학술논문

Latent Graph Representations for Critical View of Safety Assessment
Document Type
Periodical
Source
IEEE Transactions on Medical Imaging IEEE Trans. Med. Imaging Medical Imaging, IEEE Transactions on. 43(3):1247-1258 Mar, 2024
Subject
Bioengineering
Computing and Processing
Surgery
Annotations
Task analysis
Visualization
Image segmentation
Image reconstruction
Semantics
Scene graphs
representation learning
surgical scene understanding
critical view of safety
Language
ISSN
0278-0062
1558-254X
Abstract
Assessing the critical view of safety in laparoscopic cholecystectomy requires accurate identification and localization of key anatomical structures, reasoning about their geometric relationships to one another, and determining the quality of their exposure. Prior works have approached this task by including semantic segmentation as an intermediate step, using predicted segmentation masks to then predict the CVS. While these methods are effective, they rely on extremely expensive ground-truth segmentation annotations and tend to fail when the predicted segmentation is incorrect, limiting generalization. In this work, we propose a method for CVS prediction wherein we first represent a surgical image using a disentangled latent scene graph, then process this representation using a graph neural network. Our graph representations explicitly encode semantic information – object location, class information, geometric relations – to improve anatomy-driven reasoning, as well as visual features to retain differentiability and thereby provide robustness to semantic errors. Finally, to address annotation cost, we propose to train our method using only bounding box annotations, incorporating an auxiliary image reconstruction objective to learn fine-grained object boundaries. We show that our method not only outperforms several baseline methods when trained with bounding box annotations, but also scales effectively when trained with segmentation masks, maintaining state-of-the-art performance.