학술논문

Mutually Improved Endoscopic Image Synthesis and Landmark Detection in Unpaired Image-to-Image Translation
Document Type
Periodical
Source
IEEE Journal of Biomedical and Health Informatics IEEE J. Biomed. Health Inform. Biomedical and Health Informatics, IEEE Journal of. 26(1):127-138 Jan, 2022
Subject
Bioengineering
Communication, Networking and Broadcast Technologies
Components, Circuits, Devices and Systems
Computing and Processing
Signal Processing and Analysis
Task analysis
Surgery
Valves
Maintenance engineering
Training
Semantics
Generative adversarial networks
surgical simulation
surgical training
CycleGAN
landmark localization
landmark detection
mitral valve repair
Language
ISSN
2168-2194
2168-2208
Abstract
The CycleGAN framework allows for unsupervised image-to-image translation of unpaired data. In a scenario of surgical training on a physical surgical simulator, this method can be used to transform endoscopic images of phantoms into images which more closely resemble the intra-operative appearance of the same surgical target structure. This can be viewed as a novel augmented reality approach, which we coined Hyperrealism in previous work. In this use case, it is of paramount importance to display objects like needles, sutures or instruments consistent in both domains while altering the style to a more tissue-like appearance. Segmentation of these objects would allow for a direct transfer, however, contouring of these, partly tiny and thin foreground objects is cumbersome and perhaps inaccurate. Instead, we propose to use landmark detection on the points when sutures pass into the tissue. This objective is directly incorporated into a CycleGAN framework by treating the performance of pre-trained detector models as an additional optimization goal. We show that a task defined on these sparse landmark labels improves consistency of synthesis by the generator network in both domains. Comparing a baseline CycleGAN architecture to our proposed extension ( DetCycleGAN ), mean precision (PPV) improved by $+61.32$, mean sensitivity (TPR) by $+37.91$, and mean $F_1$ score by $+0.4743$. Furthermore, it could be shown that by dataset fusion, generated intra-operative images can be leveraged as additional training data for the detection network itself.