Post-surgical Endometriosis Segmentation in Laparoscopic Videos

preprint OA: green CC0
AI-generated summary by claude@2026-06, 2026-06-12

This paper presents a system trained to segment dark endometrial implants in laparoscopic videos, annotating detected regions and providing a summary for improved video browsing.

One-sentence paraphrase of the abstract; not a substitute for reading it. No clinical advice. How this works

AI-generated deep summary by claude@2026-06, 2026-06-12 · read from full text

This demo paper studies automated segmentation of a specific visual appearance of endometriosis—dark endometrial implants—in laparoscopic surgery videos, using a custom single-class dataset derived from GLENDA annotations and extensive image augmentation (e.g., rotation, blur, perspective changes, desaturation, and tracking). The authors adapt Mask R-CNN via transfer learning with a ResNet-101 backbone and report best mask segmentation performance of 0.642 [email protected] (with 0.324 mAP averaged over IoU thresholds 0.50–0.95) after 29 epochs. A video-processing pipeline then analyzes frames to generate bounding boxes, pixel masks, and annotated output videos with a frame-by-frame confidence summary bar, while storing extracted data in JSON format for later browsing. The paper explicitly presents a feasibility showcase rather than a fully detailed dataset/training study or a fully developed user interface. This paper is centrally about endometriosis — it focuses on segmenting post-surgical dark endometrial implants in laparoscopic videos.

Read from the paper's body, not the abstract. Not a substitute for reading the paper. No clinical advice. How this works

Abstract

Endometriosis is a common women's condition exhibiting a manifold visual appearance in various body-internal locations. Having such properties makes its identification very difficult and error-prone, at least for laymen and non-specialized medical practitioners. In an attempt to provide assistance to gynecologic physicians treating endometriosis, this demo paper describes a system that is trained to segment one frequently occurring visual appearance of endometriosis, namely dark endometrial implants. The system is capable of analyzing laparoscopic surgery videos, annotating identified implant regions with multi-colored overlays and displaying a detection summary for improved video browsing.
Full text 14,340 characters · extracted from oa-pdf · click to expand
Post-surgical Endometriosis Segmentation in Laparoscopic Videos Andreas Leibetseder, Klaus Schoeffmann Institute of Information Technology Klagenfurt University Klagenfurt, Austria [aleibets,ks]@itec.aau.at J¨org Keckstein Medical Faculty Ulm University Ulm, Germany [email protected] Simon Keckstein University Hospital Ludwig-Maximilians-University Munich, Germany [email protected] Abstract—Endometriosis is a common women’s condition ex- hibiting a manifold visual appearance in various body-internal locations. Having such properties makes its identification very difficult and error-prone, at least for laymen and non-specialized medical practitioners. In an attempt to provide assistance to gynecologic physicians treating endometriosis, this demo paper describes a system that is trained to segment one frequently occurring visual appearance of endometriosis, namely dark endometrial implants. The system is capable of analyzing la- paroscopic surgery videos, annotating identified implant regions with multi-colored overlays and displaying a detection summary for improved video browsing. Index Terms—Endometriosis, Lesion Segmentation, Mask R- CNN I. INTRODUCTION Endoscopic surgical procedures are well established partic- ularly in gynecology. The exact diagnosis of various diseases takes place via an endoscopy camera system which is inserted into the abdominal cavity through a small port. The endoscopic image is made available to the surgeon on monitors. The exploration of the abdominal cavity and especially the inner genital tract is very informative and helpful for a correct diagnosis and therapy in the case of painful conditions or pathological findings. One condition commonly treated this way is termedendometriosis, which refers to the abnormal growth of uterine-like tissue outside of the uterus and is diag- nosed among women of child-bearing age. Affected patients exhibit lesions of varying severity – often in various locations. Complete identification and recording of all foci and their therapy (removal) is essential for improving symptoms and quality of life of the patient. There are two mainly used systems to classify the disease, the revised American Society for Reproductive Medicine (rASRM) score [1] and theEnzian classification [2], [3]. The rASRM classification is particu- larly applicable to the recording of all intraperitoneal lesions, whereas the Enzian classification covers deep endometriosis. The classification is primarily carried out by the surgeon’s visual assessment complimenting each other for quantifying a patient’s overall condition. The entire detection of the endometriosis in the partially inaccessible area of the pelvis and the large area of the peritoneum can be limited, and is made more difficult by the different color and appearance of the respective endometrial lesions. Due to these various manifestations of endometriosis, good training and great attention is required from the surgeon during diagnosis. The lack of experience, possibly combined with time pressure under a large operation list, carries the risk of incomplete recording of the disease. This has an essential consequence for the further treatment and the patient’s well- being. There is a requirement to prevent misdiagnosis of the disease as far as possible and at the same time to intensify the visual perception of all lesions, especially for doctors in training. This could be supported intra- or post-operatively with the help of image segmentation. With deep learning already heavily employed in medical imaging, it naturally could be regarded as an opportunity for not only improving aforementioned educational training but as well facilitate post-surgical analysis. In order to demonstrate the feasibility of such a goal, for this work we focus on the object segmentation of a specific visual appearance of en- dometriosis – darkendometrial implants. Figure 1 depicts four examples taken from a custom-created ground truth dataset 1 including region-based annotations of such pathological areas. When regarding these annotations, it can be observed that, although the indicated regions appear distinctly different from their immediate surroundings, they seem quite similar to other non-pathological areas such as spots of blood or dark vessels. The dataset exclusively contains single-class implant annotations and is used to adapt and train the state-of-the-art deep object segmentation network Mask R-CNN [4], which is a region-based convolutional neural network capable of producing pixel masks for detected objects in addition to bounding boxes generated by an incorporated region proposal network (c.f. Faster R-CNN [5]). Overall, we formulate our contributions as follows: •Adapting Mask R-CNN and providing a model for binary segmentation of endometrial implants. •Local and temporal visualization of endometrial implants in laparoscopic surgery videos. •Providing the tool source code as well as pre-trained models for academic purposes 2. 1https://tinyurl.com/ENIDDS 2https://tinyurl.com/EndoSegTool arXiv:2510.13899v1 [cs.CV] 14 Oct 2025 (a) (b) (c) (d) Fig. 1: Examples of dark endometrial implants This demonstration highlights partial results of an ongoing more thorough study on the subject of endometriosis segmen- tation. As such, the following sections intentionally focus on describing the tool and its features rather than portraying the dataset creation and training approach in very much detail. II. ENDOMETRIOSISSEGMENTATIONTOOL The endometriosis segmentation tool can generally be de- scribed as an ensemble of technologies combined, resulting in a series of scripts for analyzing post-surgical video archives. These scripts are used for creating annotated output videos as well as a configurable amount of metadata, which can for instance be incorporated into potential interactive systems. As mentioned above, this demo should be regarded as a showcase for highlighting the feasibility of endometriosis segmentation, therefore, we reserve building a fully-fledged user interface for future versions of the tool. In the following sections we describe its architecture, usage, hardware-specific runtime analysis and implementation details. A. Architecture The system’s overall architecture is comprised of three three main steps: dataset creation, model training and video analysis (model application). We custom-create a single-class lesion dataset from re- fining parts of the more extensive and multi-class Gyne- cologic Laparoscopy Endometriosis Dataset [6] (GLENDA). The collected base dataset comprises over 350 region-based endometrial implant annotations for 160 frames taken from more than 100 patient cases exhibiting endometriosis. In order to improve the trained segmentation model, we augment this dataset by applying various techniques including rotating, blurring, perspective transformation, desaturation as well as object tracking. For the subsequent training step we divide these various resulting datasets into two different subsets used for training, validation and testing. As mentioned above, for model training we adapt state- of-the-art object segmentation network MASK R-CNN for transfer learning a single output label. As a backbone network we employ ResNet-101 [7] together with overall multi-task loss function incorporating class (log loss), bounding box (smoothL 1 loss) and mask segmentation (binary cross entropy loss) predictions as described in [4], [8]. Training is conducted for 50 epochs using a learning rate of0.001and stochastic gradient descent as an optimizer. The best performing model in terms of mean average precision (mAP) for mask segmentation as employed in the MS COCO-detection [9] evalutaions is achieved after 29 epochs using rotation as well as cropping for augmentation: 0.642 [email protected] at a threshold of 0.5 Fig. 2: Video Processing Pipeline. (a) (b) (c) (d) Fig. 3: Video at two different points in time – raw (top row) and analyzed (bottom row) mask overlap (0.324 mAP for a threshold range of 0.50 to 0.95 with 0.05 steps). This model together with other well- performing models from both splits are made available for download3. Finally, we utilize such a model in our system for detecting pathologically suspicious regions with a confidence threshold of 0.50 or above. The employed core processing pipeline is depicted in Figure 2: first a user provides the tool with a raw surgery video, which then is analyzed frame by frame extracting bounding boxes, masks and labels. Whenever results are found, the tool uses the determined segmentation masks to produce annotated frames as well as an overall detection summary in form of an indication bar, as depicted in Figure 3. This bar indicates frame-by-frame detections over-time, col- ored by detection confidence (yellow to dark red) – values for multiple detections are averaged. Both, segmentation results as well as indication bar are integrated into the final video output, while additionally marking the current video position with a green horizontal bar. This way, viewers of such annotated output videos at any point in time are provided with an overview of potentially important sections. All extracted data can additionally be stored in JSON-format, as to facilitate the integration in to future interactive video browsing systems. B. Hardware and Runtime Analysis For implementation, training and evaluation we used a workstation with the following specifications: Intel Core i7- 3https://tinyurl.com/ENIDDS TABLE I: Processing time comparison of 16:9 resolutions. resolution avg in ms 640×360153 1280×720158 1920×1080170 3840×2160207 5820K CPU @ 3.30GHz x 6, 32 GiB DDR3 @ 1333 MHz, Nvidia GeForce GTX 1080. On such a machine, model train- ing required approximately 2h to complete. The tool has been implemented using Linux Ubuntu 18.x, but also successfully tested on Windows 10 systems. Given the exclusive utilization of cross-platform technologies (c.f. SectionII-C), it is assumed to be compatible with MacOS as well. Concerning runtime performance, when using GPU pro- cessing the system requires an average of approximately 150- 250ms of processing time per frame for most videos, as is out- lined by Table I. Albeit clearly growing with larger resolutions, the processing time essentially depends on resizing the input images, since the generated model’s input is resized to fit a re- stricted distinct pixel range, i.e. 800 pixels for the shortest and 1333 pixels for the longest image side. Hence, assuming a per- frame performance of 170ms we can approximately estimate the overall time requirements of processing an hour of video produced by an endoscope recording in HD resolution with 25 frames per second: 170×25×60×60 1000 = 15300s= 4h15m. C. Installation and Usage The tool requires working installations OpenCV 4, Python 3.x 5, FFmpeg6 and Detectron27. All further requirements can simply be installed by running: $ pip install requirements.txt In its most basic use case – analyzing a single video – the tool can be executed by running: $ python demo.py -i -m -o The tool is also capable of multi-video and -model process- ing and a detailed description of all available options can be produced by running the script with the ’-h’ flag. III. CONCLUSION We present a tool for segmenting and annotating endome- trial implants in laparoscopic videos. Approaching this prob- lem by combining video object tracking in combination with state-of-the-art image segmentation, we achieve qualitatively good results that can be regarded as a first step towards an interactive post-surgical video archive browser, which could be of great assistance for treatment planning as well as clin- ical education. Finally, this work represents valuable insights into the feasibility of applying traditional machine learning developed real-world object detection to a practical medical use case. ACKNOWLEDGMENTS This work was funded by the FWF Austrian Science Fund under grant P 32010-N38. REFERENCES [1] M. Canis, J. Donnez, D. Guzick, J. Halme, J. Rock, R. Schenken, and M. Vernon, “Revised american society for reproductive medicine classification of endometriosis: 1996,”Fertility and Sterility, vol. 67, no. 5, pp. 817–821, 1997. [2] J. Keckstein, U. Ulrich, M. Possover, K. Schweppeet al., “Enzian- klassifikation der tief infiltrierenden endometriose,”Zentralblatt f ¨ur Gyn¨akologie, vol. 125, p. 291, 2003. [3] J. Keckstein and G. Hudelist, “Classification of die including bowel endometriosis: from r-asrm to #enzian-classification,”Best Practice & Research Clinical Obstetrics & Gynaecology, 2020. [4] K. He, G. Gkioxari, P. Doll ´ar, and R. B. Girshick, “Mask R-CNN,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 42, no. 2, pp. 386–397, 2020. [Online]. Available: https://doi.org/10.1109/TPAMI.2018.2844175 [5] S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: Towards real-time object detection with region proposal networks,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 6, pp. 1137–1149, June 2017. [Online]. Available: https://doi.org/10.1109/ TPAMI.2016.2577031 [6] A. Leibetseder, S. Kletz, K. Schoeffmann, S. Keckstein, and J. Keckstein, “GLENDA: gynecologic laparoscopy endometriosis dataset,” inMultiMedia Modeling - 26th International Conference, MMM 2020, Daejeon, South Korea, January 5-8, 2020, Proceedings, Part II, ser. Lecture Notes in Computer Science, Y . M. Ro, W. Cheng, J. Kim, W. Chu, P. Cui, J. Choi, M. Hu, and W. D. Neve, Eds., vol. 11962. Springer, 2020, pp. 439–450. [Online]. Available: https://doi.org/10.1007/978-3-030-37734-2 36 4OpenCV 4.x, https://opencv.org 5Python 3.x, https://www.python.org 6https://ffmpeg.org 7https://github.com/facebookresearch/detectron2 [7] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” inProceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778. [8] R. Girshick, “Fast r-cnn,” inProceedings of the IEEE international conference on computer vision, 2015, pp. 1440–1448. [9] T.-Y . Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Doll ´ar, and C. L. Zitnick, “Microsoft coco: Common objects in context,” inEuropean conference on computer vision. Springer, 2014, pp. 740–755.

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

Ask this paper AI returns verbatim quotes from the full text · source: oa-pdf

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Outcome instruments

rASRM Enzian

Condition tags

endometriosis

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2025) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

openalex
last seen: 2026-06-04T00:00:01.174412+00:00
License: CC0 · commercial use OK