Research progress on correlative prediction factors and prediction models of endometriosis associated ovarian carcinoma

review OA: gold CC0 ⤵ 3 in-corpus citations
AI-generated summary by claude@2026-06, 2026-06-09

This review analyzes 7 studies identifying clinical and serological factors like age and CA125, and models such as logistic regression, for predicting endometriosis-associated ovarian cancer risk.

One-sentence paraphrase of the abstract; not a substitute for reading it. No clinical advice. How this works

AI-generated deep summary by claude@2026-06, 2026-06-14 · read from full text

This paper reviews evidence and synthesizes research on correlative prediction factors and prediction models for endometriosis-associated ovarian cancer (EAOC), drawing from an exhaustive literature search (CNKI, PubMed, and Web of Science) covering 1989 to 2023 and applying inclusion criteria limited to English research articles; only 7 papers were ultimately included. It summarizes epidemiologic findings that women with endometriosis have higher risk of ovarian cancer (including increased risks by histologic subtype), and it catalogs proposed risk factors such as older age, dysmenorrhea/abnormal menstruation, postmenopausal status, longer disease duration, infertility, hormone-related factors, and elevated biomarkers (CA125, HE4, ROMA) and ultrasound features. The review also notes specific caveats for interpreting EAOC incidence and risk estimates, including possible underestimation due to specimen sampling omissions, tumor “burnout,” and strict pathological diagnostic criteria. This paper is centrally about endometriosis — it focuses on prediction factors and models for endometriosis-associated ovarian carcinoma, aiming at identifying high-risk endometriosis patients.

Read from the paper's body, not the abstract. Not a substitute for reading the paper. No clinical advice. How this works

Abstract

BACKGROUND: Endometriosis is a common benign disease in women of childbearing age, with a malignant change rate of about 1%. Endometriosis associated ovarian cancer (EAOC), which usually occurs in the ovaries, is a serious threat to women's health. Early identification of high-risk groups of EMs malignant transformation is of great significance for the prevention and treatment of EAOC. However, there is still a lack of specific and sensitive prediction factors. In recent years, scholars at home and abroad have used traditional statistical methods and machine learning to explore EAOC related prediction factors and prediction models. This paper mainly reviews and evaluates the diagnosis and prediction model of EAOC. METHODS: Studies were identified by searching the CNKI, PubMed and Web of Science Core Collection, (WOSCC) till 2023, Data which met the inclusion criteria of clinical studies were evaluated about the quality. This paper analyzes and summarizes the prediction factors and prediction models in the literature. RESULTS: After screening, 7 relevant studies were finally obtained. Prediction factors included: age, menstruation, menopausal status, course of disease, infertility associated with endometriosis, history of single estrogen use during menopause, serological indexes: human epididymis protein 4, carbohydrate antigen 125(CA125), ovarian malignancy risk algorithm, indications for ultrasound examination: cyst shape, structure and blood flow signal, etc. Prediction models: Alignment diagram, Multivariate logistic regression model, Gail model, Gradient Boosting Decision Tree and Lasso-logistics regression. CONCLUSION: Related models were in good agreement with the actual situation, and have good sensitivity and specificity. The relevant prediction factors and prediction models were summarized to provide reference and new thinking for the research of prediction models in the field of EAOC, in order to develop standardized long-term management strategies for high-risk groups of EAOC and realize the advance of the diagnosis threshold of patients with EAOC.
Full text 30,793 characters · extracted from pmc-nxml · 6 sections · click to expand

Section 5

The limitation of this study lies in the small number of relevant literatures included, which is due to two aspects. On the one hand, due to the severe diagnostic conditions of this disease, the number of cases reported is lower than the real incidence rate, so the clinical case data can be collected less. On the other hand, the application of prediction model in clinical practice is still in the exploratory stage, and there are few related studies, so the number of references is limited. It’s believe that with the increasing attention of clinical doctors to this disease and the continuous development of clinical prediction models in the future, we will have more relevant research to learn from.

Intro

Endometriosis (EMs) is a disease in which endometrial tissue (glands and stroma) grows outside the uterus and is a common and frequent disease in women of childbearing age. According to reports, the incidence of this disease is about 10% globally. [ 1 ] Although the morphology of EMs is benign, it has similar biological characteristics to malignant tumors, including implantation growth, invasion, distant metastasis and high recurrence rate. Ovaries are the most vulnerable part to be invaded, and 80% of EMs malignant lesions occur in ovaries, [ 2 ] which is called endometriosis associated ovarian cancer (EAOC). Globally, the diagnosis of EMs is generally delayed. It takes an average of 7.5 years from the onset of symptoms to the diagnosis of EMs. [ 1 ] This also led to a delay in the diagnosis of EAOC. In addition, the onset of ovarian cancer is insidious, and there are no special symptoms in the early stage. At present, there is no sensitive, specific means of early screening, which leads to the delay in the diagnosis of EAOC. With the increase in the number of patients with EMs, the number of patients with EAOC is also increasing. As a malignant tumor with the highest fatality rate in gynecology, ovarian cancer is a serious threat to women’s life and health. [ 3 ] Early identification of high-risk EMs patients is of great significance for the prevention and early treatment of EAOC. With the advent of the era of massive data, artificial intelligence has been widely used in the field of life science and medicine. In addition to the traditional statistical methods, domestic and foreign scholars began to use machine learning to analyze clinical data statistics to diagnose diseases, evaluate conditions and predict prognosis, and established many related prediction models. A large number of studies [ 4 , 5 ] have established and validated relevant prediction models to evaluate the condition and prognosis of EMs. These provides bases for early diagnosis, treatment selection and long-term follow-up. Some scholars have also explored the prediction model of ovarian endometriosis (OE) malignant transformation. This paper mainly reviews and summarizes relevant studies on EAOC prediction models, and summarizes relevant risk factors and prediction models. It provides reference and new thinking for the research of prediction model in the field of EAOC, in order to make long-term management plan for high-risk groups of EAOC and diagnose EAOC patients as early as possible.

Author

Writing – original draft: Jing Liu. Writing – review & editing: Yu Ma, Wen Jiang, Ping Xie.

Methods

We conducted an exhaustive online bibliographic search of the following databases: CNKI, PubMed and Web of Science Core Collection, (WOSCC). The articles under investigation were sourced from a period spanning January 1, 1989 to December 1, 2023. By searching relevant search terms through PubMed, we create the search format as follows: (((((((((((((((((((TS = (Neoplasm, Ovarian)) OR TS = (Ovarian Neoplasm)) OR TS = (Neoplasms, Ovarian)) OR TS = (Ovary Neoplasms)) OR TS = (Neoplasm, Ovary)) OR TS = (Neoplasms, Ovary)) OR TS = (Ovary Neoplasm)) OR TS = (Ovary Cancer)) OR TS = (Cancer, Ovary)) OR TS = (Cancers, Ovary)) OR TS = (Ovary Cancers)) OR TS = (Cancer of Ovary)) OR TS = (Cancer of the Ovary)) OR TS = (Ovarian Cancer)) OR TS = (Cancer, Ovarian)) OR TS = (Cancers, Ovarian)) OR TS = (Ovarian Cancers)) AND TS = (endometriosis)) OR TS = (endometriosis associated ovarian cancer)) AND (((TS = (risk prediction)) OR TS = (prediction models)) OR TS = (prediction factors)) OR TS = (prediction) Inclusion criteria: The language of publication was English. The document type is limited to article. According to the characteristics of the database, the subject, title, abstract and full text of the literature are searched and manually screened. Exclusion criteria: Repeated publications; and Newspapers and books related to popular science; Guide, expert consensus, review, conference notice, news report, case, experience exchange and outcome literature.

Results

After preliminary search, 274 relevant literatures were obtained, and 267 non-research papers and irrelevant literatures were excluded, 7 literatures were finally included. In recent years, more and more epidemiological studies have shown that EAOC is closely related to EMs. A meta-analysis [ 6 ] included 13 case-control studies that included 13,226 healthy women and 7911 patients with ovarian cancer. The study found that women with a history of EMs were 1.5 times more likely to develop ovarian cancer than healthy women. The risk of clear cellcarcinoma and endometrioid adenocarcinoma in patients with EMs was 3.05 times and 2.04 times higher than in healthy women. A cohort study of 45,790 EMs patients [ 7 ] showed that patients with OE had a 23 times higher incidence of ovarian cancer than healthy women. Another prospective cohort study that included 102,025 patients with EMs found that the incidence of ovarian cancer is 1.81 to 2.14 times higher in patients with EMs than in healthy women. [ 8 ] A 14-year study found that the incidence of ovarian cancer in patients with and without EMs was 1.90% to 18.70% and 0.77% to 0.89%. [ 9 ] In addition, many studies have shown that patients with EMs have an increased risk of ovarian malignancy compared with healthy women. [ 10 , 11 ] The pathological types of EAOC are mainly clear cellcarcinoma and endometrioid adenocarcinoma, and there are few low-grade serous adenocarcinoma. [ 12 ] It is generally believed that OE is the origin of EAOC, and the core of the relevant pathogenesis hypothesis is that endometrial cells with oncogene mutations enter the pelvic cavity with menstrual blood and implant into the ovary to cause disease. [ 13 – 16 ] It is still controversial whether the eutopic endometrium has undergone gene mutation or malignant transformation after the endometriosis is located in the ovary. Atypical endometriosis was considered to be precancerous lesions of EMs. [ 17 ] However, in the diagnosis of EAOC, the incidence of EAOC may have been higher than the data reported in the existing literature due to the omission of specimen sampling, the “burnout effect” of tumor tissue, and the strict diagnostic criteria of EAOC pathology. Age is an independent risk factor for EAOC. Most studies believe that the older patients with EMs have a higher risk of malignant transformation, and the risk of EAOC is positively correlated with age. A retrospective cohort study including 5945 patients with EMs found that patients with age > 50 years at first diagnosis of EMs had a higher risk of malignant degeneration than those without EMs or those with age < 30 years, and the incidence of EAOC increased with age. [ 18 ] Studies have shown that the median age of EAOC patients was 52 years old. [ 19 ] In another study, the average age of EAOC patients was 53.6 years old, which was significantly higher than that of benign EMs patients (39.2 years old). [ 20 ] Early menarche, abnormal menstruation and dysmenorrhea are risk factors for EAOC. According to the “retrograde bleeding hypothesis,” EMs patients with early menarche and short menstrual cycle increase the probability of ectopic endometrium invading the ovarian gland epithelium, thus increasing the incidence of EAOC. Concurrent dysmenorrhea is also a risk factor of EAOC. A study [ 21 ] showed that among 156 ovarian cancer patients, the probability of EAOC patients with dysmenorrhea was higher than the probability of non-EAOC patients(35.14% vs 7.56%). Patients with accessory mass and dysmenorrhea should be alert to the occurrence of EAOC, and patients with EMs should also pay attention to the change of pain regularity. The relevant clinical guidelines issued by the European Society of Human Reproduction and Embryology (ESHRE) in 2022 pointed out that the occurrence of endodysmenorrhea malignant transformation in EMs patients should be vigilant when dysmenorrhea is transformed into chronic pelvic pain. [ 22 ] Menopausal status is an independent risk factor of EAOC. [ 23 ] Even if EMs-related clinical symptoms disappear after menopause, the risk of EAOC is still present [ 24 ] and even higher. [ 25 ] EMs tissue that persists or relapses after menopause is estrogen-dependent. In menopausal women, obesity and estrogen therapy may activate potential lesions of the ectopic endometrium. [ 26 , 27 ] A cohort study with 6398 EMs patients showed an 8.68-fold increased risk of ovarian cancer in postmenopausal patients with EMs (95% CI 4.12–15.3). [ 28 ] Oxholm’s study also reached a similar conclusion. [ 29 ] The long course of disease is also one of the high risk factors of EAOC. The course of EMs ≥ 10 years should be vigilant against the occurrence of EAOC. Data from the Swedish National Cancer Center show that the incidence of malignant changes increases with the increase in the duration of EMs. [ 30 ] A data analysis of 64,000 women showed that the duration of EMs disease was positively associated with the risk of malignant transformation. [ 14 , 31 ] Infertility associated with endometriosis is a risk factor of malignant transformation of EMs. Infertility caused by EMs is the result of multiple factors, and the malignant transformation of EMs is mainly related to specific immune response and hormonal environment characterized by excessive estrogen and progesterone deficiency. [ 32 ] A retrospective cohort study with a median follow-up of 18.8 years included 12193 infertility patients. Compared with the general population, infertility patients had a significantly higher incidence of ovarian cancer (SIR 1.98, 95% CI 1.4–2.6), and primary infertility patients with EMs had the highest incidence of malignancy (SIR 4.19, 95% CI 2.0–7.7). [ 33 ] High estrogen level is a risk factor for EMs malignancy, [ 34 ] and single estrogen therapy in postmenopausal EMs patients increases the risk of malignancy. On the one hand, most EMs patients have progesterone resistance, [ 35 ] which strengthens the role of estrogen, and excessive estrogen level will promote the increase of vascular endothelial growth factor and stimulate tumor growth. [ 36 ] On the other hand, the number of macrophages in the abdominal cavity of EMs patients increases with increased activity, and high levels of estrogen can promote the growth and malignant transformation of ectopic endometrium of macrophages in the abdominal cavity. [ 37 ] Increased CA125, HE4, and ROMA were risk factors of EAOC. CA125 has high sensitivity and low specificity. [ 38 ] One study showed that the CAl25 level of EAOC patients was mostly between 201 and 1000 U/mL. [ 39 ] Another study showed that the median latency time from mildly elevated CAl25 levels (35 < CAl25 < 65 U/mL) to diagnosis of ovarian cancer in patients with non-serous ovarian cancer was 3.8 years. [ 40 ] However, it should be noted that the causes of CA125 elevation also include menstrual period, pelvic inflammation, pregnancy, and some benign ovarian teratoma, etc. The interference of related factors should be excluded during examination. HE4 has a high specificity in ovarian cancer diagnosis and is not affected by the menstrual cycle. HE4 was significantly increased in EAOC. [ 41 ] ROMA is an ovarian cancer risk assessment model based on CA125 and HE4 combined with women’s menstrual status. The sensitivity (90.47%) and specificity (97.62) of CA125 and HE4 combined for the diagnosis of epithelial ovarian cancer [ 42 ] were significantly higher than that of a single tumor marker. [ 43 ] One study showed that HE4 and ROMA levels in EAOC patients were higher than those in OE patients. [ 44 ] Ultrasound is the preferred imaging method for EMs. Large diameter(≥8 cm), solid or papillary structure inside the cyst, and abundant peripheral blood flow signal are risk factors of EAOC. [ 45 ] One study showed that the incidence of ovarian cancer in patients with OE with a diameter of 6 to 9 cm and ≥9 cm was 35% and 65%, respectively. [ 46 ] Another retrospective study involving 508 ovarian cancer patients showed that cyst diameter ≥ 10 cm was a high risk factor of EAOC. [ 47 ] The risk of EAOC increases with cyst diameter. [ 20 ] The ultrasound examination of OE often indicates fine punctated echoes in the capsule, while the ultrasound examination of EAOC often indicates solid components or papillae in the capsule and abundant peripheral blood flow. The predictive factors used for model construction are the risk factors in the meta-analysis. In addition to the above predictive factors, the risk factors for EAOC reported in literature include BMI, nonalcoholic fatty liver disease, [ 48 ] surgical history, Miriol emplacement, family history of malignant tumors, dioxin exposure history, etc. The alignment diagram (Nomogram) is a quantitative analysis diagram based on multivariable logistic regression analysis to screen predictive factors and integrate multiple predictors. The alignment diagram plots line segments with scales on the same plane in a certain proportion to represent the functional relationship between multiple independent variables. The sum of the scores of each predictor is the risk coefficient reflecting the outcome of the patient. The alignment diagram is a common clinical prediction model, and its essence is a graph of multivariable logistic regression analysis. The main advantage of the alignment diagram is that there is no need to classify continuous variables, and multiple predictors can be plotted in a single graph. However, if the size and resolution of the alignment diagram in the published paper is poor, the calculated risk probability may not be accurate. The more predictors included in the model, the more difficult the alignment diagram is to interpret. The study of Na Yan [ 49 ] included 357 OE patients (45 EAOC patients). Firstly, the researchers used univariate Logistic regression analysis to screen patients’ clinical data. Then multivariate logistic regression analysis was used to analyze the clinical data of the patients again to identify independent risk factors for OE malignant transformation, which were then used as predictors to construct a alignment diagram model. This is also a common process for building predictive models. The results of this study show that the predictive power of the predictors in descending order is as follows: HE4 (≥150 pmol/L), ROMA (≥11.4%), solid component of the cyst, papillary of the cyst wall, blood flow signal, thickness of the cyst wall, menopausal status, maximum diameter of the cyst (≥10 cm), age (≥45 years), abnormal menstruation and disease course (≥10 months). In this study, the above patients were used as the training set, and 206 EMs patients were included as the validation set. The study evaluated the calibration of the prediction model, and the results showed that the C-index of the 2 groups was 0.904 (95% CI 0.867–0.941) and 0.912 (95% CI 0.871–0.950), respectively, indicating that the prediction of the model was highly consistent with the actual situation. The AUC of this model is 0.982 (95% CI 0.977–0.988), indicating that this model has strong predictive ability. However, this study is a cross-sectional study, unable to reflect the temporal and causal relationship between malignant transformation, so the research results need to be verified by cohort studies. In the study of Yuanyuan Li, [ 50 ] 90 patients with ovarian cancer were included, including 23 patients with EAOC. Menopausal status, combined uterine fibroids or polyps, dysmenorrhea or irregular bleeding, and ROMA index were screened as predictive factors to construct a alignment diagram model. In this study, the Brier score was used to evaluate the discrimination and calibration of the model. The Brier value was 0.108, and the AUC was 0.896, indicating that the model had good prediction ability and accuracy. This study also verified the clinical practicability of the model, and the Pt values of the DCA curve ranged from 0.1 to 0.7, suggesting that the model has good clinical applicability and can be used to predict EAOC in ovarian cancer patients. In addition, from the independent risk factors in this study, compared with other non-EAOC patients, EAOC patients had the following characteristics: premenopause, dysmenorrhea, and irregular vaginal bleeding. This fits with “thedualistic model of ovarian carcinoma.” EAOC mostly belongs to type I ovarian cancer, which has weak invasion and spread ability. However, due to patients seeking medical treatment for EMs-related clinical symptoms, the diagnostic threshold has moved forward. In another study, Yuanyuan Wang [ 51 ] constructed a nomogram model to predict the risk of EMs malignancy in perimenopause. The study included 412 patients with perimenopausal EMs, including 42 patients with EAOC. Independent risk factors were identified, including menopausal status, progressive dysmenorrhea, menopausal single estrogen use history, infertility and EO. The AUC of the model in the training set and validation set were 0.856 (95% CI 0.829–0.882) and 0.892 (95% CI 0.850–0.934), respectively. The Hosmer–Lemeshow good of fit test was used to evaluate the calibration of the model, and the results showed that χ 2  = 11.777, P  = .161, the model was in good agreement with the actual situation. Regression analysis is the clinical common predictive modeling method. It studies the causal relationship between the dependent variable (outcome) and the independent variable (predictor). Multivariate logistic regression models are a classic type of regression models, which used data from a longitudinal study model, usually. The model’s dependent variables are measured repeatedly on the same individual at different points in time, or the model is built by using nested data. multivariate logistic regression analysis can show the relationship between the predictors. MR relaxometry can measure the difference of iron concentration in OE and EAOC capsule fluid to determine whether it is benign or malignant. The transverse magnetic relaxation rate R2, which is the reciprocal of MR Transverse relaxation time, has a high accuracy in diagnosing EAOC, with a reported critical value of 12.1 (sensitivity 86%, specificity 94%). [ 52 ] However, due to the limitation of the application range of MR relaxometry, some medical institutions can not implement it. Kawahara [ 53 ] used multivariate logistic regression to analyze the correlation between R2 and EAOC risk factors in 142 patients with ovarian cysts (OE 95 cases, EAOC 32 cases), and formulated the calculation formula for predicting R2: In this study, another 105 patients with ovarian cysts (OE 54, EAOC 51) who did not have examination by MR relaxometry were included. R2 values were calculated using the above formula. In this study, the AUC of multivariate logistic regression model was 0.816, and the critical value of EMs malignant transformation was 18.70(sensitivity 83.2%, specificity 76.4%). Age (HR: 17.20, 95% CI 3.84–77.16), CRP (HR: 6.76, 95% CI 1.58–28.89) and R2 (HR: 8.25,95% CI 2.13–32.02) were independent risk factors. Gail model was created by Gail [ 54 ] in 1989. The model initially selected 2852 breast cancer patients and 3146 normal women from Caucasian women, and statistically analyzed their clinical data to select risk factors and calculate the absolute risk of breast cancer. After many years of development, the predictors of the Gail model were eventually determined to be age, race, age of menarche, age of first delivery, personal history of breast disease, family history of breast cancer, and number of breast biopsies. The Gail model can be used to estimate the 5-year or lifetime risk of breast cancer. Women are considered high-risk if their 5-year risk is ≥ 1.67%. The Gail model has also become the most accurate and authoritative breast Cancer risk assessment model recommended by the National Cancer Institute (NCI). It is easy to use and has good predictive power, and is one of the most commonly used breast cancer risk assessment models in clinical practice. Wenting Wang [ 55 ] built a 5-year absolute risk prediction model for EMs malignant transformation based on the Gail model. The study included 444 patients with ovarian cancer, 350 with non-EMs and 94 with EMs. The included objects are divided into training set and verification set (2:1). The decision tree is used to filter the predictors, and then the validation set is used to evaluate the decision tree model. The decision tree model finally output 9 combinations and screen out 4 risk factors, including menstrual disorder, tumor diameter, CA125, FSH. The sensitivity is 0.83, the specificity is 0.75, the accuracy is l.62, the accuracy is 0.74, the F1 = 0.74, E/O = 1.45, and the AUC is 0.81, indicating that the model has good prediction ability. In the study,9 output combinations were also defined as a new set of variables, and logistic regression analysis was performed on this set of data. The results showed that the OR was the highest when menstrual disorder, tumor diameter (≥10 cm), CA125 (≥200 U/mL), FSH (≥10 mIU/mL) were combined at the same time, which was 38.20 (95% CI 36.15–40.73). Compared with multivariable regression analysis, decision tree model can better reflect the interaction between risk factors. After screening out the risk factors, Gail model was used to estimate the absolute risk. The absolute risk of EMs malignancy for each patient is the sum of the risk values of exposure to the various factors, and the formula is: “r(α, τ, χ) ”represents the risk of EMs malignancy in the future “τ” years for women with age of “α” exposed to factor “χ.” “ h 1 j ” represents baseline risk of EMs malignancy in women within the age range of “ j ” “ h 2 j ”is the competitive risk of non-EMs malignancy death in women within the age group of “ j .” “ r r j ”is the relative risk of exposure to the factor of “χ” in “ j ” age group. e x p [ − ∑ j − 1 I = a ( h 1 I r r I + h 2 I ) ] can be identified as 1. According to the formula, the 5-year risk of patients in the training set non-EMs group and EMs group was 0.026% and 0.223%, and the interquartile distance was 0.129% and 0.249%. The optimal cutoff value obtained by the ROC curve is 0.100%, which can be used as a critical value to distinguish between low and high risk groups (low risk < 0.100% < high risk). The AUC value of the model is 0.783, indicating that the model has good prediction ability. The Gail model is an empirical model, which mainly focuses on risk factors with epidemiological characteristics, such as family history, menarche and age of first delivery, without considering the influence of genetic genes of malignant tumors. With the maturity of gene detection technology, some models begin to apply gene detection results to the establishment of models, such as BRCAPRO model and BOADICE model, which is also a new direction for the establishment of malignant tumor prediction models. Gradient Boosting Decision Tree (GBDT), also called Multiple Additive Regression Tree. It uses continuous gradient lifting to build a more powerful model by merging multiple simple decision trees. Each operation in the model adds a tree to the existing tree to fit the residual difference between the prediction of the previous tree and the true value. Different from the classification decision tree used to screen risk factors in Wenting Wang’s [ 55 ] study, GBDT uses regression decision tree. In logistic regression, the logit function is used to convert the linear combination of independent variables into a predicted value between 0 and 1 to obtain the probability of occurrence of a certain event. This results in the outcome of binary variables, which is widely used because of its strong interpretability. The decision tree is trained according to the hierarchy of features, which is simulated by continuously dividing the data into rectangles. For nonlinear data, decision tree model is a better choice. Chao’s study [ 56 ] included 6809 EMT patients, of which 125 patients (1.84%) were in the EAOC group. A total of 94 variables were collected, including not only common risk factors for patients with EAOC, but also pathology data that had been rarely included in the past. The investigators used the multivariate logistic regression analysis to analyze clinical data of the included patients, and finally selected risk factors that were positively related to EAOA: age, family history of endometrial cancer, history of benign cyst surgery, mirena, preoperative CA125, maximum diameter of cyst, peritoneal endometriosis. Dysmenorrhea, preoperative using of GnRHa, leiomyoma or adenomyoma were negatively correlated with malignant transformation of EMT. The AUC of the model in the training set was 0.903 (95% CI 0.857–0.948), the sensitivity was 89.2%, and the specificity was 82.3%. The AUC in the test dataset was 0.891 (95% CI 0.821–0.960), the sensitivity was 88.9%, and the specificity was 76.7%. The P value and C-index of Hosmer-Lemeshow good of fit test of this model are 0.4 and 0.891, indicating that the model has good accuracy. The study also used a GBDT model to analyze patients data, and validated the model using 10-fold cross-validation, that is, dividing the training data set into 10 equal parts containing the same proportion of EAOC, selecting 1 as the training set for each validation, and the remaining 9 as the test set, and iterating until each set was validated. The results showed that the maximum AUC of the model was 0.9417 (95% CI 0.914–0.969), the sensitivity was 86.8%, and the specificity was 86.7%, which was superior to logistic regression. The minimum AUC of this model can still reach more than 0.79, and the model stability is good. The top 10 variables in the model with optimal parameters are taken as independent predictors, and their influence is ranked from strong to weak as follows:CA125, CA199, cyst diameter, course of disease, age, weight, BMI, menarche, ultrasonography indicated endometrial lesions, dysmenorrhea. In addition, 1788 patients were included to verify the GBDT. The prevalence of EAOC was 1.01%, the sensitivity of the model was 94.4%, and the specificity was 73.8%. Lasso-logistics regression is a common type of multiple linear regression. And it simplifies the model by adding penalty functions and continuously compressing the coefficients. This can avoid collinearity and overfitting and improve the accuracy of the model. When the coefficient is 0, it can be used to filter variables. Machine learning has a powerful advantage in analyzing and processing multi-dimensional, complex data. Random Forest (RF), Extreme Gradient Boosting (XGB), Support Vector Classifier (SVC), K-nearest Neighbor (KN), Multi-Layer Perceptron (MLP) are classic machine learning approaches. Bilin’s [ 57 ] study compared the differences between regression models and machine learning-related predictive models in processing diverse and complex clinical data. The study included 1137 patients with ovarian lesions and used lasso-logistics regression to screen clinical data, taking menopausal status, APF, ROMA_pro, ROMA_post, and CA125 as predictors. The formula constructed according to Lasso-logistics regression forecasting model is as follows: “ P ” indicates the probability of ovarian cancer, and “1− P ” indicates the probability of benign ovarian disease. The AUC of this model is 0.946 (95% CI: 0.922–0.971), and the AUC of both internal and external validation is greater than 0.75. The research also established 5 machine learning models, including: RF, XGB, SVC, KN, MLP. The AUC value of machine learning is in order from high to low: RFM (0.968), SVCM (0.951), XGBM (0.909), KNM (0.893), MLPM (0.821). F1 were 0.855, 0.893, 0.909, 0.935, and 0.922, respectively. All models showed strong prediction ability. The RF model with the best predictive power was selected to further deepen and optimize, so as to distinguish benign ovarian cyst, borderline ovarian cyst and malignant ovarian cyst. The AUC for internal and external validation of the RF model was 0.927 vs 0.838, 0.742 vs 0.527, and 0.961 vs 0.921. The top 10 predictors in RF include: ROMA_post, HE4, ROMA_pro, CA125, AFP, CA199, CEA, age, MLR, RBC. RF is an algorithm that combines multiple decision trees and is one kind of ensemble learning method. The predictive factors in the RF model include all tumor indicators, such as age, MLR, RBC, etc and are more suitable for a wide variety of clinical data than traditional statistical methods.

Discussion

From the current studies, there are a few studies on EAOC related prediction models, and most of them are retrospective studies, which may be biased due to many factors such as patient selection, lack of clinical data, and pathological diagnosis errors. However, it is worth affirmating that the prediction model of EAOC established by using clinical data as predictors has initially formed, and the related models have good prediction performance through internal and external verification, and are highly consistent with the actual situation, which has certain clinical practicability. The occurrence of EAOC is related to gene mutation, immune response, oxidative stress, hormonal changes, environment and many other factors. In addition to common clinical indicators, there are many emerging molecular indicators such as gut microbiota [ 58 ] and plasma metabolites in recent years. Exploring new related molecular markers, enriching the content of predictive factors, and establishing relevant prediction models are also the future research directions. In addition, there are many prediction models for malignant tumors, which can be used as templates for future EAOC prediction models. Finally, the ultimate purpose of this study is to identify and classify high-risk EAOC patients according to the predicted results of the model. In order to formulate the corresponding long-term management plan, to achieve the prevention and early diagnosis and treatment of EAOC.

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

Ask this paper AI returns verbatim quotes from the full text · source: pmc-nxml

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Condition tags

endometriosisinfertility

MeSH descriptors

Endometriosis Endometriosis Endometriosis Endometriosis Endometriosis Endometriosis Endometriosis Endometriosis Endometriosis Endometriosis Endometriosis Endometriosis Endometriosis Endometriosis Endometriosis Endometriosis Endometriosis Endometriosis Endometriosis Endometriosis

Citation neighborhood

Papers in the corpus that this work cites (lower rings, blue) and that cite this one (upper rings, green). Dot size scales with the paper's in-corpus citation count — bigger dot = more influential within the endo/adeno field. Click a dot to open that paper. [ expand to 2 hops ] — adds papers reached through this work's immediate citers/citees. Heavier; up to 60 extra dots.

References (54)

Cited by (3)

Source provenance

europepmc
last seen: 2026-06-17T06:13:18.893374+00:00
openalex
last seen: 2026-06-10T17:14:06.276822+00:00
pmc
last seen: 2026-05-13T20:22:03.195721+00:00
pubmed
last seen: 2026-06-17T06:10:48.189925+00:00
License: CC0 · commercial use OK