Machine learning-based preliminary screening tool for clinical pregnancy prediction: towards management of IVF/ICSI stages.

doi:10.1080/07853890.2025.2582245

Machine learning-based preliminary screening tool for clinical pregnancy prediction: towards management of IVF/ICSI stages.

2025 · doi:10.1080/07853890.2025.2582245 · PMID:41243616 · PMC12624961

OA: gold publisher-OA-unknown

📄 Open PDF Full text JSON View on PubMed View at publisher

Full text 32,633 characters · extracted from pmc-nxml · 5 sections · click to expand

Results

From June 2016 to December 2021, a total of 1,989 adult female patients underwent IVF/ICSI treatment at the First Affiliated Hospital of Xiamen University. Based on the criteria, 1,062 patients were included in the final analysis ( Figure 1 ). The overall pregnancy success rate was 62.5%. The median age of these female patients was 31 years (Interquartile range: 28–34 years), and the median BMI was 21.80 kg/m 2 (interquartile range: 19.84–24.33 kg/m 2 ). Of the 250 patients included in the temporal validation cohort, 145 (58.0%) experienced a successful pregnancy. Table 1 summarized the demographic and treatment-related characteristics of the patients in the training, internal validation, and temporal validation cohorts. With the exception of DOR and endometrial thickness on HCG day, there were no significant differences in the characteristics of the patients between the two groups ( p > 0.05). In our study, accounting for 38% received assisted reproductive treatment for pelvic and fallopian tube factors. Additionally, 192 patients experienced unexplained infertility, while 312 had multiple infertility factors. In the training cohort, we observed some differences between the GnRH antagonist protocol and the early follicular phase long-acting GnRH agonist long protocol between the two groups ( p < 0.05). The overall workflow of the entire study. Baseline characteristics in the entire, training, internal validation, and temporal validation cohorts. Data were shown as median (interquartile range) or number (percentage). Abbreviations: BMI, Body Mass Index; FSH, Follicular-stimulating hormone; LH, Luteinizing hormone; E2, Estradiol; P, Progesterone; AMH, Anti-Mullerian hormone; AFC, Antral follicle counting; Gn, Gonadotropin; HCG, human chorionic gonadotropin; PCOS, Polycystic ovary syndrome; DOR, diminished ovarian reserve; EMS, endometriosis. P-value1 0.05 indicated that there was no significant difference between the training and internal validation cohorts. The variables with p < 0.2 in the Univariate logistic regression analysis were regressed using LASSO, resulting in the inclusion of female age, BMI, Primary infertility, FSH, AFC, and E2 in the model ( Table 2 ). Six different ML algorithms were employed to train the models on the training cohort. The ROC and PRC curves were shown in Supplementary Figure 1 , and the optimal hyperparameters along with the evaluation metrics for each model were presented in Supplementary Table 1 . The results of features incorporated into models in the training cohort. Abbreviations: LASSO, the least absolute shrinkage and selection operator; OR, Odd Ratio; CI, Confidence Interval, VIF, Variance Inflation; BMI, Body Mass Index; FSH, Follicular-stimulating hormone; E2, Estradiol; P, Progesterone; AFC, Antral follicle counting; Gn, Gonadotropin Factor; HCG, human chorionic gonadotropin. For internal validation cohort, the ROC and PRC curves were displayed in the Figure 2 . The AUROC of the XGB model was 0.617 (0.551–0.683), indicating the best discriminative ability, while the AUPRC was 0.763 (0.702–0.815), demonstrating good predictive performance despite data imbalance. By calculating evaluation metrics such as sensitivity, specificity, and accuracy at the selected thresholds ( Table 3 ), the model was found to outperform the other models overall. The F1 score, a commonly used metric for assessing model performance-especially in cases of class distribution imbalance-was highest for the XGB model at 0.687. Supplementary Figure 2 also illustrated that the model exhibited superior calibration ability. The XGB model also benefited more than the all-patient treatment regimen or the no-patient treatment regimen ( Supplementary Figure 3A ). In the temporal validation cohort, although the NB model had the highest AUROC of 0.629 (0.560–0.697), the XGB model achieved superior sensitivity of 0.652 and F1 score of 0.662 ( Table 4 ). Ultimately, the XGB model was selected as the best predictor for clinical pregnancy. ROC curve and PRC curve for pregnancy for the internal validation cohort. ROC curves (a, c) and PRC curves (b, d) for the model 1 and model 2. Abbreviations: ROC, receive operating characteristic; PRC, precision-recall curve; LR, logistic regression; XGB, extreme gradient boosting; RFC, random forest classifier; SVM, support vector machine; LGBM, Light Gradient Boosting Machine. Performance of six machine learning models respectively built during the pre- and treatment phases in the internal validation cohort. Abbreviations: AUROC, Area Under the Receiver Operating Characteristic curve; AUPRC, Area Under the Precision-Recall Curve; PPV, Positive Predictive Value; NPV, Negative Predictive Value; LR, Logistic regression; XGB, EXtreme Gradient Boosting; RFC, Random Forest; SVM, Support Vector Machine; NB, Naive Bayes; LGBM, Light Gradient Boosting Machine. Performance of six machine learning models respectively built during the pre-and treatment phases in the temporal validation cohort. Abbreviations: AUROC, Area Under the Receiver Operating Characteristic curve; AUPRC, Area Under the Precision-Recall Curve; PPV, Positive Predictive Value; NPV, Negative Predictive Value; LR, Logistic regression; XGB, EXtreme Gradient Boosting; RFC, Random Forest; SVM, Support Vector Machine; NB, Naive Bayes; LGBM, Light Gradient Boosting Machine. On the internal validation cohort, ROC and PRC curves were plotted, and evaluation metrics such as sensitivity, specificity, and accuracy were calculated based on the selected thresholds ( Table 3 ). The results indicated that the XGB model achieved the highest AUROC of 0.652 (95% CI: 0.590–0.714), AUPRC of 0.737 (95% CI: 0.671–0.793), and the sensitivity of 0.66, with an F1 score of 0.695. Supplementary Figure 2 also showed that the model exhibits superior calibration. The XGB model also yielded more benefit than the all-patient treatment regimen or the no-patient treatment regimen when clinical intervention was at threshold probabilities of 0.45–0.8 ( Supplementary Figure 3B ). In the temporal external validation cohort, the XGB model continued to show better performance with the highest F1 score of 0.702, indicating the optimal model accuracy, as presented in the Table 4 . Consequently, the XGB model was identified as the most effective model for predicting pregnancy success. The SHAP summary plot was generated based on the final XGBoost model results. The Figure 3 illustrates the contribution of each feature to the predictions of the two models, respectively. In Model 1, the relevant features, in descending order of importance, were: female age, BMI, AFC, E2, FSH, and Primary infertility. In Model 2, the relevant features, also in descending order of importance, were: P on HCG day, female age, E2 on HCG day, Gn average dosage, AFC, endometrial thickness on HCG day, E2, and the number of oocyte-retrieval procedures. SHAP summary plot of the model 1 (a) and model 2 (b). Based on the length of the light blue bars representing Mean Shaply Value, the features were ranked from top to bottom in order of importance. Each point represents a sample of data, with red indicating a high raw value for the feature and colour blue a low value. The X-axis at the bottom shows the shapley value contribution for each feature. Abbreviations: BMI, Body Mass Index; FSH, Follicular-stimulating hormone; E2, Estradiol; P, Progesterone; AFC, Antral follicle counting; Gn, Gonadotropin Factor; HCG, human chorionic gonadotropin; SHAP, SHapley Additive exPlanations. Finally, Model 1 and Model 2 constructed by XGB algorithm were implemented into the Web application, which can be accessed through https://preivf-predictor.streamlit.app/ and https://ivf-predictor.streamlit.app/ . The tool will automatically predict the likelihood of successful pregnancy in females when the desired feature values for the model are inputted. Meanwhile, the Web calculator can provide interpretation of the model predictions, as shown in the Figure 4 . Clinical application of Web calculator based on the model 1 (a) and model 2 (b). The page automatically displays the probability of pregnancy success when the actual values required by the model are entered. It also displays a force diagram for each patient, with the blue features on the right side being those that push the prediction into the ‘non-pregnant’ category, and the red features on the left side being those that push the prediction into the ‘pregnant’ category, which can help to develop a strategy to improve the success of the pregnancy. Abbreviations: BMI, Body Mass Index; FSH, Follicular-stimulating hormone; E2, Estradiol; P, Progesterone; AFC, Antral follicle counting; Gn, Gonadotropin Factor; HCG, human chorionic gonadotropin; SHAP, SHapley Additive exPlanations. Supplementary Figure 4 illustrated the results of the subgroup analysis for the two models across the training, internal validation, and temporal external validation cohorts. We observed that, for the DOR population, both models exhibited relatively low performance, which may be attributed to an insufficient sample size ( n = 50). Specifically, Models 1 and 2 demonstrated suboptimal performance in the GnRH antagonist protocol and the early follicular phase long-acting GnRH agonist long protocol populations, respectively. As shown in the Supplementary Figure 5 , the RCS results indicated a nonlinear relationship between age and pregnancy outcomes, with higher pregnancy success rates noted among patients aged 25 to 31 years. Additionally, a linear relationship was identified between the other characteristics and outcomes. Further analysis of the threshold effects on pregnancy outcomes revealed that FSH was consistently negatively correlated with pregnancy success. Specifically, when the AFC exceeded 13 and endothelial thickness surpassed 12, the pregnancy success rate increased. Conversely, when Gn average dosage was greater than 204 IU/ day, E2 levels on HCG day exceeded 2866 pg/ml, and P levels on HCG day were above 0.75 ng/ml, the pregnancy failure rate increased.

Materials

This study was a retrospective analysis of a data comprising 1989 female patients who underwent IVF/ICSI treatment at the First Affiliated Hospital of Xiamen University between June 2016 and December 2021. The inclusion criteria were (i) female age ≥ 18 years old; (ii) at least one available embryo of good morphological quality; and (iii) if the patient underwent multiple cycles of IVF/ICSI treatment, only the last recorded cycle was selected. The exclusion criteria included: (i) missing demographic information and incomplete cycle management data; (ii) cancellation of IVF before oocyte retrieval; (iii) women who did not retrieve oocytes or were not successfully fertilized during the treatment period; and (iv) patients who had undergone or were undergoing frozen embryo transfer. Additionally, we collected a dataset of patients who underwent IVF/ICSI treatment at the hospital from January to December 2022 as the temporal validation cohort. The study received approval from the First Affiliated Hospital of Xiamen University institutional review board (IRB) with the approval number KY2023-038. Candidate predictors were selected based on their association with pregnancy outcomes, as identified in previous studies, while also considering the feasibility of clinical application. Relevant information regarding patient characteristics, basic testing data, and treatment details prior to embryo or blastocyst transfer was collected. General demographic information included female age, body mass index (BMI), type of infertility (primary or secondary), and duration of infertility. Previous causes of infertility encompassed pelvic and fallopian tube factors, polycystic ovary syndrome (PCOS), diminished ovarian reserve (DOR), endometriosis (EMS), and unexplained infertility. Pre-treatment hormone levels were assessed, including AMH, FSH, estradiol (E2), progesterone(P), luteinizing hormone (LH), and antral follicle counting (AFC). In conjunction with the individual patient’s condition, the doctor and patient collaboratively determined the treatment prescription, which included GnRH antagonist protocol, Ultra-long GnRH agonist protocol, GnRH agonist long protocol, and Early-follicular phase long-acting GnRH agonist long protocol, entering into the treatment phase. Following the administration of ovulation-stimulating drugs, the healthcare provider recorded the patient’s gonadotropin factor (Gn) average dosage, E2 on HCG day, P on HCG day, LH on HCG day, endometrial thickness on HCG day, The endometrial morphology on the day of hCG administration was categorized as pattern A, B, or C according to the Gonen classification system [ 10 ], and the average estrogen levels in individual follicles. This information helped assess the maturity and quality of the follicles to determine the optimal time for oocyte retrieval. The oocyte retrieval rate was defined as the number of oocytes retrieved divided by the number of follicles larger than 10 mm on the hCG day. The dominant follicle rate was determined by dividing the number of follicles larger than 14 mm by the total number of follicles larger than 10 mm on the hCG day. MII refers to the stage of oocyte maturation at which the oocyte is metaphase II-arrested, indicating it has completed the first meiotic division and is considered mature and competent for fertilization. The presence of 2PN was a marker of normal fertilization, and the rate and quality of its development determine the number of embryos available on day 3 after in vitro fertilization. To minimize the influence of embryo quality, our data included variables such as the number of available embryos on day 3 and the number of blastocysts formed on days 5/6. Number of embryos available on day 3: reflects the developmental potential of cleavage-stage embryos. Blastocysts formed on days 5/6: reflects the blastocyst formation capacity of embryos and is a key indicator of embryonic developmental potential. The primary outcome of this study was the occurrence of individual clinical pregnancy, defined as a positive pregnancy test accompanied by the presence of a subsequent intrauterine gestational sac observed on ultrasound [ 11 ]. First, we removed features with more than 20% missing data. Next, we randomly divided the dataset into the training and internal validation cohorts in a 7:3 ratio. Finally, we imputed the missing values of continuous variables using the K-Nearest Neighbors (KNN) method to prevent data leakage. The normality of quantitative variables was assessed using the Shapiro-Wilk test. Quantitative variables were expressed as mean ± standard deviation or median ± interquartile range. Continuous variables that met the criteria for normal distribution were compared between groups using the Student’s t-test, while those with non-normal distributions were analyzed using the Wilcoxon rank-sum test. Categorical variables, presented as frequencies (percentage), were compared using the chi-square test or Fisher’s exact test. The training set was utilized to train the model and adjust hyperparameters, while the internal validation set was employed to verify the model’s performance. Univariate logistic regression analysis was conducted to select statistically significant variables ( p < 0.2). Subsequently, lasso regression was applied to identify potential risk variables with non-zero coefficients. Significant predictors associated with pregnancy outcomes were determined through Multivariate logistic regression analysis, in conjunction with findings from previous studies. Z-score normalization was applied to all continuous variables, and one-hot encoding was performed for all categorical variables. Statistical analyses were conducted using R and Python, with statistical significance defined as two-sided p < 0.05. Models were developed in two phases on the training cohort: Pre-treatment and treatment, utilized to predict whether the patient would receive future treatment or achieve a successful pregnancy, respectively. Six algorithms-Logistic Regression (LR), XGBoost (XGB), Random Forest Classifier (RFC), Support Vector Machine (SVM), Naive Bayes (NB), and LightGBM (LGBM)-were employed to construct the optimal model. Hyperparameters were identified using randomized search and 10-fold cross-validation. The model developed for this phase incorporates only baseline demographic variables (e.g. maternal age and BMI) and baseline endocrine parameters (e.g. antral follicle count, basal FSH, AMH, and basal progesterone) to provide patients with an estimate of post-treatment pregnancy success probability during the initial consultation. It also aimed to assist couples and clinicians in making informed decisions, thereby enhancing the quality of decision-making, feasibility, and acceptability for the patient. At this stage, the model retains all pre-treatment variables and incorporates variables collected during treatment that are associated with pregnancy outcomes, such as drug dosage, daily HCG hormone levels, and embryo status-were incorporated to develop a predictive model. This model allowed patients to estimate the likelihood of future successful pregnancies prior to embryo transfer and assisted clinicians in implementing interventions to improve pregnancy success rates. In the internal validation cohort, the Area Under the Receiver Operating Characteristic curve (AUROC) was used to evaluate the model’s ability to distinguish between positive and negative samples. The Area Under the Precision-Recall Curve (AUPRC) offers a more precise assessment of model performance, especially in cases where the distribution of positive and negative samples was imbalanced. Additional evaluation metrics, such as sensitivity and specificity, were calculated based on the optimal threshold derived from the receiver operating characteristic (ROC) curve. The F1 score was employed to gauge the overall performance of the model. Furthermore, the calibration curve is essential for assessing whether the model’s probabilistic predictions align with actual observations. The Brier score closer to 0 indicated better model calibration. The decision curve analysis (DCA) was used to compare the net benefits of different models under different treatment decisions. We collected the temporal external validation cohort in 2022 that met the inclusion and exclusion criterion. We also calculated the relevant evaluation metrics to further assess the performance of our model and its generalization ability. The optimal model was chosen based on the evaluation results, and its interpretability was elucidated through SHapley Additive exPlanations (SHAP) approach, which was visualized using a summary plot. This summary plot integrated beehive chart and characteristic importance bar chart to illustrate the distribution of SHAP values for each feature, thereby enhancing the understanding of each feature’s contribution to the model’s predictions. Models and its SHAP values were implemented into a web application established based on the Streamlet Python-based framework, allowing users to access the website for free online and enhancing the clinical utility. We conducted subgroup analyses within each cohort based on infertility factors and treatment regimens to determine whether the predictive utility of the two models remained consistent across patient populations with varying treatment approaches. The population was categorized into six groups: Pelvic and fallopian tube factors group, PCOS group, DOR group, EMS group, unexplained infertility group, and Multiple infertility factors group. The treatment regimens were classified into four categories: GnRH antagonist protocol group, Ultra-long GnRH agonist protocol group, GnRH agonist long protocol group, and Early-follicular phase long-acting GnRH agonist long protocol group. For characteristics that significantly affected the model, such as female age, FSH, AFC, Gn average dosage, E2 on HCG day, and P on HCG day, RCS curves were utilized to visualize their relationship with pregnancy outcomes.

Discussion

In this study, we developed two XGB models for the first step of predicting the probability of pregnancy during both the pre-treatment and treatment periods. The pre-treatment model utilizes collectible clinical variables during initial consultations to enable early-stage prediction. The in-treatment model integrates treatment metrics to guide therapeutic optimization. This dual-phase architecture establishes a patient-centric predictive framework supporting personalized treatment adaptation through key clinical decision points. Within the infertile pathway-including male-factor infertility-AI models are poised to integrate female and male predictors; evidence of covert sperm dysfunction underscores the need to incorporate male functional metrics when available [ 12 ]. Compared to previous models based on conventional statistical methods, such as restricted cubic spline regression applied to IVF-only CLBR prediction [ 8 , 9 ], our study presents several key advancements. We incorporated both IVF and ICSI cycles to enhance generalizability, employed a dual-phase framework for dynamic risk assessment, conducted rigorous performance evaluation across multiple metrics, and implemented the models as online calculators to facilitate clinical application. Additionally, we utilized SHAP to interpret machine learning outputs and applied RCS curves to explore nonlinear associations and threshold effects between key variables and pregnancy outcomes. Age, a critical factor affecting oocyte quality and ovarian reserve, is well recognized as a strong predictor of ART success [ 6 , 13 , 14 ]. The RCS analysis revealed a nonlinear relationship between age and pregnancy probability, with the highest success observed in women aged 25–30 years, followed by a decline after 30. This pattern is consistent with findings from a large retrospective study ( N = 7,243 women; 16,782 cycles) identifying 25–30 years as the optimal age range for autologous oocyte use [ 15 ]. Furthermore, pregnancy rates decreased with younger age below 25, despite this group typically being considered ideal candidates for IVF/ICSI. This unexpected trend can be attributed to the higher incidence of whole-chromosome nondisjunction in young women [ 16 ], which increases aneuploidy in their oocytes and, consequently, lowers implantation and live-birth rates [ 17 ]. On the other hand, occult reproductive-tract infections or antisperm antibodies in males can impair sperm DNA integrity via oxidative-stress pathways [ 18 ]. More critically, even when conventional semen parameters concentration, motility, and morphology fall within reference ranges, covert subtle defects, including diminished mitochondrial membrane potential and aberrant chromatin packaging, suffice to precipitate fertilisation failure in oocyte-donation cycles [ 12 ]. Previous studies have documented that woman classified as obese (BMI ≥ 30 kg/m 2 ) had lower live birth rates after IVF than women who possessed a normal weight (BMI 18.5–24.9 kg/m 2 ) who underwent ART assessment [ 19 ]. In this study, our result demonstrated that female BMI was positively associated with pregnancy achievement rates, which may be attributed to the fact that most patients included had BMI values within the normal range. Therefore, appropriate weight management is critical in optimizing pregnancy achievement rates and pregnancy outcomes. Nutraceuticals may complement metabolic/endocrine optimization along this pathway, with inositols showing reproductive/endocrine benefits [ 20 ] and myo-inositol ± melatonin improving metabolic and thyroid profiles in mid-life women [ 21 ], betaine-containing combinations exhibit anti-inflammatory/endocrine modulation [ 22 ]. Vitamin D status also intersects with reproductive and metabolic features in PCOS, including endometrial receptivity [ 23 ]. Upon entering the treatment phase, several characteristic variables become relevant. Our treatment-stage prediction models include covariates such as E2 level on HCG day, Gn average dosage, endometrial thickness on HCG day, progesterone level on HCG day, and number of oocytes retrieved, all of which have demonstrated strong associations with pregnancy outcomes [ 10 , 24–26 ]. Serum progesterone concentrations are recognized as significant predictors of pregnancy rates for fresh treatment cycles[ 27 ]. Most studies report that elevated progesterone levels on HCG day negatively impact clinical outcomes [ 28–30 ], although the exact threshold varies between 0.8 ng/ml and 2.0 ng/ml. In our study, clinical pregnancy rates notably changed when progesterone exceeded 0.75 ng/ml. Differences among studies likely stem from variations in measurement techniques and ovarian responsiveness, highlighting the importance of individual clinical factors. In antagonist ovarian stimulation protocols, the use of GnRHa trigger and embryo cryopreservation is an effective approach to prevent ovarian hyperstimulation syndrome (OHSS). For high-risk patients, administration of a double-dose GnRH antagonist before hCG trigger has been shown to significantly reduce E2 and progesterone levels without compromising pregnancy rates, offering a pharmacologic approach to threshold management [ 31 ]. For patients planning fresh embryo transfer, administering a double-dose GnRH antagonist before the hCG trigger remains an effective strategy. Pharmacologic threshold management is feasible: in high-risk antagonist cycles, a double-dose GnRH antagonist before trigger lowers E2 and P without reducing pregnancy rates [ 31 ]. While previous research confirms positive correlations between follicular maturation, E2 levels on HCG day, and conception probability [ 25 , 32 ]. Our data paradoxically identify elevated E2 as predictive of reduced pregnancy rates. This apparent contradiction may reflect an endocrine imbalance beyond optimal thresholds - excessively high E2 could create a suboptimal uterine environment through [ 33 ]. When E2 is supraphysiologic, oocyte/embryo cryopreservation is a rational option; closed-system oocyte vitrification yields recipient outcomes comparable to fresh oocytes, supporting proactive fertility preservation when needed [ 34 ]. Perinatal counseling should also acknowledge a modestly increased risk of minor congenital heart defects after ART, with limited evidence for major defects [ 20 ]. Standardized reproductive counseling embedded within individualized stimulation/transfer strategies can align choices with patient values and safety [ 35 ]. When E2 exceeds optimal levels and suggests impaired endometrial receptivity, a freeze-all approach may be advantageous; Frozen Embryo Transfer (FET) has been shown to achieve similar cumulative live birth rates to fresh transfer but with higher neonatal birthweight and reduced risk of low birthweight [ 36 ]. Perinatal counseling should also note that ART offspring may have a slightly increased risk of minor congenital heart defects, although evidence for major defects remains limited, supporting individualized counseling [ 37 ]. Beyond endocrine markers, thyroid autoimmunity may impair oocyte quality and pregnancy outcomes, suggesting that future predictive models could incorporate immune-related biomarkers [ 38 ]. From a clinical workflow perspective, standardized reproductive counseling alongside individualized stimulation and transfer strategies can improve patient understanding, decision-making, and outcomes [ 35 ]. The impact of Gn dosage on ART outcomes is multifactorial. Crucially, our analysis reveals that pregnancy rates decrease beyond an average daily Gn threshold of 203.27 IU, demonstrating that higher doses do not universally improve success. These results substantiate the need for individualized ovarian stimulation protocols incorporating comprehensive medical monitoring to optimize pregnancy success. The possible limitations of this study should be noted. First, to ensure the data quality and consistency of the model input variables and avoid bias introduced by missing data, some key male factors and some possible confounding variables, such as dietary habits, smoking status, and drinking habits, were not included in this study. Future AI iterations should incorporate male functional metrics (e.g. oxidative-stress or immunologic markers) where available [ 12 , 18 ] and pre-treatment nutraceutical profiles [ 20 , 22 , 23 ]. Second, this study was designed as retrospective research and is therefore subject to selection bias and other relatively unavoidably biases reflecting the retrospective design of the study in all possible aspects. Third, as this study was conducted at a single center and only included fresh transfer cycles with at least one transferable embryo. Therefore, the performance of this model may decline in different scenarios with different geographical differences and clinical strategies, future work will need to incorporate data from additional centers to corroborate the model generalizability. Finally, even though the overall performance of the model is likely limited, from the individual patient vantage point, the research team considers it to be a reasonable clinical decision support tool.

Conclusions

This study developed and validated two machine learning models to predict clinical pregnancy outcomes across the pre-treatment and treatment phases of IVF/ICSI. By integrating SHAP for model interpretability and deploying the models as user-friendly online calculators, we provide a practical and accessible tool to support individualized clinical decision-making. This dual-phase framework addresses key limitations of existing models and offers new insights into optimizing reproductive strategies and improving ART success in real-world clinical settings.

Introduction

With ongoing global environmental changes and evolving lifestyle patterns, infertility has emerged as an increasingly prominent public health concern, affecting a growing proportion of couples of reproductive ages [ 1 , 2 ]. It is estimated that approximately 8% to 12% of couples worldwide are affected by infertility [ 3 ]. Assisted reproductive technology (ART), a pivotal medical intervention, encompasses a range of techniques-including in vitro fertilization (IVF) and intracytoplasmic sperm injection (ICSI)-that offer effective solutions for couples experiencing infertility [ 4 ]. In recent years, the global application of ART has expanded substantially, resulting in the birth of more than seven million children worldwide through these technologies [ 5 ]. To enhance the success rates of ART, numerous studies have focused on developing predictive models for pregnancy outcomes [ 6 , 7 ]. However, most existing models are limited to a single stage of the treatment process and fail to provide a comprehensive prediction across the entire IVF cycle-from the pre-treatment phase through to embryo transfer. This limitation restricts their capacity to offer holistic prognostic evaluations and to guide personalized clinical decision-making [ 8 ]. Some studies have attempted to construct risk prediction models specifically for the pre-treatment phase, aiming to identify patients with higher pregnancy potential, thereby supporting individualized clinical decisions, reducing unnecessary interventions, and improving the efficiency of medical resource allocation [ 9 ]. Nonetheless, these models have predominantly relied on traditional regression methodologies, which assume linear relationships and lack the capacity to capture complex, nonlinear interactions among variables. Furthermore, many models exclude ICSI-related data, thereby constraining their generalizability and clinical applicability [ 9 ]. In this study, we developed two predictive models for pregnancy outcomes based on real-world clinical data from IVF/ICSI treatment cycles, encompassing both the pre-treatment and treatment-phases. A range of widely used machine learning algorithms was employed to evaluate model performance, while the SHapley Additive exPlanations (SHAP) framework was used to enhance model interpretability, thereby increasing clinical transparency and user trust. Additionally, both models were deployed as interactive online calculators to facilitate clinical application and to support individualized reproductive management and risk stratification in ART practice.

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

⚙ Ask this paper AI returns verbatim quotes from the full text · source: pmc-nxml ⓘ

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2025) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc: last seen: 2026-07-02T06:07:54.402228+00:00
unpaywall: last seen: 2026-05-21T05:10:58.409756+00:00

License: publisher-OA-unknown · commercial use NOT OK · attribution required