Optimizing predictive features using machine learning for early miscarriage risk following single vitrified-warmed blastocyst transfer.

doi:10.3389/fendo.2025.1557667

Optimizing predictive features using machine learning for early miscarriage risk following single vitrified-warmed blastocyst transfer.

2025 · doi:10.3389/fendo.2025.1557667 · PMID:40309447 · PMC12040701

OA: gold CC-BY-4.0

📄 Open PDF Full text JSON View on PubMed View at publisher

Full text 25,527 characters · extracted from pmc-nxml · 5 sections · click to expand

Intro

Early miscarriage, defined as the spontaneous loss of pregnancy before 12 weeks and 6 days of gestation, is a common complication in in vitro fertilization (IVF) pregnancies, affecting 10-15% of cases, with 80% occurring in the first trimester ( 1 , 2 ). In single vitrified-warmed blastocyst transfer (SVBT), early miscarriage risk remains significant due to the multifactorial nature of its causes, including genetic abnormalities, uterine factors, hormonal imbalances, and environmental influences ( 1 , 3 , 4 ). Accurate predictive models are essential for identifying at-risk pregnancies and optimizing clinical decision-making. Machine learning (ML) offers a powerful approach for early miscarriage risk prediction, as it can process large, heterogeneous datasets and capture complex, non-linear relationships between multiple clinical, embryological, and demographic variables ( 5 , 6 ). Traditional statistical models, such as logistic regression and LASSO regression, have been widely applied in reproductive medicine but have demonstrated only moderate predictive performance, with AUC values ranging from 0.615 to 0.660 ( 6 ).These models often assume linear relationships among predictors and may struggle with the intricate interdependencies inherent in reproductive outcomes. In contrast, ML techniques, particularly ensemble learning methods, can leverage multiple algorithms to improve predictive accuracy, robustness, and generalizability ( 7 ). Ensemble learning, which combines the predictions of multiple models, has demonstrated significant advantages in improving predictive accuracy and robustness. Common techniques include Voting, Stacking, and Boosting. Voting aggregates predictions from several models to produce a final result, Stacking uses the outputs of multiple models as inputs for a meta-model, and Boosting builds models sequentially, with each iteration correcting errors from the previous one ( 7 ). Ensemble methods have been successfully applied in various medical domains, such as cancer risk prediction, where Stacking models improved the classification of tumor types ( 8 ), and in diagnosing cardiovascular diseases, where Boosting algorithms enhanced early detection of heart conditions ( 9 ). These successes underscore ensemble learning’s ability to integrate diverse features, reduce bias and variance, and improve stability when applied to complex clinical datasets. Given the multifactorial nature of early miscarriage, ensemble methods are particularly suited for capturing the intricate relationships between predictive factors and delivering reliable predictions ( 10 , 11 ). SVBT is widely used in assisted reproductive technologies (ART) due to its ability to reduce the risks associated with multiple pregnancies, such as preterm birth and low birth weight. Despite its advantages, including improved timing flexibility and enhanced endometrial synchronization ( 12 ), the risk of early miscarriage following SVBT remains a concern. Recent advancements in ML have demonstrated its efficacy in medical diagnosis and risk assessment, including applications in obstetrics and gynecology. For example, ML models have been used to predict pregnancy complications and implantation failure ( 13 – 15 ). To date, no predictive models have been specifically developed for early miscarriage in SVBT cycles. This study aims to develop and evaluate machine learning models to predict early miscarriage risk following SVBT by conducting a comparative analysis of multiple approaches, including Logistic Regression, Random Forest, Gradient Boosting, and ensemble methods, to identify the most effective predictive model. Additionally, we integrate ensemble learning techniques, specifically the Voting Classifier and Stacking Classifier, to leverage the strengths of multiple models, further enhancing predictive accuracy and supporting more informed clinical decision-making.

Results

Of the 1,664 SVBT cycles analyzed, significant differences were observed between the miscarriage and non-miscarriage groups. Advanced maternal and paternal ages were associated with higher miscarriage risk, with median ages of 33 and 35 years compared to 32 and 33 years in the non-miscarriage group (p<0.001). The miscarriage group also exhibited thinner endometrial thickness (9.3 mm vs. 9.5 mm, p=0.031), delayed blastocyst development by day 5 (57.5% vs. 64.9%, p=0.017), and poorer inner cell mass (ICM) quality (grade C: 14.0% vs. 8.4%, p<0.01). Additionally, ovarian-related infertility was more prevalent in the miscarriage group (30.8% vs. 21.5%, p=0.001). These findings highlight the multifactorial nature of miscarriage risk, emphasizing the importance of integrating parental, embryonic, and endometrial factors into predictive models ( Table 1 ). Baseline characteristics of 1664 SVBT cycles stratified by early miscarriage outcomes. *:p<0.05. **:p<0.01. ***:p<0.001. Mutual information (MI) analysis identified maternal age, paternal age, number of oocytes retrieved, and endometrial thickness as the top predictive features for early miscarriage ( Figure 2 ). Other important features included Gn duration, total Gn dose, infertility duration, BMI, previous gravidity, and basal FSH. These findings align with clinical evidence, emphasizing the role of parental demographics, ovarian response, and endometrial conditions in determining pregnancy outcomes. The selected features were subsequently used for model development to enhance prediction accuracy. Top 10 features ranked by mutual information scores for predicting early miscarriage. Based on the recursive feature elimination (RFE) combined with a random forest classifier, the top 10 ranked features have been selected for the next step of model construction to predict early miscarriage. The selected features are maternal age, paternal age, BMI, basal FSH, basal LH, infertility duration, trigger day estradiol, total gn dose, number of oocytes retrieved, endometrial thickness. Based on the results from mutual information analyses, and RFE analyses, the features selected for the next step of model construction are: maternal age, paternal age, previous gravidity, total Gn dose, number of oocytes retrieved, endometrial thickness, Gn duration, infertility duration, BMI, basal FSH, basal LH and trigger day estradiol. Receiver Operating Characteristic (ROC) analysis on the training set demonstrated superior performance of ensemble models compared to individual classifiers. Repeated 10-fold cross-validation confirmed the robustness of these results, ensuring reliable performance across different data subsets. Detailed comparisons of all models are provided in Figure 3 . ROC curves Evaluating the performance of eight machine learning models with 10-Fold Cross-Validation on the Training Set for predicting early miscarriage. (A) Logistic Regression; (B) Random Forest classifier; (C) Extra Trees classifier; (D) Gradient Boosting classifier; (E) XGBoost classifier; (F) KNeighbors classifier; (G) CatBoost classifier; (H) AdaBoost classifier. Table 2 presents the average performance of these algorithms across six key metrics: Area Under the Curve (AUC), accuracy, recall, precision, F1-score, and specificity. Based on these metrics, the Gradient Boosting Classifier and the CatBoost Classifier emerged as the top-performing models. The Gradient Boosting Classifier leads in terms of AUC (0.831), accuracy (0.777), recall (0.649), and F1-score (0.744), demonstrating high effectiveness across various critical metrics. The CatBoost Classifier also exhibits strong performance, particularly in AUC (0.819), accuracy (0.754), precision (0.894), and specificity (0.932) ( Table 2 ). These characteristics make both models robust and reliable for predicting early miscarriage following SVBT. Performance comparison of different machine learning models on the testing set. *:Ensemble Model. To enhance the performance of early miscarriage prediction models, this study employed ensemble learning methods by constructing two ensemble models based on the Gradient Boosting Classifier and the CatBoost Classifier: a Voting Classifier and a Stacking Classifier. In the training set, the ROC curves of the Voting Classifier and the Stacking Classifier demonstrate comparable performance to those of the individual Gradient Boosting Classifier and CatBoost Classifier ( Figure 4 ). ROC curves comparing the performance of four machine learning models on the training data. In the evaluation of classifiers for predicting early miscarriage, the Voting Classifier demonstrated superior performance, leading in key metrics including AUC, where it scored 0.836, and accuracy, with a value of 0.780. This model also excelled in precision, achieving the highest among the classifiers at 0.914, and in specificity, where it led with a score of 0.942. The Gradient Boosting Classifier also showed robust performance across various metrics. It ranked second in both AUC (0.831) and accuracy (0.777), indicating its strong capability to distinguish between cases. Additionally, it demonstrated good recall (0.649) and an F1 score (0.744), reflecting its balanced performance in identifying true positives while maintaining a lower rate of false negatives. The precision and specificity scores were also high at 0.871 and 0.904, respectively, reinforcing its applicability in diverse clinical environments where both sensitivity and precision are crucial ( Table 2 ).

Discussion

The study demonstrates that ensemble machine learning models, particularly the Voting Classifier and Gradient Boosting Classifier, significantly improve the prediction of early miscarriage following SVBT cycles. The Voting Classifier achieved the highest performance metrics, with an AUC of 0.836, accuracy of 0.780, and precision of 0.914, underscoring its robustness and clinical applicability. The Gradient Boosting Classifier also exhibited strong predictive capability (AUC = 0.831, accuracy = 0.777), effectively capturing complex, non-linear interactions among features, such as parental age, endometrial thickness, ovarian response, and blastocyst quality. These findings underscore the effectiveness of ensemble methods in capturing the multifactorial nature of early miscarriage risk. Previous studies on early miscarriage prediction have primarily relied on traditional statistical models, such as logistic regression and LASSO regression, which have typically demonstrated only moderate predictive performance, with AUC values ranging from 0.615 to 0.660 ( 6 ). These methods often struggle to capture the complex, non-linear relationships among predictive variables, thereby limiting their accuracy and generalizability. Ensemble learning models offer a promising alternative by integrating multiple algorithms, leading to improved predictive accuracy and robustness. Prior research has highlighted the advantages of ensemble methods in pregnancy-related predictions, including naturally conceived pregnancies ( 10 ). However, these models have not been specifically optimized for SVBT cycles, which involve unique physiological factors such as endometrial synchronization and blastocyst vitrification. The superior performance of our proposed methodology can be attributed to several key factors. First, ensemble learning methods aggregate predictions from multiple models, reducing individual model biases and enhancing generalization. The Voting Classifier, in particular, leverages the strengths of multiple base models, producing more stable and accurate predictions. Second, Gradient Boosting enhances feature importance by iteratively improving weak learners, making it highly effective in handling the intricate dependencies among clinical and embryonic factors. Unlike traditional statistical models that assume linear relationships, boosting techniques dynamically refine decision boundaries, leading to superior classification performance. Additionally, advanced feature selection techniques, including Mutual Information (MI) and Recursive Feature Elimination (RFE), were incorporated to improve model interpretability and efficiency. By systematically removing irrelevant or redundant features, our models focus on the most clinically meaningful predictors, such as maternal age, endometrial thickness, and embryo quality, thereby enhancing both predictive accuracy and generalizability. This study addresses a critical gap by demonstrating that ensemble learning models significantly improve early miscarriage risk prediction in SVBT cycles, achieving higher accuracy and reliability compared to traditional approaches. These findings establish a new benchmark for predictive modeling in ART and highlight the potential of machine learning in enhancing personalized risk assessment and clinical decision-making. The findings of this study hold significant implications for clinical practice. The enhanced predictive accuracy of ensemble machine learning models offers the potential for more personalized care in ART. Clinicians can leverage these models to identify pregnancies at high risk of early miscarriage, enabling closer monitoring and tailored counseling, while patients with lower risk might benefit from reduced interventions. Integrating these machine learning models into electronic medical record (EMR) systems could further streamline risk assessment, providing real-time, data-driven support for clinical decision-making. Beyond their immediate clinical utility, these findings also pave the way for future research to investigate additional predictors of early miscarriage, including genetic, molecular, and lifestyle factors, to further refine and enhance model performance. This study has several notable strengths. First, it is the first to develop ensemble learning models specifically designed for predicting early miscarriage in SVBT cycles, addressing a critical gap in the literature. Second, the study utilized a large and well-documented dataset from two reproductive medicine centers, which enhances both the reliability and generalizability of the findings. Third, the use of rigorous validation techniques, such as repeated stratified 10-fold cross-validation, ensured robust model performance and reduced the risk of overfitting. Despite these strengths, several limitations should be acknowledged. The retrospective design may introduce biases related to data collection and patient selection. The absence of preimplantation genetic testing in the dataset restricts the ability to account for chromosomal abnormalities, a major contributor to early miscarriage. Furthermore, the dataset lacked sociodemographic information, such as socioeconomic status and education level, which are known to influence pregnancy outcomes. Lastly, while the models were specifically developed for SVBT cycles, their generalizability to other ART procedures or naturally conceived pregnancies remains to be validated in future studies.

Conclusions

The study underscores the potential of ensemble machine learning models, particularly the Voting Classifier and Gradient Boosting Classifier, to significantly enhance the prediction of early miscarriage following SVBT. With the continued evolution of machine learning techniques, these models hold considerable promise in advancing clinical decision-making by delivering more accurate and personalized risk assessments.

Materials|Methods

This retrospective study was conducted at two reproductive medicine centers: the First Affiliated Hospital of Guangxi Medical University and the Nanning Maternity and Child Health Hospital. A total of 3,375 SVBT cycles performed between June 2016 and December 2022 were reviewed, among which 1,664 resulted in clinical pregnancies and 308 ended in early miscarriage. To ensure robust model development, data were randomly divided into training (70%) and testing (30%) sets. The inclusion criteria were patients undergoing SVBT, with complete clinical and laboratory records. Figure 1 outlines the study flowchart. Both centers adhered to identical laboratory and clinical protocols, ensuring consistency in patient preparation, embryo culture, and transfer procedures. Flowchart illustrating the selection process of participants for this study. Clinical and laboratory data were extracted from the EMRs of the participating centers. To ensure data accuracy, two independent clinical data managers validated key variables, including patient demographics, clinical protocols, and pregnancy outcomes. Any inconsistencies were cross-checked with original medical records before inclusion in the study. Model performance was evaluated using repeated stratified 10-fold cross-validation, ensuring proportional representation of early miscarriage cases across folds. This rigorous approach minimizes overfitting, enhances model reliability, and ensures reproducibility of findings. Ovarian stimulation protocols were tailored to individual patients based on clinical parameters such as age, BMI, baseline FSH levels, and antral follicle counts ( 16 ). Triggering of ovulation was achieved with human chorionic gonadotropin (HCG) when at least one follicle reached 18 mm in diameter, and oocyte retrieval was performed 36 hours later under ultrasound guidance. Insemination, either through conventional IVF or ICSI, was determined based on semen quality, following standard protocols at the participating centers. The blastocysts were cultured continuously in a single culture medium throughout all developmental stages and incubated under oil at 37°C in an environment containing 5% O2 and 6% CO2, with nitrogen as the balance gas. Blastocyst assessments were conducted using the Gardner scoring system ( 17 ). Fully expanded blastocysts were artificially shrunk using a laser before being cryopreserved with vitrification kits (KITAZATO). The embryos were then loaded onto a cryotop on day 5-6 post-insemination. The cryopreserved blastocysts were stored in liquid nitrogen until they were ready to be warmed. Blastocyst warming was performed using warming kits (KITAZATO) once the endometrium achieved adequate thickness. The survival of the blastocyst was assessed by its re-expansion two hours post-warming. Endometrial preparation for frozen embryo transfer (FET) followed four main protocols: 1. Modified Natural Cycle (NC): Ovulation was induced with HCG when the dominant follicle reached ≥18 mm, followed by luteal support with dydrogesterone or vaginal progesterone. Blastocyst transfer was performed 5 days post-ovulation. 2. Mild Stimulation Cycle (MS): For cases with insufficient follicular development, human menopausal gonadotropin (HMG) was administered to stimulate follicular growth. Ovulation was triggered with HCG, and luteal support was initiated before transfer. 3. Hormone Replacement Therapy (HRT): Endometrial preparation included estradiol valerate for endometrial proliferation, followed by progesterone for luteal support. Patients received either intramuscular or vaginal progesterone, combined with dydrogesterone, based on clinical needs. 4. GnRH Agonist Combined with HRT (GnRHa-HRT): Downregulation was achieved using triptorelin acetate (GnRH agonist) administered during the early follicular phase. Hormonal and endometrial parameters were monitored until complete downregulation was confirmed (e.g., estradiol <50 pg/mL, FSH <5 IU/L, LH <5 IU/L, endometrial thickness <5 mm). Following downregulation, estradiol valerate and progesterone were used to prepare the endometrium, with blastocyst transfer performed 6 days after initiating progesterone. All blastocyst transfers were conducted under abdominal ultrasound guidance ( 18 ). Protocol selection was individualized based on patient characteristics and clinical indications. The primary outcome was early miscarriage, defined as the spontaneous loss of a pregnancy before 12 weeks and 6 days of gestation. The initial dataset comprised more than 40 features, capturing a broad range of maternal, paternal, embryonic, and clinical characteristics relevant to early miscarriage risk. A panel of three reproductive medicine experts guided the feature selection process based on clinical relevance and literature review, ultimately selecting 32 features for model development. The selected features include: maternal age, paternal age, body mass index (BMI), basal FSH, previous gravidity, infertility duration, Gonadotropin (Gn) duration, total Gn dose, number of oocytes retrieved, endometrial thickness, basal LH, trigger day estradiol, blastulation time, blastocyst stage, inner cell mass (ICM), trophectoderm (TE), cleavage stage fragmentation, number of blastomeres at the cleavage stage, infertility type, previous parity, previous abortus, number of previous transfers, infertility cause, controlled ovarian hyperstimulation (COH) protocol, fertilization method, and endometrial preparation. This research included a total of 1,664 cycles, of which 1,356 did not result in early miscarriage and 308 did. The dataset was complete with no missing values, encompassing data from individuals who underwent single vitrified-warmed blastocyst transfers. To ensure compatibility with machine learning algorithms, categorical variables (e.g., endometrial preparation protocol) were encoded using Label Encoding, while continuous variables (e.g., maternal age, BMI, endometrial thickness) were standardized using Min-Max Scaling to enhance model convergence and comparability. The dataset was split using train_test_split, with 70% allocated to the training set and 30% to the testing set. The stratify=y parameter was applied to ensure stratified sampling, maintaining the class distribution consistency. Given the significant class imbalance, with more cases of non-early miscarriage than early miscarriage, we employed the SMOTETomek technique ( 19 ). This method combines SMOTE (Synthetic Minority Over-sampling Technique) and Tomek Links to balance the classes effectively. First, SMOTE generates synthetic samples for the minority class (early miscarriage) to increase its representation in the dataset, helping to prevent the model from becoming biased towards the majority class (non-miscarriage) during training. Next, Tomek Links refine the dataset by identifying and removing overlapping samples that are difficult to classify, enhancing the clarity of the decision boundary between the classes. To determine the most pertinent features for predicting early miscarriage, we used Mutual Information (MI), a statistical metric that measures the dependency between two random variables. MI quantifies how much knowing one feature reduces the uncertainty of the other, capturing all possible relationships between the features, not just linear ones. A higher mutual information value indicates a stronger dependency between the features. Recursive Feature Elimination (RFE) combined with a Random Forest classifier was employed as a feature selection technique to identify the most important features, enhancing both the predictive performance and interpretability of the model. RFE works by recursively training the model and removing the least important features in each iteration, gradually reducing the feature set size until the desired number of features is reached. In each iteration, feature importance is computed by the Random Forest classifier, which effectively captures complex nonlinear relationships between features. Based on the results from MI and RFE analyses, we selected an optimal number of features for inclusion in our model. This approach ensured that the chosen features demonstrated strong relationships and significant overall dependencies with the target variable, early miscarriage, thereby optimizing the model’s predictive power and interpretability. Eight machine learning classifiers were evaluated for their ability to predict early miscarriage, including Logistic Regression, Random Forest, Gradient Boosting, and CatBoost. Models were trained using a balanced dataset (SMOTETomek technique) to address class imbalance. Performance was assessed using repeated stratified 10-fold cross-validation, with AUC, accuracy, recall, precision, F1 score, and specificity as key metrics. Ensemble methods, including Voting and Stacking Classifiers, were constructed using top-performing models to further enhance predictive accuracy. To enhance the performance of early miscarriage prediction models, this study employed ensemble learning methods by constructing two ensemble models: a Voting Classifier and a Stacking Classifier. First, the two best-performing models were selected as base models to construct the Voting Classifier, utilizing a soft voting strategy based on predicted probabilities. The Stacking Classifier combined these two base models, with Logistic Regression serving as the meta-classifier. Both ensemble models were trained on the training data and subsequently used to make predictions and evaluations on the test data. The evaluation metrics included AUC, accuracy, recall, precision, F1 score, and specificity. Statistical analysis was conducted using Python software (Version 3.12). Participant characteristics were summarized using means and standard deviations for continuous variables, and frequencies and percentages for categorical variables. T-tests were employed to compare differences between continuous variables, while chi-square tests or Fisher’s exact tests were used for categorical variables. This approach ensured a robust and accurate assessment of the data.

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

⚙ Ask this paper AI returns verbatim quotes from the full text · source: pmc-nxml ⓘ

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2025) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc: last seen: 2026-06-28T06:08:18.748782+00:00
unpaywall: last seen: 2026-05-21T05:10:58.409756+00:00

License: CC-BY-4.0