Integrating SHAP analysis with machine learning to predict postpartum hemorrhage in vaginal births

Integrating SHAP analysis with machine learning to predict postpartum hemorrhage in vaginal births

2025 · PMC12048952

OA: gold

📄 Open PDF Full text JSON

Full text 34,895 characters · extracted from pmc-nxml · 6 sections · click to expand

Methods

This retrospective multicenter cohort study was conducted in Northeast China, focusing on women who underwent vaginal deliveries to develop and validate predictive models for PPH. The derivation cohort included women who delivered vaginally at three independent tertiary hospitals (Shengjing Hospital of China Medical University, Liaoning Maternal and Child Health Hospital, and Shenyang Women’s and Children’s Hospital) from September 2018 to December 2023. At the time of admission, all women were informed that their clinical data, excluding personally identifiable information, might be used for research purposes. Those who consented after being fully informed were included in the study. Exclusion criteria were: (1) age less than 18 years or more than 50 years; (2) gestational age at delivery less than 37 weeks or more than 42 weeks; (3) multiple births; and (4) stillbirth, neonatal death, or any induced labor performed with the intention of terminating the fetus’s life. This study was conducted in accordance with the Declaration of Helsinki and was approved by the Institutional Review Boards of Shengjing Hospital (No. 2016PS344K, Date: 17/12/2016). Using the electronic medical systems of the hospitals, data were collected on basic characteristics, obstetric history, pregnancy complications, delivery processes, and neonatal conditions to identify features for constructing predictive models. The data categories included: 1.Basic Characteristics : Age, ethnicity, education level, occupation (classified into three categories based on physical labor intensity: light physical labor (LPL), moderate physical labor (MPL), and heavy physical labor (HPL)), family per capita monthly income, pre-pregnancy Body Mass Index (BMI), smoking status, and alcohol consumption status. 2.Obstetric History and Pregnancy Complications : Gravidity, parity, history of miscarriage, spontaneous abortion history, induced and medical abortion history, history of labor induction for fetal demise (induced with the intention of terminating a nonviable or deceased fetus), use of assisted reproductive technology, gestational age at delivery, gestational diabetes, pregnancy-induced hypertension (PIH, including gestational hypertension, preeclampsia, and eclampsia), anemia during pregnancy, coagulation dysfunction, uterine fibroids or adenomyosis, polyhydramnios, umbilical cord entanglement, premature rupture of membranes, placental abruption, vaginal bleeding during pregnancy, and presence of a scarred uterus. 3.Delivery Process and Neonatal Conditions : Delivery time, total duration of labor, first stage of labor time (including latent and active phases), second stage duration, third stage duration, placental retention/adhesion/implantation, instrumental assistance in delivery, cervical, vaginal, and perineal lacerations, newborn weight, and newborn length. PPH was defined as vaginal bleeding exceeding 500 ml within 24 h after vaginal delivery, corresponding to the clinical concept of early postpartum hemorrhage. PPH was primarily measured using the weighing method, which calculates the difference in weight of the absorbent materials before and after blood collection. In cases of heavier bleeding, blood was collected in a container and measured using a graduated cup. Due to the potential impact of multicollinearity on predictive accuracy, features with a high correlation (correlation coefficient > 0.6) in Spearman’s correlation analysis were handled by removing one of the two correlated features based on its lower correlation with the outcome. This result is illustrated in Supplementary Figure S1 . Data from the derivation cohort, collected from September 2018 to December 2022 at the three independent tertiary hospitals, was divided into a training set (70%) and a validation set (30%) to prevent over-fitting. An additional test data-set from admissions between January 2023 and December 2023, with the same inclusion and exclusion criteria as the derivation cohort, was used for external validation. A total of 34 features were used to develop predictive models. Missing data were handled using median imputation, a common approach for dealing with missing values in clinical datasets. Six ML models were employed to predict PPH in critically ill pregnant women: eXtreme Gradient Boosting (XGBoost), Light Gradient Boosting Machine (LGBM), Gradient Boosting Decision Tree (GBDT), Gradient Boosting Machine (GBM), Adaptive Boosting (AdaBoost), and Bernoulli Naive Bayes (BNB). Common evaluation metrics, including the area under the receiver operating characteristic (ROC) curve (AUC), accuracy, precision, recall, and F1 score, were employed to assess the reliability and performance of the models. Additionally, Decision Curve Analysis (DCA) and calibration curves were applied to validate the predictive models on both the internal validation dataset and the external validation dataset. To ensure clinicians can accept and understand the predictive models, the SHapley Additive exPlanations (SHAP) methodology was used to calculate the contribution of each variable to the prediction, thereby explaining the output of the final model. This interpretability approach provides two types of explanations: a global explanation that describes the overall functionality of the model at the feature level, and a local explanation that shows how individual features impact the model’s output through a dependence plot. SHAP values were utilized to assist in feature selection, ranking the features of the predictive model by importance and selecting those with the strongest predictive power for further analysis. The non-parametric method of Delong et al. was used to compare differences in AUC using MedCalc version 19.6 ( https://www.medcalc.org ). Features of the selected ML models were gradually reduced until a significant decrease in AUC occurred. To enhance the clinical utility of the model, the final predictive model was implemented into a web application using the Streamlit Python framework. This application allows users to input values for the corresponding features of the final model, returning the probability of PPH and a force plot for individual sub-items. Data analysis was conducted using Python version 3.6.5 ( https://www.python.org ) and SPSS statistical software version 23.0 ( https://www.ibm.com/spss ). Continuous variables with a skewed distribution are presented as medians with interquartile ranges and were compared using the Mann-Whitney U test or Kruskal-Wallis H test. Categorical variables are presented as numbers with percentages and were compared using the chi-square test or Fisher’s exact test. Analysis of covariance (ANCOVA) was used to adjust for confounding factors. AUCs were used to evaluate predictive efficacy. DCA was performed using R version 4.1.0 ( https://www.r-project.org ). A two-tailed p-value < 0.05 was considered statistically significant.

Results

This retrospective study included a total of 30,745 parturients for the identification of the predictive model cohort. During the study period from September 2018 to December 2022, 27,389 parturients were admitted to the obstetrics departments of the three hospitals, with 2,556 excluded based on the study’s exclusion criteria. The remaining 24,833 parturients who met the inclusion criteria were randomly assigned to separate training and internal validation groups (see Table 1 ). Additionally, in the external validation cohort admitted from January 2023 to December 2023, 257 parturients were excluded, resulting in 3,099 parturients included (Supplementary Table 1 ). Study design details are shown in Fig. 1 . Table 1 Comparison of clinical characteristics and outcomes between non-PPH and PPH in the training and internal validation cohort Non-PPH( n = 23,210) PPH( n = 1,623) P Age (years) 29.0[27.0, 32.0] 30.0[28.0, 32.0] < 0.001* Ethnicity < 0.001* Han 20,848 (89.8%) 1,374 (84.7%) Manchu 1,693 (7.3%) 179 (11.0%) Other ethnic groups 669 (2.9%) 70 (4.3%) Educational Attainment < 0.001* High school or below 8,142 (35.1%) 493 (30.4%) Bachelor’s degree 12,713 (54.8%) 926 (57.1%) Postgraduate or higher 2,355 (10.1%) 204 (12.5%) Occupation < 0.001* Unemployed 10,441 (45.0%) 932 (57.4%) Light physical labor 2,573 (11.1%) 252 (15.5%) Moderate physical labor 9,591 (41.3%) 420 (25.9%) Heavy physical labor 605 (2.6%) 19 (1.2%) Family Per Capita Monthly Income (10 , 000 yuan) 0.887 5.0 1,293 (5.6%) 97 (6.0%) BMI(kg/m^2) 19.8[18.4, 22.4] 20.4[18.5, 23.3] < 0.001* Smoking 0.036 No 23,128 (99.6%) 1,612 (99.3%) Yes 82 (0.4%) 11 (0.7%) Alcohol Consumption 0.927 No 23,153 (99.8%) 1,620 (99.8%) Yes 57 (0.2%) 3 (0.2%) Pregnancy History 1[1, 2] 1[1, 2] < 0.001* Parity (number of deliveries) 0[0, 1] 0[0, 1] < 0.001* Assisted reproductive technology 0.299 No 22,595 (97.4%) 1,589 (97.9%) Yes 615 (2.6%) 34 (2.1%) Delivery (weeks) 39.5[39.0, 40.2] 40.0[39.1, 40.3] < 0.001* GDM < 0.001* No 20,207 (87.1%) 1,300 (80.1%) Yes 3,003 (12.9%) 323 (19.9%) PIH < 0.001* No 21,649 (93.3%) 1,385 (85.3%) Yes 1,561 (6.7%) 238 (14.7%) Anemia < 0.001* No 19,381 (83.5%) 1,189 (73.3%) Yes 3,829 (16.5%) 434 (26.7%) Coagulation disorder < 0.001* No 23,084 (99.5%) 1,588 (97.8%) Yes 126 (0.5%) 35 (2.2%) Uterine fibroids/adenomyosis < 0.001* No 22,595 (97.4%) 1,501 (92.5%) Yes 615 (2.6%) 122 (7.5%) Polyhydramnios < 0.001* No 20,967 (90.3%) 1,399 (86.2%) Yes 2,243 (9.7%) 224 (13.8%) Umbilical cord entanglement 0.973 No 16,254 (70.0%) 1,143 (70.4%) Yes 6,956 (30.0%) 480 (29.6%) Premature rupture of membranes < 0.001* No 18,247 (78.6%) 1,051 (64.8%) Yes 4,963 (21.4%) 572 (35.2%) Placental abruption < 0.001* No 23,180 (99.9%) 1,605 (98.9%) Yes 30 (0.1%) 18 (1.1%) Vaginal bleeding during pregnancy 0.867 No 22,030 (94.9%) 1,546 (95.3%) Yes 1,180 (5.1%) 77 (4.7%) Scarred uterus < 0.001* No 22,980 (99.0%) 1,593 (98.2%) Yes 230 (1.0%) 30 (1.8%) Time of delivery < 0.001* 0 o ‘clock 781 (3.4%) 70 (4.3%) 1 o ‘clock 793 (3.4%) 51 (3.1%) 2 o ‘clock 747 (3.2%) 48 (3.0%) 3 o ‘clock 809 (3.5%) 46 (2.8%) 4 o ‘clock 855 (3.7%) 39 (2.4%) 5 o ‘clock 840 (3.6%) 46 (2.8%) 6 o ‘clock 954 (4.1%) 60 (3.7%) 7 o ‘clock 936 (4.0%) 57 (3.5%) 8 o ‘clock 661 (2.8%) 36 (2.2%) 9 o ‘clock 915 (3.9%) 53 (3.3%) 10 o ‘clock 964 (4.2%) 58 (3.6%) 11 o ‘clock 958 (4.1%) 69 (4.3%) 12 o ‘clock 1,112 (4.8%) 70 (4.3%) 13 o ‘clock 1,221 (5.3%) 101 (6.2%) 14 o ‘clock 1,286 (5.5%) 91 (5.6%) 15 o ‘clock 1,450 (6.2%) 120 (7.4%) 16 o ‘clock 1,155 (5.0%) 87 (5.4%) 17 o ‘clock 1,070 (4.6%) 81 (5.0%) 18 o ‘clock 1,059 (4.6%) 71 (4.4%) 19 o ‘clock 1,031 (4.4%) 74 (4.6%) 20 o ‘clock 965 (4.2%) 79 (4.9%) 21 o ‘clock 951 (4.1%) 63 (3.9%) 22 o ‘clock 871 (3.8%) 72 (4.4%) 23 o ‘clock 826 (3.6%) 81 (5.0%) First stage of labor - Latent phase 194.0[124.0, 296.0] 281.0[181.0, 416.0] < 0.001* First stage of labor - Active phase 72.0[43.0, 119.0] 108.0[62.0, 205.5] < 0.001* Second stage of labor 28.0[15.0, 50.0] 49.0[26.0, 93.0] < 0.001* Third stage of labor 5.0[3.0, 7.0] 5.0[4.0, 10.0] < 0.001* Newborn weight (grams) 3365.0[3120.0, 3620.0] 3610.0[3400.0, 3930.0] < 0.001* Newborn length (centimeters) 51.0[50.0, 52.0] 52.0[50.0, 53.0] < 0.001* Analgesia during labor < 0.001* No 19,593 (84.4%) 1,177 (72.5%) Yes 3,617 (15.6%) 446 (27.5%) Placenta accreta spectrum < 0.001* No 22,440 (96.7%) 1,458 (89.8%) Yes 770 (3.3%) 165 (10.2%) Instrumental assistance in delivery < 0.001* No 22,704 (97.8%) 1,524 (93.9%) Yes 506 (2.2%) 99 (6.1%) Lacerations of the cervix , vagina , or perineum < 0.001* No 21,139 (91.1%) 1,398 (86.1%) Yes 2,071 (8.9%) 225 (13.9%) Continuous values were presented as median [interquartile range]. Categorical values were presented as number (percentage) PPH: postpartum hemorrhage; BMI: body mass index; GDM: gestational diabetes mellitus; PIH: pregnancy-induced hypertension *: P < 0.05 Comparison of clinical characteristics and outcomes between non-PPH and PPH in the training and internal validation cohort Continuous values were presented as median [interquartile range]. Categorical values were presented as number (percentage) PPH: postpartum hemorrhage; BMI: body mass index; GDM: gestational diabetes mellitus; PIH: pregnancy-induced hypertension *: P < 0.05 Fig. 1 Patient Selection Criteria Flowchart Patient Selection Criteria Flowchart Data collected during pregnancy and within 24 h after delivery were used to generate six ML models to predict the likelihood of PPH in parturients during the perinatal period. Among the six models, the XGBoost model (AUC = 0.997, CI: 0.997–0.998) demonstrated the best predictive performance for PPH, followed by the LGBM model (AUC = 0.980, CI: 0.977–0.984) and the GBDT model (AUC = 0.966, CI: 0.960–0.972). The ROC curves for all six ML models with all features included are shown in Fig. 2 , with predictive values detailed in Table 2 . Fig. 2 Comparison of ROC results of different machine learning models. XGBoost: eXtreme Gradient Boosting; LGBM: Light Gradient Boosting Machine; GBDT: Gradient Boosting Decision Tree; GBM: Gradient Boosting Machine; Ada: Adaptive Boosting; BNB: Bernoulli Naive Bayes Comparison of ROC results of different machine learning models. XGBoost: eXtreme Gradient Boosting; LGBM: Light Gradient Boosting Machine; GBDT: Gradient Boosting Decision Tree; GBM: Gradient Boosting Machine; Ada: Adaptive Boosting; BNB: Bernoulli Naive Bayes Table 2 Performance of the ML models for PPH prediction Accuracy Precision Recall F1 score XGBoost 0.99 1.00 0.78 0.87 LGBM 0.97 0.98 0.50 0.66 GBDT 0.97 0.99 0.58 0.73 Ada 0.94 0.74 0.19 0.30 BNB 0.94 0.73 0.02 0.05 PPH: postpartum hemorrhage; XGBoost: eXtreme Gradient Boosting; LGBM: Light Gradient Boosting Machine; GBDT: Gradient Boosting Decision Tree; Ada: Adaptive Boosting; BNB: Bernoulli Naive Bayes Performance of the ML models for PPH prediction PPH: postpartum hemorrhage; XGBoost: eXtreme Gradient Boosting; LGBM: Light Gradient Boosting Machine; GBDT: Gradient Boosting Decision Tree; Ada: Adaptive Boosting; BNB: Bernoulli Naive Bayes An initial predictive model incorporating all 34 identified risk factors was constructed, and SHAP value analysis was applied for feature selection. By plotting the SHAP values for each feature across all samples, we gained an intuitive understanding of the overall patterns in the data, which also facilitated the detection of outlier predictions. In these visualizations (Fig. 3 and Supplementary Fig. 2 ), each row corresponds to a different feature, with the horizontal axis representing the SHAP values. Individual data points represent samples, color-coded to indicate the magnitude of feature values—red for high and blue for low. This approach allowed us to discern the contribution of each feature to the model’s predictions and to identify any anomalies that might suggest a need for further investigation. Fig. 3 Global model explanation of initial XGBoost model SHAP value for all risk factors. (A) SHAP summary bar plot. (B) SHAP summary dot plot Global model explanation of initial XGBoost model SHAP value for all risk factors. (A) SHAP summary bar plot. (B) SHAP summary dot plot During the feature reduction process, based on feature importance ranking, the XGBoost model’s AUC and F1 score demonstrated that the model maintained good predictive power, with no significant change in predictive ability when the number of features was reduced to 15 (Fig. 4 A-B, Supplementary Table 2 , Supplementary Fig. 3 ). Thus, the final model was selected when the feature set was narrowed down to 15 features. Fig. 4 Performance of XGBoost models to predict PPH. (A) AUC of the XGBoost model with varied numbers of features. (B) F1 score of the XGBoost model with varied numbers of features. (C) Pearson correlation plot of 15 features Performance of XGBoost models to predict PPH. (A) AUC of the XGBoost model with varied numbers of features. (B) F1 score of the XGBoost model with varied numbers of features. (C) Pearson correlation plot of 15 features Multicollinearity among the 15 features was assessed to determine its potential impact on predictive accuracy. A correlation coefficient close to 0 indicates low correlation, with values less than 0.8 generally considered not correlated. Figure 4 C shows that each feature exhibits independence, suggesting that multicollinearity is not a significant issue in this model. As illustrated in the SHAP summary plot (Fig. 5 ), the contribution of the 15 selected features to the model was evaluated using average SHAP values, displayed in descending order. Figure 6 depicts the relationship between the actual values and SHAP values of these 15 features. SHAP values above zero correspond to a higher risk of PPH in the model’s positive class prediction. For instance, parturients with a newborn weight ≥ 3500 g or a second stage of labor ≥ 100 min have SHAP values above zero, pushing the decision towards PPH. Fig. 5 Global model explanation of final XGBoost model SHAP value for 15 risk factors. (A) SHAP summary bar plot. (B) SHAP summary dot plot Global model explanation of final XGBoost model SHAP value for 15 risk factors. (A) SHAP summary bar plot. (B) SHAP summary dot plot Fig. 6 SHAP dependence plot. Each dependence plot shows how a single feature affects the output of the prediction model, and each dot represents a single patient. The SHAP values for specific features exceeding zero push the decision towards the “PPH” class. LPL: light physical labor; MPL: moderate physical labor; HPL: heavy physical labor; PROM: premature rupture of membranes; BMI: body mass index SHAP dependence plot. Each dependence plot shows how a single feature affects the output of the prediction model, and each dot represents a single patient. The SHAP values for specific features exceeding zero push the decision towards the “PPH” class. LPL: light physical labor; MPL: moderate physical labor; HPL: heavy physical labor; PROM: premature rupture of membranes; BMI: body mass index Local explanations analyze how specific predictions for individual patients are made by combining personalized input data. Figure 7 A and C, and 7 E show parturients who did not experience PPH within 24 h postpartum, illustrating the impact of the selected features on the model’s output. According to the predictive model, the x-axis in Fig. 8 A represents the probability of the sample being predicted as non-PPH, and the y-axis represents the selected features and their corresponding values. The waterfall plot starts with the expected model output on the x-axis (E[f(X)] = -3.051). This “baseline” value of -3.051 is the average predicted probability of the test set. The combination of positive contributions (red) and negative contributions (blue) shifts the expected value output to the final model output (f(x) = -6.327). Positive SHAP values increase the probability of the sample being classified as PPH, while negative SHAP values decrease it. The force plot provides further insights through an additive force layout (Fig. 7 B). Fig. 7 Local model explanation by the SHAP method. (A) Waterfall plot of risks contributed by each feature for individual patient at low; (B) Waterfall plot of risks contributed by each feature for individual patient at high; (C) Force plot of risks contributed by each feature for individual patient at low; (D) Force plot of risks contributed by each feature for individual patient at high; (E) Evolution of risks contributed by each feature for individual patient at low; (F) Evolution of risks contributed by each feature for individual patient at high Local model explanation by the SHAP method. (A) Waterfall plot of risks contributed by each feature for individual patient at low; (B) Waterfall plot of risks contributed by each feature for individual patient at high; (C) Force plot of risks contributed by each feature for individual patient at low; (D) Force plot of risks contributed by each feature for individual patient at high; (E) Evolution of risks contributed by each feature for individual patient at low; (F) Evolution of risks contributed by each feature for individual patient at high Fig. 8 Model evaluation. (A) ROC of train cohort; (B) ROC of internal validation cohort; (C) ROC of external validation cohort; (D) calibration curve of train cohort; (E) DCA curve of train cohort; (F) calibration curve of internal validation cohort; (G) DCA curve of internal validation cohort; (H) calibration curve of external validation cohort; (I) DCA curve of external validation cohort. ROC: receiver operating characteristic curve; AUC: area under curve; DCA: decision curve analysis Model evaluation. (A) ROC of train cohort; (B) ROC of internal validation cohort; (C) ROC of external validation cohort; (D) calibration curve of train cohort; (E) DCA curve of train cohort; (F) calibration curve of internal validation cohort; (G) DCA curve of internal validation cohort; (H) calibration curve of external validation cohort; (I) DCA curve of external validation cohort. ROC: receiver operating characteristic curve; AUC: area under curve; DCA: decision curve analysis Similarly, Fig. 7 B and D, and 7 F show parturients who experienced PPH within 24 h postpartum. Figure 7 B highlights the features that push or pull the decision towards the PPH category and their actual measured values, indicating that the decision for this case inclines towards PPH, with a probability of 32.3%. To verify the robustness of the model and ensure an adequate sample size, we applied internal and external validation datasets. The AUC for the internal validation dataset was 0.894 (95% CI: 0.875–0.912) and for the external validation dataset was 0.880 (95% CI: 0.855–0.905), as Fig. 8 . Although these AUC values are slightly lower than those observed in the training set, they still indicate strong predictive performance in both internal and external validations. The calibration curves and DCA also showed improvement in the internal and external validation datasets, addressing the imbalance in positive data seen in the test set. The final predictive model has been implemented into a web application to facilitate practical use in clinical scenarios. By entering the actual values of the 15 features required by the model, the application predicts the risk of PPH for individual parturients. It also displays a force plot for each parturient, indicating the features contributing to the decision on PPH: features on the right side in blue indicate factors pushing the prediction towards “non-PPH,” while features on the left side in red push the prediction towards “PPH.” This web application is accessible online at https://postpartum-hemorrhage-prediction-model6.streamlit.app/ .

Conclusion

In conclusion, our study has successfully developed an interpretable ML model capable of predicting postpartum hemorrhage (PPH) in patients undergoing vaginal delivery using readily available clinical data extracted from the hospital information system (HIS). his model represents a promising tool for early risk assessment and intervention in clinical practice. The final XGBoost model exhibited outstanding predictive performance for PPH, as validated internally and externally. Moving forward, prospective studies are essential to assess whether implementing individualized and timely treatment measures guided by our predictive model can lead to improved maternity outcomes, particularly in reducing PPH-related morbidity and mortality. This represents a crucial step towards personalized healthcare in obstetrics, potentially enhancing patient care and reducing maternal morbidity and mortality rates.

Discussion

The era of big data has revolutionized clinical healthcare, and the application of ML in disease prediction and prognosis is increasingly prevalent. This study leverages SHAP values to assist in identifying risk factors and constructs a ML predictive model for PPH to predict the risk in women undergoing vaginal delivery. By utilizing big data to develop a diagnostic system for PPH, we can significantly enhance the accuracy of PPH diagnosis. While artificial intelligence, including ML, is making strides in obstetric disease diagnosis and treatment globally, the application of interpretable ML in clinical practice is still in its infancy. This study represents an important step towards improving the standardization of obstetric medical care and reducing maternal mortality, particularly by providing a tool for early identification of high-risk patients. Currently, three widely used postpartum hemorrhage risk assessment tools globally are the California Maternal Quality Care Collaborative (CMQCC) [ 18 ] toolkit, the Association of Women’s Health, Obstetric and Neonatal Nurses (AWHONN) [ 19 ] guidelines, and the New York State Department of Health (NYSBOH) [ 3 ] guidelines. These tools have summarized and classified risk factors for PPH into low, medium, and high categories based on expert consensus. However, a comparative study of these three risk assessment scales found that they only have moderate reliability in predicting severe PPH in high-risk cesarean section groups [ 20 ]. In these tools, the incidence of PPH is significantly higher in the high-risk group only when a pregnant woman is classified as such [ 20 ]. Additionally, in Dilla et al.‘s study, the sensitivity of the CMQCC toolkit in predicting PPH requiring transfusion was only 22%, and the probability of severe PPH in the low-risk group was still 0.4–0.6% [ 21 ]. Therefore, adding more assessment indicators and improving modeling methods may enhance the accuracy of PPH prediction. In 2021, Venkatesh et al. published a study utilizing ML to predict PPH [ 22 ]. This study included 152,279 childbirth cases, of which 7,279 (4.8%) experienced PPH exceeding 1000 milliliters. They included 55 risk factors and used random forest and extreme gradient boosting algorithms to develop ML models. The extreme gradient boosting algorithm achieved the best performance (AUC: 0.93; 95% CI: 0.92–0.93), followed by the random forest, demonstrating the high predictive performance of ML. However, this study primarily focused on PPH cases associated with cesarean sections, with 28% of the patients undergoing cesarean delivery, and 91% of PPH cases occurring in cesarean section patients. Akazawa’s study [ 23 ] from 1995 to 2020 at the Tokyo Women’s Medical University East Center, involving 9,894 women who underwent vaginal delivery, applied eleven clinical variables to create a ML model predicting PPH, defined as blood loss > 1000mL. The study utilized an ensemble learning approach with five ML classifiers, including logistic regression, support vector machine, random forest, boosting tree, decision tree, and a deep learning model consisting of two-layer neural networks. The deep learning model demonstrated the best performance, achieving an AUC of 0.708 for PPH prediction, with an accuracy of 0.686, false positive rate (FPR) of 0.312, and false negative rate (FNR) of 0.398. However, previous models lacked interpretability, as they did not explain the results of the ML algorithm predictions due to their black box nature. SHAP, a ML model interpretation method based on Shapley values from game theory, addresses this issue by assigning the contribution of model predictions to each feature, thus explaining the decision-making process of the model [ 24 ]. SHAP values quantify the impact of each feature on the model’s prediction results, aiding in understanding why the model gives specific predictions. In this study, SHAP was employed in the XGBoost model for its superior predictive performance and interpretability. Personalized explanations constructed through SHAP force analysis help doctors understand why the model makes specific high-risk recommendations, enhancing understanding of the decision-making process. To further validate the contribution of risk factors to the model, SHAP feature importance and feature effects were calculated. Then 15 key variables that significantly predict PPH were identified. The most important input parameter for PPH was newborn weight, followed by stages of labor, nature of work, premature rupture of membranes, among others. The clinical significance of these variables is consistent with existing literature, emphasizing the importance of these common clinical characteristics in predicting PPH. Previous studies have demonstrated that larger neonatal birth weight is an independent risk factor for PPH [ 25 ]. Heavier infants are generally associated with uterine distension, prolonged labor, and difficult placental separation, all of which increase the risk of bleeding [ 26 ]. Prolonged labor, which has also been confirmed as a significant predictor of PPH, may indicate insufficient uterine contractions, abnormal fetal position, or difficulty in placental separation, thus increasing the likelihood of postpartum hemorrhage [ 27 – 29 ]. Interestingly, occupational factors were also identified as important predictors of PPH. Individuals engaged in moderate to heavy physical labor occupations had a lower probability of developing PPH, which may reflect the potential influence of lifestyle and socioeconomic factors. Additionally, older maternal age and obesity were found to be associated with a higher risk of PPH. Pregnant women delivering at > 40 weeks of gestation require enhanced surveillance, aligning with findings from previous research, especially for those delivering between 41 and 42 weeks [ 30 ]. Furthermore, we observed that certain pregnancy complications significantly increased the risk of PPH, highlighting the importance of timely preventive and management measures in clinical practice, such as infection prevention, blood pressure control, and anemia correction, to reduce the risk of PPH. Thus, SHAP analysis in this study not only helped us gain a deeper understanding of the predictive mechanisms of the ML model but also provided a bridge for applying the model in clinical practice, significantly enhancing its clinical feasibility. However, this study has several limitations. Firstly, the data were derived solely from the Shenyang area, potentially introducing significant selection bias and limiting generalizability. Secondly, while the predictive model incorporated multiple risk factors and demonstrated high overall efficacy, the restrictive selection criteria may have influenced PPH prediction, hindering objective clinical utility assessment. Thirdly, while ML techniques require ‘big data’ for predictive model construction, there are no established standards for calculating the sample size needed. Therefore, caution should be exercised in interpreting the conclusions, and further evidence is warranted to confirm these findings in diverse populations.

Introduction

Postpartum hemorrhage (PPH) is a significant global health concern that can lead to severe and potentially fatal complications for women, particularly in low-resource settings. It has been extensively studied due to its status as one of the leading causes of maternal mortality, particularly in developing countries [ 1 ]. The majority of these deaths are preventable through the establishment of clinical guidelines and policies [ 2 ], as well as the promotion of relevant research and training. In routine clinical practice, physicians typically estimate the probability of PPH by assessing clinical history, conducting physical examinations, and performing laboratory tests. However, the limited sensitivity and specificity of these assessments, combined with the low incidence of PPH, mean that traditional bleeding assessment tools [ 3 , 4 ], such as structured history taking and systematic evaluation scales, have shown low efficacy in assessing the incidence of PPH. With the advancement of medical big data and artificial intelligence, predictive models for PPH have begun to emerge, offering new opportunities for early risk assessment and intervention. However, most current studies do not differentiate between PPH following cesarean section and vaginal delivery [ 4 ]. Moreover, the selection of predictive factors in model construction is often constrained by data collection limitations and sample size [ 5 ], resulting in a lack of comprehensive assessment in most clinical predictive models. In recent years, the rise of smart medicine and artificial intelligence has highlighted the unparalleled advantages of machine learning (ML) techniques over traditional statistics [ 6 ]. ML involves fitting predictive models to data or identifying informative patterns within datasets [ 7 ], leveraging data features to establish automated data analysis processes that enhance predictive capabilities for new data. Scholars have applied ML algorithms to various fields to construct predictive models, such as disease prediction and diagnosis [ 8 ], prognosis or mortality prediction [ 9 ], drug interaction prediction [ 10 ], rehospitalization prediction [ 11 ], and patient care needs prediction [ 12 ], all of which have shown good predictive performance [ 13 ]. Despite its advantages, ML research in the medical field faces several challenges, including handling missing data, avoiding model overfitting, and accounting for interrelationships among dataset attributes [ 14 ]. Additionally, the “black box” issue, where model inputs and operations are not visible to users or stakeholders, complicates interpretability [ 15 ]. Due to the complexity and multi-dimensionality of its algorithmic structure, understanding ML models can be difficult for clinicians. SHapley Additive exPlanations (SHAP), a method inspired by game theory and proposed by Lundberg et al. [ 16 ], addresses this issue by assigning a value to each input feature, indicating how the feature contributes to the prediction for a specific data point. Some factors positively impact the prediction probability, while others have a negative effect [ 17 ]. This can help clinicians quantify risk factors, improving their ability to focus on and prevent them in clinical practice. This study aims to use various ML algorithms to construct an optimal PPH prediction model for vaginal delivery. Additionally, it seeks to evaluate and quantify risk factors, providing a highly reliable reference for personalized assessment and prevention of PPH in high-risk pregnant women.

Supplementary Material

Below is the link to the electronic supplementary material. Supplementary Material 1 Supplementary Material 1 Supplementary Material 2 Supplementary Material 2 Supplementary Material 3 Supplementary Material 3 Supplementary Material 4 Supplementary Material 4

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy is the canonical version.

My notes (saved in your browser only)

⚙ Ask this paper AI returns verbatim quotes from the full text · source: pmc-nxml ⓘ

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2025) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc: last seen: 2026-06-24T06:10:11.469335+00:00