Comparison of risk prediction models for the progression of pelvic inflammatory disease patients to sepsis: Cox regression model and machine learning model.

OA: gold CC-BY-NC-ND-4.0 ⤵ 1 in-corpus citation
AI-generated summary by claude@2026-06, 2026-06-10

This study compared Cox regression and machine learning models for predicting progression to sepsis in pelvic inflammatory disease patients, assessing their predictive performance.

One-sentence paraphrase of the abstract; not a substitute for reading it. No clinical advice. How this works

AI-generated deep summary by claude@2026-06, 2026-06-10 · read from full text

This retrospective study analyzed women with pelvic inflammatory disease (PID) diagnosed between 2008 and 2019 using MIMIC-IV, aiming to model progression to sepsis and to compare a random survival forest (RSF) machine-learning approach with traditional Cox proportional hazards modeling. Sepsis was defined with Sepsis-3.0 criteria (SOFA score increase ≥2 alongside infection), and the main outcome was sepsis incidence with follow-up from ICU admission/diagnosis to discharge; multiple imputation handled missing data (excluding variables missing >21%) and the first ICU admission was used. The paper’s stated approach included internal training/validation split (7:3) for model construction and performance evaluation, with Cox regression used for feature selection, while RSF leveraged variable importance and tree-averaging for survival prediction. The paper does not explicitly discuss endometriosis or adenomyosis; it was included in the corpus via a keyword match in the upstream search index.

Read from the paper's body, not the abstract. Not a substitute for reading the paper. No clinical advice. How this works

Full text 46,677 characters · extracted from pmc-nxml · 12 sections · click to expand

Credit

Qingyi Wang: Writing – review & editing, Writing – original draft, Investigation, Funding acquisition, Formal analysis, Data curation, Conceptualization. Jianing Sun: Writing – review & editing, Supervision, Software, Resources, Project administration, Methodology. Xiaofang Liu: Writing – review & editing, Writing – original draft, Visualization, Validation, Data curation, Conceptualization. Yunlu Ping: Writing – review & editing, Writing – original draft, Methodology, Investigation, Funding acquisition, Formal analysis. Chuwen Feng: Writing – review & editing, Writing – original draft, Software, Resources, Project administration. Fanglei Liu: Writing – review & editing, Writing – original draft, Visualization, Validation, Supervision. Xiaoling Feng: Writing – review & editing, Writing – original draft, Supervision, Project administration, Investigation, Formal analysis, Conceptualization.

Ethics

All analyses were based on content in existing databases and therefore did not require ethical approval or patient consent.

Consent

Not applicable.

Funding

This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

Patient

Not applicable.

Results

A total of 1746 PID patients were recorded in MIMIC-IV, and 1064 of them were included in this study after excluding cases with missing values of one or more variables. These patients were randomly divided into a training set (745 patients) and a validation set (319 patients) at a ratio of 7:3. The baseline characteristics of these two datasets are presented in Table 2 . Significant differences in race, age, and use of Vit-B were observed between the two sets, while no significant differences were found in other characteristics (p > 0.05). The age at admission ranged from 18 to 91 years (median: 38, IQR: 30, 50). The follow-up duration ranged from 0.01 to 28.99 days (median: 2.615, IQR:1.2475, 4.39). Patients who progressed to sepsis were more likely to have a history of hemodialysis, decreased PLT counts, pneumonia, hormone use, and increased WBC counts, compared with those who did not. Among the 54 sepsis patients, 21 were between 20 and 39 years old (38.89%), 14 were between 40 and 64 years old (25.92%), and 19 were over 65 years old (35.19%). Within a median follow-up period of 0.37 days, 11 cases (52.4%) progressed to sepsis. To predict the prevalence of sepsis, various factors, including demographical information, vital signs, laboratory tests, comorbidities, medications, and surgical history of the main cohort were analyzed using univariate Cox regression. Several potential predictors were identified with a p-value < 0.05, such as adrenaline, hemodialysis, neutrophil counts, organ transplant, PLT counts, RBC counts, hormone use, Vit-B, Vit-D, WBC counts, pneumonia, peritonitis, lupus, urinary system infection, anemia, renal failure, and age (p < 0.05). All these variables were included in multivariate Cox proportional regression, where hemodialysis, PLT counts, pneumonia, hormone use, and WBC counts were finally chosen as the predictors for the predictive model (p < 0.05) ( Table 3 ). All variables in the model had VIF < 5 (hemodialysis, 1.049; PLT count, 1.213; sex hormone, 1.026; WBC count, 1.283; pneumonia, 1.042). Table 2 Baseline characteristics in patients with PID. Table 2 Parameters Overall Train Test p n 1064 745 319 Suffer From Sepsis (%) 21 (2.0) 14 (1.9) 7 (2.2) 0.922 Time (Median [Iqr]) 2.62 [1.25, 4.39] 2.62 [1.24, 4.37] 2.61 [1.27, 4.51] 0.714 Race (%) 0.010 Asian 62 (5.8) 49 (6.6) 13 (4.1) Black 330 (31.0) 226 (30.3) 104 (32.6) Other 210 (19.7) 163 (21.9) 47 (14.7) White 462 (43.4) 307 (41.2) 155 (48.6) Marital Status (%) 0.092 Divorced 68 (6.4) 47 (6.3) 21 (6.6) Married 437 (41.1) 306 (41.1) 131 (41.1) Other 8 (0.8) 7 (0.9) 1 (0.3) Single 522 (49.1) 371 (49.8) 151 (47.3) Widowed 29 (2.7) 14 (1.9) 15 (4.7) Hemodialysis (%) 4 (0.4) 3 (0.4) 1 (0.3) 1.000 Sex Hormone (%) 14 (1.3) 10 (1.3) 4 (1.3) 1.000 Adrenaline (%) 6 (0.6) 5 (0.7) 1 (0.3) 0.789 Antibiotic (%) 667 (62.7) 471 (63.2) 196 (61.4) 0.631 Anticoagulant (%) 693 (65.1) 476 (63.9) 217 (68.0) 0.220 Hypotensor (%) 292 (27.4) 207 (27.8) 85 (26.6) 0.759 Organ Transplant (%) 5 (0.5) 5 (0.7) 0 (0.0) 0.328 Laparoscope (%) 214 (20.1) 162 (21.7) 52 (16.3) 0.052 Digestive System Operation (%) 396 (37.2) 280 (37.6) 116 (36.4) 0.758 Hysteroscope (%) 7 (0.7) 6 (0.8) 1 (0.3) 0.620 Acquired Immune Deficiency Syndrome (%) 5 (0.5) 5 (0.7) 0 (0.0) 0.328 Alzheimer Disease (%) 3 (0.3) 3 (0.4) 0 (0.0) 0.614 Leukemia (%) 6 (0.6) 2 (0.3) 4 (1.3) 0.128 Enteritis (%) 80 (7.5) 49 (6.6) 31 (9.7) 0.098 Pneumonia (%) 97 (9.1) 63 (8.5) 34 (10.7) 0.304 Peritonitis (%) 798 (75.0) 548 (73.6) 250 (78.4) 0.113 Liver Failure (%) 8 (0.8) 5 (0.7) 3 (0.9) 0.937 Liver Cirrhosis (%) 16 (1.5) 10 (1.3) 6 (1.9) 0.699 Hypertension (%) 80 (7.5) 49 (6.6) 31 (9.7) 0.098 Fracture (%) 62 (5.8) 44 (5.9) 18 (5.6) 0.980 Coronary Disease (%) 37 (3.5) 26 (3.5) 11 (3.4) 1.000 Hypothyroidism (%) 129 (12.1) 88 (11.8) 41 (12.9) 0.708 Thyroiditis (%) 10 (0.9) 10 (1.3) 0 (0.0) 0.083 Thyroid Disease (%) 134 (12.6) 93 (12.5) 41 (12.9) 0.948 Connective Tissue Disease (%) 6 (0.6) 2 (0.3) 4 (1.3) 0.128 Phlebitis (%) 14 (1.3) 11 (1.5) 3 (0.9) 0.682 Systemic Systemic Lupus Erythematosus Erythematosus (%) 12 (1.1) 7 (0.9) 5 (1.6) 0.568 Rheumatoid Arthritis (%) 20 (1.9) 12 (1.6) 8 (2.5) 0.459 Urinary System Infection (%) 200 (18.8) 136 (18.3) 64 (20.1) 0.545 Anemia (%) 451 (42.4) 312 (41.9) 139 (43.6) 0.656 Burn (%) 12 (1.1) 9 (1.2) 3 (0.9) 0.951 Renal Failure (%) 225 (21.1) 159 (21.3) 66 (20.7) 0.875 Nephritis (%) 53 (5.0) 37 (5.0) 16 (5.0) 1.000 Diabetes Mellitus (%) 164 (15.4) 115 (15.4) 49 (15.4) 1.000 Gastritis (%) 48 (4.5) 32 (4.3) 16 (5.0) 0.721 Thrombophlebitis (%) 14 (1.3) 11 (1.5) 3 (0.9) 0.682 Neoplasms (%) 311 (29.2) 207 (27.8) 104 (32.6) 0.131 Vitamin A (%) 6 (0.6) 5 (0.7) 1 (0.3) 0.789 Vitamin B (%) 15 (1.4) 6 (0.8) 9 (2.8) 0.023 Vitamin D (%) 147 (13.8) 97 (13.0) 50 (15.7) 0.293 Vitamin E (%) 4 (0.4) 2 (0.3) 2 (0.6) 0.742 Hypoglycemic Agents (%) 3 (0.3) 2 (0.3) 1 (0.3) 1.000 Antineoplastic Drugs (%) 45 (4.2) 28 (3.8) 17 (5.3) 0.317 Nsaids (%) 868 (81.6) 616 (82.7) 252 (79.0) 0.182 Glucocorticoids (%) 183 (17.2) 131 (17.6) 52 (16.3) 0.675 BMI (Median [Iqr]) 28.40 [23.80, 34.50] 28.40 [23.80, 34.70] 28.50 [23.85, 34.00] 0.786 Neutrophil Count (Median [Iqr]) 68.65 [59.18, 79.00] 68.50 [59.00, 79.10] 69.20 [59.80, 78.60] 0.574 PLT Count (Median [Iqr]) 289.00 [239.00, 351.25] 291.00 [239.00, 352.00] 286.00 [236.00, 351.00] 0.592 Hemoglobin (Median [Iqr]) 12.40 [11.20, 13.30] 12.40 [11.30, 13.40] 12.30 [11.20, 13.30] 0.631 Sodium (Median [Iqr]) 139.00 [137.00, 141.00] 139.00 [137.00, 141.00] 139.00 [137.00, 141.00] 0.985 Potassium (Median [Iqr]) 4.00 [3.80, 4.20] 4.00 [3.80, 4.20] 4.00 [3.70, 4.20] 0.473 RBC Count (Median [Iqr]) 4.25 [3.94, 4.56] 4.27 [3.95, 4.58] 4.24 [3.93, 4.54] 0.552 WBC Count (Median [Iqr]) 8.10 [6.40, 10.80] 8.10 [6.50, 10.80] 8.00 [6.20, 10.85] 0.527 Age (Median [Iqr]) 39.00 [30.00, 51.00] 38.00 [30.00, 50.00] 40.00 [31.00, 53.00] 0.037 Table 3 Univariate and multivariable analyses for the relationship between the candidate risk factors and Incidence rate of sepsis in the primary cohort. Table 3 Parameters HR 95%CI P HR 95%CI P Adrenaline 11.06 (1.35–90.8) 0.0253 Age 1.03 (1–1.06) 0.0356 Antibiotic 6.77 (0.88–52.03) 0.0660 Anticoagulant 6.22 (0.8–48.08) 0.0800 Alzheimer Disease 0 (0-Inf) 0.9978 Body Mass Index 0.97 (0.9–1.04) 0.3738 Leukemia 0 (0-Inf) 0.9978 Enteritis 1.06 (0.14–8.2) 0.9572 Digestive System Operation 1.02 (0.34–3.05) 0.9651 Pneumonia 12.61 (4.21–37.8) <0.0001 8.1 (2.49–26.36) 0.0005 Peritonitis 0.34 (0.12–0.97) 0.0437 NSAIDs 0.38 (0.13–1.15) 0.0872 Liver Failure 0.00 (0.00-Inf) 0.9980 Liver Cirrhosis 0.00 (0.00-Inf) 0.9975 Hypertension 1.06 (0.14–8.2) 0.9572 Coronary Disease 2.33 (0.3–17.94) 0.4161 Fracture 2.18 (0.48–9.95) 0.3137 Hemodialysis 19.84 (2.53–155.73) 0.0045 26.9 (2.93–247.32) 0.0036 Hemoglobin 0.81 (0.61–1.08) 0.1524 Acquired Immune Deficiency Syndrome 0.00 (0.00-Inf) 0.9978 Hypotensor 2.2 (0.75–6.47) 0.1530 Hysteroscope 0.00 (0.00-Inf) 0.9978 Hypothyroidism 0.00 (0.00-Inf) 0.9977 Thyroid Disease 0.00 (0.00-Inf) 0.9976 Thyroiditis 0.00 (0.00-Inf) 0.9982 Connective Tissue Disease 0.00 (0.00-Inf) 0.9978 Phlebitis 0.00 (0.00-Inf) 0.9973 Hypoglycemic Agents 0.00 (0.00-Inf) 0.9981 Antineoplastic Drugs 2.05 (0.27–15.8) 0.4896 Systemic Lupus Erythematosus 8.76 (1.14–67.42) 0.0371 Laparoscope 0.29 (0.04–2.19) 0.2288 Rheumatoid Arthritis 3.27 (0.37–28.68) 0.2845 Married 0.38 (0.08–1.9) 0.2389 Other 0.00 (0.00-Inf) 0.9985 Single 0.33 (0.07–1.64) 0.1754 Widowed 0.00 (0.00-Inf) 0.9978 Urinary System Infection 7.37 (2.45–22.21) 0.0004 3.02 (0.87–10.47) 0.0814 Neutrophil Count 1.05 (1.00–1.10) 0.0354 Organ Transplant 24.25 (5.23–112.42) 0 PLT Count 0.99 (0.98–1.00) 0.0175 0.99 (0.98–0.99) 0.0015 Potassium 1.91 (0.64–5.68) 0.2451 Anemia 16.22 (2.11–124.74) 0.0074 7.12 (0.86–59.1) 0.0692 Race (Black) 0.38 (0.07–2.09) 0.2654 Race (Other) 0.4 (0.07–2.4) 0.315 Race (White) 0.33 (0.06–1.75) 0.1937 RBC Count 0.27 (0.1–0.73) 0.0099 0.62 (0.24–1.61) 0.3246 Sex Hormone 15.73 (3.44–71.92) 0.0004 31.52 (5.18–191.92) 0.0002 Sodium 1.03 (0.85–1.25) 0.7347 Renal Failure 4.78 (1.65–13.87) 0.0040 Burn 0.00 (0.00-Inf) 0.9976 Nephritis 1.5 (0.19–11.55) 0.6985 Diabetes Mellitus 1.96 (0.61–6.33) 0.2583 Glucocorticoids 2.05 (0.65–6.43) 0.2192 Vitamin A 0.00 (0.00-Inf) 0.9981 Vitamin B 10.17 (1.32–78.37) 0.0260 Vitamin D 3.05 (0.99–9.41) 0.0528 Vitamin E 0.00 (0.00-Inf) 0.9981 WBC Count 1.09 (1.01–1.16) 0.0186 1.18 (1.07–1.30) 0.0008 Gastritis 1.74 (0.22–13.42) 0.5975 Thrombophlebitis 0.00 (0.00-Inf) 0.9973 Neoplasms 1.73 (0.59–5.04) 0.3165 Baseline characteristics in patients with PID. Univariate and multivariable analyses for the relationship between the candidate risk factors and Incidence rate of sepsis in the primary cohort. A nomogram was constructed using the five variables to predict the 3-day/7-day risk of progression to sepsis in PID patients. Each independent predictor was assigned a weighted score, with different scales applied to hemodialysis, PLT counts, pneumonia, hormone use, and WBC counts, represented as No-Yes, 100–1000, No-Yes, No-Yes, and 0–50, respectively. The highest possible total score was 160. A perpendicular line was drawn from each variable state to the line segment of points, and the resulting score (points) was determined. After obtaining scores for all variables, they were added together to calculate the patient's total score (Total Points). Using the total points as a reference, a vertical line was drawn downward to meet the 3/7-day probability line segment, representing the likelihood of sepsis occurring within that timeframe. The disease probability was higher when the score, derived from adding up the distribution points for each prognostic factor, was higher. ( Fig. 1 ). Fig. 1 Prediction nomogram for 3/7-day risk of progression to sepsis in PID patients. Fig. 1 Prediction nomogram for 3/7-day risk of progression to sepsis in PID patients. For instance, a PID patient had PLT counts of 300*10 9 /L (78 points) and WBC counts of 11*10 9 /L (12 points), with no history of hemodialysis (0 point), pneumonia (0 point), and hormone use (30 points). The patient's total points were 120, indicating a 3-day probability of approximately 10% and a 7-day probability of approximately 20%. Similarly, another PID patient had PLT counts of 500*10 9 /L (55 points) and WBC counts of 7*10 9 /L (9 points). This patient had received hemodialysis (22 points) and hormone treatment (30 points), with no history of pneumonia (0 point). The total points for this patient were 116, indicating a 3-day probability of approximately 4% and a 7-day probability of approximately 5. To evaluate the nomogram's performance, the Cox regression model was used, and AUC was calculated. The AUC for predicting progression progressing to sepsis at 3/7 days was 0.886/0.863 ( Fig. 2 A) in the training set and 0.824/0.726 in the validation set ( Fig. 2 B). The c-index of the model was 0.8905, indicating its strong predictive ability. The AUCs being close to 1 signifies the high accuracy of the nomogram. Moreover, both sets displayed smooth ROC curves, suggesting minimal risk of overfitting. Fig. 2 ROC of predictive nomogram for 3/7-day progression to sepsis in PID patients in training set and validation set (A) training set, (B) validation set. Fig. 2 ROC of predictive nomogram for 3/7-day progression to sepsis in PID patients in training set and validation set (A) training set, (B) validation set. Additionally, the calibration curve demonstrated that the predictive model neither significantly overestimated nor underestimated the risk, ensuring a high level of consistency and calibration for the model ( Fig. 3 A and B). Fig. 3 Calibration curve of predictive nomogram for 3/7-day progression to sepsis in PID patients in training set and validation set (A) training set, (B) validation set. Notes: The X-axis represented the predicted risk, the Y-axis represented the actual risk, and the diagonal dotted line represented the ideal prediction of the ideal model. The solid lines represented the performance of the nomogram, and the one closer to the diagonal dotted line indicated a better prediction. Fig. 3 Calibration curve of predictive nomogram for 3/7-day progression to sepsis in PID patients in training set and validation set (A) training set, (B) validation set. Notes: The X-axis represented the predicted risk, the Y-axis represented the actual risk, and the diagonal dotted line represented the ideal prediction of the ideal model. The solid lines represented the performance of the nomogram, and the one closer to the diagonal dotted line indicated a better prediction. The DCA of 3-day and 7-day risks in the training set and validation set are shown in Fig. 4 A–D. Across a threshold probability range of 1.0%–24.0%, the nomogram demonstrated its ability to effectively identify the progression to sepsis, resulting in significant net clinical benefits for patients with PID. The DCA indicated that a large proportion of PID patients could benefit from the predictive model, making it a valuable tool for clinical decision-making. Fig. 4 (A) DCA of 3-day risk, training set; (B) DCA of 3-day risk, validation set; (C) DCA of 7-day risk, training set; (D) DCA of 7-day risk, training set; Notes: the X-axis represented the threshold probability, the Y-axis represented the net benefit rate after the advantages deducting disadvantages, and the blue line indicated the 3/7 day risk of progression to sepsis. The Treat None line (green) indicated that all patients did not trigger the intervention, while the Treat ALL line (red) indicated that all patients triggered the intervention. (For interpretation of the references to colour in this figure legend, the reader is referred to the Web version of this article.) Fig. 4 (A) DCA of 3-day risk, training set; (B) DCA of 3-day risk, validation set; (C) DCA of 7-day risk, training set; (D) DCA of 7-day risk, training set; Notes: the X-axis represented the threshold probability, the Y-axis represented the net benefit rate after the advantages deducting disadvantages, and the blue line indicated the 3/7 day risk of progression to sepsis. The Treat None line (green) indicated that all patients did not trigger the intervention, while the Treat ALL line (red) indicated that all patients triggered the intervention. (For interpretation of the references to colour in this figure legend, the reader is referred to the Web version of this article.) We employed random search to estimate the OOB error for different combinations of mtry and nodesize . Fig. 5 displays the results, indicating that the lowest OOB error rate was achieved when mtry  = 3 and nodesize  = 15. Consequently, we determined the optimal number of decision trees (ntree = 50). Using these optimal parameters, we proceeded to construct the final Random Forest (RF) model and ranked the importance of the variables. Fig. 6 shows that the WBC count emerged as the most crucial variable, followed by pneumonia, in predicting the outcome. Fig. 5 The OOB error for different combinations of mtry and nodesize. Fig. 5 Fig. 6 (A) The relationship between the trees and error rate. (B) Variable Importance of the Candidate Characteristics in RSF Model. Fig. 6 The OOB error for different combinations of mtry and nodesize. (A) The relationship between the trees and error rate. (B) Variable Importance of the Candidate Characteristics in RSF Model. We evaluated the performance of the RF model by calculating the AUC for predicting whether PID patients would develop sepsis within 3 and 7 days. Based on the RSF model, the AUC of the nomogram for predicting progression to sepsis among PID patients progressing within 3/7 days was found to be 0.939 (95% CI: 0.915, 0.962)/0.919 (95% CI: 0.874, 0.964) in the training set ( Fig. 7 A), and 0.712 (95% CI: 0.0565, 0.860)/0.571 (95% CI: 0.402, 0.740) in the validation set ( Fig. 7 B). Furthermore, the calibration curve showed that the model's predicted probabilities were in good agreement with the actual outcomes, indicating excellent calibration ability ( Fig. 8 A and B). Fig. 7 (A) AUC of RSF of training data. (B) AUC of RSF of validation data. Fig. 7 Fig. 8 The calibration curve of the training data in RSF Model. Fig. 8 (A) AUC of RSF of training data. (B) AUC of RSF of validation data. The calibration curve of the training data in RSF Model.

Materials

We retrospectively analyzed patient information and clinical data for PID diagnosed between 2008 and 2019. The data was obtained from the Medical Information Mart for Intensive Care (MIMIC)-IV database (ICD-9-CM diagnosis codes: 614, 615; ICD-10-CM diagnosis codes: N7). To determine the index date of diagnosis, we looked back to the first date when the same ICD-9-CM/ICD-10-CM code was recorded by any medical specialist during the medical visit or admission. Cases with missing data were excluded from this study. All patients were observed for research purposes in the ICU. Our research team was granted permission and received certification to access the MIMIC database (certificate number: 52249934). PostgreSQL (version 14.6) was utilized for the extraction of information on PID patients, and sepsis was diagnosed based on the Sepsis-3.0 criteria [ 7 , 29 ]. The primary outcome was the incidence of sepsis. According to the criteria outlined in “Sepsis 3.0,” sepsis is defined as a two-point or more increase in the SOFA score, in conjunction with diagnosed or suspected infections ( Table 1 ). This modification in the scoring system marks a crucial advancement in “Sepsis 3.0,” as compared to the earlier Systemic Inflammatory Response Syndrome (SIRS) criteria, the SOFA score is better equipped to accurately reflect changes in organ function [ 7 ]. The dataset offered the following risk factors: adrenaline, antibiotic, anticoagulant, hemodialysis, neutrophil count, organ transplant, platelet (PLT) counts, red blood cell (RBC) count, sex hormone, vitamin B, vitamin D, white blood cell (WBC) counts, pneumonia, peritonitis, lupus, urinary system infection, anemia, renal failure, and age. The follow-up duration ranged from admission/preliminary diagnosis of sepsis to discharge. Table 1 Diagnostic criteria and major clinical characteristics of patients with sepsis. Table 1 Clear evidence of infection Clinical presentation, laboratory tests, imaging, or bacterial culture Sequential Organ Failure Assessment Score System Score Respiration Coagulation Liver Cardiovascular Central nervous system Renal PaO 2 /FiO 2 mmHg (kPa) Platelets  ×  10³/μ L Bilirubin mg/dL (μmol/L) Glasgow Coma Scale Score b Creatinine mg/d L (μmol/L) Urine output mL/d 0 ≥400 (53.3) ≥150 <1.2 (20) MAP ≥70 mmHg 15 <1.2 (110) 1 <400 (53.3) <150 1.2–1.9 (20–32) MAP <70 mmHg 13–14 1.2–1.9 (110–170) 2 <300 (40) <100 2.0–5.9 (33–101) Dopamine <5 or dobutamine (any dose) a 10–12 2.0–3.4 (171–299) 3 <200 (26.7) with respiratory support <50 6.0–11.9 (102–204) Dopamine 5.1–15 or epinephrine ≤0.1 or norepinephrine ≤0.1 a 6–9 3.5–4.9 (300–440) <500 4 <100 (13.3) with respiratory support 12.0 (204) Dopamine >15 or epinephrine >0.1 norepinephrine >0.1 a 5.0 (440) <200 Patients were diagnosed with sepsis if their sofa increased by more than 2 points at the same time of definite infection. PaO 2 , partial pressure of oxygen; FiO 2 , fraction of inspired oxygen; MAP, mean arterial pressure. a Catecholamine doses are given as μg/kg/min for at least 1 h. b Glasgow Coma Scale scores range from 3 to 15; higher score indicates better neurological function. Diagnostic criteria and major clinical characteristics of patients with sepsis. Patients were diagnosed with sepsis if their sofa increased by more than 2 points at the same time of definite infection. PaO 2 , partial pressure of oxygen; FiO 2 , fraction of inspired oxygen; MAP, mean arterial pressure. Catecholamine doses are given as μg/kg/min for at least 1 h. Glasgow Coma Scale scores range from 3 to 15; higher score indicates better neurological function. To protect patient privacy, all data in the database were de-identified. Therefore, informed consent was not required. The study followed recommendations for transparent reporting of multivariate prediction models for individual prognostic or diagnostic claims [ 30 ]. Data extraction was performed using structured query language. In cases where patients had multiple ICU admissions, data from their first admission were selected. Baseline characteristics were collected within the first 24 h of admission and encompassed the following categories: (1) demographic information: age, body weight, body mass index (BMI), race, marital status, time of admission, time of discharge, and time of progression to sepsis; (2) vital signs: blood pressure; (3) laboratory test results: WBC counts, hemoglobin, RBC counts, PLT counts, aspartic transaminase (AST), alkaline phosphatase (AKP), cholesterol, triglyceride, globulin, total protein, serum creatinine, C-reactive protein (CRP), high-density lipoprotein (HDL), low-density lipoprotein (LDL), lactic acid (LA), serum potassium (K + ), and serum sodium (Na 2+ ). (4) Comorbidities: diabetes, peritonitis, gastritis, nephritis, thrombophlebitis, phlebitis, Acquired Immunodeficiency Syndrome (AIDS), renal failure, liver failure, tumor, pneumonia, and coronary heart disease. (5) Surgical history: laparoscopy, hysteroscopy, digestive system operations, hemodialysis, and organ transplantation. (6) Medications: antidiabetic agents, antihypertensive agents, glucocorticoids, sex hormones, non-steroidal anti-inflammatory drugs (NSAIDs), epinephrine, antibiotics, antineoplastic agents, vitamin-D, vitamin-E, vitamin-C, vitamin-B, vitamin-A, and anticoagulants. Variables were extracted from the MIMIC-IV database. In case where a variable had missing values exceeding 21%, it was excluded. Finally, multiple imputation techniques were applied to handle the missing values of the remaining variables. The data for each variable was split into a training set and a validation set at a ratio of 7:3. The former was used for nomogram construction and the latter for validation. Categorical variables were expressed using percentiles (%), abnormally-distributed continuous variables were represented using the interquartile range (P 25 , P 75 ), and normally-distributed continuous variables were presented as mean and standard deviation (Mean ± SD). The Chi-square test was performed for comparison of categorical data between the two groups, while the t -test and non-parametric test were employed for comparing continuous data. Cox regression was employed for feature selection. Variables with a p-value less than 0.05 in univariate analysis were included in stepwise Cox regression, and those with a p-value less than 0.05 in stepwise Cox regression were finally incorporated into the Cox proportional hazard model. The hazard ratio (HR) with its 95% confidence interval (CI) for each predictor variable was calculated via Cox proportional hazard regression. To address potential multicollinearity in the model, we calculated the variance inflation factor (VIF) for each variable in the model and subsequently excluded variables with a VIF exceeding 5. Then, a nomogram was built using the significant predictor variables identified from the Cox proportional hazard model. This nomogram considered multiple clinical variables and their interdependencies to estimate the probability of progression to sepsis among PID patients at 3/7 days. By scoring the patients according to the input multivariate Cox proportional hazard regression, the aggregate scores provided the corresponding probability of disease onset, allowing for more accurate prediction. Subsequently, an RSF model was developed using ML algorithms. Firstly, bootstrapping was employed to randomly select N samples from the training set, generating N survival trees, where the unused samples during the training of each tree are called out-of-bag (OOB) samples. The OOB error is the metric used for evaluating the performance of the model, and a lower OOB error indicated better model performance. The “tune()” function was utilized to find the best parameters for the model (mtry and nodesize), combining different parameters in the training set to calculate the OOB error. The parameter mtry denotes the number of variables randomly sampled as candidates at each split. While a smaller value promotes diversity in decision trees, it can also introduce significant bias. The nodesize parameter specifies the minimum data points required to create a terminal node in a tree. A smaller value might lead to more complex, overfitted trees, whereas a larger value may yield an overly generalized model. Through this process, we identified the optimal combination of parameters that minimized the total error of RSF. The learning curve was then plotted based on the selected optimal parameters to determine the optimal number of trees (ntree). The ntree parameter refers to the number of decision trees to be included in the random forest. Increasing the number of trees generally enhances model performance, but it may also lead to longer computation times and may lead to overfitting. In this way, a final RSF model was established based on the selected optimal parameters, and the variables were ranked for importance within the RSF algorithm framework. The area under the time-dependent receiver operating characteristic (ROC) curve (time-dependent AUC) was adopted to assess the model's discrimination. The ROC curve was used to assess the effectiveness of the predictive model [ 31 ], plotting true positive rate (sensitivity) against false positive rate (1-specificity) using various binary classification methods with different cut-off values or thresholds. A higher AUC value closer to 1 indicated better accuracy, with an AUC greater than 0.7 indicating good model performance. The calibration curve was used to assess the consistency between predicted values and actual outcomes. All statistical analyses were performed using R4.2.1 ( https://www.r-project.org/ ), utilizing specific packages such as tableone (0.13.2) for data description, survival (ver.3.3.1) for Cox regression, rms (ver.6.3.0) for nomogram construction, and randomForestSRC for parameter selection and RSF construction. A two-sided p-value less than 0.05 indicated statistical significance.

Background

Pelvic inflammatory disease (PID) is a syndrome that gives rise to an array of complications in women worldwide. It usually occurs when sexually transmitted pathogens spread from the lower reproductive tract to the upper reproductive organs, including the uterus and/or fallopian tubes, and possibly spreading to adjacent pelvic organs [ 1 , 2 ]. According to the U.S Centers for Disease Control and Prevention, the lifetime prevalence of PID in the United States is estimated to be 4.4%, impacting around 2.5 million people [ 3 ]. PID patients present with three strains of pathogens isolated from the genital tract: sexually transmitted pathogens like Neisseria gonorrhoeae, Chlamydia trachomatis , Mycoplasma genitalium, and Trichomonas vaginalis, pathogens related to bacterial vaginosis (BV), such as Vaginal alfalfa, Sneathia, and Megasphaera, and those typically related to gastrointestinal and respiratory infection, such as Bacteroides, Escherichia coli , and Streptococcus. However, PID is often associated with gonorrhoeae and chlamydia in only a subset of PID patients [ 4 ], necessitating timely diagnosis and treatment to avoid complications such as sepsis, septic shock, and even death. Sepsis is characterized by a rapid progression and has high morbidity and mortality, presenting significant challenges to clinicians worldwide, despite advanced diagnostic methods and intensive care. Even if patients survived, 15%–20% of them may experience long-term sequelae of pelvic inflammatory disease (SPID), including chronic pelvic pain, PID relapse, infertility, and ectopic gestation [ 5 , 6 ]. These conditions significantly impact the reproductive health and quality of life of women of childbearing age. Therefore, early identification of sepsis in PID patients is of great importance for disease prevention and effective treatment. Sepsis represents an immunity-mediated dysregulated response to infection [ 7 ], posing a significant challenge for critical care clinicians. It can induce septic shock and multiple organ dysfunction syndrome (MODS), commonly triggered by severe trauma, major surgery, and infections. The morbidity of sepsis in adults is approximately 189 cases per 100,000 person-years [ 8 , 9 ], and it carries a mortality rate of approximately 30% in the intensive care unit (ICU) [ 10 , 11 ], with a high incidence of sequelae among survivors. Around 16% of patients experience cognitive dysfunction after surviving sepsis [ 12 ]. To confirm the diagnosis of sepsis in patients with infection or suspected infection, an increase of more than 2 points from baseline in the Sequential (Sepsis-related) Organ Failure Assessment (SOFA) scores is required [ 13 , 14 ]. Due to the complexity of SOFA scoring, a bedside quick SOFA (qSOFA) has been introduced to identify critical patients. If a patient meets at least 2 items in qSOFA, further assessment is necessary to identify organ failure [ 7 ]. The pathophysiology of sepsis is intricate, involving multiple processes such as inflammatory response, immune response, coagulation dysfunction, and various changes in cellular function, metabolism, and microcirculation [ 15 ]. Therefore, further understanding of sepsis and its pathogenetic mechanisms is of significant value for clinical diagnosis, treatment, and patient prognosis. Timely administration of antibiotics is crucial for predicting the prognosis of sepsis [ 16 ]. However, due to the high heterogeneity and complexity of sepsis [ 17 ], uniform treatment is impractical [ 16 , 18 , 19 ]. Nonetheless, delayed treatment increases mortality in septic patients, underscoring the importance of timely prediction of the prevalence and phenotype of PID patients for favorable outcomes. Prediction models developed through traditional regression methods or machine learning techniques can integrate numerous predictive factors and offer personalized disease prediction [ 20 ]. Currently, there is no predictive model for PID progression to sepsis. Therefore, personalized and high-performance predictive tools are needed for effective management of such patients. Machine learning (ML), as a crucial branch of artificial intelligence, utilizes algorithms to unveil intricate interactions between data, and its application in the biomedical field encompasses various tasks such as classification, prognosis, transcriptomics, imageomics, and drug response prediction [ 21 , 22 ]. For the purpose of this study, the random survival forest (RSF) was chosen as the ML method for predicting survival outcomes. RSF, a decision tree-based algorithm, holds great promise in practical application [ [23] , [24] , [25] ]. Compared to other methods, RSF not only facilitates the identification of the most relevant features through variable feature importance but also effectively reduces data dimensionality [ 26 , 27 ]. This distinctive trait allows the final prediction results to be averaged over the predictions of each tree, thereby ensuring more precise survival predictions [ 28 ]. Consequently, the RSF algorithm was selected to yield more reliable and accurate prediction results when dealing with intricate survival analysis problems. Therefore, the focus of this study is to establish an RSF model to predict the progression of PID to sepsis and to compare it with traditional Cox regression approach. The aim of this study is to develop and validate a prediction model based on the RSF algorithm to forecast the prevalence of sepsis in women with PID, in which the performance of the RSF model was compared with that of Cox proportional hazards models. Additionally, the internal validation dataset was utilized for evaluating the prognostic performance of the model.

Conclusion

In conclusion, this study presents a prognostic nomogram containing five factors: hemodialysis, PLT counts, pneumonia, hormone, and WBC counts, for predicting 3-/7-day progression to sepsis in critical PID patients admitted to the ICU. The risk prediction nomogram demonstrated AUCs of 0.886/0.863 and 0.824/0.726 for the training and validation sets, respectively, with a c-index of 0.8905, indicating the potential of this model to aid clinicians in risk stratification and decision-making for PID patients to improve their clinical treatment. Furthermore, we developed a well-calibrated RSF model to evaluate the prognostic risk for adult PID patients and predict whether patients can achieve clinical stability. With further validation and modification, this model can support clinical decision-making to improve the care of hospitalized PID patients.

Discussion

The escalation of mortality and disability rates among patients with PID progressing to sepsis underscores the critical importance of timely identification of high-risk patients. This identification has significant potential for optimizing clinical management and improving patient prognosis. In this study, we established and validated a clinical predictive model using both RSF and stepwise Cox regression, aiming to predict the likelihood of PID advancing to sepsis. Our robustness analysis revealed that the predictive model demonstrated favorable discrimination, calibration ability, and clinical applicability. The most common pathogens associated with PID are Chlamydia trachomatis and Neisseria gonorrhoeae. PID presents with various symptoms, including lower abdominal pain, discomfort, fever, vaginal discharge increased, vaginal bleeding, and dysuria. The condition encompasses several diseases, such as endometritis, salpingitis, fallopian tube abscess, tubo-ovarian abscess (TOA), and pelvic peritonitis [ 2 ]. Clinical presentations can range from no symptoms to life-threatening TOA [ 32 ]. PID patients have weakened immune defenses, making them vulnerable to bacteria, which can potentially lead to the development of sepsis. TOA is a common complication of PID that affects approximately 30% of hospitalized women with PID. If not treated promptly, TOA can lead to serious complications like sepsis and multiple organ failure. Therefore, early recognition, therapeutic management, and proactive prevention are crucial in reducing the incidence of sepsis [ 33 ]. Treatment for acute PID with TOA poses challenges, and no consensus has been reached on the optimal strategy. The first-line approach usually involves the empirical use of broad-spectrum antibiotics [ [34] , [35] , [36] , [37] ]. However, approximately 25%–30% of the patients do not respond to this treatment and require surgical intervention [ 37 , 38 ]. Age, WBC counts, and diameter of the adnexal mass [ 39 ] are potential risk factors for pharmacological treatment failure [ 40 , 41 ]. In cases of pelvic abscesses or suspected inflammatory masses that persist or are suspected to have ruptured after 48 h of antimicrobial use, exploratory laparotomy is recommended to prevent sepsis or septic peritonitis. TOA-induced sepsis is the primary cause of high mortality in PID patients. Current diagnostic methods rely on clinical characteristics and laboratory tests, but they have limitations, particularly with a possibility of misdiagnosis in patients with atypical symptoms [ 42 ]. In acute settings, computed tomography (CT) scan is advantageous in distinguishing acute PID from appendicitis and other inflammatory or pelvic masses. Therefore, accurate imaging techniques are crucial for diagnosing late effects such as suppurative tubitis, tubal ovarian abscess, chronic pelvic pain, and infertility. When confronted with right lower abdominal pain, CT can be effectively employed to distinguish TOA from acute appendicitis. In TOA patients, the incidence of ovarian abnormalities, periovarian fat deposition, and rectal sigmoid wall thickening is significantly higher than that in acute appendicitis patients, whereas the incidence of cecal wall thickening and pericecal fat deposition in acute appendicitis patients is significantly higher in acute appendicitis patients. Among TOA patients, the most accurate CT feature involves identifying the inflamed mass with involvement of the ovarian veins and deep pelvic fat, extending to the contralateral pelvic fat [ 43 ]. On contrast-enhanced CT images, the primary manifestation of fallopian tube ovarian abscess typically presents as a multiform septal cystic mass with thick and uniformly enhanced wall within the appendix [ 44 ]. Sepsis induced by PID can be due to infections in the upper reproductive system infection, including endometritis, salpingitis, ovulitis, tubal and ovarian abscess, and gynecological pelvic peritonitis. Other causes may involve ovarian venous thrombophlebitis, uterine myoma in aseptic necrosis, and necrotic ovary in undiagnosed adnexal torsion [ 45 ]. Unmanaged sepsis can cause systemic vascular dilatation, peripheral vascular paralysis, and a significant decrease in effective circulatory blood volume [ 46 ], eventually progressing to tissue hypoperfusion (shock) and MODS. The timely identification of shock and early and sufficient fluid resuscitation are crucial factors that influence patient prognosis in PID-induced sepsis. The morbidity and mortality of PID-induced sepsis are directly associated with the duration and severity of hypoperfusion. Moreover, proper use of glucocorticoids, recombinant human activated protein-C, and vasopressors can contribute to improved outcomes, with strict blood glucose control and measures to prevent deep venous thrombosis and stress ulcers. Compared to recent studies on PID and concurrent sepsis, our investigation confirmed the significance of several key clinical indicators. Recent basic and clinical research has established that systemic immune dysfunction, including increased WBC count, neutrophil infiltration, and systemic immune inflammation index [SII = (platelet count * neutrophils)/lymphocytes], correlates with the progression of infection and sepsis-related mortality in patients with PID and concurrent sepsis [ [47] , [48] , [49] , [50] , [51] , [52] ]. WBC count, a representative biomarker in PID, primarily serves as a medium-term to short-term indicator reflecting the management of acute or chronic inflammatory responses and is commonly utilized for the diagnosis of sepsis. Research has underscored the clinical predictive value of cell count in the development and prognosis of TOA [ 53 ]. A previous study has reported the predictive value of total WBC and classified WBC counts for different sepsis stages [ 54 ]. Additionally, PCT or the combination of WBC and PCT has shown promising diagnostic and prognostic abilities [ 55 ]. In a prospective observational study focusing on early sepsis in elderly patients, the Modified Early Warning Score (MEWS) ≥ 3, WBC counts ≥ 11 × 10 9 /L, neutrophil-to-lymphocyte ratio (NLR) ≥ 8, and monocyte distribution width (MDW) ≥ 20 have presented the highest diagnostic accuracy in all subgroups based on age [ 56 ]. Moreover, PLT count [ 57 ] has also been identified as a potential risk factor, with the extent of thrombocytopenia in sepsis being correlated with disease severity and prognosis. The precise etiology of this type of thrombocytopenia remains elusive; however, current evidence suggests that it may result from infection-induced inhibition of platelet production by megakaryocytes [ 58 ]. Research findings have highlighted that patients who do not respond to conservative TOA treatment exhibit significantly higher PLT counts upon admission compared to the successful treatment group [ 59 ]. The percentage of immature platelet fraction (IPF%) is considered an early cellular biomarker for predicting sepsis progression. Elevated IPF% and PCT levels are associated with sepsis, with IPF% negatively correlated with PLT counts [ 54 ]. Moreover, the rate of platelet aggregation has been identified as an independent predictor for 28-day mortality. Mitochondrial dysfunction in platelets is observed early in patients with sepsis, leading to impaired PLT mitochondrial activity, membrane depolarization, and ultrastructural destruction. This results in a significant decrease in adenosine triphosphate (ATP) and mitochondrial membrane potential (MMP) but an increase in the opening of mitochondrial permeability transition pore (mPTP), which is positively associated with the PLT aggregation rate [ 60 ]. Microvascular hypoperfusion is a typical pathological feature of severe sepsis and septic shock [ 61 ]. The neutrophil extracellular traps (NET)-PLT-thrombin axis is implicated in promoting sepsis-induced intravascular coagulation and microvascular dysfunction [ 62 ]. Our analysis revealed a notable association between the use of hemodialysis therapy and the susceptibility to sepsis in PID patients. Recent research has illustrated that hemodialysis treatment heightens the occurrence of infectious complications, especially sepsis [ [63] , [64] , [65] ]. Consistently, our study indicated an elevated risk of sepsis development in PID patients who had undergone hemodialysis treatment. Hemodialysis (HD) plays a crucial role in predicting the outcomes of sepsis. The risk of sepsis in HD patients is primarily associated with bloodstream infections related to vascular access, which are commonly observed in dialysis patients using central venous catheters [ 66 , 67 ]. This makes them particularly vulnerable to sepsis as it allows pathogenic bacteria to enter their bloodstream—patients undergoing hemodialysis experience significant alterations in microbial species, metabolic pathways, antibiotic resistance, and virulence factors. Notably, specific factors like erythromycin-resistant methylase, pyridoxine 5-phosphate oxidase, and streptomycin-acetyltransferase show substantial increases [ 64 ]. Interestingly, we found that pneumonia is a contributing factor to the onset of PID and its progression to sepsis. This observation is in line with Jamie Morgan's description that severe pneumonia can lead to acute respiratory distress syndrome requiring mechanical ventilation, as well as severe sepsis and septic shock syndrome [ 68 ]. Furthermore, our results suggest that glucocorticoids represent a risk factor for sepsis, aligning with our initial hypothesis. During the course of sepsis onset and progression, dual factors invariably contribute to both hyperinflammation and immune suppression. Early administration of glucocorticoids in patients with sepsis can impede their ability to combat infection, ultimately leading to immune paralysis and uncontrolled inflammation. In brief, the five factors included in the nomogram are reliable prognostic factors for sepsis in PID patients, making them valuable tools in clinical practice. Healthcare professionals can use these factors to implement appropriate treatment measures, effectively reducing the risk of PID progressing to sepsis. Our analysis results revealed that in the training set, the ROC curves of the RSF model slightly outperformed those of the Cox regression model, indicating better performance of the RSF model, while in the validation set, it's the other way around. Compared with the Cox proportional hazard model, the RSF model allows for cross-validation using internal data to ensure high prediction accuracy, and it can capture the nonlinear effects of predictive variables and the interactions between predictors, without the need for proportional hazards assumption testing [ 24 , 25 ]. Nevertheless, the RSF model is prone to overfitting, while the Cox regression model demonstrates relatively stable performance. In addition, in Table 3 , the HR of peritonitis lower than 1 is statistically significant. This indicates that the detection of early peritonitis is helpful for subsequent treatment. Because there is a certain lag in treatment and drug treatment, patients who fail to detect peritonitis in the early stage usually will not receive the corresponding examination and treatment. In contrast, once peritonitis is diagnosed, patients will have a greater chance to receive targeted treatment and exclude the possibility of reproductive system inflammation, which will help prevent the occurrence and progression of PID. From the perspective of disease definition, there is some overlap between peritonitis and pelvic inflammatory disease, but not all PIDs are peritonitis, and vice versa. Therefore, the early detection and treatment of peritonitis will bring more benefits. Limitations of this study should be noted. Being retrospective and based on the MIMIC-IV database, there may be inevitable data loss and bias. Additionally, the study focused only on European and American populations, potentially leading to deviations in population characteristics. To enhance the model's feasibility, validation with larger external cohorts is necessary, although a validation cohort (30% of the total dataset) was randomly assigned to verify the model's superiority.

Coi Statement

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Data Availability

Data included in article/supp. material/referenced in article.

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

Ask this paper AI returns verbatim quotes from the full text · source: pmc-nxml

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (sparse)

Too few in-corpus citations on either side for a chart; here are the lists.

Cited by (1)

Cited by (1)

Source provenance

europepmc
last seen: 2026-06-13T06:22:48.782012+00:00
unpaywall
last seen: 2026-05-21T05:10:58.409756+00:00
License: CC-BY-NC-ND-4.0