Methods
In compliance with the principles outlined in the Declaration of Helsinki, the study obtained ethical approval from the institutional review boards of the three participating centers. Specifically, approval numbers were PJ20240952 from the First Affiliated Hospital of Anhui Medical University (Centre 1), PJ20241039 from the High-tech Zone Branch of the First Affiliated Hospital of Anhui Medical University (Centre 2), and 2024-KY-88 from the Second People’s Hospital of Wuhu City (Centre 3). As the study employed a retrospective design, the requirement for informed consent was waived. OC patients treated at three participating centers between July 2012 and Mar 2025 were collected. The training cohort was derived from patients consecutively treated at Centre 1 between January 1, 2018, and June 30, 2024. The internal validation cohort comprised patients consecutively treated at the same institution from July 1, 2012, to June 30, 2017. For external validation 1, patients were drawn from Centre 2, where they were consecutively treated from June 1, 2016, to June 30, 2024. Another external validation 2 patients of Centre 3 were consecutively collected from Oct 1, 2017, to Mar 25, 2025. Inclusion and exclusion criteria were detailed in the Supplementary Fig. 4 . This study followed the Standards for Reporting of Diagnostic Accuracy (STARD) and the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD) reporting guidelines (Supplementary Material). A flow chart of the study design was shown in Fig. 6 . Fig. 6 The illustration of the OvcaSurvivor architectures for survival prediction and risk stratification in this study.
The illustration of the OvcaSurvivor architectures for survival prediction and risk stratification in this study.
OS time was from diagnosis to death. In this study, the survival status of OC patients was used as the predictive outcome, with follow-up data collected up to Mar 25, 2025. OS was ascertained through the hospital’s electronic medical record system and telephone follow-up, with cross-verification performed by two independent researchers. For patients who have not experienced the event, the OS is typically recorded up to the date of the last follow-up.
Clinical data were extracted from the electronic health record system, and multiple core prognostic factors were selected, including Age, FIGO stage, Lymph node metastasis, Menopausal status, CA125, HE4, Ki67, Risk of Ovarian Malignancy Algorithm (ROMA) and other indicators. All processed clinical features were integrated into a multi-dimensional feature vector as the input for the model.
Detailed US scanning parameters are provided in the Supplementary Material. The US image preprocessing pipeline consists of standardization and data augmentation. First, raw images are uniformly resized to a resolution of 512 × 512 pixels, converted to RGB three-channel format, and normalized to the [0,1] range to mitigate imaging variations across different devices. Subsequently, data augmentation method such as random horizontal flipping (50% probability), rotation (±15°), Brightness (±20%), and translation (±10%) were employed.
WSI (scanned at 20× magnification) stained with H&E were obtained by DST300 scanner (Jiangsu Disite Medical Technology Co., Ltd.), stored in TIFF format. Pixel-level background segmentation is first performed using Otsu’s thresholding method to automatically distinguish effective tissue regions from blank backgrounds and remove irrelevant noise. The preprocessed WSI is then cropped into 256 × 256-pixel non-overlapping patches at a ×10 magnification (resolution: 1.0 μm/pixel), comprehensively covering diverse regions such as tumors, stroma, and normal tissues to provide high-quality baseline data for subsequent feature extraction.
We develop OvcaSurvivor, a multimodal deep learning framework for OC survival prediction that integrates WSI, US, and clinical data through attention-guided gated fusion. The framework first integrates a pre-trained clinical histopathology imaging evaluation foundation (CHIEF) 25 to extract cellular-level features from WSI and adapts ResNet50 to derive structural features from US images. Then, a cross-attention module establishes fine-grained semantic correlation between US and WSI features, enabling precise alignment of microscopic histological details from WSI with macroscopic structural patterns in US. Concurrently, a gating unit adaptively learns the weight allocation of clinical data within multimodal fusion to suppress noise from irrelevant modalities. Finally, the fused multimodal features are mapped to a continuous risk score via fully connected layers, where a higher score indicates worse prognosis.
For feature learning from US, the ResNet50 network was employed. The model was initialized with ImageNet pre-trained weights and fine-tuned on our dataset to capture critical morphological features of OC, including lesion shape, size, and border clarity. The detailed procedure used for hyperparameter tuning and training configuration were displayed in Supplementary Material.
CHIEF first captures features from each patch divided from the WSI, generating patch-level feature vectors. During whole-slide integration, these patch features are integrated with other patch features from the same WSI through a deep attention module to compute adaptive weights, which focus on tumor core regions and suppress noise. Concurrently, anatomical site semantic vectors encoded are incorporated, ultimately generating whole-slide feature vectors that fuse micro-morphological characteristics with organ-specific contextual information.
Following modality-specific feature learning, a two-stage hybrid fusion framework was developed to address the heterogeneity of multimodal data in survival prediction. The first stage, local fusion, employs cross-attention mechanisms for US-WSI feature alignment. US features serve as Query vectors to locate structural characteristics of suspicious tumor regions, while cell-level WSI features act as Key-Value pairs providing histological details. Cosine similarity between Query and Key generates dynamic attention weights, enabling the model to focus on pathologically critical regions semantically aligned with US abnormalities. Weighted WSI values are then combined with original US features via residual connections, preserving anatomical context while integrating cellular-level discriminative details guided by pathology.
The second stage, global fusion, involves clinically semantic-guided gated multimodal integration: building on locally fused visual features, a gated multimodal unit models nonlinear interactions between clinical data and imaging features. First, the combined US and WSI features undergo dimensionality reduction to retain critical information, while clinical features are nonlinearly encoded into semantic vectors, forming the basis for subsequent multimodal fusion. A gating mechanism then generates adaptive weight matrices (0–1 range via Sigmoid activation) to balance contributions: weights near 1 emphasize imaging features, while weights near 0 prioritize clinical priors. Finally, weighted multimodal features are mapped to continuous risk scores through fully connected layers, enabling joint survival prediction that integrates fine-grained intermodal interactions and holistic information complementarity.
To validate the specific contributions of our model’s key components, we conducted an ablation study. We compared the performance of the full OvcaSurvivor model against three simplified variants: one where the cross-attention module for US-WSI fusion was replaced with simple feature concatenation (“w/o Cross-Attention”); another where the gated unit for fusing imaging and clinical data was similarly replaced (“w/o Gating Unit”); and a “Simple Fusion Baseline” where both modules were removed and all three modalities were fused via direct concatenation. This analysis was designed to isolate the performance gains attributable to our proposed fusion mechanisms versus the simple aggregation of multimodal data.
Results
Table 1 lists the clinical characteristics of the training and validation datasets. A total of 543 participants were included in this study. These participants were divided into a training cohort ( n = 261, 48.07%), an internal validation cohort ( n = 66, 12.15%), an external validation cohort 1 ( n = 157, 28.91%) and an external validation cohort 2 ( n = 59, 10.87%). The median age was 53.00 years (Q₁, Q₃: 45.00, 59.00) overall, with no statistically significant differences between groups ( P = 0.477). Similarly, no differences were observed for lesion size ( P = 0.898) or HE4 levels ( P = 0.181). Statistically significant differences were found for CA125 ( P = 0.035), CA19-9 ( P < 0.001), AFP ( P < 0.001), Ki67 ( P = 0.031), and ROMA index ( P = 0.031). Median overall survival (OS) varied across the groups. In the training cohort, the median OS was 34.00 months (IQR: 17.00–50.00), and the mortality rate was 39.46%. The internal validation cohort showed a median OS of 30.00 months (IQR: 17.50–43.50), with a mortality rate of 22.73%, while the external validation cohort 1 had a median OS of 30.00 months (IQR: 19.00–44.00) and a mortality rate of 19.11%. In the external validation cohort 2, the median OS was 48.00 months (IQR: 29.50–60.00), and the mortality rate was 45.76%. Table 2 provides a detailed look at the survival distributions. Table 1 Baseline clinical characteristics of patients in training group and validation groups Variables Total ( n = 543) Training, ( n = 261) Internal validation, ( n = 66) External validation 1, ( n = 157) External validation 2, ( n = 59) Statistic P Age, M (Q₁, Q₃) 53.00 (45.00, 59.00) 53.00 (45.00, 58.00) 54.00 (44.00, 58.75) 54.00 (46.00, 60.00) 52.00 (46.00, 60.50) χ² = 2.49 a 0.477 Lesion size (millimeter), M (Q₁, Q₃) 90.00 (56.00, 121.00) 92.00 (54.00, 120.00) 87.00 (52.25, 126.50) 86.00 (61.00, 123.00) 98.00 (63.50, 124.50) χ² = 0.59 a 0.898 CA125 (U/ml), M (Q₁, Q₃) 123.50 (31.30, 496.25) 117.15 (20.24, 513.00) 76.11 (19.57, 391.02) 165.20 (47.90, 600.00) 91.00 (50.50, 304.50) χ² = 8.60 a 0.035 CA19-9 (kU/L), M (Q₁, Q₃) 128.00 (25.00, 253.50) 94.00 (33.00, 247.00) 137.50 (121.25, 317.75) 192.00 (154.00, 348.00) 15.00 (8.55, 21.00) χ² = 146.74 a <0.001 HE4 (pmol/L), M (Q₁, Q₃) 141.00 (58.91, 460.55) 169.00 (55.77, 552.00) 128.00 (60.13, 451.92) 164.40 (59.21, 480.60) 115.00 (76.50, 165.50) χ² = 4.87 a 0.181 AFP (IU/mL), M (Q₁, Q₃) 42.00 (7.00, 122.50) 45.00 (7.00, 95.00) 92.00 (7.00, 144.75) 114.00 (7.00, 135.00) 6.10 (4.45, 10.25) χ² = 84.69 a <0.001 Ki67 (%), M (Q₁, Q₃) 40.00 (30.00, 60.00) 40.00 (30.00, 60.00) 40.00 (30.00, 60.00) 40.00 (30.00, 60.00) 30.00 (15.00, 50.00) χ² = 8.84 a 0.031 ROMA, M (Q₁, Q₃) 0.31 (-1.60, 2.67) 0.77 (-1.86, 3.02) 0.02 (-1.73, 2.94) 0.43 (-1.23, 2.64) -0.38 (-1.41, 0.52) χ² = 8.86 a 0.031 FIGO stage, n(%) χ² = 121.20 <0.001 I 103 (18.97) 25 (9.58) 10 (15.15) 46 (29.30) 22 (37.29) II 58 (10.68) 15 (5.75) 5 (7.58) 24 (15.29) 14 (23.73) III 356 (65.56) 216 (82.76) 51 (77.27) 77 (49.04) 12 (20.34) IV 26 (4.79) 5 (1.92) 0 (0.00) 10 (6.37) 11 (18.64) Invasion of the fallopian tube, n (%) – 0.018 absence 375 (69.06) 166 (63.60) 50 (75.76) 119 (75.80) 40 (67.80) presence 167 (30.76) 95 (36.40) 16 (24.24) 38 (24.20) 18 (30.51) Menopause, n (%) χ² = 8.40 0.038 no 375 (69.06) 166 (63.60) 50 (75.76) 119 (75.80) 40 (67.80) yes 168 (30.94) 95 (36.40) 16 (24.24) 38 (24.20) 19 (32.20) Vaginal bleeding, n (%) χ² = 26.15 <0.001 absence 492 (90.61) 238 (91.19) 62 (93.94) 149 (94.90) 43 (72.88) presence 51 (9.39) 23 (8.81) 4 (6.06) 8 (5.10) 16 (27.12) Metastatic lymph nodes, n (%) χ² = 51.62 <0.001 Absence 278 (51.20) 123 (47.13) 32 (48.48) 67 (42.68) 56 (94.92) Presence 265 (48.80) 138 (52.87) 34 (51.52) 90 (57.32) 3 (5.08) χ² Chi-square test, – Fisher exact, M Median, Q ₁ 1st Quartile, Q ₃ 3st Quartile. a Kruskal–waills test. Table 2 Survival data summary of ovarian cancer patients across cohorts Metric Training cohort ( n = 261) Internal validation ( n = 66) External validation 1 ( n = 157) External validation 2 ( n = 59) Death events ( n , %) 103 (39.46%) 15 (22.73%) 30 (19.11%) 27 (45.76%) Censored events ( n , %) 158 (60.54%) 51 (77.27%) 127 (80.89%) 32 (54.24%) Median survival time (months) 34.00 (IQR: 17.00–50.00) 30.00 (IQR: 17.50–43.50) 30.00 (IQR: 19.00–44.00) 48.00 (IQR:29.50–60.00) Median follow-up time (months) 39.00 (IQR: 17.00–53.00) 33.00 (IQR: 16.00–43.50) 32.00 (IQR: 19.00–42.00) 48.00 (IQR:29.50–60.00) Survival time range (months) 3.00–82.00 7.00–75.00 4.00–59.00 15.00–77.00 Values are presented as n (%), median (IQR), or range. IQR Interquartile range.
Baseline clinical characteristics of patients in training group and validation groups
χ² Chi-square test, – Fisher exact, M Median, Q ₁ 1st Quartile, Q ₃ 3st Quartile.
a Kruskal–waills test.
Survival data summary of ovarian cancer patients across cohorts
Values are presented as n (%), median (IQR), or range.
IQR Interquartile range.
OvcaSurvivor achieved superior performance (C-index: 0.81 [0.77–0.85] internally; 0.76 [0.68–0.84] externally 1; C-index: 0.70 [0.69–0.80] externally 2), outperforming single-modality models (C-index: 0.60–0.72, Table 3 ). Both internal and external validation showed the OvcaSurvivor’s AUC for 1-, 3-, and 5-year survival predictions exceeded unimodal US or WSI models ( P < 0.05). Due to the absence of 5-year survival events in the external validation cohort 1 and the lack of 1-year observation data in external validation cohort 2, ROC curve in the cohorts cannot be plotted. The time-dependent ROC curves for other models in the internal cohort, external validation cohort 1, and external validation cohort 2 are presented in Fig. 1a and Supplementary Fig. 1 . The DCA evaluated the clinical utility of OvcaSurvivor compared to unimodal approaches (US, WSI) and baseline strategies (Fig. 1b and Supplementary Fig. 1 ). Based on our data, we also compared the results of another two WSI-based models 16 , 17 , as shown in the Supplementary Table 1 , OvcaSurvivor model is optimal. Fig. 1 The time-dependent receiver operating characteristic (ROC) and Decision curve analysis (DCA) of survival prediction models over 1, 3, and 5 years in internal and external validation group 1. a Time-dependent ROC comparing the predictive performance of each model for 1-year, 3-year, and 5-year survival outcomes. b DCA evaluating the clinical utility of the survival models by quantifying net benefits across different threshold probabilities for 1-year, 3-year, and 5-year survival predictions. Table 3 Comparison of prediction performance of different methods in internal and external validation Ultrasound WSI WSI+Ultrasound WSI+Ultrasound+Clinical Data Internal External 1 External 2 Internal External 1 External 2 Internal External 1 External 2 Internal External 1 External 2 C-index 0.70 (0.65–0.75) 0.64 (0.56–0.72) 0.61 (0.57–0.70) 0.72 (0.67–0.77) 0.66 (0.58–0.74) 0.63 (0.59–0.72) 0.79 (0.74–0.84) 0.74 (0.66–0.82) 0.62 (0.61–0.74) 0.81 (0.76–0.86) 0.76 (0.68–0.84) 0.70 (0.69–0.80) AUC (1 year) 0.73 (0.65–0.81) 0.68 (0.61–0.75) Na 0.77 (0.70–0.84) 0.73 (0.66–0.80) Na 0.81 (0.73–0.89) 0.76 (0.69–0.83) Na 0.82 (0.74–0.90) 0.78 (0.71–0.85) Na AUC (3 year) 0.69 (0.60–0.78) 0.66 (0.58–0.74) 0.60 (0.58–0.74) 0.73 (0.64–0.82) 0.70 (0.62–0.78) 0.61 (0.60–0.69) 0.74 (0.65–0.83) 0.70 (0.62–0.78) 0.63 (0.61–0.74) 0.76 (0.67–0.85) 0.73 (0.65–0.81) 0.62 (0.61–0.74) AUC (5 year) 0.65 (0.55–0.75) Na 0.64 (0.60–0.71) 0.68 (0.58–0.78) Na 0.62 (0.58–0.65) 0.69 (0.59–0.79) Na 0.61 (0.60–0.72) 0.70 (0.60–0.80) Na 0.72 (0.64–0.77) Values in parentheses are confidence intervals. Unless otherwise indicated, confidence intervals are 95%. WSI Whole-Slide Images, AUC Area Under Curve, Na Not available.
a Time-dependent ROC comparing the predictive performance of each model for 1-year, 3-year, and 5-year survival outcomes. b DCA evaluating the clinical utility of the survival models by quantifying net benefits across different threshold probabilities for 1-year, 3-year, and 5-year survival predictions.
Comparison of prediction performance of different methods in internal and external validation
Values in parentheses are confidence intervals. Unless otherwise indicated, confidence intervals are 95%.
WSI Whole-Slide Images, AUC Area Under Curve, Na Not available.
The importance of multimodal features was visualized using a radar plot (Fig. 2a ). As depicted, the WSI features exhibited the highest contribution, followed by US features and clinical data. The adaptive allocation of weights between clinical and imaging modalities in a multimodal fusion model were shown in Fig. 2b . As the FIGO stage increases, the contribution weight of the clinical modality (blue) gradually decreases, while that of the imaging modality (orange) increases. In early stages (I–II), the model primarily relies on clinical indicators for prognostic stratification. In advanced stages (III–IV), the model elevates the weights of US and WSI through a gating mechanism to capture the potiential heterogeneity. Fig. 2 Modality contribution, weight distribution by FIGO stage, and cross-attention analysis. a Radar chart of the importance of the contributions of whole-slide image (WSI), ultrasound (US), and clinical modalities to the model. b Contribution of weight of clinical vs. imaging modalities by International Federation of Gynecology and Obstetrics (FIGO) stage. c Cross-attention matrix analysis between whole-slide image (WSI) and ultrasound (US) modalities. Tumor-specific regions demarcated by yellow bounding boxes exhibit prominent high attention weights.
a Radar chart of the importance of the contributions of whole-slide image (WSI), ultrasound (US), and clinical modalities to the model. b Contribution of weight of clinical vs. imaging modalities by International Federation of Gynecology and Obstetrics (FIGO) stage. c Cross-attention matrix analysis between whole-slide image (WSI) and ultrasound (US) modalities. Tumor-specific regions demarcated by yellow bounding boxes exhibit prominent high attention weights.
To elucidate how OvcaSurvivor adaptively aligns WSI and US features, we visualized the cross-attention matrix between WSI patches and US image regions (Fig. 2c ) to substantiate the credibility of personalized survival prediction at the feature level, demonstrating that risk score generation relies on synergistic discrimination of cross-modality critical biological signatures rather than isolated information from a single modality. To further explain the biological basis of the model’s predictions, we employed gradient-weighted class activation mapping (Grad-CAM) techniques to visualize the key decision-making regions within both the WSI and US modalities of the OvcaSurvivor (Supplementary Fig. 2 ). In the Grad-CAM heatmaps of WSI (Supplementary Fig. 2d ), high-attention regions (marked in red) were primarily concentrated in areas of dense tumor cells and at the tumor-stroma interface. Notably, strong signals were observed in regions with significant nuclear pleomorphism and elevated nuclear-to-cytoplasmic ratio, aligning with the CHIEF network’s logic for extracting cell-level features from WSI and validating the model’s precise capture of prognostic morphological features. For the Grad-CAM visualization of US images (Supplementary Fig. 2a–c ), high-attention regions of the model prioritized the solid components and irregular boundaries of ovarian lesions, both of which are critical US indicators of malignant risk according to O-RADS classification. When jointly analyzing the Grad-CAM heatmaps from WSI and US, the model exhibited higher attention to the solid components of the US lesions corresponding to the tumor infiltration zones in WSI than to other regions. This further substantiates the semantic alignment capability of the cross-attention module, providing a traceable biological interpretation for the model’s predictions.
Based on the median risk score (value of 0.62) output by OvcaSurvivor, patients were stratified into high-risk and low-risk groups, and KM survival curves were plotted (Fig. 3 ). The KM survival analysis demonstrated the prognostic discriminative ability of the risk stratification, with the curve showing that the median survival time of patients in the high-risk group was lower than that in the low-risk group. In all validation cohorts, the survival curves of the two groups (high-risk vs. low-risk) were separated, and bilateral log-rank tests confirmed differences between groups ( P < 0.05). Additionally, KM curve analysis for unimodal and US plus WSI models were sequentially performed. The results showed that the curve separation of the OvcaSurvivor was superior to that of unimodal model in external group. Multivariate Cox analysis further identified risk score as an independent risk factor, as shown in Fig. 4 and Supplementary Table 2 , with an HR of 6.01 (95% CI: 3.57–10.10). To ensure the reliability of the Cox Proportional Hazards model used for survival analysis and risk stratification, we first validated the Proportional Hazards assumption. This was assessed via the Schoenfeld residuals test, a standard statistical method to evaluate whether the hazard ratio of covariates remains constant over time. The test result yielded a p -value of 0.641 (>0.05), indicating no violation of the Proportional Hazards assumption. Based on the results of the multivariate Cox analysis, we also constructed a clinical-only model. The time-dependent ROC curves for this clinical-only model across the three groups were shown in Supplementary Fig. 3 , and the results indicated that the clinical-only model had very limited value. Fig. 3 Survival stratification by median risk score. Kaplan–Meier (KM) curve survival analysis of patients divided into high-risk score group and low-risk score group using median risk score as the cutoff value. Fig. 4 Multivariate Cox regression analysis of prognostic factors. Forest plot of multivariate cox regression.
Kaplan–Meier (KM) curve survival analysis of patients divided into high-risk score group and low-risk score group using median risk score as the cutoff value.
Forest plot of multivariate cox regression.
To further validate the model’s generalizability, subgroup analyses were performed based on age, menopausal status, and FIGO stage. Age was divided into groups according to the median, FIGO stage was categorized into early stage (I–II) and advanced stage (III–IV), and menopausal status was classified as premenopausal and postmenopausal. Within the internal validation cohort, there were 15 deaths. Mortality was observed in two patients aged ≤49 years and in 13 patients aged >49 years. Mortality counts stratified by FIGO stage were four for early stage (I–II) and 11 for advanced stage (III–IV). Furthermore, four deaths occurred in premenopausal patients, and 11 in postmenopausal patients. A total of 30 deaths were observed in the external validation cohort 1. Of these, five deaths were attributed to patients aged ≤49 years, while 25 deaths were attributed to patients aged >49 years. Four deaths were in patients with early-stage (I–II) disease, and 26 deaths were with advanced-stage (III–IV) disease. Lastly, seven deaths occurred in premenopausal patients and 23 in postmenopausal patients. Analysis of external validation cohort 2 revealed a total of 27 deaths. Within this cohort, 5 deaths occurred in aged ≤49 years, while 22 deaths were observed in those aged >49 years. Regarding FIGO staging, 4 deaths were associated with Stage I–II, and 23 deaths were reported in Stage III–IV. By menopausal status, 10 deaths were recorded in the premenopausal, and 17 deaths occurred in the postmenopausal. Within each subgroup, the C-index of OvcaSurvivor and the Kaplan–Meier curves for risk stratification were calculated. The risk score demonstrated significant stratification power in most subgroups (log-rank P 49 years ( P = 0.12) and early-stage cases (FIGO I–II, P = 0.099) within the internal cohort, premenopausal patients ( P = 0.084) within the external cohort 1, and in patients aged ≤49 years ( p = 0.41) and FIGO stage III–IV ( p = 0.3) within the external cohort 2, as detailed in Table 4 and Fig. 5 . Fig. 5 Subgroup survival analysis by Age, menopausal status, and FIGO stage. Kaplan–Meier (KM) curve analysis was conducted. Table 4 Subgroup analysis of predictive performance of OvcaSurvivor (C-index and AUC) by FIGO stage, age, and menopausal status Subgroup Category Validation C-index AUC (1 year) AUC (3 years) AUC (5 years) FIGO stage I–II Internal 0.74 (0.69–0.79) 0.76 (0.73–0.79) 0.72 (0.68–0.76) 0.66 (0.61–0.71) External 1 0.69 (0.65–0.73) 0.70 (0.65–0.75) 0.68 (0.62–0.74) Na External 2 0.69 (0.60–0.71) Na 0.69 (0.61–0.74) 0.73 (0.66–0.79) III–IV Internal 0.82 (0.78–0.86) 0.83 (0.79–0.87) 0.78 (0.73–0.83) 0.73 (0.66–0.80) External 1 0.78 (0.74–0.82) 0.79 (0.73–0.85) 0.75 (0.68–0.82) Na External 2 0.67 (0.62–0.73) Na 0.72 (0.67–0.80) 0.72 (0.64–0.81) Age (median) ≤49 Internal 0.77 (0.71–0.83) 0.79 (0.73–0.85) 0.76 (0.70–0.82) 0.66 (0.61–0.77) External 1 0.74 (0.68–0.80) 0.76 (0.68–0.84) 0.71 (0.64–0.78) Na External 2 0.72 (0.62–0.84) Na 0.75 (0.66–0.89) 0.66 (0.63–0.75) >49 Internal 0.81 (0.76–0.86) 0.84 (0.79–0.89) 0.79 (0.74–0.84) 0.71 (0.65–0.77) External 1 0.77 (0.72–0.82) 0.79 (0.72–0.86) 0.74 (0.66–0.82) Na External 2 0.78 (0.73–0.82) Na 0.75 (0.69–0.83) 0.72 (0.67–0.81) Menopausal status Pre-menopausal Internal 0.79 (0.73–0.85) 0.78 (0.71–0.85) 0.76 (0.67–0.85) 0.70 (0.60–0.80) External 1 0.73 (0.64–0.82) 0.74 (0.67–0.81) 0.73 (0.65–0.81) Na External 2 0.72 (0.61–0.80) Na 0.72 (0.61–0.80) 0.69 (0.63–0.79) Post-menopausal Internal 0.80 (0.75–0.85) 0.81 (0.73–0.89) 0.77 (0.67–0.87) 0.70 (0.59–0.81) External 1 0.77 (0.67–0.87) 0.78 (0.68–0.88) 0.75 (0.66–0.84) Na External 2 0.73 (0.62–0.81) Na 0.72 (0.60–0.82) 0.68 (0.62–0.77) Values in parentheses are confidence intervals. Unless otherwise indicated, confidence intervals are 95%. AUC Area Under Curve, Na Not available.
Kaplan–Meier (KM) curve analysis was conducted.
Subgroup analysis of predictive performance of OvcaSurvivor (C-index and AUC) by FIGO stage, age, and menopausal status
Values in parentheses are confidence intervals. Unless otherwise indicated, confidence intervals are 95%.
AUC Area Under Curve, Na Not available.
To isolate the contributions of our proposed fusion components, we conducted an ablation study, with results summarized in Table 5 . The full OvcaSurvivor model achieved the highest C-index and AUC values on both validation cohorts. Notably, removing the cross-attention module or the gating unit led to a discernible drop in performance, indicating their positive contributions. The simple fusion model, which lacks both mechanisms, showed the largest performance degradation. These results strongly suggest that the performance gains of OvcaSurvivor are driven by our sophisticated fusion architecture, rather than solely by the aggregation of multimodal data. Table 5 Performance comparison of ablation models Model variant Validation C-index AUC (1 year) AUC (3 year) AUC (5 year) OvcaSurvivor (Full) Internal 0.81 (0.76–0.86) 0.82 (0.74–0.90) 0.76 (0.67–0.85) 0.70 (0.60–0.80) External 1 0.76 (0.68–0.84) 0.78 (0.71–0.85) 0.73 (0.65–0.81) Na External 2 0.70 (0.69–0.80) Na 0.62 (0.61–0.74) 0.72 (0.64–0.77) w/o gating unit Internal 0.79 (0.74–0.84) 0.80 (0.72–0.88) 0.74 (0.66–0.82) 0.70 (0.65–0.80) External 1 0.74 (0.66–0.82) 0.76 (0.69–0.83) 0.71 (0.63–0.79) Na External 2 0.67 (0.67–0.75) Na 0.60 (0.61–0.74) 0.69 (0.64–0.79) w/o cross-attention Internal 0.78 (0.73–0.83) 0.79 (0.71–0.87) 0.73 (0.65–0.81) 0.69 (0.65–0.75) External 1 0.73 (0.65–0.81) 0.75 (0.68–0.82) 0.70 (0.62–0.78) Na External 2 0.66 (0.62–0.74) Na 0.61 (0.58–0.71) 0.68 (0.59–0.72) Simple fusion Internal 0.75 (0.70–0.80) 0.76 (0.68–0.84) 0.70 (0.62–0.78) 0.63 (0.58–0.72) External 1 0.70 (0.62–0.78) 0.72 (0.64–0.80) 0.67 (0.59–0.75) Na External 2 0.65 (0.60–0.71) Na 0.64 (0.61–0.74) 0.65 (0.60–0.69) AUC Area Under Curve, Na Not available.
Performance comparison of ablation models
AUC Area Under Curve, Na Not available.
Discussion
This study presents the OvcaSurvivor framework for personalized survival prediction in patients with R0-resected OC, demonstrating that multimodal feature fusion enhances prognostic accuracy through improved risk stratification.
Accurate prognostication in OC is essential for guiding treatment planning; however, current approaches, including FIGO staging 18 , biomarkers, genomics, lack sufficient precision. Although the FIGO staging system remains the clinical cornerstone, tumor heterogeneity often results in diverse biological behaviors among patients with identical stages. Additionally, imaging modalities such as CT and MRI provide anatomical visualization of tumor morphology but have limited sensitivity for subcentimeter lesions and are subject to interpretive variability due to operator dependency. Serum biomarkers like CA-125 and HE4 are widely used in clinical practice but suffer from specificity issues, for instance, CA-125 levels can be elevated in benign conditions such as endometriosis 19 , while some ovarian malignancies present with serologic negativity. Although genomic signatures including BRCA mutations, hold prognostic significance 20 , 21 , clinical implementation of sequencing technologies faces challenges such as high costs, long turnaround times, and difficulties in analyzing complex polygenic interactions that surpass the capabilities of conventional statistical methods. Our study pioneers the development of a stratified prognostic model predicting 1-, 3-, and 5-year survival rates by employing DL architectures to integrate multimodal data, thereby providing enhanced decision-support tools for precision oncology.
WSI generates high-resolution digital panoramas by computationally scanning conventional histopathological slides, enabling precise visualization of nuclear morphometric features such as size, shape, architectural patterns in OC. Elevated nuclear polymorphism and increased nuclear-to-cytoplasmic ratios are frequently associated with aggressive tumor phenotypes. Complementing histopathological analysis, US offers real-time, non-invasive evaluation for prognostic stratification by assessing morphological characteristics of OC, including cystic-solid components, septal thickness, papillary projections, and mural nodularity, as classified by the Ovarian-Adnexal Reporting and Data System (O-RADS) 22 . Malignant indicators, such as multilocular septations, predominant solid components (>80% composition), and irregular surface contours typically predict unfavorable clinical outcomes. Hemodynamic parameters also contribute to prognostic precision; malignant ovarian neoplasms exhibit characteristic vascular patterns, including reduced resistance indices (RI 1.0), reflecting tumor angiogenesis and invasive potential. The proposed DL framework integrates WSI-derived histomorphometric profiles with US biomarkers, creating a biologically grounded prognostic model that simultaneously captures cellular-level morphological features and macroscale tissue characteristics.
Although existing DL methodologies have shown promise in predicting survival outcomes for OC patients, current approaches face significant limitations in multimodal integration and temporal resolution. For example, Camilla Nero et al. 11 applied DL solely to digitized H&E-stained WSIs to predict progression-free survival (PFS) in epithelial OC, achieving an AUC of 0.71. Similarly, Giacomo Avesani et al. 12 combined CT-based radiomic features with DL architectures for PFS prediction, but their framework was limited to unimodal or bimodal data inputs. Notably, these studies focused on fixed timepoints (e.g., 1-year PFS) and reported suboptimal discriminative performance, with AUC values ranging from 0.56 to 0.71. In contrast, our multimodal model demonstrated superior performance compared to single-modality baselines, achieving a C-index of 0.81 (95% CI: 0.77–0.85) and 0.76 (95% CI: 0.68–0.84) in internal and external validation cohorts, respectively. Feature importance and Cox regression analyses revealed that WSI-derived features and the risk score were the most significant contributors to the final model. Moreover, our multimodal model accurately predicted OS rates at 1, 3, and 5 years, with AUC values of 0.82, 0.76, and 0.70 during internal validation, and 0.78, 0.73 during external validation 1, and 0.62, 0.72 during external validation 2, highlighting its robustness and superior information complementarity over time. The ablation study further confirmed our hypothesis that a simple concatenation of features is insufficient to capture the complex, non-linear interactions between histopathology, ultrasound, and clinical data. The significant performance drop observed upon removing the cross-attention and gating modules underscores their critical role in achieving superior prognostic accuracy.
The adaptive weighting of multimodal features reflects stage-specific biological progression in OC. In early-stage disease (FIGO I–II), characterized by lower tumor heterogeneity and limited metastatic potential, serum biomarkers and clinical staging hold substantial prognostic value 23 , 24 , underscoring the importance of biochemical marker monitoring in these patients. As the disease advances to later stages (FIGO III–IV), the tumor microenvironment becomes markedly more complex. At this stage, US and WSI features become critical for evaluating tumor aggressiveness. The model’s autonomous adjustment of imaging modality weights through a gating mechanism mirrors clinical decision-making processes. This adaptive weighting not only improves the model’s ability to capture spatiotemporal tumor heterogeneity but also creates an interpretable link between disease stage and modality importance.
In this study, the p -values obtained from KM curve analysis and log-rank tests based on risk scores generated by OvcaSurvivor were greater than 0.05 in certain subgroups, indicating a reduced ability of the model to distinguish survival differences within these groups. Specifically, in the internal validation cohort, model performance was limited among postmenopausal women ( P = 0.097), individuals aged over 49 years ( P = 0.12), and early-stage cases (FIGO I–II, P = 0.099). Similarly, in external validation cohort 1, the model showed decreased discriminatory power in premenopausal patients ( P = 0.084). Since there were no statistically significant differences in baseline characteristics such as age and menopausal status between groups, and given the overall small sample size, the refined subgroups, especially the FIGO I–II patients in the internal validation group, which included only 15 cases, further reduced the number of samples available for analysis. This reduction likely limited the statistical power of the log-rank test, explaining why these results did not reach significance. Notably, no significant survival differences were observed in patients aged ≤49 years ( P = 0.41) or those with FIGO stage III–IV disease ( P = 0.3) in external validation cohort 2. For FIGO stage III–IV patients, the absence of survival disparity between high- and low-risk subgroups, as identified by OvcaSurvivor, may be attributed to the inherently high mortality associated with advanced-stage malignancy. In such cases, the aggressive nature of the cancer likely overwhelms potential prognostic differences, leading to uniformly poor outcomes and, consequently, no statistically significant differences in survival. The reason for the lack of significant survival differences among patients aged ≤49 years in external validation cohort 2 remains unclear. In summary, these subgroup analysis results do not imply that OvcaSurvivor is ineffective within these groups; rather, the limited sample sizes may have obscured the model’s true ability to differentiate survival outcomes. Future studies with larger cohorts are needed to more accurately evaluate the model’s performance in these subgroups.
Our study has several limitations that warrant consideration. First, the small sample size in certain subgroups of the validation cohorts, particularly the I–II stage patients, introduces a potential risk of overfitting. Further validation using larger, more diverse datasets are needed. Second, while we demonstrated good performance across external sites, we did not perform explicit stain normalization for WSIs or domain adaptation techniques. Future work should incorporate these strategies to further solidify model robustness for broad clinical adoption.
In conlusion, by harmonizing WSI, US, and clinical data through innovative fusion strategies, this study advances postoperative prognostics tool in OC. Future work should focus on prospective validation and real-world implementation to translate these gains into patient benefit.
Introduction
Ovarian cancer (OC) presents a significant health challenge, being the leading cause of gynecologic cancer-related deaths and ranking among the top six causes of cancer mortality in women worldwide 1 . In 2024, approximately 19,680 new OC diagnoses were diagnosed in the United States, with an estimated 12,740 deaths 2 . The five-year survival rate for women diagnosed with OC remains relatively low, with only about 49% surviving beyond this period.
For patients undergoing surgery, achieving complete resection (R0) is a crucial prognostic factor, however, significant heterogeneity in outcomes persists even within this subgroup. This highlights the urgent need for advancements in personalized prognostic tools. Numerous researchers have attempted to predict OC survival using various approaches. For instance, Wan et al. developed a predictive nomogram based on machine learning radiomics using computed tomography (CT) imaging. Their model demonstrated the time-dependent receiver operating characteristic (ROC) curve values of 0.800, 0.673, and 0.792 for 1-year, 3-year, and 5-year survival predictions, respectively 3 . Cheng et al. developed and validated nomograms using data from the SEER database, reporting concordance indices (C-indices) of 0.751 and 0.702 in the validation cohort 4 . Additionally, Zhang et al. established a risk model and nomogram based on nine glycolysis-related genes, achieving areas under the curve (AUC) of 0.709 and 0.762 for 3- and 5-year survival predictions 5 . Despite these advances, these methods have limitations, particularly in their ability to capture all relevant predictive features. Moreover, although the International Federation of Gynecology and Obstetrics (FIGO) staging system and histopathological grading provide fundamental prognostic stratification, growing evidence reveals considerable heterogeneity in outcomes among R0-resected patients within conventional risk groups. The FIGO staging system can also introduce uncertainties and inconsistencies in prediction. This critical knowledge gap emphasizes the need for precise biomarkers capable of decoding tumor biological aggressiveness.
Deep learning (DL) technologies have demonstrated significant potential due to their robust feature-learning capabilities, enabling them to process high-dimensional data and identify potential biomarkers. Pioneering studies in OC by Chen et al. showed that DL algorithms can effectively distinguish malignant from benign ovarian tumors with high diagnostic performance 6 . Similarly, ultrasound (US) is a valuable tool for revealing OC heterogeneity by identifying key features such as cystic and solid components, nodules, calcifications, and abnormal blood flow patterns. Studies have demonstrated that DL-enabled US can outperform the average diagnostic accuracy of radiologists and rival that of expert US image readers 7 – 9 . Although some research has applied DL to predict survival in OC based on various imaging modalities 10 – 12 , based on different image modalities, these approaches often fail to fully exploit DL’s strengths in image feature learning. They tend to operate in isolation, focusing on single modalities rather than integrating multiple sources, which risks missing synergistic biological signals.
Advancements in computational pathology have demonstrated that whole-slide image (WSI) analysis using DL can extract prognostically significant morphometric patterns that are imperceptible to human observers 13 , 14 . WSIs are high-resolution images of entire tissue section obtained through advanced microscopy scanning technology, capturing intricate details of tissue samples. DL techniques applied to WSIs have shown promise across various fields 13 , 15 . However, studies leveraging WSI for predicting OC survival remain scarce. There is an urgent need to develop novel methods that integrate multiple data sources to improve the accuracy and reliability of survival predictions.
WSI can capture cellular-level tumor-stroma interactions, while US imaging can reveal macro-scale vascular patterns. This study aims to develop a multimodal DL framework that integrates WSI, US imaging, and clinical data to enhance postoperative survival prediction accuracy and risk stratification for patients with R0-resected OC.
Supplementary Material
Revised Supplementary Data
Revised Supplementary Data
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.