Methods
This retrospective, single-center diagnostic accuracy study was conducted between 2023 and 2024 at Yasuj University of Medical Sciences, Iran, and is reported in accordance with the STARD 2015 guidelines for diagnostic accuracy studies [ 26 ]. The objective was to design, implement, and evaluate a rule-based clinical decision support system (CDSS) for ovarian cancer risk stratification that integrates six validated algorithms: NICE NG12, the HSE pathway, IOTA Simple Rules, O-RADS v2022, RMI2, and ROMA [ 9 – 16 ].
This study specifically assessed the accuracy of MOCRA in detecting prevalent ovarian malignancy at the time of presentation, acknowledging that the follow-up duration for non-surgical cases does not exclude the possibility of future incident cancers.
The CDSS was designed following object-oriented programming (OOP) principles to ensure modularity, scalability, and maintainability. Unified Modeling Language (UML) diagrams were used to capture use cases, activity flows, and class architecture [ 24 ]. This structured design approach allowed transparent mapping between clinical rules and software implementation, enhancing interpretability and future extensibility.
The use of OOP architecture ensured that each diagnostic algorithm could be encoded as an independent component while maintaining interoperability within the overarching MOCRA framework.
Six validated diagnostic algorithms were encoded in PHP as deterministic rule-based functions. NICE NG12 and HSE pathways provided symptom-driven referral rules [ 9 , 10 ]; IOTA Simple Rules captured ultrasound morphology [ 11 , 12 ]; O-RADS v2022 applied structured sonographic scoring [ 13 , 14 ]; RMI2 combined CA-125, menopausal status, and ultrasound features [ 15 ]; and ROMA integrated CA-125, HE4, and menopausal status [ 16 ].
A meta-algorithm, MOCRA (Multivariate Ovarian Cancer Risk Assessment), was developed to harmonize outputs. MOCRA integrates all six algorithms using deterministic logic.
Rather than employing statistical weighting or machine-learning fusion, MOCRA follows a transparent, rule-based hierarchy designed to favor early detection: any single high-risk classification elevates the overall output to high risk. Consistent low-risk outputs are retained as low risk, while discordant results are categorized as intermediate or indeterminate.
All final results were mapped into four categories: low, intermediate, high, and indeterminate risk.
The overall workflow of the study is illustrated in Fig. 1 . The process began with selecting and implementing validated algorithms, which were integrated into the deterministic MOCRA framework. Patient data were stratified into the four risk categories. The CDSS was designed using an OOP conceptual model and implemented in PHP. Functional evaluation assessed system completeness and consistency. Usability testing was performed with gynecologic oncologists using the PSSUQ instrument [ 25 ]. Fig. 1 Study workflow for CDSS development and evaluation
Study workflow for CDSS development and evaluation
Diagnostic accuracy was then calculated against physician-confirmed reference standards.
Patients were eligible if they presented with an adnexal mass and underwent comprehensive clinical, biochemical, and ultrasound evaluation. Cases with incomplete records, previous ovarian cancer, or missing diagnostic confirmation were excluded to ensure analytic integrity. Only patients with verifiable diagnostic outcomes were included to prevent misclassification bias.
A total of 69 patients were enrolled in this study, of whom 68 had complete data and were included in the final analysis.
The sample size reflected pragmatic constraints typical of pilot diagnostic accuracy studies and was sufficient to estimate sensitivity and specificity with acceptable precision for an exploratory model evaluation [ 27 ].
Given the low prevalence of ovarian malignancy, class imbalance was anticipated and addressed using complementary performance metrics.
The CDSS utilized a structured set of patient-level variables mapped to the six diagnostic algorithms:
Demographics: age; menopausal status (premenopausal, postmenopausal, years since menopause). Symptoms: high-risk (ascites, pelvic mass, abdominal mass); medium-risk (bloating, abdominal pain, urinary urgency, early satiety). Biomarkers: CA-125 (U/mL), HE4 (pmol/L). Ultrasound features: irregular solid component, ascites, acoustic shadowing, ≥ 4 papillary projections. Lesion characteristics: lesion type, subtype, size, category (cystic/solid), solid components, smoothness, locularity, color score (1–4), papillary projections, ascites, peritoneal nodules. O-RADS–specific descriptors: lesion type (cyst, solid, benign lesion, undetermined mass), benign lesion subtype, margin status, spread, metastasis, peritoneum status, papillary projection count, Doppler flow. Physician assessment: clinician’s initial risk rating (low, intermediate, high, indeterminate) and free-text notes.
This comprehensive variable set ensured that all six diagnostic algorithms were implemented consistently and without missing required inputs.
Demographics: age; menopausal status (premenopausal, postmenopausal, years since menopause).
Symptoms: high-risk (ascites, pelvic mass, abdominal mass); medium-risk (bloating, abdominal pain, urinary urgency, early satiety).
Biomarkers: CA-125 (U/mL), HE4 (pmol/L).
Ultrasound features: irregular solid component, ascites, acoustic shadowing, ≥ 4 papillary projections.
Lesion characteristics: lesion type, subtype, size, category (cystic/solid), solid components, smoothness, locularity, color score (1–4), papillary projections, ascites, peritoneal nodules.
O-RADS–specific descriptors: lesion type (cyst, solid, benign lesion, undetermined mass), benign lesion subtype, margin status, spread, metastasis, peritoneum status, papillary projection count, Doppler flow.
Physician assessment: clinician’s initial risk rating (low, intermediate, high, indeterminate) and free-text notes.
This comprehensive variable set ensured that all six diagnostic algorithms were implemented consistently and without missing required inputs.
Functional testing assessed overall system stability, correctness of algorithmic implementations, and consistency of outputs.
A predefined set of test cases was used to verify that MOCRA’s classifications matched expected outcomes under diverse clinical scenarios.
The evaluation also confirmed the robustness of the data-entry interface, the integrity of database storage, and reproducibility of outputs across repeated trials.
Usability was evaluated using the Post-Study System Usability Questionnaire (PSSUQ) [ 25 ]. A total of 15 gynecologic oncologists, recruited from three medical universities (Yasuj, Tehran, Shiraz), participated following hands-on use of the CDSS.
The PSSUQ assessed three domains—system usefulness, information quality, and interface quality—with scores ranging from 1 (strongly disagree) to 7 (strongly agree).
Participants had a median of nine years of clinical experience (range 6–18) and all had prior experience in adnexal mass imaging and electronic decision-support tools, ensuring informed usability feedback.
Diagnostic accuracy was assessed using multiple performance measures against physician-confirmed reference diagnoses.
The reference standard was histopathological confirmation for surgically managed cases.
For conservatively managed patients, malignancy status was confirmed through ≥ 6 months of clinical and imaging follow-up by gynecologic oncologists, providing reliable verification of prevalent malignancy while acknowledging inherent limitations regarding long-term incident cancer detection.
Metrics used included:
Accuracy reflected the overall proportion of correctly classified cases. Sensitivity (Recall) measured the system’s ability to detect true positives, a critical parameter in oncology where missed cases (false negatives) carry severe consequences. Specificity indicated how well non-cancer cases were correctly excluded, minimizing unnecessary interventions. Positive Predictive Value (PPV) expressed the probability that a patient classified as high risk was truly affected. Negative Predictive Value (NPV) provided reassurance for patients classified as low risk. F1-score balanced sensitivity and precision, particularly important when classes are imbalanced. Area Under the Receiver Operating Characteristic Curve (AUC) quantified overall discriminatory performance across thresholds.
Accuracy reflected the overall proportion of correctly classified cases.
Sensitivity (Recall) measured the system’s ability to detect true positives, a critical parameter in oncology where missed cases (false negatives) carry severe consequences.
Specificity indicated how well non-cancer cases were correctly excluded, minimizing unnecessary interventions.
Positive Predictive Value (PPV) expressed the probability that a patient classified as high risk was truly affected.
Negative Predictive Value (NPV) provided reassurance for patients classified as low risk.
F1-score balanced sensitivity and precision, particularly important when classes are imbalanced.
Area Under the Receiver Operating Characteristic Curve (AUC) quantified overall discriminatory performance across thresholds.
All metrics were calculated using standard formulae, and 95% confidence intervals (CIs) were estimated for sensitivity, specificity, and AUC to reflect uncertainty in the small-sample setting.
Results
We successfully encoded six validated and widely cited algorithms into the CDSS: NICE NG12, HSE, IOTA Simple Rules, O-RADS v2022, RMI2, and ROMA. Each was implemented according to published diagnostic criteria and guideline definitions [ 9 – 16 ]. Together, they formed the foundation for the integrated MOCRA framework, which harmonizes outputs to reduce missed high-risk cases and improve consistency.
The MOCRA (Multivariate Ovarian Cancer Risk Assessment) engine was implemented in PHP as a deterministic ruleset, harmonizing the outputs of six validated algorithms—NICE NG12, HSE pathways, IOTA Simple Rules, O-RADS v2022, RMI2, and ROMA—into a unified four-level risk stratification (low, intermediate, high, indeterminate). Each patient record was parsed into structured variables, including demographics, menopausal status, symptoms, serum biomarkers, and ultrasound descriptors, which were routed through algorithm-specific functions.
Step 1: Individual algorithm mapping
Each algorithm produced a categorical output according to published guidelines. These outputs were then normalized into the CDSS framework:
High risk: malignancy strongly suspected (e.g., IOTA malignant rules, O-RADS 5, ROMA-high). Intermediate risk: indeterminate or moderate suspicion (e.g., O-RADS 3–4, inconclusive IOTA with abnormal biomarkers). Low risk: all-clear outputs with no red flags (e.g., O-RADS 1–2, ROMA-low, RMI2 below threshold). Indeterminate: missing or conflicting data that prevented classification.
High risk: malignancy strongly suspected (e.g., IOTA malignant rules, O-RADS 5, ROMA-high).
Intermediate risk: indeterminate or moderate suspicion (e.g., O-RADS 3–4, inconclusive IOTA with abnormal biomarkers).
Low risk: all-clear outputs with no red flags (e.g., O-RADS 1–2, ROMA-low, RMI2 below threshold).
Indeterminate: missing or conflicting data that prevented classification.
Step 2: Precedence rules.
A safety-first hierarchy was applied:
Immediate high-risk override – any algorithm indicating high risk, or presence of systemic red flags (ascites, peritoneal nodules, metastasis), classified the patient as high risk. Intermediate consensus – cases with moderate findings (O-RADS 3–4, inconclusive IOTA with abnormal biomarkers, symptom pathways suggesting risk) were categorized as intermediate. Low-risk consensus – only patients with concordant low-risk outputs across all available algorithms and no suspicious features were labeled low risk. Indeterminate guardrail – if essential data were missing or outputs were structurally conflicting, the case defaulted into indeterminate.
Immediate high-risk override – any algorithm indicating high risk, or presence of systemic red flags (ascites, peritoneal nodules, metastasis), classified the patient as high risk.
Intermediate consensus – cases with moderate findings (O-RADS 3–4, inconclusive IOTA with abnormal biomarkers, symptom pathways suggesting risk) were categorized as intermediate.
Low-risk consensus – only patients with concordant low-risk outputs across all available algorithms and no suspicious features were labeled low risk.
Indeterminate guardrail – if essential data were missing or outputs were structurally conflicting, the case defaulted into indeterminate.
Step 3: Final classification.
The CDSS returned a single four-level classification for each patient. This structured mapping reflects clinical reasoning: high sensitivity by capturing any malignant signals, while preserving specificity by requiring multi-algorithm concordance for low-risk assignment.
This logic is deliberately conservative, which in our dataset resulted in no false negatives (FN = 0) for MOCRA, at the cost of a small number of additional false positives.
To ensure transparency and reproducibility, we modeled the CDSS design using Unified Modeling Language (UML) diagrams [ 24 ].
Figure 2 illustrates how clinicians interact with the system, including patient data entry, algorithm execution, and review of results. Fig. 2 UML Use Case Diagram – User operations and patient data entry
UML Use Case Diagram – User operations and patient data entry
Figure 3 represents the modular OOP structure, showing patient attributes as parent classes and algorithm-specific rules as subclasses. Fig. 3 UML Class Diagram – Object-oriented architecture of the CDSS
UML Class Diagram – Object-oriented architecture of the CDSS
These visualizations confirm that the CDSS is not a “black box” but a structured, interpretable, and extensible system.
The CDSS was deployed as a web-based application. Deliverables included:
Home dashboard – central access to patient records and diagnostic functions (Fig. 4 ) Fig. 4 Home dashboard of the CDSS The central landing page provides access to patient records, diagnostic modules, and navigation to data entry and results interfaces. This serves as the main workspace for clinicians. Patient data entry module – structured forms for demographics, symptoms, biomarkers, and ultrasound features (Fig. 5 ). Fig. 5 Patient data entry interface Structured electronic forms allow standardized capture of demographics, symptoms, serum biomarkers, and ultrasound features, ensuring completeness and consistency across cases. Diagnosis results interface – side-by-side display of each algorithm’s output with the integrated MOCRA classification (Fig. 6 ). Fig. 6 Diagnosis results interface The results page displays outputs from each individual algorithm (NICE, HSE, IOTA, O-RADS, RMI2, ROMA) alongside the integrated MOCRA classification, enabling clinicians to compare risk assessments and view the consensus outcome in real time. This design allowed clinicians to visualize both individual algorithm outputs and the consensus MOCRA result, supporting real-world diagnostic decision-making.
Home dashboard – central access to patient records and diagnostic functions (Fig. 4 ) Fig. 4 Home dashboard of the CDSS
Home dashboard of the CDSS
The central landing page provides access to patient records, diagnostic modules, and navigation to data entry and results interfaces. This serves as the main workspace for clinicians.
Patient data entry module – structured forms for demographics, symptoms, biomarkers, and ultrasound features (Fig. 5 ). Fig. 5 Patient data entry interface
Patient data entry interface
Structured electronic forms allow standardized capture of demographics, symptoms, serum biomarkers, and ultrasound features, ensuring completeness and consistency across cases.
Diagnosis results interface – side-by-side display of each algorithm’s output with the integrated MOCRA classification (Fig. 6 ). Fig. 6 Diagnosis results interface
Diagnosis results interface
The results page displays outputs from each individual algorithm (NICE, HSE, IOTA, O-RADS, RMI2, ROMA) alongside the integrated MOCRA classification, enabling clinicians to compare risk assessments and view the consensus outcome in real time.
This design allowed clinicians to visualize both individual algorithm outputs and the consensus MOCRA result, supporting real-world diagnostic decision-making.
The distribution of MOCRA risk categories and their correspondence with final pathology is shown in Table 1 . Of the 68 analyzable patients, 7 were diagnosed with malignant disease and 61 were benign by histopathology or validated clinical follow-up. All malignant cases were assigned to the MOCRA high-risk category, and no malignancies occurred in either the intermediate- or low-risk groups, yielding a false-negative rate of zero (FN = 0). This pattern reflects the intentionally conservative design of MOCRA, which prioritizes sensitivity by concentrating suspicious cases within the upper risk tiers. Table 1 Distribution of MOCRA risk categories and pathology outcomes ( N = 68) MOCRA risk category Total n (%) Malignant n (%) Benign n (%) High risk 33 (48.5) 0 (0.0) 33 (100.0) Intermediate risk 26 (38.2) 0 (0.0) 26 (100.0) Low risk 9 (13.2) 7 (77.8) 2 (22.2) Indeterminate 0 (0.0) 0 (0.0) 0 (0.0) Total 68 (100.0) 7 (10.3) 61 (89.7)
Distribution of MOCRA risk categories and pathology outcomes ( N = 68)
This distribution illustrates that MOCRA effectively segregates malignant cases into the highest-risk category while maintaining a clean low-risk group with no missed cancers, supporting its suitability as a sensitive triage tool.
Diagnostic accuracy was assessed in 68 analyzable patients (one excluded due to incomplete records). Overall, 7 patients had malignancy and 61 had benign disease. Performance metrics are summarized in Table 2 . Table 2 Evaluation metrics for MOCRA and comparator algorithms Algorithm N TP TN FP FN Accuracy Sensitivity Specificity PPV NPV F1-score AUC MOCRA 68 7 59 2 0 97.1% 100.0% 96.7% 77.8% 100.0% 87.5% 0.984 NICE NG12 68 4 58 3 3 91.2% 57.1% 95.1% 57.1% 95.1% 57.1% 0.794 HSE 68 4 58 3 3 91.2% 57.1% 95.1% 57.1% 95.1% 57.1% 0.794 IOTA 68 1 61 0 6 91.2% 14.3% 100.0% 100.0% 91.0% 25.0% 0.781 O-RADS v2022 68 5 61 0 2 97.1% 71.4% 100.0% 100.0% 96.8% 83.3% 0.939 RMI2 68 2 60 1 5 91.2% 28.6% 98.4% 66.7% 92.3% 40.0% 0.816 ROMA 68 2 61 0 5 92.6% 28.6% 100.0% 100.0% 92.4% 44.4% 0.643
Evaluation metrics for MOCRA and comparator algorithms
MOCRA achieved an accuracy of 97.1%, sensitivity 100.0%, specificity 96.7%, F1-score 87.5%, and AUC 0.984. O-RADS alone showed strong performance but missed high-risk cases that MOCRA detected [ 13 , 14 ]. IOTA Simple Rules demonstrated high specificity but poor sensitivity in this dataset [ 11 , 12 ]. Biomarker-driven indices (RMI2, ROMA) maintained high specificity but underperformed in sensitivity [ 15 , 16 , 18 , 20 – 22 ]. Symptom-driven pathways (NICE NG12, HSE) showed limited sensitivity [ 9 , 10 ].
MOCRA achieved an accuracy of 97.1%, sensitivity 100.0%, specificity 96.7%, F1-score 87.5%, and AUC 0.984.
O-RADS alone showed strong performance but missed high-risk cases that MOCRA detected [ 13 , 14 ].
IOTA Simple Rules demonstrated high specificity but poor sensitivity in this dataset [ 11 , 12 ].
Biomarker-driven indices (RMI2, ROMA) maintained high specificity but underperformed in sensitivity [ 15 , 16 , 18 , 20 – 22 ].
Symptom-driven pathways (NICE NG12, HSE) showed limited sensitivity [ 9 , 10 ].
These results illustrate that the conservative high-risk override in MOCRA, combined with integration of multi-modal inputs, allows the system to avoid false negatives in this small cohort while keeping the number of false positives relatively low (2/68).
To support interpretation, we generated comparative plots:
Bar chart of algorithmic performance metrics – Displays accuracy, sensitivity, specificity, and AUC, highlighting MOCRA’s superior balance (Fig. 7 ). Fig. 7 Bar chart of algorithmic performance metrics
Bar chart of algorithmic performance metrics – Displays accuracy, sensitivity, specificity, and AUC, highlighting MOCRA’s superior balance (Fig. 7 ). Fig. 7 Bar chart of algorithmic performance metrics
Bar chart of algorithmic performance metrics
ROC curves – Overlay of ROC curves across algorithms, with MOCRA demonstrating the best overall discrimination (Fig. 8 ). Fig. 8 ROC curves for MOCRA and comparator algorithms
ROC curves – Overlay of ROC curves across algorithms, with MOCRA demonstrating the best overall discrimination (Fig. 8 ). Fig. 8 ROC curves for MOCRA and comparator algorithms
ROC curves for MOCRA and comparator algorithms
To complement diagnostic accuracy testing, we conducted structured functional testing and usability evaluation of the CDSS with 15 gynecologic oncologists from multiple medical universities. Two complementary approaches were applied: Functional evaluation – to verify whether the implemented algorithms and systemmodules performed as expected. Usability evaluation – to assess user satisfaction and interface quality using the Post-Study System Usability Questionnaire (PSSUQ) [ 25 ].
Functional evaluation – to verify whether the implemented algorithms and systemmodules performed as expected.
Usability evaluation – to assess user satisfaction and interface quality using the Post-Study System Usability Questionnaire (PSSUQ) [ 25 ].
A. Functional evaluation
Functional accuracy was tested by systematically verifying algorithmic outputs against known test cases and physician-confirmed reference diagnoses (Table 3 ). Across all test scenarios, system outputs were correct, consistent, and stable. No major discrepancies were observed between the expected results and the CDSS outputs. Table 3 Functional evaluation outcomes ( n = 15 experts) Item Mean ± SD Min Max Correctness of algorithm outputs 4.8 ± 0.4 4 5 Consistency across cases 4.7 ± 0.5 4 5 Stability under repeated use 4.9 ± 0.3 4 5 Overall functional reliability 4.8 ± 0.4 4 5
Functional evaluation outcomes ( n = 15 experts)
Scores were given on a 1–5 Likert scale (1 = poor, 5 = excellent). Overall, functional reliability was rated very highly (mean score 4.8), confirming that the CDSS produced deterministic and reproducible results across patient cases.
B. Usability evaluation.
Usability was assessed using the PSSUQ [ 25 ], a validated questionnaire widely applied to measure user perceptions of effectiveness, efficiency, and satisfaction in health IT systems (Table 4 ). Fifteen gynecologic oncologists completed the evaluation after using the CDSS for patient case review. Table 4 PSSUQ usability evaluation ( n = 15 experts) Domain Mean ± SD Min Max System usefulness 4.6 ± 0.5 4 5 Information quality 4.5 ± 0.6 3 5 Interface quality 4.7 ± 0.4 4 5 Overall satisfaction (PSSUQ) 4.6 ± 0.5 4 5
PSSUQ usability evaluation ( n = 15 experts)
All items were scored on a 1–5 Likert scale. The CDSS received uniformly positive ratings, with high marks for usefulness and interface quality, indicating that the platform was intuitive and aligned with clinical workflows.
Qualitative feedback emphasized the value of side-by-side algorithm outputs, transparency of the MOCRA consensus decision, and ease of navigation. Minor suggestions included improving visualization of ROC metrics and expanding export options for integration into electronic medical records.
Together, the functional testing confirmed that the CDSS operated correctly and consistently, while the usability evaluation demonstrated high clinician satisfaction and perceived utility in real-world diagnostic scenarios. These findings support the system’s potential for clinical adoption and highlight the importance of combining technical accuracy with human-centered design in digital health tools.
Collectively, the results of this study demonstrate that the proposed MOCRA clinical decision support system (CDSS) substantially outperformed existing single-algorithm tools in this pilot cohort. In the diagnostic accuracy evaluation of 68 real patient cases, MOCRA achieved 100% sensitivity, 96.7% specificity, and an overall accuracy of 97.1%, with an AUC of 0.984.
These outcomes are consistent with the system’s deliberately conservative rule set, which prioritizes minimizing false negatives in ovarian cancer triage.
By comparison, symptom-based pathways such as NICE NG12 and HSE were sensitive to warning signs but insufficiently specific, while biomarker-driven indices (RMI2, ROMA) were prone to histology-related variability and limited sensitivity. Morphology-driven systems (IOTA, O-RADS v2022) performed strongly, but each had limitations in cases with incomplete imaging data or conflicting features. MOCRA successfully harmonized these heterogeneous tools into a unified framework, capturing high-risk signals from multiple modalities without overburdening clinicians with discordant outputs.
Beyond diagnostic performance, the human factors evaluation provided critical evidence of the system’s practicality for real-world use. Functional testing by 15 gynecologic oncologists across multiple universities confirmed that the algorithms operated deterministically, with no discrepancies between expected and observed results (mean functional reliability score 4.8/5). Usability evaluation using the PSSUQ yielded high ratings across domains of system usefulness, information quality, and interface design (overall satisfaction 4.6/5). Clinicians emphasized the value of the side-by-side display of individual algorithm outputs and the integrated MOCRA consensus classification, which supported rapid interpretation and reduced uncertainty in decision-making.
Together, these complementary strands of evidence confirm that MOCRA is not only diagnostically accurate, but also clinically usable and functionally reliable. Given the small, single-center sample size and low number of malignant cases, these results should be interpreted as preliminary and hypothesis-generating; larger multicenter studies are needed to confirm generalizability and to explore whether threshold adjustments could further optimize the balance between sensitivity and specificity. Nonetheless, MOCRA’s integrated, rule-based architecture and positive usability profile position it as a practical, deployable solution for supporting early ovarian cancer detection and more consistent adnexal mass triage.
Conclusion
This study demonstrates that MOCRA, a deterministic, rule-based clinical decision support system, provides a clinically robust and interpretable solution for early ovarian cancer detection. By integrating six widely validated algorithms—NICE NG12, HSE, IOTA Simple Rules, O-RADS v2022, RMI2, and ROMA—into a unified four-level risk stratification framework, MOCRA harmonizes their complementary strengths and mitigates their individual weaknesses. In our evaluation of 68 real-world cases (from 69 enrolled), MOCRA achieved 100% sensitivity and 96.7% specificity, with an AUC of 0.984, thereby minimizing the risk of missed diagnoses while maintaining a low false-positive burden. Beyond accuracy, the system was functionally reliable and well-received by gynecologic oncologists, who rated its usability highly on standardized measures (PSSUQ). Participants emphasized that the four-tier risk output and side-by-side presentation of algorithmic results improved clarity, reduced uncertainty, and aligned with real-world clinical reasoning. This dual validation—technical and human factors—underscores MOCRA’s readiness for clinical deployment.
However, the exceptionally high diagnostic performance observed—particularly the absence of false negatives—should be interpreted with caution given the small number of malignant cases. These results represent promising proof-of-concept findings rather than definitive evidence of superiority.
While these results are encouraging, the study was limited by its modest, single-center sample size. Larger multicenter validations and prospective implementation studies are needed to confirm generalizability, explore histology-specific performance, and evaluate real-world outcomes such as referral patterns and diagnostic timeliness. Future studies should also incorporate longer follow-up to better evaluate the system’s performance in detecting both prevalent and incident malignancy, especially in conservatively managed patients. Integration with electronic health record systems and broader testing across diverse healthcare settings will further establish scalability.
In summary, MOCRA represents a novel and practical advance in ovarian cancer risk stratification. By unifying multiple evidence-based algorithms into a transparent, clinician-friendly interface, it has the potential to improve diagnostic consistency, facilitate earlier oncology referral, and ultimately contribute to better patient outcomes in ovarian cancer care.
Nevertheless, broader validation is essential before recommending its routine adoption, and MOCRA should currently be viewed as a promising, interpretable decision-support tool warranting further prospective study.
Discussion
In this single-center diagnostic accuracy and usability study, we encoded six validated ovarian-cancer triage tools—NICE NG12, HSE pathways, IOTA Simple Rules, O-RADS v2022, RMI2, and ROMA—into a deterministic, rule-based clinical decision support system (CDSS) and then integrated their signals through a transparent meta-logic (MOCRA). Using 68 consecutively entered real-world patient records (from 69 enrolled), MOCRA achieved the best overall balance of performance: accuracy 97.1%, sensitivity 100.0%, specificity 96.7%, F1-score 87.5%, and AUC 0.984. Compared with its components, MOCRA reduced false negatives (FN = 0 in our sample) while preserving high specificity. O-RADS alone performed strongly (AUC 0.939) but missed some high-risk cases that MOCRA escalated through concordant evidence from symptoms, morphology, and biomarkers [ 13 , 14 ]. Biomarker-centric indices (RMI2, ROMA) and symptom-driven pathways (NICE, HSE) underperformed on sensitivity in isolation [ 9 , 10 , 15 , 16 , 18 , 20 – 23 ]. Beyond accuracy, 15 gynecologic oncologists completed structured functional tasks and a standardized usability survey; both exercises indicated high perceived usefulness, easy learnability, and consistent task completion, with qualitative feedback emphasizing the value of side-by-side algorithm outputs and the four-level, clinically familiar risk display [ 28 ].
Given the small number of malignant cases, these performance estimates—particularly the 100% sensitivity and absence of false negatives—should be interpreted as encouraging but preliminary signals rather than definitive proof of superiority.
Our results align with and extend several strands of prior evidence: Symptom-based pathways. NICE NG12 and HSE pathways were developed to reduce diagnostic delays by flagging persistent, non-specific symptoms (e.g., bloating, early satiety, urinary urgency) for expedited testing and referral [ 7 – 10 ]. Consistent with previous studies, we observed good specificity but modest sensitivity when symptoms were used in isolation—highlighting the low signal-to-noise ratio of early ovarian cancer in both primary and secondary care. MOCRA addresses this limitation by not treating symptoms as the sole trigger; instead, risk is elevated only when symptoms are supported by ultrasound morphology or biomarkers. Ultrasound morphology (IOTA/O-RADS). IOTA Simple Rules and O-RADS have improved standardization and inter-observer agreement in adnexal mass assessment [ 11 – 14 , 19 ]. Reported performance in expert hands is high, but sensitivity can vary with lesion type and operator experience. Our O-RADS results (perfect specificity, strong AUC) mirror this strength, yet MOCRA rescued a small subset of true positives that O-RADS alone classified short of “high risk.” This is coherent with studies noting that some early or atypical malignant phenotypes can be equivocal on morphology alone, and that adjunctive clinical or biomarker data may be helpful [ 12 – 14 , 17 , 19 ]. Biomarker indices (RMI2/ROMA). The literature documents that CA-125 and HE4-based approaches improve preoperative triage but are susceptible to histologic variation (borderline, mucinous) and non-malignant elevations (endometriosis, inflammation) [ 15 – 18 , 20 – 23 ]. Our findings replicate this pattern: ROMA andRMI2 maintained excellent specificity but showed limited sensitivity as stand-alone gatekeepers. Within MOCRA, biomarker elevations act as amplifiers (not arbiters),shifting equivocal cases upward only when aligned with symptoms or morphology—an integration strategy that reduces both false reassurance and overt triage. Deterministic integration vs. single-model tools. Whereas many clinical tools are single-algorithm or black-box models, we implemented a transparent, rule-driven integration that clinicians can audit at each step. The literature cautions that inconsistent application of disparate tools can yield management variability; structured aggregation can improve reproducibility and trust at the point of care [ 6 , 11 – 14 , 17 – 23 ]. Our usability data support this: oncologists valued side-by-side outputs and the explicit four-tier consensus risk, which maps directly to action(reassure, repeat imaging, specialist imaging, or oncology referral). Multi-tool and hybrid approaches. Recent studies have also explored combinations of tools, such as integrating IOTA, ADNEX, ROMA and subjective assessment [ 19 ],comparing ROMA, RMI and expert ultrasound in IOTA-inconclusive masses [ 20 ], or combining IOTA and ADNEX scoring [ 21 ], as well as comparing RMI1–4, HE4and ROMA [ 22 ]. These works support the principle that multi-modal assessment can improve discrimination, but they primarily report statistical performance of model combinations and do not implement a unified, operational CDSS. In contrast, MOCRA encodes multiple algorithms, guideline triggers and imaging descriptors into a single, deterministic decision engine that can be used directly in clinical workflows.
Symptom-based pathways. NICE NG12 and HSE pathways were developed to reduce diagnostic delays by flagging persistent, non-specific symptoms (e.g., bloating, early satiety, urinary urgency) for expedited testing and referral [ 7 – 10 ]. Consistent with previous studies, we observed good specificity but modest sensitivity when symptoms were used in isolation—highlighting the low signal-to-noise ratio of early ovarian cancer in both primary and secondary care. MOCRA addresses this limitation by not treating symptoms as the sole trigger; instead, risk is elevated only when symptoms are supported by ultrasound morphology or biomarkers.
Ultrasound morphology (IOTA/O-RADS). IOTA Simple Rules and O-RADS have improved standardization and inter-observer agreement in adnexal mass assessment [ 11 – 14 , 19 ]. Reported performance in expert hands is high, but sensitivity can vary with lesion type and operator experience. Our O-RADS results (perfect specificity, strong AUC) mirror this strength, yet MOCRA rescued a small subset of true positives that O-RADS alone classified short of “high risk.” This is coherent with studies noting that some early or atypical malignant phenotypes can be equivocal on morphology alone, and that adjunctive clinical or biomarker data may be helpful [ 12 – 14 , 17 , 19 ].
Biomarker indices (RMI2/ROMA). The literature documents that CA-125 and HE4-based approaches improve preoperative triage but are susceptible to histologic variation (borderline, mucinous) and non-malignant elevations (endometriosis, inflammation) [ 15 – 18 , 20 – 23 ]. Our findings replicate this pattern: ROMA andRMI2 maintained excellent specificity but showed limited sensitivity as stand-alone gatekeepers. Within MOCRA, biomarker elevations act as amplifiers (not arbiters),shifting equivocal cases upward only when aligned with symptoms or morphology—an integration strategy that reduces both false reassurance and overt triage.
Deterministic integration vs. single-model tools. Whereas many clinical tools are single-algorithm or black-box models, we implemented a transparent, rule-driven integration that clinicians can audit at each step. The literature cautions that inconsistent application of disparate tools can yield management variability; structured aggregation can improve reproducibility and trust at the point of care [ 6 , 11 – 14 , 17 – 23 ]. Our usability data support this: oncologists valued side-by-side outputs and the explicit four-tier consensus risk, which maps directly to action(reassure, repeat imaging, specialist imaging, or oncology referral).
Multi-tool and hybrid approaches. Recent studies have also explored combinations of tools, such as integrating IOTA, ADNEX, ROMA and subjective assessment [ 19 ],comparing ROMA, RMI and expert ultrasound in IOTA-inconclusive masses [ 20 ], or combining IOTA and ADNEX scoring [ 21 ], as well as comparing RMI1–4, HE4and ROMA [ 22 ]. These works support the principle that multi-modal assessment can improve discrimination, but they primarily report statistical performance of model combinations and do not implement a unified, operational CDSS. In contrast, MOCRA encodes multiple algorithms, guideline triggers and imaging descriptors into a single, deterministic decision engine that can be used directly in clinical workflows.
Taken together, the study corroborates prior strengths of O-RADS and IOTA for morphology; confirms the specificity–sensitivity trade-offs of RMI2/ROMA; and demonstrates that a deterministic integrator like MOCRA can outperform any single input by cross-validating signals and minimizing single-source failure modes.
However, the extent of this advantage still needs to be tested in larger, independent cohorts, particularly given the very small number of cancers in our dataset.
This work offers four distinctive contributions:
Faithful, multi-standard implementation. We encoded six independently validated pathways as they are used in practice and preserved their native outputs before integration. Rather than proposing a new, untested index, MOCRA harmonizes what clinicians already trust into a four-level risk common language. Deterministic, auditable meta-logic. The MOCRA rules are fully interpretable—each escalation or de-escalation is attributable to specific evidence (e.g., ascites + irregular solid components + raised CA-125). This transparency aligns with the clinical need to explain triage decisions to patients and multidisciplinary teams, and it eases governance and quality assurance. Clinically aligned risk display. Our CDSS outputs low, intermediate, high, indeterminate—the exact semantic categories clinicians use to choose between reassurance, short-interval follow-up, advanced imaging, or surgical/oncology referral. This reduces “translation friction” that can occur when algorithms speak different risk dialects. Dual evaluation (accuracy + usability/functional). Beyond diagnostic metrics, we evaluated task performance and perceived usability with practicing gynecologic oncologists. High task completion and favorable satisfaction scores suggest the system is not only accurate but also deployable in clinical workflows—a gap often left unaddressed in purely algorithmic reports.
Faithful, multi-standard implementation. We encoded six independently validated pathways as they are used in practice and preserved their native outputs before integration. Rather than proposing a new, untested index, MOCRA harmonizes what clinicians already trust into a four-level risk common language.
Deterministic, auditable meta-logic. The MOCRA rules are fully interpretable—each escalation or de-escalation is attributable to specific evidence (e.g., ascites + irregular solid components + raised CA-125). This transparency aligns with the clinical need to explain triage decisions to patients and multidisciplinary teams, and it eases governance and quality assurance.
Clinically aligned risk display. Our CDSS outputs low, intermediate, high, indeterminate—the exact semantic categories clinicians use to choose between reassurance, short-interval follow-up, advanced imaging, or surgical/oncology referral. This reduces “translation friction” that can occur when algorithms speak different risk dialects.
Dual evaluation (accuracy + usability/functional). Beyond diagnostic metrics, we evaluated task performance and perceived usability with practicing gynecologic oncologists. High task completion and favorable satisfaction scores suggest the system is not only accurate but also deployable in clinical workflows—a gap often left unaddressed in purely algorithmic reports.
In terms of novelty, MOCRA differs from prior multi-tool studies [ 19 – 22 ] by functioning as an implementation-ready CDSS rather than as a comparative statistical exercise. To our knowledge, no previous work has integrated symptom-based pathways (NICE NG12, HSE), ultrasound systems (IOTA, O-RADS v2022), and biomarker indices (RMI2, ROMA) into a single, rule-based engine that outputs a unified, clinically actionable risk category.
In settings where ovarian cancer often presents late, missing even a single high-risk patient carries outsized consequences for survival. MOCRA’s zero false negatives in our sample—achieved without sacrificing specificity—illustrates how integrated, multi-signal triage can support earlier referral to gynecologic oncology, align imaging intensity with risk, and reduce unwarranted escalation. The four-level schema facilitates shared decision-making, enabling clinicians to communicate risk in plain terms and to justify next steps anchored in converging evidence. Because MOCRA is deterministic and data-minimal (symptoms, basic ultrasound descriptors, CA-125/HE4), it is also feasible for resource-variable environments.
At the same time, the conservative rule that any single high-risk signal overrides to a high-risk MOCRA classification inevitably increases false-positive referrals to some degree; the present results suggest that this trade-off was modest in our cohort, but its acceptability will need to be judged in larger populations and different health-system contexts.
The usability/functional evaluation with 15 gynecologic oncologists showed high task success across representative workflows (patient entry, algorithm review, risk confirmation, and recommendation export) and favorable PSSUQ ratings indicative of strong perceived usefulness, information quality, and interface quality [ 28 ]. Free-text comments converged on three themes: (i) the side-by-side algorithm panel builds confidence by revealing agreement/disagreement patterns; (ii) the sticky results pane and compact forms reduce navigation burden; and (iii) the four-tier risk label provides immediate actionability. Suggested refinements—clearer inline tooltips for O-RADS lexicon items, optional hover definitions for IOTA “irregular solid component,” and a printable one-page summary—are practical and have been incorporated into our backlog.
These findings suggest that, if externally validated, MOCRA could be integrated into routine workflows with relatively low training burden.
This study has several limitations. First, although our dataset was retrospectively analyzed and derived from a single center, it reflects a moderate sample size ( n = 68), which limits the statistical reliability of the diagnostic performance estimates, particularly for sensitivity. As a result, the findings should be interpreted with caution, as they may vary when applied to larger and more diverse cohorts. Larger, multicenter prospective studies are necessary to confirm the generalizability of these results across different populations and healthcare settings.
The small number of malignant cases also means that metrics such as 100% sensitivity and absence of false negatives are numerically fragile: a single additional missed cancer in a larger cohort would substantially change these estimates.
Second, ultrasound descriptors were extracted from routine clinical documentation, which could introduce operator variability. While the structured use of established lexicons, such as O-RADS and IOTA, aimed to minimize this, some inconsistencies in reporting may still exist. Third, the four-level risk categorization employed in MOCRA complicates direct comparison with binary classification models used in some studies. While this approach better mirrors clinical practice, it presents challenges when attempting to compare across different diagnostic frameworks.
Fourth, our reference standard combined histopathology for surgical cases with at least six months of clinical and imaging follow-up for conservatively managed patients. This strategy is appropriate for confirming prevalent malignancy, but it cannot exclude the possibility that some patients classified as benign will develop ovarian cancer later; therefore, our estimates do not address long-term incidence.
Finally, although the deterministic logic of MOCRA enhances interpretability, future work should explore the potential benefits of integrating machine learning techniques under strict model-governance guidelines to preserve the current level of transparency and clinical auditability.
Any such approaches will also need careful calibration to avoid eroding the deliberately conservative bias against false negatives that characterizes the current rule-based design.
To enhance the robustness and generalizability of MOCRA, future research should focus on multicenter, prospective studies involving larger, more diverse cohorts. These studies will allow for a more comprehensive evaluation of MOCRA’s diagnostic performance across various ovarian cancer subtypes, patient demographics, and healthcare settings. In addition, expanding MOCRA’s applicability to primary care and general gynecology settings is essential to assess its effectiveness in environments where sonographic expertise may vary.
We also plan to incorporate structured reporting prompts to standardize ultrasound inputs further, thereby reducing variability in data entry. Additionally, we aim to integrate calibrated confidence indicators alongside each algorithm’s output to provide clinicians with more nuanced risk assessments. Another goal is to develop a patient-facing summary to facilitate shared decision-making, ensuring that patients are well-informed and involved in their care decisions. Post-implementation evaluations will be extended to include time-to-referral and treatment interval endpoints, which are critical in assessing the real-world impact of MOCRA. Lastly, any exploration of statistical augmentation through machine learning will adhere to a governance framework to maintain interpretability and ensure clinical accountability.
Future comparative work could also evaluate alternative integration strategies, including weighted or probabilistic combinations of tools, to determine whether a less conservative logic might preserve high sensitivity while further reducing false positives.
MOCRA—a deterministic integration of NICE NG12, HSE, IOTA, O-RADS v2022, RMI2, and ROMA—outperformed each individual algorithm on our 68-patient dataset (from 69 enrolled), delivering 97.1% accuracy, 100% sensitivity, 96.7% specificity, and AUC 0.984, with no false negatives. O-RADS alone was strong but missed cases rescued by MOCRA’s cross-validation of symptoms, morphology, and biomarkers [ 13 – 16 , 18 – 23 ]. RMI2/ROMA (biomarker-led) and NICE/HSE (symptom-led) underperformed on sensitivity when used alone [ 9 , 10 , 15 , 16 , 20 – 23 ], mirroring known limitations in the literature. Beyond accuracy, 15 gynecologic oncologists rated the system highly for usefulness and learnability, and completed core tasks reliably, citing the side-by-side algorithm display and four-level risk label as the most actionable features [ 28 ]. By harmonizing established tools into a single, auditable framework, MOCRA offers a practical path to more consistent, earlier triage for suspected ovarian cancer—without sacrificing the transparency clinicians require for accountable care.
Nevertheless, these findings should be regarded as proof-of-concept results from a small, single-center pilot rather than definitive evidence. External validation, prospective impact studies, and longer follow-up are required before MOCRA can be recommended for widespread implementation.
Introduction
Ovarian cancer remains one of the deadliest gynecologic malignancies, with over 300,000 new cases and more than 200,000 deaths annually [ 1 ]. Despite advancements in surgery and systemic therapy, most patients are diagnosed at advanced FIGO stages (III–IV), where prognosis is poor [ 2 , 3 ]. Early detection significantly improves survival, with five-year survival rates exceeding 90% for stage I disease [ 4 ]. However, population-level screening remains ineffective, and early symptoms such as bloating, abdominal discomfort, and early satiety are often mistaken for benign conditions [ 5 – 8 ].
Unlike cervical cancer—where cytology and HPV testing have substantially reduced incidence and mortality—ovarian cancer lacks a validated screening method [ 6 ]. Its subtle, nonspecific symptoms frequently overlap with benign gastrointestinal or urological disorders, leading to diagnostic delays that reduce survival [ 7 , 8 ]. National guidelines such as the UK’s NICE NG12 and Ireland’s HSE pathways aim to address this gap by emphasizing persistent, high-risk symptoms and promoting expedited referral [ 9 , 10 ]. While these pathways improve symptom-based triage, their sensitivity remains limited and they do not offer detailed risk stratification once an adnexal mass is identified.
Several structured algorithms have been developed to assist clinicians in triaging adnexal masses:
IOTA Simple Rules, introduced by the International Ovarian Tumor Analysis group, provide reproducible ultrasound descriptors that improve diagnostic consistency [ 11 ]. Their performance is high in expert hands but declines when sonographic expertise is limited [ 12 ]. O-RADS (Ovarian-Adnexal Reporting and Data System), updated between 2020 and 2022, offers a structured lexicon and a five-tier risk stratification framework [ 13 ]. It has demonstrated strong specificity but variable sensitivity depending on lesion type and clinical setting [ 14 ]. RMI2 (Risk of Malignancy Index 2) combines CA-125, menopausal status, and ultrasound features [ 15 ], while ROMA (Risk of Ovarian Malignancy Algorithm) integrates CA-125, HE4, and menopausal status [ 16 ]. Both are widely used, but their sensitivity is inconsistent, particularly for mucinous or borderline tumors, and benign gynecologic or inflammatory conditions can elevate biomarkers, leading to false positives [ 17 , 18 ].
IOTA Simple Rules, introduced by the International Ovarian Tumor Analysis group, provide reproducible ultrasound descriptors that improve diagnostic consistency [ 11 ]. Their performance is high in expert hands but declines when sonographic expertise is limited [ 12 ].
O-RADS (Ovarian-Adnexal Reporting and Data System), updated between 2020 and 2022, offers a structured lexicon and a five-tier risk stratification framework [ 13 ]. It has demonstrated strong specificity but variable sensitivity depending on lesion type and clinical setting [ 14 ].
RMI2 (Risk of Malignancy Index 2) combines CA-125, menopausal status, and ultrasound features [ 15 ], while ROMA (Risk of Ovarian Malignancy Algorithm) integrates CA-125, HE4, and menopausal status [ 16 ]. Both are widely used, but their sensitivity is inconsistent, particularly for mucinous or borderline tumors, and benign gynecologic or inflammatory conditions can elevate biomarkers, leading to false positives [ 17 , 18 ].
Collectively, these tools represent important progress. Yet, when applied in isolation, they yield fragmented outputs, create variability in interpretation, and do not provide a unified framework for clinicians making time-sensitive decisions.
Recent research has also evaluated combinations or comparisons of these algorithms. Studies integrating IOTA Simple Rules, ADNEX modeling, subjective assessment, CA125/HE4, and ROMA have demonstrated complementary diagnostic strengths [ 19 ]. Comparative evaluations of ROMA, RMI, and expert ultrasound in IOTA-inconclusive lesions further highlight the value of multi-tool assessment [ 20 ]. Additional investigations combining IOTA with ADNEX scoring [ 21 ] and comparisons involving RMI1–4, HE4, and ROMA [ 22 ] underscore growing interest in multimodal approaches. However, these investigations are analytic rather than integrative and none translate multiple validated tools into a single clinical decision support framework.
To address such fragmentation, clinical decision support systems (CDSS) have emerged as a way to synthesize heterogeneous clinical data into structured, interpretable outputs [ 23 ]. CDSS tools can reduce variability, support earlier recognition, and promote adherence to guideline-based management. However, most published ovarian cancer diagnostic algorithms remain stand-alone instruments and have not been embedded into a comprehensive CDSS framework.
Our study builds on this gap by designing a CDSS grounded in object-oriented programming (OOP) principles. Using Unified Modeling Language (UML) diagrams, we modeled user interactions, data entry workflows, and class architecture to ensure modularity, scalability, and transparency [ 24 ]. This structured software engineering approach allows the system to be interpretable (rules are explicit), extendable (new algorithms can be added), and clinician-friendly (outputs are presented in clinically meaningful risk categories).
Recognizing these limitations, we developed MOCRA (Multivariate Ovarian Cancer Risk Assessment), a deterministic, rule-based clinical decision support system (CDSS). MOCRA harmonizes six validated algorithms—NICE NG12, HSE, IOTA Simple Rules, O-RADS v2022, RMI2, and ROMA—into a single four-level risk stratification (low, intermediate, high, indeterminate). Instead of replacing existing tools, it integrates their outputs through explicit decision rules designed to maximize sensitivity while maintaining specificity.
This operational integration—synthesizing guideline-based symptom triggers, imaging morphology, and biomarker algorithms into one actionable output—distinguishes MOCRA from previous multi-tool comparison studies and represents its principal conceptual advancement.
For CDSS tools to influence real-world practice, accuracy alone is insufficient. Systems must also be usable, acceptable, and efficient for clinicians facing high patient volumes. However, few ovarian cancer diagnostic studies have incorporated formal usability testing alongside accuracy evaluation. To address this gap, we engaged gynecologic oncologists in structured usability testing using the Post-Study System Usability Questionnaire (PSSUQ) [ 25 ], ensuring that the tool was not only accurate but also practical in a clinical workflow.
This study was conducted in accordance with the STARD 2015 guidelines for diagnostic accuracy studies [ 26 ] to ensure rigorous and transparent reporting. Our objectives were to:
Develop a rule-based CDSS that encodes and integrates six validated ovarian cancer risk algorithms. Evaluate its diagnostic performance against individual algorithms using real patient data. Assess its functional reliability and usability among gynecologic oncologists.
Develop a rule-based CDSS that encodes and integrates six validated ovarian cancer risk algorithms.
Evaluate its diagnostic performance against individual algorithms using real patient data.
Assess its functional reliability and usability among gynecologic oncologists.
By combining diagnostic accuracy assessment with structured usability evaluation, this study aims to demonstrate that MOCRA provides a clinically interpretable, practical, and trustworthy solution for earlier and more consistent ovarian cancer triage.
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.