A Systematic Review of Artificial Intelligence and Machine Learning Methods and Deployment Challenges for Public Health Predictions Using Electronic Health Records in Low- and Middle-Income Countries

doi:10.21203/rs.3.rs-9227225/v1

A Systematic Review of Artificial Intelligence and Machine Learning Methods and Deployment Challenges for Public Health Predictions Using Electronic Health Records in Low- and Middle-Income Countries

2026 · doi:10.21203/rs.3.rs-9227225/v1

preprint OA: closed

Full text JSON View at publisher

Full text 205,035 characters · extracted from preprint-html · click to expand

A Systematic Review of Artificial Intelligence and Machine Learning Methods and Deployment Challenges for Public Health Predictions Using Electronic Health Records in Low- and Middle-Income Countries | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Systematic Review A Systematic Review of Artificial Intelligence and Machine Learning Methods and Deployment Challenges for Public Health Predictions Using Electronic Health Records in Low- and Middle-Income Countries Joe Phiri, Aaron Zimba, Chiyaba Njovu, Mwansa Lumpa This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-9227225/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract The growing availability of electronic health records (EHRs) has accelerated the use of artificial intelligence (AI) and machine learning (ML) in public health. Yet, how well these methods work in resource-limited settings, particularly low- and middle-income countries (LMICs), remains poorly understood. This systematic review synthesizes evidence from 64 peer-reviewed studies (2018–2025) on ML-based predictive analytics using EHRs, with LMICs as the primary focus and high-income country studies as a methodological reference. Following PRISMA guidelines, searches across five major databases identified 64 eligible studies published between 2018 and 2025. Of these, 12 (18.8%) were conducted exclusively in LMIC settings, 44 (68.8%) in high-income countries, and 8 (12.5%) drew on mixed or multi-setting data. Retrospective designs predominated (81.3%). Disease progression (40.6%), mortality (34.4%), and treatment response (25.0%) were the most common prediction targets. Deep learning architectures were the most frequently applied category overall (39.1%, n = 25), driven by high-income country studies with access to large curated datasets; among LMIC-focused studies, traditional ML and ensemble methods were each applied in 33.3% of studies. Evaluation practices were dominated by discrimination metrics, particularly AUROC; external validation was reported in only 5 studies (7.8%) and calibration in only 4 (6.2%). Explainability assessment was reported in 1 of 12 LMIC studies (8.3%) compared with 16 of 44 high-income studies (36.4%), with governance and ethical considerations inconsistently documented in LMIC settings. This review highlights key methodological and contextual gaps and offers actionable guidance for developing interpretable, reliable, and context-appropriate AI tools for public health decision-making in resource-constrained settings. Artificial Intelligence and Machine Learning Health Economics & Outcomes Research Machine Learning Electronic Health Records Predictive Analytics Public Health Epidemiology Low- and Middle-Income Countries Figures Figure 1 Figure 2 Figure 3 Figure 4 1. Introduction The rapid digitization of healthcare has led to the widespread generation of electronic health records (EHRs), which are digital records of patient information including diagnoses, laboratory results, medications, and clinical outcomes. This data, collected routinely during patient care, has opened new opportunities for using AI and ML to improve public health decision-making, disease surveillance, and health system management [ 1 – 3 ]. Modern ML methods, especially ensemble learning and deep learning, can identify complex patterns in large health datasets, enabling predictions for outcomes such as early disease detection, mortality risk, hospital readmissions, and outbreak forecasting [ 4 – 7 ]. These capabilities are increasingly viewed as essential to data-driven public health and precision medicine [ 8 ]. Despite promising advances, translating ML into routine public health practice remains uneven. Real-world performance depends heavily on data quality, digital infrastructure, analytical capacity, and governance: factors that vary dramatically between settings, and are particularly challenging in LMICs [ 9 , 10 ]. Any meaningful synthesis of this literature must therefore account for implementation realities and resource constraints, not just algorithmic metrics [ 11 ]. 1.1 Machine Learning and EHRs in Public Health ML-driven analysis of EHR data can enhance disease surveillance, track population-level trends, and support timely policy responses [ 3 , 7 ]. Predictive models built from routine clinical data have been used to identify high-risk populations, power early warning systems for infectious diseases, and evaluate health interventions at scale [ 6 , 12 ]. In high-income settings, ML-enabled EHR analytics have demonstrated advantages over traditional statistical methods in specific epidemiological applications, particularly where large volumes of complex, time-series data are available [ 1 , 4 , 6 ]. However, research consistently shows that performance gains over simpler models are often modest, especially when data quality is poor or sample sizes are small [ 17 ]. For public health agencies and policymakers, accuracy alone is not enough. Models must also be interpretable, well-calibrated (meaning predicted probabilities match observed outcomes), and practically feasible [ 18 , 19 ]. These requirements are especially critical in LMICs, where poorly understood or mis-calibrated models can erode trust and deepen existing health inequities [ 20 ]. 1.2 Health Systems and EHR Infrastructure in LMICs Health systems in LMICs typically span community health posts, primary health centres, district hospitals, and national referral facilities, but frequently operate under severe resource constraints, with high patient-to-provider ratios, donor-dependent programme funding, and fragmented governance [ 9 , 21 , 22 ]. EHR infrastructure in LMICs is heterogeneous. Many primary health care facilities, particularly in sub-Saharan Africa, still use paper or hybrid records [ 9 , 64 ]. Where electronic systems exist, common platforms include OpenMRS (widely used in East and Southern Africa), DHIS2 (for aggregate national reporting), and SmartCare (Zambia's national HIV EHR platform) [ 9 , 64 ]. These are designed primarily for programme reporting rather than research-grade longitudinal data, resulting in variable data quality. Key data quality challenges in LMIC EHRs include: systematic missingness (patients seeking care at multiple facilities with no record linkage); inconsistent coding of diagnoses and laboratory results; limited longitudinal depth due to patient mobility; and incomplete digitisation of historical records [ 35 , 36 ]. Infrastructure barriers, including unreliable electricity, limited internet, and insufficient computational resources, further restrict the deployment of data-intensive models such as deep learning [ 21 , 23 ]. By contrast, high-income countries benefit from large, integrated EHR systems (such as Epic and Cerner) with longitudinal, multi-institutional, well-structured datasets [ 1 , 4 , 15 ]. Methods developed in these contexts do not transfer directly to LMIC settings [ 64 , 66 ]. 1.3 The Need for Context-Appropriate ML Approaches Recent research has explored ML optimisation strategies better suited to resource-constrained settings: lightweight model architectures with reduced computational demands; preprocessing methods designed to handle systematic missingness; and hybrid approaches that balance performance with interpretability [ 17 , 26 ]. Explainable AI (XAI) tools, such as feature attribution methods, are gaining attention as ways to build clinician and policymaker trust in model outputs [ 28 ]. Federated learning, which enables model training across institutions without sharing raw patient data, offers potential for collaborative development while preserving data privacy and national sovereignty [ 26 , 29 ]. Synthetic data generation and transfer learning have also been proposed to address data scarcity in low-resource settings [ 30 ]. Despite these advances, the literature remains fragmented. Many studies emphasise algorithmic performance without adequately addressing deployment feasibility, evaluation rigour, or public health relevance. Technology-focused reviews (Shickel et al. [ 4 ]; Rajkomar et al. [ 1 ]; Christodoulou et al. [ 17 ]) offer limited guidance on adapting methods to resource-limited health systems. LMIC-focused studies (Mutai et al. [ 13 ]; Musukwa et al. [ 14 ]) demonstrate feasibility but consistently lack external validation, calibration, and explainability evaluation. Policy frameworks from WHO [ 11 ] and Vayena et al. [ 25 ] provide normative guidance but do not operationalise this within applied ML workflows. 1.4 Scope and Objectives Zambia and comparable LMICs represent a compelling context for examining AI-driven EHR analytics. While investments in digital health infrastructure are expanding, challenges related to data quality, interoperability, and analytic capacity persist [ 9 , 22 ]. This systematic review addresses these gaps by synthesising peer-reviewed evidence on ML-based predictive analytics using EHR data published between 2018 and 2025, with LMIC settings as the primary focus and high-income country studies included as a methodological benchmark. The primary objective is to critically examine how ML techniques are applied, adapted, and evaluated. Three research questions guide the review: RQ1 Which ML algorithms are most frequently applied in LMIC settings, and how do they compare in predictive accuracy, interpretability, and computational feasibility? RQ2 Which preprocessing and feature engineering strategies best address data quality challenges in resource-constrained EHR environments? RQ3 How are ML-based predictions evaluated and validated in low-resource settings, particularly regarding calibration and explainability for public health decision-making? The review makes three key contributions: A context-aware synthesis grounded in public health and LMIC health system realities; A focus on methodological rigour beyond predictive accuracy, foregrounding calibration, external validation, and explainability; Identification of practical research gaps, including limited external validation, insufficient governance and fairness attention, and underrepresentation of non-communicable disease applications in LMICs. 2. Algorithmic Foundations This section provides a concise overview of the main ML method categories used in EHR-based public health prediction. A deeper technical understanding is not required to follow the paper's findings; this section is provided for context. ML methods learn patterns from data to make predictions. In EHR-based health research, the goal is to build a model that maps patient information (demographics, diagnoses, lab results, visit history) to a health outcome (e.g., mortality risk, disease progression). EHR datasets present specific challenges — high dimensionality (many variables per patient), missing data, imbalanced outcomes (e.g., rare events like death), irregular time intervals between visits, and data distributions that change over time [ 32 , 33 , 34 ]. Four model categories dominate this literature. Traditional ML methods, including logistic regression and decision trees, offer transparent outputs, low computational requirements, and reliable calibration even with sparse data, making them well-suited to LMIC environments [ 37 , 38 ]. Ensemble methods, such as random forests, gradient boosting and XGBoost, handle noisy or incomplete data robustly and generate feature importance scores that help clinicians understand what drives predictions [ 41 , 43 ]. Deep learning architectures, including feedforward networks, recurrent neural networks (LSTMs), and transformers, can capture complex temporal patterns in longitudinal EHR data but require large, well-curated datasets and substantial computational infrastructure, generally limiting their use to high-resource settings [ 44 – 46 ]. Hybrid and emerging approaches, including federated learning and AutoML, offer potential advantages in privacy preservation and automation [ 26 , 29 ]. Two issues are especially important for public health settings. First, class imbalance — outcomes such as mortality or treatment failure are uncommon, so models can appear accurate simply by predicting the majority class. Weighted loss functions address this [ 47 ]. Second, calibration - a model is well-calibrated when its predicted probabilities match real-world event rates, which is a critical property for triage and resource allocation decisions where the probability estimate itself (not just the ranking of patients) determines action. The Brier score and calibration plots are standard tools for assessing this [ 48 , 61 ]. Models with fewer parameters, stable optimisation, and reliable calibration are more likely to perform robustly under the constraints typical of LMIC health systems [ 37 , 49 ]. 3. Methodology This systematic review was conducted in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines to ensure transparency, reproducibility, and methodological rigour [ 50 ]. 3.1 Review Design A systematic literature review design was adopted to identify, screen, and synthesise empirical studies applying AI/ML to EHR or routinely collected health data for public health-oriented predictive analytics. Given substantial heterogeneity in prediction objectives, data sources, and modelling approaches, a thematic narrative synthesis was used rather than quantitative meta-analysis [ 51 ]. The review is guided by the three research questions (RQ1–RQ3) mapped in Table 1 . Table 1 Mapping of Research Questions to Review Processes RQ Review Process Components RQ1 Extraction of ML model types, algorithm categorization, comparison of reported performance metrics, assessment of interpretability and computational feasibility RQ2 Extraction of preprocessing methods, feature engineering strategies, handling of missingness, heterogeneity, and temporal data RQ3 Extraction of evaluation metrics, validation strategies, calibration reporting, explainability methods, and governance considerations 3.2 Information Sources and Search Strategy A comprehensive literature search was conducted across five major academic databases: IEEE Xplore, Scopus, Web of Science, ACM Digital Library, and PubMed. The search combined controlled vocabulary and free-text terms related to AI, ML, EHRs, predictive analytics, public health, epidemiology, and resource-limited settings. The search string was: ("machine learning" OR "artificial intelligence" OR "deep learning") AND ("electronic health record" OR "electronic medical record" OR "EHR") AND ("prediction" OR "risk prediction" OR "predictive analytics") AND ("low-resource" OR "LMIC" OR "developing country") AND ("public health" OR "epidemiology" OR "population health") The search was restricted to studies published between January 2018 and December 2025, capturing the rapid expansion of deep learning in epidemiology, the emergence of explainable AI frameworks, and innovations following the COVID-19 pandemic [ 52 ]. 3.3 Eligibility Criteria Study selection followed a three-stage PRISMA-consistent screening process [ 50 ]: Title and abstract screening to remove irrelevant or non-AI studies; Full-text eligibility assessment using predefined criteria; And final inclusion based on relevance to at least one research question. Criteria are summarised in Table 2 . High-income country studies were included as a methodological reference baseline to enable comparative evaluation of methods, evaluation practices, and deployment feasibility. Table 2 Inclusion and Exclusion Criteria Category Inclusion Criteria Exclusion Criteria Publication year 2018–2025 Pre-2018 Language English Non-English Study type Peer-reviewed articles, systematic reviews, validated conference papers Editorials, opinion pieces, blogs Focus AI/ML applied to public health or epidemiology Purely clinical AI without population focus Methods Empirical AI models, epidemiological simulations, decision-support systems Theoretical AI papers without health application Setting LMIC settings (primary focus); high-income settings included as comparative baseline Studies with no geographic or health system context reported 3.4 Study Selection and Data Extraction All retrieved records were imported into a reference management system and duplicates removed. The selection process is shown in Fig. 1 . Studies were excluded primarily due to irrelevance to predictive analytics, absence of EHR or routinely collected health data, or insufficient methodological detail. A structured data extraction spreadsheet captured — study setting and population; data sources and sample size; prediction task and outcome; ML algorithms; preprocessing and feature engineering methods; evaluation metrics and validation strategy; explainability, ethical, and governance considerations; and reported limitations [ 55 ]. 3.5 Quality and Risk-of-Bias Assessment A formal quality and risk-of-bias assessment was conducted using an adapted PROBAST framework extended to account for ML-specific considerations [ 51 , 52 ]. The adapted assessment covered four core domains: Data and participants: representativeness, completeness, and handling of missingness; Predictor and outcome definition: clarity, public health relevance, and temporal alignment; Model development and validation: overfitting control, internal and external validation; Analysis and reporting: calibration, explainability, and transparency. Additional ML-specific criteria included class imbalance handling, hyperparameter tuning, and reproducibility practices [ 53 ]. Each study was categorised as having low, moderate, or high risk of bias. 3.6 Synthesis Approach A thematic narrative synthesis was adopted. Studies were grouped and compared by — ML model category; public health application domain; data quality mitigation strategies; model evaluation and calibration practices; and deployment feasibility and governance considerations [ 51 , 59 ]. 4. Review of Previous Works 4.1 Overview of Research Focus Areas Across the reviewed literature, AI applications in public health cluster around four areas: Clinical and population risk prediction; Disease surveillance and epidemiological forecasting; Health system performance and resource optimisation; And ethical, explainability, and governance frameworks. The relative maturity of these domains varies considerably — risk prediction has attracted the most empirical work, while health system optimisation and governance remain comparatively nascent. Understanding which domains are well-served and which are neglected is essential for identifying where a synthesis focused on LMIC contexts adds the most value [ 60 – 66 ] 4.2 Key Studies Table 3 synthesises 15 key studies most relevant to this review's objectives, explicitly mapping what each study addresses and what it does not. This approach enables direct identification of methodological, contextual, and ethical gaps across the prior literature. Table 3 Key Studies on AI/ML for EHR-Based Predictive Analytics in Public Health Authors Area Year Pub. Type ML Prediction EHR Data LMIC XAI/Ethics Gap Identified Rajkomar et al. ML in clinical prediction 2019 Journal ✓ Deep learning ✓ Large-scale EHR ✗ High-income ✗ Limited Limited applicability to low-resource settings Beam & Kohane Big data & ML in healthcare 2018 Journal ✓ Conceptual ML ✓ EHR-centric ✗ No LMIC ✗ Ethics not explored No empirical LMIC validation Shickel et al. Deep learning for EHRs 2021 Journal ✓ Comprehensive review ✓ EHR time-series ✗ Limited LMIC ✗ Marginal XAI Focuses on algorithms, not deployment Christodoulou et al. ML vs regression 2019 Journal ✓ Comparative prediction ✓ Clinical datasets ✗ No LMIC ✓ Transparency No public health decision impact Hasson et al. EHR infrastructure in Africa 2020 Journal ✗ No ML modelling ✓ Health IS ✓ LMIC-focused ✗ Limited ethics No predictive analytics perspective Mutai et al. HIV outcome prediction 2020 Journal ✓ ML risk prediction ✓ Routine clinical ✓ LMIC (Kenya) ✗ XAI absent Limited validation & calibration Musukwa et al. ART outcomes 2021 Journal ✓ ML prediction ✓ National EHR ✓ LMIC (Zambia) ✗ Ethics not discussed No external validation Goldstein et al. Risk modeling with EHRs 2018 Journal ✓ Predictive modelling ✓ EHR data ✗ No LMIC ✓ Calibration Limited operational guidance Collins et al. Model validation 2016 Journal ✓ Prediction evaluation ✓ Health datasets ✗ No LMIC ✓ Transparency Not ML/EHR-specific Van Calster et al. Calibration 2019 Journal ✓ Risk prediction theory ✗ Not EHR-specific ✗ No LMIC ✓ Decision reliability Lacks applied case studies Lundberg & Lee Explainable AI (SHAP) 2018 Conference ✓ Model explainability ✗ General ML ✗ No LMIC ✓ Strong XAI focus No health system evaluation Vayena et al. Ethics of ML in health 2018 Journal ✗ No prediction ✗ Conceptual ✓ Global relevance ✓ Ethics-centered No technical implementation WHO AI governance for health 2021 Policy ✗ No modelling ✗ Conceptual ✓ LMIC-relevant ✓ Governance No empirical ML assessment Rasmy et al. Transformer models (Med-BERT) 2023 Journal ✓ Advanced deep learning ✓ Large EHRs ✗ High-resource ✗ Limited XAI High computational demands Rieke et al. Federated learning 2020 Journal ✓ Privacy-preserving ML ✓ Multi-institutional ✓ Potential LMIC ✓ Ethical data sharing Limited LMIC deployment evidence 4 .3 Clinical and Population Risk Prediction The most established body of prior work concerns risk prediction from EHR data, covering mortality, disease progression, and treatment outcomes. Large-scale studies from high-income settings have demonstrated the feasibility of applying deep learning to longitudinal EHRs [ 67 – 69 ], while comparative analyses consistently find that performance gains over simpler models are often modest [ 17 , 71 – 73 ]. Christodoulou et al. found no consistent advantage of ML over logistic regression across multiple prediction tasks [ 17 ] — a finding particularly relevant where data sparsity constrains complex architectures [ 64 , 74 ]. Studies specifically targeting LMIC settings, such as Mutai et al. [ 13 ] in Kenya and Musukwa et al. [ 14 ] in Zambia, demonstrate feasibility of ML prediction from routine clinical data but highlight persistent gaps in external validation, calibration assessment, and explainability evaluation that limit confidence in their generalisability. 4.4 Disease Surveillance and Forecasting A second research stream applies ML to disease surveillance and epidemiological forecasting, often combining EHRs with laboratory reports or syndromic surveillance data [ 80 – 82 ]. Following COVID-19, ML forecasting models have been proposed for infectious disease dynamics and outbreak detection [ 83 – 85 ]. While these show strong short-term predictive performance in reported studies, concerns persist in the prior literature about model stability, interpretability, and transferability across settings [ 81 , 86 ]. The gap between modelling performance and operational deployment is a recurring theme, with few prior studies linking forecasting outputs to explicit decision thresholds or public health protocols. 4.5 Health System Performance and Resource Optimization A smaller but growing body of work applies AI to health system performance monitoring, including patient flow prediction, service utilisation forecasting, and workforce planning [ 89 – 91 ]. These applications are directly relevant to public health operations but remain largely exploratory and predominantly confined to high-income settings [ 90 , 92 ]. LMIC-focused evidence is sparse, and few studies evaluate how predictive outputs actually influence resource allocation decisions [ 65 , 93 ]. 4.6 Data Quality, Preprocessing, and Feature Engineering Data quality challenges are a pervasive theme in the prior literature, particularly for LMIC contexts. Common issues documented include systematic missingness, inconsistent diagnosis coding, fragmented patient records across facilities, and limited longitudinal depth [ 64 , 94 ]. Prior reviews and empirical studies note that these challenges are qualitatively different from those in high-income settings, where large integrated EHR systems provide relatively complete and standardised data [ 1 , 4 ]. The prior literature identifies a recurring methodological weakness: preprocessing strategies — including imputation method choice, temporal aggregation windows, and feature selection criteria — are rarely evaluated through sensitivity analyses, making it difficult to attribute reported model performance to architecture rather than data preparation decisions [ 95 – 97 ]. 4.7 Evaluation Practices and Validation Strategies Evaluation practices in the prior AI/ML health literature have been widely critiqued for over-reliance on discrimination metrics, particularly AUROC, at the expense of calibration and external validation [ 17 , 61 , 63 ]. Methodological work by Van Calster et al. [ 61 ] and Collins et al. [ 63 ] establishes that calibration — the agreement between predicted probabilities and observed outcomes — is essential for triage and resource allocation applications yet is systematically underreported. Prior comparative reviews further highlight that external validation on independent datasets is rare, meaning that reported model performance frequently cannot be assumed to generalise to new settings or time periods [ 63 , 66 ]. These gaps in evaluation practice are especially consequential in LMIC contexts, where the absence of rigorous validation compounds existing uncertainties about data quality and representativeness. 4.8 Explainability, Ethics, and Governance The prior literature increasingly recognises that predictive performance alone is insufficient for responsible public health deployment of ML models. Explainability frameworks, including SHAP [ 71 ] and LIME [ 85 ], have been proposed as mechanisms to enhance clinician trust and support accountability, though prior reviews find that their application in health settings is largely post-hoc and rarely evaluated for practical utility within decision workflows [ 72 , 94 ]. Ethical and governance frameworks, including WHO guidance [ 11 ] and the normative work of Vayena et al. [ 25 ], identify algorithmic bias, data sovereignty, and transparency as priorities — particularly for LMIC contexts where regulatory oversight is limited. A consistent finding across this literature is that governance considerations are discussed at a conceptual level but rarely operationalised within applied modelling studies [ 66 , 94 ]. 4.9 Synthesis of Key Gaps Across the four research domains reviewed above, four persistent gaps in the prior literature motivate this systematic review. First, evaluation practices emphasise discrimination at the expense of calibration and external validation, limiting the operational credibility of published models [ 17 , 61 , 63 ]. Second, LMIC settings are underrepresented relative to their disease burden, and non-communicable disease applications are especially scarce despite the growing NCD burden in these settings [ 64 , 66 ]. Third, explainability and governance are treated as conceptual addenda rather than integral evaluation criteria, creating a gap between normative guidance and applied practice [ 25 , 94 ]. Fourth, preprocessing decisions are rarely subject to systematic sensitivity analysis, making it difficult to distinguish the contribution of algorithm choice from that of data preparation [ 95 – 97 ]. These gaps collectively define the analytical focus of Sections 5 and 6 . 5. Results 5.1 Distribution of Included Studies All 64 included studies were published between 2018 and 2025. Publication volume increased substantially from 2020 onwards, reflecting intensified research activity coinciding with expanded digital health adoption, greater availability of routine electronic health data, and heightened interest in AI-driven public health analytics during and after the COVID-19 pandemic. Figure 2 shows the annual distribution of included studies. 5.2 Study Characteristics General characteristics of the included studies are summarised in Table 4 . Most studies focused on predictive modelling, with mortality prediction, disease progression, and treatment outcomes being the most frequently addressed. Studies conducted in LMICs primarily addressed HIV/TB and infectious diseases (45.3%), non-communicable diseases (28.1%), maternal and child health (14.1%), and health system utilisation (12.5%). These prediction targets broadly align with the leading causes of premature death in sub-Saharan Africa, confirming the disease relevance of this literature [ 9 , 13 , 14 ]. However, non-communicable diseases, including cardiovascular disease, diabetes, and hypertension, remain underrepresented despite their rapidly growing burden in LMICs, a gap that future research must address. Table 4 Summary characteristics of the included studies Characteristic Category N Studies % Study setting Low- and middle-income countries (LMICs) 12 18.8 High-income countries (comparative baseline) 44 68.8 Mixed / multi-setting studies (Both) 8 12.5 Study design Retrospective observational 52 81.3 Prospective / near real-time 12 18.7 Primary data source Electronic health records (EHR) 41 64.1 Health information systems (HIS) 23 35.9 Application domain HIV / TB / infectious diseases 29 45.3 Non-communicable diseases 18 28.1 Maternal and child health 9 14.1 Health system utilization / outcomes 8 12.5 Prediction target Mortality 22 34.4 Disease progression / outcomes 26 40.6 Treatment response / adherence 16 25.0 Note: Health Information Systems (HIS): standardised institutional systems for ongoing collection, analysis, and use of health data generated through routine service delivery such as aggregate reporting platforms (e.g. DHIS2, DATIM). 5.3 Data Sources and Settings PubMed and Scopus contributed the largest proportion of records (Fig. 3 ), reflecting strong coverage of biomedical and interdisciplinary AI-health research. IEEE Xplore and the ACM Digital Library provided substantial representation of computer science and engineering-focused studies, while Web of Science contributed broad cross-disciplinary coverage. The combination of these databases ensured balanced representation of both methodological AI research and applied public health studies. LMIC-based studies predominantly relied on national or facility-level EHR systems and routinely collected clinical data, typically characterised by limited longitudinal depth and variable completeness. Data completeness in LMIC studies frequently fell below 80% for key clinical variables, contrasting with the more complete and standardised datasets common in high-income settings. 5.4 AI/ML Methods Applied The distribution of ML methods across the 64 included studies is presented in Table 5 . Deep learning architectures were the most commonly applied category overall (n = 25, 39.1%), driven predominantly by high-income country studies with access to large-scale curated EHR datasets; this category includes recurrent neural networks, LSTM models, and convolutional neural networks. Traditional ML methods — principally logistic regression and generalised linear models — were applied in 28.1% of studies (n = 18), valued for their interpretability, low computational requirements, and reliable calibration under data constraints. Ensemble methods, including random forests and gradient boosting variants (XGBoost, LightGBM), accounted for 20.3% (n = 13). Transformer-based models such as Med-BERT were represented in 6.2% of studies (n = 4), all conducted in high-income settings. Hybrid and emerging approaches, including federated learning and AutoML, appeared in 6.3% (n = 4). The pattern diverges markedly by setting — among LMIC-focused studies, traditional ML and ensemble methods were each the dominant approach (33.3% each), reflecting the practical alignment between model choice and the data quality, interpretability, and infrastructure constraints of resource-limited health systems. Table 5 Comparison of ML model categories reported across included studies. Model Category Representative Algorithms Primary Application Areas N Studies % Traditional ML Logistic Regression, Decision Trees, k-NN, Naïve Bayes Risk prediction, classification, baseline comparisons 18 28.1 Ensemble Learning Random Forest, Gradient Boosting, XGBoost, AdaBoost Disease prediction, feature importance, performance optimization 13 20.3 Deep Learning ANN, CNN, RNN, LSTM Time-series prediction, longitudinal EHR analysis 25 39.1 Transformer-based BERT, Med-BERT, GPT-based models Sequential EHR modelling, representation learning 4 6.2 Hybrid/ Emerging RF + XGBoost, CNN + LSTM, AutoML, Federated Learning Complex data fusion, privacy-preserving ML, model automation 4 6.3 5.5 Data Preprocessing and Feature Engineering Practices Data preprocessing practices varied across studies but followed common patterns. Most studies reported basic imputation techniques, typically mean or median imputation, to address missing data [ 13 , 14 , 76 ]. Multiple imputation was reported in a smaller subset, predominantly in high-income settings. Temporal aggregation of clinical events was frequently used to convert longitudinal records into fixed-length feature vectors [ 67 , 68 , 75 ]. Feature engineering was often based on clinically defined variables rather than automated methods [ 13 , 14 , 76 ]. Automated representation learning appeared in deep learning studies [ 16 , 69 ] but was rarely seen in LMIC research due to data sparsity and computational constraints [ 74 , 79 ]. Few studies reported systematic evaluation of alternative preprocessing strategies or sensitivity analyses for missing data handling [ 95 – 97 ]. A summary of preprocessing practices is provided in Table 6 . Table 6 Data preprocessing, explainability, and contextual focus across key studies Authors Preprocessing Feature Engineering Explainability External Validation LMIC Identified Limitation Rajkomar et al. Yes (imputation, normalization) Automated + clinical No No No Limited applicability to low-resource settings Beam & Kohane Conceptual discussion Not applicable No No No No empirical validation Shickel et al. Yes (temporal aggregation) Deep feature learning Limited No No Focus on algorithms over deployment Christodoulou et al. Yes Predefined clinical Yes (coefficients) No No No public health context Mutai et al. [ 13 ] Yes (missing data handling) Manual clinical No No Yes Limited calibration assessment Musukwa et al. [ 14 ] Yes Routine EHR variables No No Yes No external validation Goldstein et al. Yes Clinical risk factors Yes No No Limited operational guidance Collins et al. Not applicable Not applicable Yes (methodological) Yes No Not ML-specific Van Calster et al. Not applicable Not applicable Yes (calibration) Yes No Lacks applied case studies Lundberg & Lee Not applicable Not applicable Yes (SHAP) No No No health system evaluation Vayena et al. Not applicable Not applicable Conceptual ethics No Yes No technical implementation WHO Not applicable Not applicable Governance No Yes No empirical ML assessment Rasmy et al. Yes (tokenization, embeddings) Automated representation Limited No No High computational requirements Rieke et al. Yes Federated feature spaces Limited Yes Potential Limited LMIC deployment Hasson et al. Yes (data quality focus) Not predictive No No Yes No ML modelling 5.6 Model Evaluation and Validation Practices Evaluation metrics reported across the 64 included studies are presented in Fig. 4 . Discrimination-based metrics predominated, particularly AUROC, with sensitivity and specificity also commonly reported. Calibration metrics — essential for public health decisions where predicted probabilities determine triage thresholds — were reported in only 4 studies (6.2%) [ 61 , 63 ]. External validation on an independent dataset was rare across all settings — only 5 of 64 studies (7.8%) reported external validation [ 19 , 26 , 30 , 42 , 70 ], comprising 3 high-income studies (6.8% of 44), 1 LMIC study (8.3% of 12), and 1 mixed-setting study [ 61 , 63 , 76 ]. The near-universal reliance on internal validation through cross-validation or train-test splits [ 17 , 63 ], regardless of setting, substantially limits confidence in the generalisability of reported model performance across contexts [ 63 , 66 , 88 ]. 5.7 Explainability, Ethics, and Governance Explainability was reported in 17 of 64 studies (26.6%), most commonly through SHAP-based feature importance rankings [ 71 ] or post-hoc attribution techniques [ 72 ]. Reporting was heavily concentrated in high-income settings: 16 of 44 HIC studies (36.4%) included an explainability assessment, compared with only 1 of 12 LMIC studies (8.3%) [ 94 ]. This disparity is substantively important — trust-building mechanisms are least present in settings where institutional accountability for algorithmic decisions is most critical and where poorly understood model outputs carry the greatest risk of eroding clinician and community trust. Ethical and governance considerations, including data privacy, algorithmic bias, and fairness [ 25 , 77 , 78 ], were discussed in several studies but rarely operationalised within model development or evaluation workflows [ 66 , 94 ]. Governance-focused frameworks from WHO [ 11 ] and Vayena et al. [ 25 ] provide high-level normative guidance, but a persistent gap remains between policy intent and applied ML practice. 6. Discussion 6.1 Interpretation of Key Findings RQ1: AI/ML Algorithms. Deep learning was the most commonly applied category overall (39.1%, n = 25), but this pattern is driven by high-income country studies with large curated datasets; it does not reflect the LMIC experience. Among LMIC-focused studies, traditional ML and ensemble methods were each the dominant approach (33.3% each), representing pragmatic alignment between model choice and the data quality, interpretability, and infrastructure constraints of resource-limited health systems. This finding is consistent with evidence that deep learning offers modest or no performance advantage over logistic regression under realistic data conditions [ 17 , 71 ], implying that algorithmic sophistication alone does not guarantee superior public health utility where data are sparse or incomplete. RQ2: Data Preprocessing and Feature Engineering. The prevalence of basic imputation and manual feature construction in LMIC studies reflects rational adaptation to data sparsity and the need for clinically defensible workflows [ 76 , 79 ]. The critical gap is not the method chosen but the near-universal absence of sensitivity analyses establishing how preprocessing choices propagate to downstream predictions. Without such analyses, reported model performance cannot be clearly attributed to architecture rather than data preparation decisions [ 95 – 97 ]. RQ3: Evaluation, Calibration, and Explainability. The concentration of evaluation practices around discrimination metrics, with calibration reported in only 6.2% of studies and external validation in only 7.8%, represents a substantive deployment risk. A model with high AUROC but poor calibration can systematically misallocate scarce resources by assigning incorrect probabilities to population subgroups [ 61 , 78 ]. Explainability reporting was low overall but especially so in LMIC settings — only 1 of 12 LMIC studies (8.3%) included any explainability assessment, compared with 16 of 44 HIC studies (36.4%). This asymmetry means that trust-building and accountability mechanisms are least present precisely where they are most needed — in settings where institutional oversight of algorithmic outputs is weakest and where the consequences of unexplained errors fall disproportionately on underserved populations. 6.2 Comparison with Previous Reviews This review advances the literature by explicitly differentiating from prior major systematic reviews. Influential works such as Rajkomar et al. [ 1 ], Beam and Kohane [ 2 ], and Shickel et al. [ 4 ] predominantly emphasise algorithmic development using large-scale EHR datasets, with limited insight into deployment feasibility or governance in resource-constrained systems. Comparative analyses such as Christodoulou et al. [ 17 ] and Goldstein et al. [ 34 ] offer valuable methodological rigour but remain largely detached from public health use cases and LMIC realities. LMIC-focused empirical studies, including Mutai et al. [ 13 ], Musukwa et al. [ 14 ], and Hasson et al. [ 9 ], demonstrate feasibility of applying ML to routine health data in constrained settings but consistently lack external validation, calibration assessment, and explainability evaluation. Ethical and governance-focused works (Vayena et al. [ 25 ] and WHO [ 11 ]) provide essential normative frameworks without operationalising these principles within applied ML pipelines. Advanced methodological contributions such as Med-BERT [ 16 ] and federated learning [ 26 ] illustrate future potential but remain largely inaccessible to most LMIC health systems. This review advances the literature in three ways: Cross-layer synthesis reveals that weaknesses in preprocessing and evaluation frequently outweigh algorithmic gains; Public health-specific framing foregrounds calibration, external validation, and interpretability as non-negotiable requirements for population-level decision-making; And context-aware methodological guidance demonstrates that model selection must be driven by data quality, governance capacity, and decision context, not algorithmic novelty. 6.3 Implications for Practice and Policy Effective integration of AI into public health practice depends less on model complexity and more on data governance, evaluation rigour, and institutional capacity. Interpretable, well-calibrated models that can be validated across settings are more likely to support equitable, actionable public health decisions than highly complex models optimised solely for discrimination [ 61 , 65 , 103 ]. For policymakers, these findings underscore the importance of embedding AI within broader digital health strategies, including investments in data quality, workforce development, and regulatory oversight [ 22 , 24 ]. Without such alignment, AI-driven predictive analytics risk remaining experimental tools rather than sustainable components of public health systems [105]. The growing NCD burden in LMICs, currently underrepresented in ML research, represents an urgent area for investment in both data infrastructure and applied research. 6.4 Limitations Several limitations should be acknowledged. The review was restricted to English-language publications from 2018–2025, which may exclude relevant earlier or non-English studies. Quality assessment relied on reported methodological details, which varied in completeness and transparency. While this review identifies methodological gaps, it does not empirically evaluate the causal impact of specific ML practices on public health outcomes. The inclusion of high-income country studies, while methodologically valuable as a comparative baseline, means the evidence base is not exclusively LMIC-derived. No quantitative meta-analysis was conducted due to heterogeneity in outcomes, data sources, and evaluation metrics. 6.5 Priorities for Future Research Based on this synthesis, four priorities emerge for future LMIC-focused research. First, model selection should default to robustness over complexity — linear and ensemble models are appropriate unless data volume and quality clearly justify deep learning. Second, preprocessing decisions must be made explicit and evaluated through sensitivity analyses demonstrating how they affect downstream predictions. Third, calibration and external validation must be treated as core requirements, not optional additions, because probability reliability and cross-context generalisability are prerequisites for responsible public health deployment. Fourth, explainability and governance must move from conceptual acknowledgment to operational integration, with XAI methods evaluated for utility to health system stakeholders. An expanded research agenda addressing cardiovascular disease, diabetes, and hypertension using routine LMIC EHR data is urgently needed. 7. Conclusion This systematic review synthesized evidence from 64 studies on ML-based predictive analytics using EHRs in public health and epidemiology (2018–2025), with LMIC settings as the primary focus. The central finding is a structural imbalance — methodological investment in algorithmic development has substantially outpaced investment in evaluation rigour, explainability, and governance, and LMICs, which bear the greatest burden of preventable disease, are precisely the settings where this imbalance is most acute. Ensemble methods and logistic regression are appropriately dominant in LMIC contexts, reflecting practical alignment with data quality and infrastructure constraints. However, their deployment value is limited by pervasive reliance on internal validation, underdeveloped calibration reporting, and minimal explainability integration. Ethical and governance considerations are increasingly acknowledged but rarely operationalised within applied modelling workflows. Addressing these gaps requires a shift in research priorities — from algorithmic novelty toward contextual validity; from discrimination metrics toward calibration and decision impact; and from individual model outputs toward reproducible, externally validated, governance-ready systems. As LMICs expand digital health infrastructure and face both persistent infectious disease burdens and a growing NCD challenge, the opportunity to develop AI tools that are genuinely reliable, equitable, and actionable is significant, but realising it demands the same rigour in evaluation and deployment as in model development. Declarations Ethics approval This study is a systematic review of published literature and did not involve the collection of primary data or direct interaction with human participants. Ethical approval for the broader research programme was granted by the ZCAS University Ethics Review Committee (Approval No. 2025/11/001). No additional ethical approval was required for this review. Consent to Participate Not applicable. Consent to publish Not applicable. Competing interests All authors declare no competing interests. Funding This research received no funding. Author contributions Joe Phiri conceived and designed the study, led the literature review, data extraction and synthesis, and drafted the manuscript. Aaron Zimba contributed to conceptual development, methodological refinement, and critical review of the analysis and findings. Chiyaba Njovu provided public health and LMIC contextual interpretation and critically reviewed the manuscript. Mwansa Lumpa contributed to methodological guidance, interpretation of findings, and critical review of the manuscript. All authors reviewed, edited, and approved the final manuscript and take responsibility for its content. References Rajkomar A, Dean J, Kohane I (2019) Machine learning in medicine. N Engl J Med 380(14):1347–1358 Beam AL, Kohane IS (2018) Big data and machine learning in health care. JAMA 319(13):1317–1318 Wiens J, Shenoy ES (2018) Machine learning for healthcare: On the verge of a major shift in healthcare epidemiology. Clin Infect Dis 66(1):149–153 Shickel B, Tighe PJ, Bihorac A, Rashidi P, Deep EHR (2021) A survey of recent advances in deep learning techniques for electronic health record analysis. J Biomed Inf 122:103887 Xie X, Zhang J, Chen M (2022) Predicting in-hospital mortality using machine learning models: A systematic review. BMJ Health Care Inf 29(1):e100552 Viboud C, Vespignani A (2019) The future of influenza forecasts. Proc Natl Acad Sci USA 116(8):2802–2804 Salathé M, Bengtsson L, Bodnar TJ et al (2012) Digital epidemiology. PLoS Comput Biol 8(7):e1002616. https://doi.org/10.1371/journal.pcbi.1002616 Topol EJ (2019) High-performance medicine: The convergence of human and artificial intelligence. Nat Med 25(1):44–56 Hasson R, Smith KS, Johnson L (2020) Electronic health record infrastructure in Africa: Current state and future directions. BMJ Glob Health 5(4):e002734 Chibanda D et al (2021) Data quality challenges in African health information systems and implications for analytics. BMJ Glob Health 6:e004896 World Health Organization (2021) Ethics and Governance of Artificial Intelligence for Health. WHO, Geneva Lazer D, Pentland AS, Watts DJ et al (2020) Computational social science: Obstacles and opportunities. Science 369(6507):1060–1062 Mutai BK, Njoroge SW et al (2020) Predicting HIV treatment outcomes using machine learning in Kenya. BMC Med Inf Decis Mak 20:123 Musukwa MT et al (2021) Machine learning for antiretroviral therapy outcome prediction in Zambia. PLoS ONE 16(10):e0259873 Rajkomar A, Oren E, Chen K et al (2018) Scalable and accurate deep learning with electronic health records. NPJ Digit Med 1:18 Rasmy L, Wu Y, Wang N et al (2023) Med-BERT: Pretrained contextualized embeddings on large-scale structured electronic health records. J Am Med Inf Assoc 30(2):199–210 Christodoulou E, Ma J, Collins GS et al (2019) A systematic review shows no performance benefit of machine learning over logistic regression for clinical prediction models. J Clin Epidemiol 110:12–22 Shah NH, Milstein A, Bagley SC (2019) Making machine learning models clinically useful. JAMA 322(14):1351–1352 Steyerberg EW, Vickers AJ, Cook NR et al (2019) Assessing the performance of prediction models: A framework for traditional and novel measures. Stat Med 38(14):2503–2515 Obermeyer Z, Powers B, Vogeli C, Mullainathan S (2019) Dissecting racial bias in health algorithms. Science 366(6464):447–453 Adem S, Yusuf A, Mwangi J (2022) Digital health infrastructure in sub-Saharan Africa: Opportunities and constraints. Health Policy Plan 37(3):372–384 World Health Organization Regional Office for Africa (2020) Digital Health Strategy for the WHO African Region 2020–2030. WHO AFRO, Brazzaville Chowdhury MEH, Reza T, Islam S (2021) Mobile health and edge AI for low-resource environments. IEEE Rev Biomed Eng 14:30–54 World Health Organization (2023) AI for Health: Capacity Building in Africa. WHO, Geneva Vayena E, Blasimme A, Cohen IG (2018) Machine learning in healthcare: Ethical challenges. PLoS Med 15(11):e1002689 Rieke N, Hancox J, Li W et al (2020) The future of digital health with federated learning. NPJ Digit Med 3:119 Sendak MP, D'Arcy J, Kashyap S et al (2020) A path for translation of machine learning products into healthcare delivery. EMJ Innov 4(1):73–81 Tonekaboni S, Joshi A, McCradden MD, Goldenberg A (2020) What clinicians want from explainable artificial intelligence. NPJ Digit Med 3:74 Kairouz P, McMahan HB, Avent B et al (2021) Advances and open problems in federated learning. Found Trends Mach Learn 14(1–2):1–210 Goncalves A, Ray P, Soper B, Stevens J, Coyle L, Sales AP (2020) Generation and evaluation of synthetic patient data. J Am Med Inf Assoc 27(6):884–893 Esteva A, Robicquet A, Ramsundar B et al (2019) A guide to deep learning in healthcare. Nat Med 25(1):24–29. https://doi.org/10.1038/s41591-018-0316-z Vapnik VN (2013) The Nature of Statistical Learning Theory. Springer, New York Hastie T, Tibshirani R, Friedman J (2009) The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd edn. Springer, New York Goldstein BA, Navar AM, Pencina MJ, Ioannidis JPA (2018) Opportunities and challenges in developing risk prediction models with electronic health records data. J Am Med Inf Assoc 24(1):198–208 Weiskopf NG, Weng C (2013) Methods and dimensions of electronic health record data quality assessment: Enabling reuse for clinical research. J Am Med Inf Assoc 20(1):144–151 Hersh WR, Weiner MG, Embi PJ et al (2013) Caveats for the use of operational electronic health record data in comparative effectiveness research. Med Care 51(8 Suppl 3):S30–S37 James G, Witten D, Hastie T, Tibshirani R (2021) An Introduction to Statistical Learning, 2nd edn. Springer Steyerberg EW (2019) Clinical Prediction Models, 2nd edn. Springer, Cham Van Smeden M, Moons KGM, de Groot JAH et al (2019) Sample size for binary logistic prediction models: Beyond events per variable criteria. Stat Methods Med Res 28(8):2455–2474 Riley RD, Ensor J, Snell KIE et al (2020) Calculating the sample size required for developing a clinical prediction model. BMJ 368:m441 Breiman L (2001) Random forests. Mach Learn 45:5–32 Vagliano I, Chesnaye NC, Leopold JH et al (2022) Comparative analysis of explainable machine learning prediction models for hospital mortality. BMC Med Inf Decis Mak 22(1):53 Chen T, Guestrin C, XGBoost (2016) A scalable tree boosting system. In: Proc KDD ACM; 2016 LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521:436–444 Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780 Devlin J, Chang MW, Lee K, Toutanova K (2019) BERT: Pre-training of deep bidirectional transformers for language understanding. In: NAACL-HLT He H, Garcia EA (2009) Learning from imbalanced data. IEEE Trans Knowl Data Eng 21(9):1263–1284 Brier GW (1950) Verification of forecasts expressed in terms of probability. Mon Weather Rev 78:1–3 Saito T, Rehmsmeier M (2015) The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets. PLoS ONE 10(3):e0118432 Page MJ, McKenzie JE, Bossuyt PM et al (2021) The PRISMA 2020 statement: An updated guideline for reporting systematic reviews. BMJ 372:n71 Moons KGM, Wolff RF, Riley RD et al (2019) PROBAST: A tool to assess risk of bias and applicability of prediction model studies. Ann Intern Med 170(1):51–58 Wolff RF, Moons KGM, Riley RD et al (2019) Explanation and elaboration. Ann Intern Med 170(1):W1–W33 Collins GS, Reitsma JB, Altman DG, Moons KGM (2015) Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD). BMJ 350:g7594 Collins GS, Moons KGM, Dhiman P et al (2024) TRIPOD + AI statement: Updated guidance for reporting clinical prediction models that use regression or machine learning methods. BMJ 385:e078378. https://doi.org/10.1136/bmj-2023-078378 Wynants L, Van Calster B, Collins GS et al (2020) Prediction models for diagnosis and prognosis of COVID-19: Systematic review and critical appraisal. BMJ 369:m1328 Hernandez-Boussard T, Bozkurt S, Ioannidis JPA, Shah NH (2020) MINIMAR (MINimum Information for Medical AI Reporting). NPJ Digit Med 3:105 Sounderajah V, Ashrafian H, Rose S et al (2021) QUADAS-AI: A quality assessment tool for AI-centered diagnostic test accuracy studies. Nat Med 27:1663–1665 Liu X, Cruz Rivera S, Moher D et al (2020) CONSORT-AI extension: Reporting guidelines for clinical trial reports involving artificial intelligence. Nat Med 26:1364–1374 Rivera SC, Liu X, Chan AW et al (2020) SPIRIT-AI extension: Guidance for clinical trial protocols involving artificial intelligence. Nat Med 26:1351–1363 Collins GS, Moons KGM (2019) Reporting of artificial intelligence prediction models. Lancet 393(10181):1577–1579 Van Calster B, McLernon DJ, van Smeden M, Wynants L, Steyerberg EW (2019) Calibration: The Achilles heel of predictive analytics. BMC Med 17:230 Riley RD, Snell KIE, Ensor J et al (2019) Minimum sample size for developing a multivariable prediction model: PART II. Stat Med 38(7):1276–1296 Collins GS, Ogundimu EO, Altman DG (2016) External validation of clinical prediction models using big datasets from e-health records or IPD meta-analysis. BMJ 353:i3140 Fraser HSF, Biondich P, Moodley D, Choi S, Mamlin BW, Szolovits P (2017) Implementing electronic health records in resource-limited settings. Int J Med Inf 97:268–276 Wiens J, Saria S, Sendak M et al (2019) Do no harm: A roadmap for responsible machine learning for health care. Nat Med 25:1337–1360 Kelly CJ, Karthikesalingam A, Suleyman M, Corrado G, King D (2019) Key challenges for delivering clinical impact with artificial intelligence. BMC Med 17:195 Harutyunyan H, Khachatrian H, Kale DC, Ver Steeg G, Galstyan A (2019) Multitask learning and benchmarking with clinical time series data. Sci Data 6:96 Miotto R, Li L, Kidd BA, Dudley JT (2016) Deep Patient: An unsupervised representation to predict the future of patients from the electronic health records. Sci Rep 6:26094 Salathé M (2018) Digital epidemiology: What is it, and where is it going? Life Sci Soc Policy 14:1 Viboud C, Sun K, Gaffey R et al (2018) The RAPIDD Ebola forecasting challenge: Synthesis and lessons learnt. Epidemics 22:13–21 Lundberg SM, Lee SI (2018) A unified approach to interpreting model predictions. In: Advances in Neural Information Processing Systems (NeurIPS) Amann J, Blasimme A, Vayena E, Frey D, Madai VI (2020) Explainability for artificial intelligence in healthcare: A multidisciplinary perspective. BMC Med Inf Decis Mak 20:310 Blaya JA, Fraser HSF, Holt B (2010) E-health technologies show promise in developing countries. Health Aff (Millwood) 29(2):244–251 Sendak MP, Gao M, Nichols M et al (2020) Human-centred implementation of machine learning in clinical systems. NPJ Digit Med 3:1–10 Petticrew M, Roberts H (2006) Systematic Reviews in the Social Sciences: A Practical Guide. Blackwell, Oxford Raji ID, Smart A, White RN et al (2020) Closing the AI accountability gap. In: Proc ACM FAccT Challen R, Denny J, Pitt M, Gompels L, Edwards T, Tsaneva-Atanasova K (2019) Artificial intelligence, bias and clinical safety. BMJ Qual Saf 28:231–237 Mehrabi N, Morstatter F, Saxena N, Lerman K, Galstyan A (2021) A survey on bias and fairness in machine learning. ACM Comput Surv 54(6):115 Salgado TM et al (2020) Machine learning for health system planning: A scoping review. Health Policy 124:1011–1018 Bastani H, Bayati M, Khosravi P (2019) Analytics for healthcare operations management: A review. Manuf Serv Oper Manag 21(3):517–534 Rudin C (2019) Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat Mach Intell 1:206–215 Choi E, Bahadori MT, Song L, Stewart WF, Sun J (2017) GRAM: Graph-based attention model for healthcare representation learning. In: Proc 23rd ACM SIGKDD. :787–795 Si Y, Du J, Li Z et al (2021) Deep representation learning of patient data from EHR: A systematic review. J Biomed Inf 115:103671 Xiao C, Choi E, Sun J (2018) Opportunities and challenges in developing deep learning models using electronic health records: A systematic review. J Am Med Inf Assoc 25(10):1419–1428 Ribeiro MT, Singh S, Guestrin C (2016) 'Why should I trust you?': Explaining the predictions of any classifier. In: Proc 22nd ACM SIGKDD. :1135–1144 Aiello M, Cavaliere C, D'Albore A et al (2023) The challenges of explainable AI in biomedical research. Front Neurosci 17:1035246 Reich NG, McGowan CJ, Yamana TK et al (2019) Accuracy of real-time multi-model ensemble forecasts for seasonal influenza in the U.S. PLOS Comput Biol 15(11):e1007486 Sendak MP, Balu S, Schulman KA (2022) Barriers to achieving scalable implementation of machine learning in health care. NPJ Digit Med 5:98 Arueyingho OV, Al-Taie A, McCallum C (2024) Scoping review: Machine learning interventions in the management of healthcare systems. Digit Health 10:20552076221144095 Chen M, Hao Y, Hwang K, Wang L, Wang L (2017) Disease prediction by machine learning over big data from healthcare communities. IEEE Access 5:8869–8879 Char DS, Shah NH, Magnus D (2018) Implementing machine learning in health care: addressing ethical challenges. N Engl J Med 378(11):981–983 Gianfrancesco MA, Tamang S, Yazdany J, Schmajuk G (2018) Potential biases in machine learning algorithms using electronic health record data. JAMA Intern Med 178(11):1544–1547 Chen IY, Pierson E, Rose S, Joshi S, Ferryman K, Ghassemi M (2021) Ethical machine learning in healthcare. Annu Rev Biomed Eng 23:123–150 Panch T, Mattie H, Atun R (2019) Artificial intelligence and algorithmic bias: Implications for health systems. J Glob Health 9(2):010318 Sendak M, Ratliff W, Sarro D et al (2020) Real-world integration of a sepsis deep learning technology into routine clinical care. JMIR Med Inf 8(7):e15182 Xie F, Yuan H, Ning Y et al (2022) Deep learning for temporal data representation in electronic health records: A systematic review. J Biomed Inf 126:103980 Gao J, Xiao C, Glass LM, Sun J (2023) Deep learning prediction models based on EHR trajectories: A systematic review. J Biomed Inf 144:104428 Nazir S, Dickson DM, Akram MU (2024) A survey of explainable artificial intelligence in healthcare. Healthc Anal 6:100344 Nilsson M, Sandin F, Gustafsson J et al (2024) Implementation of machine learning applications in health care organizations: Systematic review of empirical studies. JMIR Med Inform. ;12:e55897.le Landi I, Glicksberg BS, Lee HC et al (2020) Deep representation learning of electronic health records to unlock patient stratification at scale. NPJ Digit Med 3:96 Yang S, Varghese P, Stephenson E, Tu K, Gronsbell J (2023) Machine learning approaches for electronic health records phenotyping: a methodical review. J Am Med Inform Assoc 30(2):367–381. https://doi.org/10.1093/jamia/ocac216 Tomašev N, Harris N, Baur S, Mottram A, Glorot X Rae J. W., Use of deep (2021) learning to develop continuous-risk models for adverse event prediction from electronic health records. Nat Protoc 16:2765–2787 Additional Declarations The authors declare no competing interests. Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-9227225","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Systematic Review","associatedPublications":[],"authors":[{"id":612269423,"identity":"a372bec2-bba4-43f2-87a6-44e2dc63f00e","order_by":0,"name":"Joe Phiri","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAA60lEQVRIiWNgGAWjYDACCcYGhgQeBgZ+9uZjYAE2dmK1SPYcS2NgSABqYSaoBUob3PAxA2thIKTFXLq57cMDGRs5hhs83x58/LFNno+ZgfHDxxzcWiznHGyekcCTZsw4u3e74YyE24ZtzAzMkjO34dZicCOxGeiXw4nNMme3SfMk3GYEamFj5iVCS32bRM4zkBZ7orUk8EjksIG0JBLUYjkDrCXNcAbPMTPJGWm3k9uYGZvx+sVcIv0x488eG3n7483PJD7Y3Lad39588MNHfA4DEYw9KGLAyMUHwFoYfuBVMwpGwSgYBSMdAAASDE269Cq5JAAAAABJRU5ErkJggg==","orcid":"https://orcid.org/0009-0002-0336-6852","institution":"1. School of Computing, Technology and Applied Sciences Artificial Intelligence and Predictive Analytics, ZCAS University, P.O Box 35243, Lusaka, Zambia 2. Centre for Infectious Disease Research in Zambia (CIDRZ), Lusaka, Zambia","correspondingAuthor":true,"prefix":"","firstName":"Joe","middleName":"","lastName":"Phiri","suffix":""},{"id":612269424,"identity":"1b15bd61-6d72-401b-bf88-081df5b0e754","order_by":1,"name":"Aaron Zimba","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAA9UlEQVRIiWNgGAWjYBADHgYJBsYHMJ4EsVqYDUjSAlLGJoFg4wH87WcfPi5gqJPRnd38rPJHjZ29wQHmg7d5GOqicZp9Jt3YeAYDG4/ZnWNmNySOJSduOMCWbM3DwJbbgEvPgTQ2aR4GHh6zGwlmNwwbmBMMDvCYgURwapE//wykRQKoJf1bQWJDPdBh/N9AIji1GNwA22IA1JJjxnCw4TDjhgM8YBGcWgxvPGM25jFIAPrlTLFkw7HjiTMPsxlbzjFIwKlF7nwa42Oeijp7s9vtGz/+qKm25zve/PDGm4o63N6HOA+Zw4whMgpGwSgYBaOAVAAA/FlMH3lWvCIAAAAASUVORK5CYII=","orcid":"","institution":"School of Computing, Technology and Applied Sciences Artificial Intelligence and Predictive Analytics, ZCAS University, P.O Box 35243, Lusaka, Zambia","correspondingAuthor":true,"prefix":"","firstName":"Aaron","middleName":"","lastName":"Zimba","suffix":""},{"id":612269425,"identity":"d90e869e-f55e-4aaa-bdea-d0b022c9caf5","order_by":2,"name":"Chiyaba Njovu","email":"","orcid":"","institution":"School of Computing, Technology and Applied Sciences Artificial Intelligence and Predictive Analytics, ZCAS University, P.O Box 35243, Lusaka, Zambia","correspondingAuthor":false,"prefix":"","firstName":"Chiyaba","middleName":"","lastName":"Njovu","suffix":""},{"id":612269426,"identity":"42e150d4-c72c-4f6e-92fd-b1abbc990ef0","order_by":3,"name":"Mwansa Lumpa","email":"","orcid":"https://orcid.org/0009-0000-2306-6318","institution":"Centre for Infectious Disease Research in Zambia (CIDRZ)","correspondingAuthor":false,"prefix":"","firstName":"Mwansa","middleName":"","lastName":"Lumpa","suffix":""}],"badges":[],"createdAt":"2026-03-25 21:52:02","currentVersionCode":1,"declarations":{"humanSubjects":false,"vertebrateSubjects":false,"conflictsOfInterestStatement":false,"humanSubjectEthicalGuidelines":false,"humanSubjectConsent":false,"humanSubjectClinicalTrial":false,"humanSubjectCaseReport":false,"vertebrateSubjectEthicalGuidelines":false},"doi":"10.21203/rs.3.rs-9227225/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-9227225/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":105529386,"identity":"574db929-048f-448d-8899-8c9277fb62be","added_by":"auto","created_at":"2026-03-27 05:33:14","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":152814,"visible":true,"origin":"","legend":"\u003cp\u003ePRISMA 2020 Study Selection Flow Diagram.\u003c/p\u003e","description":"","filename":"1.png","url":"https://assets-eu.researchsquare.com/files/rs-9227225/v1/785139299a985199e60d7d4a.png"},{"id":105529388,"identity":"02caf3c9-806a-46bd-8e4b-52720eceffc0","added_by":"auto","created_at":"2026-03-27 05:33:14","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":27077,"visible":true,"origin":"","legend":"\u003cp\u003eDistribution of included studies published between 2018 and 2025\u003c/p\u003e","description":"","filename":"2.png","url":"https://assets-eu.researchsquare.com/files/rs-9227225/v1/e97c8987548daee515937f7c.png"},{"id":105529387,"identity":"f1e7a7c6-cf51-413c-a1d2-d7b4a1ff5671","added_by":"auto","created_at":"2026-03-27 05:33:14","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":67591,"visible":true,"origin":"","legend":"\u003cp\u003eDistribution of included studies by bibliographic database\u003c/p\u003e","description":"","filename":"3.png","url":"https://assets-eu.researchsquare.com/files/rs-9227225/v1/fad3331a74564fdab1c7a078.png"},{"id":105567906,"identity":"fd22c291-ab25-4f22-a2ab-76d854fe5b14","added_by":"auto","created_at":"2026-03-27 13:05:53","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":61978,"visible":true,"origin":"","legend":"\u003cp\u003eEvaluation and validation metrics reported across included studies.\u003c/p\u003e","description":"","filename":"4.png","url":"https://assets-eu.researchsquare.com/files/rs-9227225/v1/2dec43b49fac859e399f7802.png"},{"id":106414738,"identity":"a0e71221-4f4d-44ff-a9a7-5876598ee94e","added_by":"auto","created_at":"2026-04-08 10:22:56","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":1833582,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-9227225/v1/6ab86712-c1c2-4048-b9fb-6acce7d2fdb1.pdf"}],"financialInterests":"The authors declare no competing interests.","formattedTitle":"\u003cp\u003e\u003cstrong\u003eA Systematic Review of Artificial Intelligence and Machine Learning Methods and Deployment Challenges for Public Health Predictions Using Electronic Health Records in Low- and Middle-Income Countries\u003c/strong\u003e\u003c/p\u003e","fulltext":[{"header":"1. Introduction","content":"\u003cp\u003eThe rapid digitization of healthcare has led to the widespread generation of electronic health records (EHRs), which are digital records of patient information including diagnoses, laboratory results, medications, and clinical outcomes. This data, collected routinely during patient care, has opened new opportunities for using AI and ML to improve public health decision-making, disease surveillance, and health system management [\u003cspan class=\"CitationRef\"\u003e1\u003c/span\u003e–\u003cspan class=\"CitationRef\"\u003e3\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eModern ML methods, especially ensemble learning and deep learning, can identify complex patterns in large health datasets, enabling predictions for outcomes such as early disease detection, mortality risk, hospital readmissions, and outbreak forecasting [\u003cspan class=\"CitationRef\"\u003e4\u003c/span\u003e–\u003cspan class=\"CitationRef\"\u003e7\u003c/span\u003e]. These capabilities are increasingly viewed as essential to data-driven public health and precision medicine [\u003cspan class=\"CitationRef\"\u003e8\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eDespite promising advances, translating ML into routine public health practice remains uneven. Real-world performance depends heavily on data quality, digital infrastructure, analytical capacity, and governance: factors that vary dramatically between settings, and are particularly challenging in LMICs [\u003cspan class=\"CitationRef\"\u003e9\u003c/span\u003e, \u003cspan class=\"CitationRef\"\u003e10\u003c/span\u003e]. Any meaningful synthesis of this literature must therefore account for implementation realities and resource constraints, not just algorithmic metrics [\u003cspan class=\"CitationRef\"\u003e11\u003c/span\u003e].\u003c/p\u003e \u003cdiv id=\"Sec2\" class=\"Section2\"\u003e \u003ch2\u003e1.1 Machine Learning and EHRs in Public Health\u003c/h2\u003e \u003cp\u003eML-driven analysis of EHR data can enhance disease surveillance, track population-level trends, and support timely policy responses [\u003cspan class=\"CitationRef\"\u003e3\u003c/span\u003e, \u003cspan class=\"CitationRef\"\u003e7\u003c/span\u003e]. Predictive models built from routine clinical data have been used to identify high-risk populations, power early warning systems for infectious diseases, and evaluate health interventions at scale [\u003cspan class=\"CitationRef\"\u003e6\u003c/span\u003e, \u003cspan class=\"CitationRef\"\u003e12\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eIn high-income settings, ML-enabled EHR analytics have demonstrated advantages over traditional statistical methods in specific epidemiological applications, particularly where large volumes of complex, time-series data are available [\u003cspan class=\"CitationRef\"\u003e1\u003c/span\u003e, \u003cspan class=\"CitationRef\"\u003e4\u003c/span\u003e, \u003cspan class=\"CitationRef\"\u003e6\u003c/span\u003e]. However, research consistently shows that performance gains over simpler models are often modest, especially when data quality is poor or sample sizes are small [\u003cspan class=\"CitationRef\"\u003e17\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eFor public health agencies and policymakers, accuracy alone is not enough. Models must also be interpretable, well-calibrated (meaning predicted probabilities match observed outcomes), and practically feasible [\u003cspan class=\"CitationRef\"\u003e18\u003c/span\u003e, \u003cspan class=\"CitationRef\"\u003e19\u003c/span\u003e]. These requirements are especially critical in LMICs, where poorly understood or mis-calibrated models can erode trust and deepen existing health inequities [\u003cspan class=\"CitationRef\"\u003e20\u003c/span\u003e].\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec3\" class=\"Section2\"\u003e \u003ch2\u003e1.2 Health Systems and EHR Infrastructure in LMICs\u003c/h2\u003e \u003cp\u003eHealth systems in LMICs typically span community health posts, primary health centres, district hospitals, and national referral facilities, but frequently operate under severe resource constraints, with high patient-to-provider ratios, donor-dependent programme funding, and fragmented governance [\u003cspan class=\"CitationRef\"\u003e9\u003c/span\u003e, \u003cspan class=\"CitationRef\"\u003e21\u003c/span\u003e, \u003cspan class=\"CitationRef\"\u003e22\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eEHR infrastructure in LMICs is heterogeneous. Many primary health care facilities, particularly in sub-Saharan Africa, still use paper or hybrid records [\u003cspan class=\"CitationRef\"\u003e9\u003c/span\u003e, \u003cspan class=\"CitationRef\"\u003e64\u003c/span\u003e]. Where electronic systems exist, common platforms include OpenMRS (widely used in East and Southern Africa), DHIS2 (for aggregate national reporting), and SmartCare (Zambia's national HIV EHR platform) [\u003cspan class=\"CitationRef\"\u003e9\u003c/span\u003e, \u003cspan class=\"CitationRef\"\u003e64\u003c/span\u003e]. These are designed primarily for programme reporting rather than research-grade longitudinal data, resulting in variable data quality.\u003c/p\u003e \u003cp\u003eKey data quality challenges in LMIC EHRs include: systematic missingness (patients seeking care at multiple facilities with no record linkage); inconsistent coding of diagnoses and laboratory results; limited longitudinal depth due to patient mobility; and incomplete digitisation of historical records [\u003cspan class=\"CitationRef\"\u003e35\u003c/span\u003e, \u003cspan class=\"CitationRef\"\u003e36\u003c/span\u003e]. Infrastructure barriers, including unreliable electricity, limited internet, and insufficient computational resources, further restrict the deployment of data-intensive models such as deep learning [\u003cspan class=\"CitationRef\"\u003e21\u003c/span\u003e, \u003cspan class=\"CitationRef\"\u003e23\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eBy contrast, high-income countries benefit from large, integrated EHR systems (such as Epic and Cerner) with longitudinal, multi-institutional, well-structured datasets [\u003cspan class=\"CitationRef\"\u003e1\u003c/span\u003e, \u003cspan class=\"CitationRef\"\u003e4\u003c/span\u003e, \u003cspan class=\"CitationRef\"\u003e15\u003c/span\u003e]. Methods developed in these contexts do not transfer directly to LMIC settings [\u003cspan class=\"CitationRef\"\u003e64\u003c/span\u003e, \u003cspan class=\"CitationRef\"\u003e66\u003c/span\u003e].\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec4\" class=\"Section2\"\u003e \u003ch2\u003e1.3 The Need for Context-Appropriate ML Approaches\u003c/h2\u003e \u003cp\u003eRecent research has explored ML optimisation strategies better suited to resource-constrained settings: lightweight model architectures with reduced computational demands; preprocessing methods designed to handle systematic missingness; and hybrid approaches that balance performance with interpretability [\u003cspan class=\"CitationRef\"\u003e17\u003c/span\u003e, \u003cspan class=\"CitationRef\"\u003e26\u003c/span\u003e]. Explainable AI (XAI) tools, such as feature attribution methods, are gaining attention as ways to build clinician and policymaker trust in model outputs [\u003cspan class=\"CitationRef\"\u003e28\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eFederated learning, which enables model training across institutions without sharing raw patient data, offers potential for collaborative development while preserving data privacy and national sovereignty [\u003cspan class=\"CitationRef\"\u003e26\u003c/span\u003e, \u003cspan class=\"CitationRef\"\u003e29\u003c/span\u003e]. Synthetic data generation and transfer learning have also been proposed to address data scarcity in low-resource settings [\u003cspan class=\"CitationRef\"\u003e30\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eDespite these advances, the literature remains fragmented. Many studies emphasise algorithmic performance without adequately addressing deployment feasibility, evaluation rigour, or public health relevance. Technology-focused reviews (Shickel et al. [\u003cspan class=\"CitationRef\"\u003e4\u003c/span\u003e]; Rajkomar et al. [\u003cspan class=\"CitationRef\"\u003e1\u003c/span\u003e]; Christodoulou et al. [\u003cspan class=\"CitationRef\"\u003e17\u003c/span\u003e]) offer limited guidance on adapting methods to resource-limited health systems. LMIC-focused studies (Mutai et al. [\u003cspan class=\"CitationRef\"\u003e13\u003c/span\u003e]; Musukwa et al. [\u003cspan class=\"CitationRef\"\u003e14\u003c/span\u003e]) demonstrate feasibility but consistently lack external validation, calibration, and explainability evaluation. Policy frameworks from WHO [\u003cspan class=\"CitationRef\"\u003e11\u003c/span\u003e] and Vayena et al. [\u003cspan class=\"CitationRef\"\u003e25\u003c/span\u003e] provide normative guidance but do not operationalise this within applied ML workflows.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec5\" class=\"Section2\"\u003e \u003ch2\u003e1.4 Scope and Objectives\u003c/h2\u003e \u003cp\u003eZambia and comparable LMICs represent a compelling context for examining AI-driven EHR analytics. While investments in digital health infrastructure are expanding, challenges related to data quality, interoperability, and analytic capacity persist [\u003cspan class=\"CitationRef\"\u003e9\u003c/span\u003e, \u003cspan class=\"CitationRef\"\u003e22\u003c/span\u003e]. This systematic review addresses these gaps by synthesising peer-reviewed evidence on ML-based predictive analytics using EHR data published between 2018 and 2025, with LMIC settings as the primary focus and high-income country studies included as a methodological benchmark. The primary objective is to critically examine how ML techniques are applied, adapted, and evaluated.\u003c/p\u003e \u003cp\u003eThree research questions guide the review:\u003c/p\u003e \u003cp\u003e \u003c/p\u003e\u003cul\u003e \u003cli\u003e \u003cp\u003eRQ1 Which ML algorithms are most frequently applied in LMIC settings, and how do they compare in predictive accuracy, interpretability, and computational feasibility?\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eRQ2 Which preprocessing and feature engineering strategies best address data quality challenges in resource-constrained EHR environments?\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eRQ3 How are ML-based predictions evaluated and validated in low-resource settings, particularly regarding calibration and explainability for public health decision-making?\u003c/p\u003e \u003c/li\u003e \u003c/ul\u003e \u003cp\u003e\u003c/p\u003e \u003cp\u003eThe review makes three key contributions:\u003c/p\u003e \u003cp\u003e \u003c/p\u003e\u003col\u003e \u003cspan\u003e \u003cli\u003e \u003cp\u003eA context-aware synthesis grounded in public health and LMIC health system realities;\u003c/p\u003e \u003c/li\u003e \u003c/span\u003e \u003cspan\u003e \u003cli\u003e \u003cp\u003eA focus on methodological rigour beyond predictive accuracy, foregrounding calibration, external validation, and explainability;\u003c/p\u003e \u003c/li\u003e \u003c/span\u003e \u003cspan\u003e \u003cli\u003e \u003cp\u003eIdentification of practical research gaps, including limited external validation, insufficient governance and fairness attention, and underrepresentation of non-communicable disease applications in LMICs.\u003c/p\u003e \u003c/li\u003e \u003c/span\u003e \u003c/ol\u003e \u003cdiv class=\"BlockQuote\"\u003e \u003c/div\u003e \u003c/div\u003e"},{"header":"2. Algorithmic Foundations","content":"\u003cp\u003e\u003c/p\u003e\u003cp\u003eThis section provides a concise overview of the main ML method categories used in EHR-based public health prediction. A deeper technical understanding is not required to follow the paper's findings; this section is provided for context.\u003c/p\u003e\u003cp\u003eML methods learn patterns from data to make predictions. In EHR-based health research, the goal is to build a model that maps patient information (demographics, diagnoses, lab results, visit history) to a health outcome (e.g., mortality risk, disease progression). EHR datasets present specific challenges — high dimensionality (many variables per patient), missing data, imbalanced outcomes (e.g., rare events like death), irregular time intervals between visits, and data distributions that change over time [\u003cspan class=\"CitationRef\"\u003e32\u003c/span\u003e, \u003cspan class=\"CitationRef\"\u003e33\u003c/span\u003e, \u003cspan class=\"CitationRef\"\u003e34\u003c/span\u003e].\u003c/p\u003e\u003cp\u003eFour model categories dominate this literature. Traditional ML methods, including logistic regression and decision trees, offer transparent outputs, low computational requirements, and reliable calibration even with sparse data, making them well-suited to LMIC environments [\u003cspan class=\"CitationRef\"\u003e37\u003c/span\u003e, \u003cspan class=\"CitationRef\"\u003e38\u003c/span\u003e]. Ensemble methods, such as random forests, gradient boosting and XGBoost, handle noisy or incomplete data robustly and generate feature importance scores that help clinicians understand what drives predictions [\u003cspan class=\"CitationRef\"\u003e41\u003c/span\u003e, \u003cspan class=\"CitationRef\"\u003e43\u003c/span\u003e]. Deep learning architectures, including feedforward networks, recurrent neural networks (LSTMs), and transformers, can capture complex temporal patterns in longitudinal EHR data but require large, well-curated datasets and substantial computational infrastructure, generally limiting their use to high-resource settings [\u003cspan class=\"CitationRef\"\u003e44\u003c/span\u003e–\u003cspan class=\"CitationRef\"\u003e46\u003c/span\u003e]. Hybrid and emerging approaches, including federated learning and AutoML, offer potential advantages in privacy preservation and automation [\u003cspan class=\"CitationRef\"\u003e26\u003c/span\u003e, \u003cspan class=\"CitationRef\"\u003e29\u003c/span\u003e].\u003c/p\u003e\u003cp\u003eTwo issues are especially important for public health settings. First, class imbalance — outcomes such as mortality or treatment failure are uncommon, so models can appear accurate simply by predicting the majority class. Weighted loss functions address this [\u003cspan class=\"CitationRef\"\u003e47\u003c/span\u003e]. Second, calibration - a model is well-calibrated when its predicted probabilities match real-world event rates, which is a critical property for triage and resource allocation decisions where the probability estimate itself (not just the ranking of patients) determines action. The Brier score and calibration plots are standard tools for assessing this [\u003cspan class=\"CitationRef\"\u003e48\u003c/span\u003e, \u003cspan class=\"CitationRef\"\u003e61\u003c/span\u003e]. Models with fewer parameters, stable optimisation, and reliable calibration are more likely to perform robustly under the constraints typical of LMIC health systems [\u003cspan class=\"CitationRef\"\u003e37\u003c/span\u003e, \u003cspan class=\"CitationRef\"\u003e49\u003c/span\u003e].\u003c/p\u003e"},{"header":"3. Methodology","content":"\u003cp\u003eThis systematic review was conducted in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines to ensure transparency, reproducibility, and methodological rigour [\u003cspan citationid=\"CR50\" class=\"CitationRef\"\u003e50\u003c/span\u003e].\u003c/p\u003e \u003cdiv id=\"Sec7\" class=\"Section2\"\u003e \u003ch2\u003e3.1 Review Design\u003c/h2\u003e \u003cp\u003eA systematic literature review design was adopted to identify, screen, and synthesise empirical studies applying AI/ML to EHR or routinely collected health data for public health-oriented predictive analytics. Given substantial heterogeneity in prediction objectives, data sources, and modelling approaches, a thematic narrative synthesis was used rather than quantitative meta-analysis [\u003cspan citationid=\"CR51\" class=\"CitationRef\"\u003e51\u003c/span\u003e]. The review is guided by the three research questions (RQ1\u0026ndash;RQ3) mapped in Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab1\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 1\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eMapping of Research Questions to Review Processes\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"2\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eRQ\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eReview Process Components\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eRQ1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eExtraction of ML model types, algorithm categorization, comparison of reported performance metrics, assessment of interpretability and computational feasibility\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eRQ2\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eExtraction of preprocessing methods, feature engineering strategies, handling of missingness, heterogeneity, and temporal data\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eRQ3\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eExtraction of evaluation metrics, validation strategies, calibration reporting, explainability methods, and governance considerations\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec8\" class=\"Section2\"\u003e \u003ch2\u003e3.2 Information Sources and Search Strategy\u003c/h2\u003e \u003cp\u003eA comprehensive literature search was conducted across five major academic databases: IEEE Xplore, Scopus, Web of Science, ACM Digital Library, and PubMed. The search combined controlled vocabulary and free-text terms related to AI, ML, EHRs, predictive analytics, public health, epidemiology, and resource-limited settings. The search string was:\u003c/p\u003e \u003cp\u003e \u003cspan fontcategory=\"NonProportional\" class=\"\" name=\"Emphasis\"\u003e(\"machine learning\" OR \"artificial intelligence\" OR \"deep learning\") AND (\"electronic health record\" OR \"electronic medical record\" OR \"EHR\") AND (\"prediction\" OR \"risk prediction\" OR \"predictive analytics\") AND (\"low-resource\" OR \"LMIC\" OR \"developing country\") AND (\"public health\" OR \"epidemiology\" OR \"population health\")\u003c/span\u003e\u003c/p\u003e \u003cp\u003eThe search was restricted to studies published between January 2018 and December 2025, capturing the rapid expansion of deep learning in epidemiology, the emergence of explainable AI frameworks, and innovations following the COVID-19 pandemic [\u003cspan citationid=\"CR52\" class=\"CitationRef\"\u003e52\u003c/span\u003e].\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec9\" class=\"Section2\"\u003e \u003ch2\u003e3.3 Eligibility Criteria\u003c/h2\u003e \u003cp\u003eStudy selection followed a three-stage PRISMA-consistent screening process [\u003cspan citationid=\"CR50\" class=\"CitationRef\"\u003e50\u003c/span\u003e]:\u003c/p\u003e \u003cp\u003e \u003col\u003e \u003cspan\u003e \u003cli\u003e \u003cp\u003eTitle and abstract screening to remove irrelevant or non-AI studies;\u003c/p\u003e \u003c/li\u003e \u003c/span\u003e \u003cspan\u003e \u003cli\u003e \u003cp\u003eFull-text eligibility assessment using predefined criteria;\u003c/p\u003e \u003c/li\u003e \u003c/span\u003e \u003cspan\u003e \u003cli\u003e \u003cp\u003eAnd final inclusion based on relevance to at least one research question.\u003c/p\u003e \u003c/li\u003e \u003c/span\u003e \u003c/ol\u003e \u003c/p\u003e \u003cp\u003eCriteria are summarised in Table\u0026nbsp;\u003cspan refid=\"Tab2\" class=\"InternalRef\"\u003e2\u003c/span\u003e. High-income country studies were included as a methodological reference baseline to enable comparative evaluation of methods, evaluation practices, and deployment feasibility.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab2\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 2\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eInclusion and Exclusion Criteria\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"3\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eCategory\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eInclusion Criteria\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eExclusion Criteria\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003ePublication year\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e2018\u0026ndash;2025\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003ePre-2018\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eLanguage\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eEnglish\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eNon-English\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eStudy type\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003ePeer-reviewed articles, systematic reviews, validated conference papers\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eEditorials, opinion pieces, blogs\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eFocus\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eAI/ML applied to public health or epidemiology\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003ePurely clinical AI without population focus\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eMethods\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eEmpirical AI models, epidemiological simulations, decision-support systems\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eTheoretical AI papers without health application\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eSetting\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eLMIC settings (primary focus); high-income settings included as comparative baseline\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eStudies with no geographic or health system context reported\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec10\" class=\"Section2\"\u003e \u003ch2\u003e3.4 Study Selection and Data Extraction\u003c/h2\u003e \u003cp\u003eAll retrieved records were imported into a reference management system and duplicates removed. The selection process is shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e. Studies were excluded primarily due to irrelevance to predictive analytics, absence of EHR or routinely collected health data, or insufficient methodological detail. A structured data extraction spreadsheet captured \u0026mdash; study setting and population; data sources and sample size; prediction task and outcome; ML algorithms; preprocessing and feature engineering methods; evaluation metrics and validation strategy; explainability, ethical, and governance considerations; and reported limitations [\u003cspan citationid=\"CR55\" class=\"CitationRef\"\u003e55\u003c/span\u003e].\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec11\" class=\"Section2\"\u003e \u003ch2\u003e3.5 Quality and Risk-of-Bias Assessment\u003c/h2\u003e \u003cp\u003eA formal quality and risk-of-bias assessment was conducted using an adapted PROBAST framework extended to account for ML-specific considerations [\u003cspan citationid=\"CR51\" class=\"CitationRef\"\u003e51\u003c/span\u003e, \u003cspan citationid=\"CR52\" class=\"CitationRef\"\u003e52\u003c/span\u003e]. The adapted assessment covered four core domains:\u003c/p\u003e \u003cp\u003e \u003col\u003e \u003cspan\u003e \u003cli\u003e \u003cp\u003eData and participants: representativeness, completeness, and handling of missingness;\u003c/p\u003e \u003c/li\u003e \u003c/span\u003e \u003cspan\u003e \u003cli\u003e \u003cp\u003ePredictor and outcome definition: clarity, public health relevance, and temporal alignment;\u003c/p\u003e \u003c/li\u003e \u003c/span\u003e \u003cspan\u003e \u003cli\u003e \u003cp\u003eModel development and validation: overfitting control, internal and external validation;\u003c/p\u003e \u003c/li\u003e \u003c/span\u003e \u003cspan\u003e \u003cli\u003e \u003cp\u003eAnalysis and reporting: calibration, explainability, and transparency. Additional ML-specific criteria included class imbalance handling, hyperparameter tuning, and reproducibility practices [\u003cspan citationid=\"CR53\" class=\"CitationRef\"\u003e53\u003c/span\u003e]. Each study was categorised as having low, moderate, or high risk of bias.\u003c/p\u003e \u003c/li\u003e \u003c/span\u003e \u003c/ol\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec12\" class=\"Section2\"\u003e \u003ch2\u003e3.6 Synthesis Approach\u003c/h2\u003e \u003cp\u003eA thematic narrative synthesis was adopted. Studies were grouped and compared by \u0026mdash; ML model category; public health application domain; data quality mitigation strategies; model evaluation and calibration practices; and deployment feasibility and governance considerations [\u003cspan citationid=\"CR51\" class=\"CitationRef\"\u003e51\u003c/span\u003e, \u003cspan citationid=\"CR59\" class=\"CitationRef\"\u003e59\u003c/span\u003e].\u003c/p\u003e \u003c/div\u003e"},{"header":"4. Review of Previous Works","content":"\u003cdiv id=\"Sec14\" class=\"Section2\"\u003e \u003ch2\u003e4.1 Overview of Research Focus Areas\u003c/h2\u003e \u003cp\u003eAcross the reviewed literature, AI applications in public health cluster around four areas:\u003c/p\u003e \u003cp\u003e \u003col\u003e \u003cspan\u003e \u003cli\u003e \u003cp\u003eClinical and population risk prediction;\u003c/p\u003e \u003c/li\u003e \u003c/span\u003e \u003cspan\u003e \u003cli\u003e \u003cp\u003eDisease surveillance and epidemiological forecasting;\u003c/p\u003e \u003c/li\u003e \u003c/span\u003e \u003cspan\u003e \u003cli\u003e \u003cp\u003eHealth system performance and resource optimisation;\u003c/p\u003e \u003c/li\u003e \u003c/span\u003e \u003cspan\u003e \u003cli\u003e \u003cp\u003eAnd ethical, explainability, and governance frameworks.\u003c/p\u003e \u003c/li\u003e \u003c/span\u003e \u003c/ol\u003e \u003c/p\u003e \u003cp\u003eThe relative maturity of these domains varies considerably \u0026mdash; risk prediction has attracted the most empirical work, while health system optimisation and governance remain comparatively nascent. Understanding which domains are well-served and which are neglected is essential for identifying where a synthesis focused on LMIC contexts adds the most value [\u003cspan additionalcitationids=\"CR61 CR62 CR63 CR64 CR65\" citationid=\"CR60\" class=\"CitationRef\"\u003e60\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR66\" class=\"CitationRef\"\u003e66\u003c/span\u003e]\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec15\" class=\"Section2\"\u003e \u003ch2\u003e4.2 Key Studies\u003c/h2\u003e \u003cp\u003eTable\u0026nbsp;\u003cspan refid=\"Tab3\" class=\"InternalRef\"\u003e3\u003c/span\u003e synthesises 15 key studies most relevant to this review's objectives, explicitly mapping what each study addresses and what it does not. This approach enables direct identification of methodological, contextual, and ethical gaps across the prior literature.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab3\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 3\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eKey Studies on AI/ML for EHR-Based Predictive Analytics in Public Health\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"9\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c7\" colnum=\"7\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c8\" colnum=\"8\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c9\" colnum=\"9\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eAuthors\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eArea\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eYear\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003ePub. Type\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003eML Prediction\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c6\"\u003e \u003cp\u003eEHR Data\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c7\"\u003e \u003cp\u003eLMIC\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c8\"\u003e \u003cp\u003eXAI/Ethics\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c9\"\u003e \u003cp\u003eGap Identified\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eRajkomar et al.\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eML in clinical prediction\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e2019\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eJournal\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e✓ Deep learning\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e✓ Large-scale EHR\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e✗ High-income\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e✗ Limited\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003eLimited applicability to low-resource settings\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eBeam \u0026amp; Kohane\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eBig data \u0026amp; ML in healthcare\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e2018\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eJournal\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e✓ Conceptual ML\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e✓ EHR-centric\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e✗ No LMIC\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e✗ Ethics not explored\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003eNo empirical LMIC validation\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eShickel et al.\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eDeep learning for EHRs\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e2021\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eJournal\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e✓ Comprehensive review\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e✓ EHR time-series\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e✗ Limited LMIC\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e✗ Marginal XAI\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003eFocuses on algorithms, not deployment\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eChristodoulou et al.\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eML vs regression\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e2019\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eJournal\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e✓ Comparative prediction\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e✓ Clinical datasets\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e✗ No LMIC\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e✓ Transparency\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003eNo public health decision impact\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eHasson et al.\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eEHR infrastructure in Africa\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e2020\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eJournal\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e✗ No ML modelling\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e✓ Health IS\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e✓ LMIC-focused\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e✗ Limited ethics\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003eNo predictive analytics perspective\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eMutai et al.\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eHIV outcome prediction\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e2020\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eJournal\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e✓ ML risk prediction\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e✓ Routine clinical\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e✓ LMIC (Kenya)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e✗ XAI absent\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003eLimited validation \u0026amp; calibration\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eMusukwa et al.\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eART outcomes\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e2021\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eJournal\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e✓ ML prediction\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e✓ National EHR\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e✓ LMIC (Zambia)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e✗ Ethics not discussed\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003eNo external validation\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eGoldstein et al.\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eRisk modeling with EHRs\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e2018\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eJournal\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e✓ Predictive modelling\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e✓ EHR data\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e✗ No LMIC\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e✓ Calibration\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003eLimited operational guidance\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eCollins et al.\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eModel validation\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e2016\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eJournal\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e✓ Prediction evaluation\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e✓ Health datasets\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e✗ No LMIC\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e✓ Transparency\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003eNot ML/EHR-specific\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eVan Calster et al.\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eCalibration\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e2019\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eJournal\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e✓ Risk prediction theory\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e✗ Not EHR-specific\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e✗ No LMIC\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e✓ Decision reliability\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003eLacks applied case studies\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eLundberg \u0026amp; Lee\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eExplainable AI (SHAP)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e2018\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eConference\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e✓ Model explainability\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e✗ General ML\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e✗ No LMIC\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e✓ Strong XAI focus\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003eNo health system evaluation\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eVayena et al.\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eEthics of ML in health\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e2018\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eJournal\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e✗ No prediction\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e✗ Conceptual\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e✓ Global relevance\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e✓ Ethics-centered\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003eNo technical implementation\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eWHO\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eAI governance for health\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e2021\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003ePolicy\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e✗ No modelling\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e✗ Conceptual\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e✓ LMIC-relevant\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e✓ Governance\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003eNo empirical ML assessment\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eRasmy et al.\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eTransformer models (Med-BERT)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e2023\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eJournal\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e✓ Advanced deep learning\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e✓ Large EHRs\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e✗ High-resource\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e✗ Limited XAI\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003eHigh computational demands\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eRieke et al.\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eFederated learning\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e2020\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eJournal\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e✓ Privacy-preserving ML\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e✓ Multi-institutional\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e✓ Potential LMIC\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e✓ Ethical data sharing\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003eLimited LMIC deployment evidence\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec16\" class=\"Section2\"\u003e \u003ch2\u003e4\u003cem\u003e.3 Clinical and Population Risk Prediction\u003c/em\u003e\u003c/h2\u003e \u003cp\u003eThe most established body of prior work concerns risk prediction from EHR data, covering mortality, disease progression, and treatment outcomes. Large-scale studies from high-income settings have demonstrated the feasibility of applying deep learning to longitudinal EHRs [\u003cspan additionalcitationids=\"CR68\" citationid=\"CR67\" class=\"CitationRef\"\u003e67\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR69\" class=\"CitationRef\"\u003e69\u003c/span\u003e], while comparative analyses consistently find that performance gains over simpler models are often modest [\u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e17\u003c/span\u003e, \u003cspan additionalcitationids=\"CR72\" citationid=\"CR71\" class=\"CitationRef\"\u003e71\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR73\" class=\"CitationRef\"\u003e73\u003c/span\u003e]. Christodoulou et al. found no consistent advantage of ML over logistic regression across multiple prediction tasks [\u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e17\u003c/span\u003e] \u0026mdash; a finding particularly relevant where data sparsity constrains complex architectures [\u003cspan citationid=\"CR64\" class=\"CitationRef\"\u003e64\u003c/span\u003e, \u003cspan citationid=\"CR74\" class=\"CitationRef\"\u003e74\u003c/span\u003e]. Studies specifically targeting LMIC settings, such as Mutai et al. [\u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e] in Kenya and Musukwa et al. [\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e] in Zambia, demonstrate feasibility of ML prediction from routine clinical data but highlight persistent gaps in external validation, calibration assessment, and explainability evaluation that limit confidence in their generalisability.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec17\" class=\"Section2\"\u003e \u003ch2\u003e4.4 Disease Surveillance and Forecasting\u003c/h2\u003e \u003cp\u003eA second research stream applies ML to disease surveillance and epidemiological forecasting, often combining EHRs with laboratory reports or syndromic surveillance data [\u003cspan additionalcitationids=\"CR81\" citationid=\"CR80\" class=\"CitationRef\"\u003e80\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR82\" class=\"CitationRef\"\u003e82\u003c/span\u003e]. Following COVID-19, ML forecasting models have been proposed for infectious disease dynamics and outbreak detection [\u003cspan additionalcitationids=\"CR84\" citationid=\"CR83\" class=\"CitationRef\"\u003e83\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR85\" class=\"CitationRef\"\u003e85\u003c/span\u003e]. While these show strong short-term predictive performance in reported studies, concerns persist in the prior literature about model stability, interpretability, and transferability across settings [\u003cspan citationid=\"CR81\" class=\"CitationRef\"\u003e81\u003c/span\u003e, \u003cspan citationid=\"CR86\" class=\"CitationRef\"\u003e86\u003c/span\u003e]. The gap between modelling performance and operational deployment is a recurring theme, with few prior studies linking forecasting outputs to explicit decision thresholds or public health protocols.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec18\" class=\"Section2\"\u003e \u003ch2\u003e4.5 Health System Performance and Resource Optimization\u003c/h2\u003e \u003cp\u003eA smaller but growing body of work applies AI to health system performance monitoring, including patient flow prediction, service utilisation forecasting, and workforce planning [\u003cspan additionalcitationids=\"CR90\" citationid=\"CR89\" class=\"CitationRef\"\u003e89\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR91\" class=\"CitationRef\"\u003e91\u003c/span\u003e]. These applications are directly relevant to public health operations but remain largely exploratory and predominantly confined to high-income settings [\u003cspan citationid=\"CR90\" class=\"CitationRef\"\u003e90\u003c/span\u003e, \u003cspan citationid=\"CR92\" class=\"CitationRef\"\u003e92\u003c/span\u003e]. LMIC-focused evidence is sparse, and few studies evaluate how predictive outputs actually influence resource allocation decisions [\u003cspan citationid=\"CR65\" class=\"CitationRef\"\u003e65\u003c/span\u003e, \u003cspan citationid=\"CR93\" class=\"CitationRef\"\u003e93\u003c/span\u003e].\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec19\" class=\"Section2\"\u003e \u003ch2\u003e4.6 Data Quality, Preprocessing, and Feature Engineering\u003c/h2\u003e \u003cp\u003eData quality challenges are a pervasive theme in the prior literature, particularly for LMIC contexts. Common issues documented include systematic missingness, inconsistent diagnosis coding, fragmented patient records across facilities, and limited longitudinal depth [\u003cspan citationid=\"CR64\" class=\"CitationRef\"\u003e64\u003c/span\u003e, \u003cspan citationid=\"CR94\" class=\"CitationRef\"\u003e94\u003c/span\u003e]. Prior reviews and empirical studies note that these challenges are qualitatively different from those in high-income settings, where large integrated EHR systems provide relatively complete and standardised data [\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e, \u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e]. The prior literature identifies a recurring methodological weakness: preprocessing strategies \u0026mdash; including imputation method choice, temporal aggregation windows, and feature selection criteria \u0026mdash; are rarely evaluated through sensitivity analyses, making it difficult to attribute reported model performance to architecture rather than data preparation decisions [\u003cspan additionalcitationids=\"CR96\" citationid=\"CR95\" class=\"CitationRef\"\u003e95\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR97\" class=\"CitationRef\"\u003e97\u003c/span\u003e].\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec20\" class=\"Section2\"\u003e \u003ch2\u003e4.7 Evaluation Practices and Validation Strategies\u003c/h2\u003e \u003cp\u003eEvaluation practices in the prior AI/ML health literature have been widely critiqued for over-reliance on discrimination metrics, particularly AUROC, at the expense of calibration and external validation [\u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e17\u003c/span\u003e, \u003cspan citationid=\"CR61\" class=\"CitationRef\"\u003e61\u003c/span\u003e, \u003cspan citationid=\"CR63\" class=\"CitationRef\"\u003e63\u003c/span\u003e]. Methodological work by Van Calster et al. [\u003cspan citationid=\"CR61\" class=\"CitationRef\"\u003e61\u003c/span\u003e] and Collins et al. [\u003cspan citationid=\"CR63\" class=\"CitationRef\"\u003e63\u003c/span\u003e] establishes that calibration \u0026mdash; the agreement between predicted probabilities and observed outcomes \u0026mdash; is essential for triage and resource allocation applications yet is systematically underreported. Prior comparative reviews further highlight that external validation on independent datasets is rare, meaning that reported model performance frequently cannot be assumed to generalise to new settings or time periods [\u003cspan citationid=\"CR63\" class=\"CitationRef\"\u003e63\u003c/span\u003e, \u003cspan citationid=\"CR66\" class=\"CitationRef\"\u003e66\u003c/span\u003e]. These gaps in evaluation practice are especially consequential in LMIC contexts, where the absence of rigorous validation compounds existing uncertainties about data quality and representativeness.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec21\" class=\"Section2\"\u003e \u003ch2\u003e4.8 Explainability, Ethics, and Governance\u003c/h2\u003e \u003cp\u003eThe prior literature increasingly recognises that predictive performance alone is insufficient for responsible public health deployment of ML models. Explainability frameworks, including SHAP [\u003cspan citationid=\"CR71\" class=\"CitationRef\"\u003e71\u003c/span\u003e] and LIME [\u003cspan citationid=\"CR85\" class=\"CitationRef\"\u003e85\u003c/span\u003e], have been proposed as mechanisms to enhance clinician trust and support accountability, though prior reviews find that their application in health settings is largely post-hoc and rarely evaluated for practical utility within decision workflows [\u003cspan citationid=\"CR72\" class=\"CitationRef\"\u003e72\u003c/span\u003e, \u003cspan citationid=\"CR94\" class=\"CitationRef\"\u003e94\u003c/span\u003e]. Ethical and governance frameworks, including WHO guidance [\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e] and the normative work of Vayena et al. [\u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e25\u003c/span\u003e], identify algorithmic bias, data sovereignty, and transparency as priorities \u0026mdash; particularly for LMIC contexts where regulatory oversight is limited. A consistent finding across this literature is that governance considerations are discussed at a conceptual level but rarely operationalised within applied modelling studies [\u003cspan citationid=\"CR66\" class=\"CitationRef\"\u003e66\u003c/span\u003e, \u003cspan citationid=\"CR94\" class=\"CitationRef\"\u003e94\u003c/span\u003e].\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec22\" class=\"Section2\"\u003e \u003ch2\u003e4.9 Synthesis of Key Gaps\u003c/h2\u003e \u003cp\u003eAcross the four research domains reviewed above, four persistent gaps in the prior literature motivate this systematic review. First, evaluation practices emphasise discrimination at the expense of calibration and external validation, limiting the operational credibility of published models [\u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e17\u003c/span\u003e, \u003cspan citationid=\"CR61\" class=\"CitationRef\"\u003e61\u003c/span\u003e, \u003cspan citationid=\"CR63\" class=\"CitationRef\"\u003e63\u003c/span\u003e]. Second, LMIC settings are underrepresented relative to their disease burden, and non-communicable disease applications are especially scarce despite the growing NCD burden in these settings [\u003cspan citationid=\"CR64\" class=\"CitationRef\"\u003e64\u003c/span\u003e, \u003cspan citationid=\"CR66\" class=\"CitationRef\"\u003e66\u003c/span\u003e]. Third, explainability and governance are treated as conceptual addenda rather than integral evaluation criteria, creating a gap between normative guidance and applied practice [\u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e25\u003c/span\u003e, \u003cspan citationid=\"CR94\" class=\"CitationRef\"\u003e94\u003c/span\u003e]. Fourth, preprocessing decisions are rarely subject to systematic sensitivity analysis, making it difficult to distinguish the contribution of algorithm choice from that of data preparation [\u003cspan additionalcitationids=\"CR96\" citationid=\"CR95\" class=\"CitationRef\"\u003e95\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR97\" class=\"CitationRef\"\u003e97\u003c/span\u003e]. These gaps collectively define the analytical focus of Sections \u003cspan refid=\"Sec23\" class=\"InternalRef\"\u003e5\u003c/span\u003e and \u003cspan refid=\"Sec31\" class=\"InternalRef\"\u003e6\u003c/span\u003e.\u003c/p\u003e \u003c/div\u003e"},{"header":"5. Results","content":"\u003cdiv id=\"Sec24\" class=\"Section2\"\u003e \u003ch2\u003e5.1 Distribution of Included Studies\u003c/h2\u003e \u003cp\u003eAll 64 included studies were published between 2018 and 2025. Publication volume increased substantially from 2020 onwards, reflecting intensified research activity coinciding with expanded digital health adoption, greater availability of routine electronic health data, and heightened interest in AI-driven public health analytics during and after the COVID-19 pandemic. Figure\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003e shows the annual distribution of included studies.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec25\" class=\"Section2\"\u003e \u003ch2\u003e5.2 Study Characteristics\u003c/h2\u003e \u003cp\u003eGeneral characteristics of the included studies are summarised in Table\u0026nbsp;\u003cspan refid=\"Tab4\" class=\"InternalRef\"\u003e4\u003c/span\u003e. Most studies focused on predictive modelling, with mortality prediction, disease progression, and treatment outcomes being the most frequently addressed. Studies conducted in LMICs primarily addressed HIV/TB and infectious diseases (45.3%), non-communicable diseases (28.1%), maternal and child health (14.1%), and health system utilisation (12.5%). These prediction targets broadly align with the leading causes of premature death in sub-Saharan Africa, confirming the disease relevance of this literature [\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e, \u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e, \u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e]. However, non-communicable diseases, including cardiovascular disease, diabetes, and hypertension, remain underrepresented despite their rapidly growing burden in LMICs, a gap that future research must address.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab4\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 4\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eSummary characteristics of the included studies\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"4\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eCharacteristic\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eCategory\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eN Studies\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003e%\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\" morerows=\"2\" rowspan=\"3\"\u003e \u003cp\u003eStudy setting\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eLow- and middle-income countries (LMICs)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e12\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e18.8\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eHigh-income countries (comparative baseline)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e44\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e68.8\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eMixed / multi-setting studies (Both)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e8\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e12.5\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003eStudy design\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eRetrospective observational\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e52\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e81.3\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eProspective / near real-time\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e12\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e18.7\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003ePrimary data source\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eElectronic health records (EHR)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e41\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e64.1\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eHealth information systems (HIS)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e23\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e35.9\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\" morerows=\"3\" rowspan=\"4\"\u003e \u003cp\u003eApplication domain\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eHIV / TB / infectious diseases\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e29\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e45.3\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eNon-communicable diseases\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e18\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e28.1\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eMaternal and child health\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e9\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e14.1\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eHealth system utilization / outcomes\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e8\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e12.5\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\" morerows=\"2\" rowspan=\"3\"\u003e \u003cp\u003ePrediction target\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eMortality\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e22\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e34.4\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eDisease progression / outcomes\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e26\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e40.6\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eTreatment response / adherence\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e16\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e25.0\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003ctfoot\u003e \u003ctr\u003e\u003ctd colspan=\"4\"\u003eNote: Health Information Systems (HIS): standardised institutional systems for ongoing collection, analysis, and use of health data generated through routine service delivery such as aggregate reporting platforms (e.g. DHIS2, DATIM).\u003c/td\u003e\u003c/tr\u003e \u003c/tfoot\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec26\" class=\"Section2\"\u003e \u003ch2\u003e5.3 Data Sources and Settings\u003c/h2\u003e \u003cp\u003ePubMed and Scopus contributed the largest proportion of records (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003e), reflecting strong coverage of biomedical and interdisciplinary AI-health research. IEEE Xplore and the ACM Digital Library provided substantial representation of computer science and engineering-focused studies, while Web of Science contributed broad cross-disciplinary coverage. The combination of these databases ensured balanced representation of both methodological AI research and applied public health studies.\u003c/p\u003e \u003cp\u003eLMIC-based studies predominantly relied on national or facility-level EHR systems and routinely collected clinical data, typically characterised by limited longitudinal depth and variable completeness. Data completeness in LMIC studies frequently fell below 80% for key clinical variables, contrasting with the more complete and standardised datasets common in high-income settings.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec27\" class=\"Section2\"\u003e \u003ch2\u003e5.4 AI/ML Methods Applied\u003c/h2\u003e \u003cp\u003eThe distribution of ML methods across the 64 included studies is presented in Table\u0026nbsp;\u003cspan refid=\"Tab5\" class=\"InternalRef\"\u003e5\u003c/span\u003e. Deep learning architectures were the most commonly applied category overall (n\u0026thinsp;=\u0026thinsp;25, 39.1%), driven predominantly by high-income country studies with access to large-scale curated EHR datasets; this category includes recurrent neural networks, LSTM models, and convolutional neural networks. Traditional ML methods \u0026mdash; principally logistic regression and generalised linear models \u0026mdash; were applied in 28.1% of studies (n\u0026thinsp;=\u0026thinsp;18), valued for their interpretability, low computational requirements, and reliable calibration under data constraints. Ensemble methods, including random forests and gradient boosting variants (XGBoost, LightGBM), accounted for 20.3% (n\u0026thinsp;=\u0026thinsp;13). Transformer-based models such as Med-BERT were represented in 6.2% of studies (n\u0026thinsp;=\u0026thinsp;4), all conducted in high-income settings. Hybrid and emerging approaches, including federated learning and AutoML, appeared in 6.3% (n\u0026thinsp;=\u0026thinsp;4). The pattern diverges markedly by setting \u0026mdash; among LMIC-focused studies, traditional ML and ensemble methods were each the dominant approach (33.3% each), reflecting the practical alignment between model choice and the data quality, interpretability, and infrastructure constraints of resource-limited health systems.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab5\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 5\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eComparison of ML model categories reported across included studies.\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"5\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eModel Category\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eRepresentative Algorithms\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003ePrimary Application Areas\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eN Studies\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003e%\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eTraditional ML\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eLogistic Regression, Decision Trees, k-NN, Na\u0026iuml;ve Bayes\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eRisk prediction, classification, baseline comparisons\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e18\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e28.1\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eEnsemble Learning\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eRandom Forest, Gradient Boosting, XGBoost, AdaBoost\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eDisease prediction, feature importance, performance optimization\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e13\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e20.3\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eDeep Learning\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eANN, CNN, RNN, LSTM\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eTime-series prediction, longitudinal EHR analysis\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e25\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e39.1\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eTransformer-based\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eBERT, Med-BERT, GPT-based models\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eSequential EHR modelling, representation learning\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e4\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e6.2\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eHybrid/ Emerging\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eRF\u0026thinsp;+\u0026thinsp;XGBoost, CNN\u0026thinsp;+\u0026thinsp;LSTM, AutoML, Federated Learning\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eComplex data fusion, privacy-preserving ML, model automation\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e4\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e6.3\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec28\" class=\"Section2\"\u003e \u003ch2\u003e5.5 Data Preprocessing and Feature Engineering Practices\u003c/h2\u003e \u003cp\u003eData preprocessing practices varied across studies but followed common patterns. Most studies reported basic imputation techniques, typically mean or median imputation, to address missing data [\u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e, \u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e, \u003cspan citationid=\"CR76\" class=\"CitationRef\"\u003e76\u003c/span\u003e]. Multiple imputation was reported in a smaller subset, predominantly in high-income settings. Temporal aggregation of clinical events was frequently used to convert longitudinal records into fixed-length feature vectors [\u003cspan citationid=\"CR67\" class=\"CitationRef\"\u003e67\u003c/span\u003e, \u003cspan citationid=\"CR68\" class=\"CitationRef\"\u003e68\u003c/span\u003e, \u003cspan citationid=\"CR75\" class=\"CitationRef\"\u003e75\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eFeature engineering was often based on clinically defined variables rather than automated methods [\u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e, \u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e, \u003cspan citationid=\"CR76\" class=\"CitationRef\"\u003e76\u003c/span\u003e]. Automated representation learning appeared in deep learning studies [\u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e, \u003cspan citationid=\"CR69\" class=\"CitationRef\"\u003e69\u003c/span\u003e] but was rarely seen in LMIC research due to data sparsity and computational constraints [\u003cspan citationid=\"CR74\" class=\"CitationRef\"\u003e74\u003c/span\u003e, \u003cspan citationid=\"CR79\" class=\"CitationRef\"\u003e79\u003c/span\u003e]. Few studies reported systematic evaluation of alternative preprocessing strategies or sensitivity analyses for missing data handling [\u003cspan additionalcitationids=\"CR96\" citationid=\"CR95\" class=\"CitationRef\"\u003e95\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR97\" class=\"CitationRef\"\u003e97\u003c/span\u003e]. A summary of preprocessing practices is provided in Table\u0026nbsp;\u003cspan refid=\"Tab6\" class=\"InternalRef\"\u003e6\u003c/span\u003e.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab6\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 6\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eData preprocessing, explainability, and contextual focus across key studies\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"7\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c7\" colnum=\"7\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eAuthors\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003ePreprocessing\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eFeature Engineering\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eExplainability\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003eExternal Validation\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c6\"\u003e \u003cp\u003eLMIC\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c7\"\u003e \u003cp\u003eIdentified Limitation\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eRajkomar et al.\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eYes (imputation, normalization)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eAutomated\u0026thinsp;+\u0026thinsp;clinical\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eNo\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eNo\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003eNo\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003eLimited applicability to low-resource settings\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eBeam \u0026amp; Kohane\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eConceptual discussion\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eNot applicable\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eNo\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eNo\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003eNo\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003eNo empirical validation\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eShickel et al.\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eYes (temporal aggregation)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eDeep feature learning\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eLimited\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eNo\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003eNo\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003eFocus on algorithms over deployment\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eChristodoulou et al.\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eYes\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003ePredefined clinical\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eYes (coefficients)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eNo\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003eNo\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003eNo public health context\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eMutai et al. [\u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e]\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eYes (missing data handling)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eManual clinical\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eNo\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eNo\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003eYes\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003eLimited calibration assessment\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eMusukwa et al. [\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e]\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eYes\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eRoutine EHR variables\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eNo\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eNo\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003eYes\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003eNo external validation\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eGoldstein et al.\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eYes\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eClinical risk factors\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eYes\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eNo\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003eNo\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003eLimited operational guidance\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eCollins et al.\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eNot applicable\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eNot applicable\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eYes (methodological)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eYes\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003eNo\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003eNot ML-specific\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eVan Calster et al.\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eNot applicable\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eNot applicable\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eYes (calibration)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eYes\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003eNo\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003eLacks applied case studies\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eLundberg \u0026amp; Lee\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eNot applicable\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eNot applicable\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eYes (SHAP)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eNo\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003eNo\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003eNo health system evaluation\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eVayena et al.\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eNot applicable\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eNot applicable\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eConceptual ethics\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eNo\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003eYes\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003eNo technical implementation\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eWHO\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eNot applicable\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eNot applicable\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eGovernance\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eNo\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003eYes\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003eNo empirical ML assessment\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eRasmy et al.\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eYes (tokenization, embeddings)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eAutomated representation\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eLimited\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eNo\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003eNo\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003eHigh computational requirements\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eRieke et al.\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eYes\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eFederated feature spaces\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eLimited\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eYes\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003ePotential\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003eLimited LMIC deployment\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eHasson et al.\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eYes (data quality focus)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eNot predictive\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eNo\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eNo\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003eYes\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003eNo ML modelling\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec29\" class=\"Section2\"\u003e \u003ch2\u003e5.6 Model Evaluation and Validation Practices\u003c/h2\u003e \u003cp\u003eEvaluation metrics reported across the 64 included studies are presented in Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003e. Discrimination-based metrics predominated, particularly AUROC, with sensitivity and specificity also commonly reported. Calibration metrics \u0026mdash; essential for public health decisions where predicted probabilities determine triage thresholds \u0026mdash; were reported in only 4 studies (6.2%) [\u003cspan citationid=\"CR61\" class=\"CitationRef\"\u003e61\u003c/span\u003e, \u003cspan citationid=\"CR63\" class=\"CitationRef\"\u003e63\u003c/span\u003e]. External validation on an independent dataset was rare across all settings \u0026mdash; only 5 of 64 studies (7.8%) reported external validation [\u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e19\u003c/span\u003e, \u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e26\u003c/span\u003e, \u003cspan citationid=\"CR30\" class=\"CitationRef\"\u003e30\u003c/span\u003e, \u003cspan citationid=\"CR42\" class=\"CitationRef\"\u003e42\u003c/span\u003e, \u003cspan citationid=\"CR70\" class=\"CitationRef\"\u003e70\u003c/span\u003e], comprising 3 high-income studies (6.8% of 44), 1 LMIC study (8.3% of 12), and 1 mixed-setting study [\u003cspan citationid=\"CR61\" class=\"CitationRef\"\u003e61\u003c/span\u003e, \u003cspan citationid=\"CR63\" class=\"CitationRef\"\u003e63\u003c/span\u003e, \u003cspan citationid=\"CR76\" class=\"CitationRef\"\u003e76\u003c/span\u003e]. The near-universal reliance on internal validation through cross-validation or train-test splits [\u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e17\u003c/span\u003e, \u003cspan citationid=\"CR63\" class=\"CitationRef\"\u003e63\u003c/span\u003e], regardless of setting, substantially limits confidence in the generalisability of reported model performance across contexts [\u003cspan citationid=\"CR63\" class=\"CitationRef\"\u003e63\u003c/span\u003e, \u003cspan citationid=\"CR66\" class=\"CitationRef\"\u003e66\u003c/span\u003e, \u003cspan citationid=\"CR88\" class=\"CitationRef\"\u003e88\u003c/span\u003e].\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec30\" class=\"Section2\"\u003e \u003ch2\u003e5.7 Explainability, Ethics, and Governance\u003c/h2\u003e \u003cp\u003eExplainability was reported in 17 of 64 studies (26.6%), most commonly through SHAP-based feature importance rankings [\u003cspan citationid=\"CR71\" class=\"CitationRef\"\u003e71\u003c/span\u003e] or post-hoc attribution techniques [\u003cspan citationid=\"CR72\" class=\"CitationRef\"\u003e72\u003c/span\u003e]. Reporting was heavily concentrated in high-income settings: 16 of 44 HIC studies (36.4%) included an explainability assessment, compared with only 1 of 12 LMIC studies (8.3%) [\u003cspan citationid=\"CR94\" class=\"CitationRef\"\u003e94\u003c/span\u003e]. This disparity is substantively important \u0026mdash; trust-building mechanisms are least present in settings where institutional accountability for algorithmic decisions is most critical and where poorly understood model outputs carry the greatest risk of eroding clinician and community trust. Ethical and governance considerations, including data privacy, algorithmic bias, and fairness [\u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e25\u003c/span\u003e, \u003cspan citationid=\"CR77\" class=\"CitationRef\"\u003e77\u003c/span\u003e, \u003cspan citationid=\"CR78\" class=\"CitationRef\"\u003e78\u003c/span\u003e], were discussed in several studies but rarely operationalised within model development or evaluation workflows [\u003cspan citationid=\"CR66\" class=\"CitationRef\"\u003e66\u003c/span\u003e, \u003cspan citationid=\"CR94\" class=\"CitationRef\"\u003e94\u003c/span\u003e]. Governance-focused frameworks from WHO [\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e] and Vayena et al. [\u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e25\u003c/span\u003e] provide high-level normative guidance, but a persistent gap remains between policy intent and applied ML practice.\u003c/p\u003e \u003c/div\u003e"},{"header":"6. Discussion","content":"\u003cdiv id=\"Sec32\" class=\"Section2\"\u003e \u003ch2\u003e6.1 Interpretation of Key Findings\u003c/h2\u003e \u003cp\u003eRQ1: AI/ML Algorithms. Deep learning was the most commonly applied category overall (39.1%, n\u0026thinsp;=\u0026thinsp;25), but this pattern is driven by high-income country studies with large curated datasets; it does not reflect the LMIC experience. Among LMIC-focused studies, traditional ML and ensemble methods were each the dominant approach (33.3% each), representing pragmatic alignment between model choice and the data quality, interpretability, and infrastructure constraints of resource-limited health systems. This finding is consistent with evidence that deep learning offers modest or no performance advantage over logistic regression under realistic data conditions [\u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e17\u003c/span\u003e, \u003cspan citationid=\"CR71\" class=\"CitationRef\"\u003e71\u003c/span\u003e], implying that algorithmic sophistication alone does not guarantee superior public health utility where data are sparse or incomplete.\u003c/p\u003e \u003cp\u003eRQ2: Data Preprocessing and Feature Engineering. The prevalence of basic imputation and manual feature construction in LMIC studies reflects rational adaptation to data sparsity and the need for clinically defensible workflows [\u003cspan citationid=\"CR76\" class=\"CitationRef\"\u003e76\u003c/span\u003e, \u003cspan citationid=\"CR79\" class=\"CitationRef\"\u003e79\u003c/span\u003e]. The critical gap is not the method chosen but the near-universal absence of sensitivity analyses establishing how preprocessing choices propagate to downstream predictions. Without such analyses, reported model performance cannot be clearly attributed to architecture rather than data preparation decisions [\u003cspan additionalcitationids=\"CR96\" citationid=\"CR95\" class=\"CitationRef\"\u003e95\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR97\" class=\"CitationRef\"\u003e97\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eRQ3: Evaluation, Calibration, and Explainability. The concentration of evaluation practices around discrimination metrics, with calibration reported in only 6.2% of studies and external validation in only 7.8%, represents a substantive deployment risk. A model with high AUROC but poor calibration can systematically misallocate scarce resources by assigning incorrect probabilities to population subgroups [\u003cspan citationid=\"CR61\" class=\"CitationRef\"\u003e61\u003c/span\u003e, \u003cspan citationid=\"CR78\" class=\"CitationRef\"\u003e78\u003c/span\u003e]. Explainability reporting was low overall but especially so in LMIC settings \u0026mdash; only 1 of 12 LMIC studies (8.3%) included any explainability assessment, compared with 16 of 44 HIC studies (36.4%). This asymmetry means that trust-building and accountability mechanisms are least present precisely where they are most needed \u0026mdash; in settings where institutional oversight of algorithmic outputs is weakest and where the consequences of unexplained errors fall disproportionately on underserved populations.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec33\" class=\"Section2\"\u003e \u003ch2\u003e6.2 Comparison with Previous Reviews\u003c/h2\u003e \u003cp\u003eThis review advances the literature by explicitly differentiating from prior major systematic reviews. Influential works such as Rajkomar et al. [\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e], Beam and Kohane [\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e], and Shickel et al. [\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e] predominantly emphasise algorithmic development using large-scale EHR datasets, with limited insight into deployment feasibility or governance in resource-constrained systems. Comparative analyses such as Christodoulou et al. [\u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e17\u003c/span\u003e] and Goldstein et al. [\u003cspan citationid=\"CR34\" class=\"CitationRef\"\u003e34\u003c/span\u003e] offer valuable methodological rigour but remain largely detached from public health use cases and LMIC realities.\u003c/p\u003e \u003cp\u003eLMIC-focused empirical studies, including Mutai et al. [\u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e], Musukwa et al. [\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e], and Hasson et al. [\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e], demonstrate feasibility of applying ML to routine health data in constrained settings but consistently lack external validation, calibration assessment, and explainability evaluation. Ethical and governance-focused works (Vayena et al. [\u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e25\u003c/span\u003e] and WHO [\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e]) provide essential normative frameworks without operationalising these principles within applied ML pipelines. Advanced methodological contributions such as Med-BERT [\u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e] and federated learning [\u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e26\u003c/span\u003e] illustrate future potential but remain largely inaccessible to most LMIC health systems.\u003c/p\u003e \u003cp\u003eThis review advances the literature in three ways:\u003c/p\u003e \u003cp\u003e \u003col\u003e \u003cspan\u003e \u003cli\u003e \u003cp\u003eCross-layer synthesis reveals that weaknesses in preprocessing and evaluation frequently outweigh algorithmic gains;\u003c/p\u003e \u003c/li\u003e \u003c/span\u003e \u003cspan\u003e \u003cli\u003e \u003cp\u003ePublic health-specific framing foregrounds calibration, external validation, and interpretability as non-negotiable requirements for population-level decision-making;\u003c/p\u003e \u003c/li\u003e \u003c/span\u003e \u003cspan\u003e \u003cli\u003e \u003cp\u003eAnd context-aware methodological guidance demonstrates that model selection must be driven by data quality, governance capacity, and decision context, not algorithmic novelty.\u003c/p\u003e \u003c/li\u003e \u003c/span\u003e \u003c/ol\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec34\" class=\"Section2\"\u003e \u003ch2\u003e6.3 Implications for Practice and Policy\u003c/h2\u003e \u003cp\u003eEffective integration of AI into public health practice depends less on model complexity and more on data governance, evaluation rigour, and institutional capacity. Interpretable, well-calibrated models that can be validated across settings are more likely to support equitable, actionable public health decisions than highly complex models optimised solely for discrimination [\u003cspan citationid=\"CR61\" class=\"CitationRef\"\u003e61\u003c/span\u003e, \u003cspan citationid=\"CR65\" class=\"CitationRef\"\u003e65\u003c/span\u003e, \u003cspan citationid=\"CR103\" class=\"CitationRef\"\u003e103\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eFor policymakers, these findings underscore the importance of embedding AI within broader digital health strategies, including investments in data quality, workforce development, and regulatory oversight [\u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e22\u003c/span\u003e, \u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e24\u003c/span\u003e]. Without such alignment, AI-driven predictive analytics risk remaining experimental tools rather than sustainable components of public health systems [105]. The growing NCD burden in LMICs, currently underrepresented in ML research, represents an urgent area for investment in both data infrastructure and applied research.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec35\" class=\"Section2\"\u003e \u003ch2\u003e6.4 Limitations\u003c/h2\u003e \u003cp\u003eSeveral limitations should be acknowledged. The review was restricted to English-language publications from 2018\u0026ndash;2025, which may exclude relevant earlier or non-English studies. Quality assessment relied on reported methodological details, which varied in completeness and transparency. While this review identifies methodological gaps, it does not empirically evaluate the causal impact of specific ML practices on public health outcomes. The inclusion of high-income country studies, while methodologically valuable as a comparative baseline, means the evidence base is not exclusively LMIC-derived. No quantitative meta-analysis was conducted due to heterogeneity in outcomes, data sources, and evaluation metrics.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec36\" class=\"Section2\"\u003e \u003ch2\u003e6.5 Priorities for Future Research\u003c/h2\u003e \u003cp\u003eBased on this synthesis, four priorities emerge for future LMIC-focused research. First, model selection should default to robustness over complexity \u0026mdash; linear and ensemble models are appropriate unless data volume and quality clearly justify deep learning. Second, preprocessing decisions must be made explicit and evaluated through sensitivity analyses demonstrating how they affect downstream predictions. Third, calibration and external validation must be treated as core requirements, not optional additions, because probability reliability and cross-context generalisability are prerequisites for responsible public health deployment. Fourth, explainability and governance must move from conceptual acknowledgment to operational integration, with XAI methods evaluated for utility to health system stakeholders. An expanded research agenda addressing cardiovascular disease, diabetes, and hypertension using routine LMIC EHR data is urgently needed.\u003c/p\u003e \u003c/div\u003e"},{"header":"7. Conclusion","content":"\u003cp\u003eThis systematic review synthesized evidence from 64 studies on ML-based predictive analytics using EHRs in public health and epidemiology (2018\u0026ndash;2025), with LMIC settings as the primary focus. The central finding is a structural imbalance \u0026mdash; methodological investment in algorithmic development has substantially outpaced investment in evaluation rigour, explainability, and governance, and LMICs, which bear the greatest burden of preventable disease, are precisely the settings where this imbalance is most acute.\u003c/p\u003e \u003cp\u003eEnsemble methods and logistic regression are appropriately dominant in LMIC contexts, reflecting practical alignment with data quality and infrastructure constraints. However, their deployment value is limited by pervasive reliance on internal validation, underdeveloped calibration reporting, and minimal explainability integration. Ethical and governance considerations are increasingly acknowledged but rarely operationalised within applied modelling workflows.\u003c/p\u003e \u003cp\u003eAddressing these gaps requires a shift in research priorities \u0026mdash; from algorithmic novelty toward contextual validity; from discrimination metrics toward calibration and decision impact; and from individual model outputs toward reproducible, externally validated, governance-ready systems. As LMICs expand digital health infrastructure and face both persistent infectious disease burdens and a growing NCD challenge, the opportunity to develop AI tools that are genuinely reliable, equitable, and actionable is significant, but realising it demands the same rigour in evaluation and deployment as in model development.\u003c/p\u003e"},{"header":"Declarations","content":"\u003cp\u003e \u003cstrong\u003eEthics approval\u003c/strong\u003e \u003cp\u003eThis study is a systematic review of published literature and did not involve the collection of primary data or direct interaction with human participants. Ethical approval for the broader research programme was granted by the ZCAS University Ethics Review Committee (Approval No. 2025/11/001). No additional ethical approval was required for this review.\u003c/p\u003e \u003c/p\u003e\u003cp\u003e \u003cstrong\u003eConsent to Participate\u003c/strong\u003e \u003cp\u003eNot applicable.\u003c/p\u003e \u003c/p\u003e\u003cp\u003e \u003cstrong\u003eConsent to publish\u003c/strong\u003e \u003cp\u003eNot applicable.\u003c/p\u003e \u003c/p\u003e\u003cp\u003e \u003cstrong\u003eCompeting interests\u003c/strong\u003e \u003cp\u003eAll authors declare no competing interests.\u003c/p\u003e \u003c/p\u003e\u003ch2\u003eFunding\u003c/h2\u003e\u003cp\u003eThis research received no funding.\u003c/p\u003e\u003ch2\u003eAuthor contributions\u003c/h2\u003e\u003cp\u003eJoe Phiri conceived and designed the study, led the literature review, data extraction and synthesis, and drafted the manuscript. Aaron Zimba contributed to conceptual development, methodological refinement, and critical review of the analysis and findings. Chiyaba Njovu provided public health and LMIC contextual interpretation and critically reviewed the manuscript. Mwansa Lumpa contributed to methodological guidance, interpretation of findings, and critical review of the manuscript. All authors reviewed, edited, and approved the final manuscript and take responsibility for its content.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\u003cli\u003e\u003cspan\u003eRajkomar A, Dean J, Kohane I (2019) Machine learning in medicine. N Engl J Med 380(14):1347\u0026ndash;1358\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eBeam AL, Kohane IS (2018) Big data and machine learning in health care. JAMA 319(13):1317\u0026ndash;1318\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eWiens J, Shenoy ES (2018) Machine learning for healthcare: On the verge of a major shift in healthcare epidemiology. Clin Infect Dis 66(1):149\u0026ndash;153\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eShickel B, Tighe PJ, Bihorac A, Rashidi P, Deep EHR (2021) A survey of recent advances in deep learning techniques for electronic health record analysis. J Biomed Inf 122:103887\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eXie X, Zhang J, Chen M (2022) Predicting in-hospital mortality using machine learning models: A systematic review. BMJ Health Care Inf 29(1):e100552\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eViboud C, Vespignani A (2019) The future of influenza forecasts. Proc Natl Acad Sci USA 116(8):2802\u0026ndash;2804\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eSalath\u0026eacute; M, Bengtsson L, Bodnar TJ et al (2012) Digital epidemiology. PLoS Comput Biol 8(7):e1002616. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1371/journal.pcbi.1002616\u003c/span\u003e\u003cspan address=\"10.1371/journal.pcbi.1002616\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eTopol EJ (2019) High-performance medicine: The convergence of human and artificial intelligence. Nat Med 25(1):44\u0026ndash;56\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eHasson R, Smith KS, Johnson L (2020) Electronic health record infrastructure in Africa: Current state and future directions. BMJ Glob Health 5(4):e002734\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eChibanda D et al (2021) Data quality challenges in African health information systems and implications for analytics. BMJ Glob Health 6:e004896\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eWorld Health Organization (2021) Ethics and Governance of Artificial Intelligence for Health. WHO, Geneva\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eLazer D, Pentland AS, Watts DJ et al (2020) Computational social science: Obstacles and opportunities. Science 369(6507):1060\u0026ndash;1062\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eMutai BK, Njoroge SW et al (2020) Predicting HIV treatment outcomes using machine learning in Kenya. BMC Med Inf Decis Mak 20:123\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eMusukwa MT et al (2021) Machine learning for antiretroviral therapy outcome prediction in Zambia. PLoS ONE 16(10):e0259873\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eRajkomar A, Oren E, Chen K et al (2018) Scalable and accurate deep learning with electronic health records. NPJ Digit Med 1:18\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eRasmy L, Wu Y, Wang N et al (2023) Med-BERT: Pretrained contextualized embeddings on large-scale structured electronic health records. J Am Med Inf Assoc 30(2):199\u0026ndash;210\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eChristodoulou E, Ma J, Collins GS et al (2019) A systematic review shows no performance benefit of machine learning over logistic regression for clinical prediction models. J Clin Epidemiol 110:12\u0026ndash;22\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eShah NH, Milstein A, Bagley SC (2019) Making machine learning models clinically useful. JAMA 322(14):1351\u0026ndash;1352\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eSteyerberg EW, Vickers AJ, Cook NR et al (2019) Assessing the performance of prediction models: A framework for traditional and novel measures. Stat Med 38(14):2503\u0026ndash;2515\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eObermeyer Z, Powers B, Vogeli C, Mullainathan S (2019) Dissecting racial bias in health algorithms. Science 366(6464):447\u0026ndash;453\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eAdem S, Yusuf A, Mwangi J (2022) Digital health infrastructure in sub-Saharan Africa: Opportunities and constraints. Health Policy Plan 37(3):372\u0026ndash;384\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eWorld Health Organization Regional Office for Africa (2020) Digital Health Strategy for the WHO African Region 2020\u0026ndash;2030. WHO AFRO, Brazzaville\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eChowdhury MEH, Reza T, Islam S (2021) Mobile health and edge AI for low-resource environments. IEEE Rev Biomed Eng 14:30\u0026ndash;54\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eWorld Health Organization (2023) AI for Health: Capacity Building in Africa. WHO, Geneva\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eVayena E, Blasimme A, Cohen IG (2018) Machine learning in healthcare: Ethical challenges. PLoS Med 15(11):e1002689\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eRieke N, Hancox J, Li W et al (2020) The future of digital health with federated learning. NPJ Digit Med 3:119\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eSendak MP, D'Arcy J, Kashyap S et al (2020) A path for translation of machine learning products into healthcare delivery. EMJ Innov 4(1):73\u0026ndash;81\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eTonekaboni S, Joshi A, McCradden MD, Goldenberg A (2020) What clinicians want from explainable artificial intelligence. NPJ Digit Med 3:74\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eKairouz P, McMahan HB, Avent B et al (2021) Advances and open problems in federated learning. Found Trends Mach Learn 14(1\u0026ndash;2):1\u0026ndash;210\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eGoncalves A, Ray P, Soper B, Stevens J, Coyle L, Sales AP (2020) Generation and evaluation of synthetic patient data. J Am Med Inf Assoc 27(6):884\u0026ndash;893\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eEsteva A, Robicquet A, Ramsundar B et al (2019) A guide to deep learning in healthcare. Nat Med 25(1):24\u0026ndash;29. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1038/s41591-018-0316-z\u003c/span\u003e\u003cspan address=\"10.1038/s41591-018-0316-z\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eVapnik VN (2013) The Nature of Statistical Learning Theory. Springer, New York\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eHastie T, Tibshirani R, Friedman J (2009) The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd edn. Springer, New York\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eGoldstein BA, Navar AM, Pencina MJ, Ioannidis JPA (2018) Opportunities and challenges in developing risk prediction models with electronic health records data. J Am Med Inf Assoc 24(1):198\u0026ndash;208\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eWeiskopf NG, Weng C (2013) Methods and dimensions of electronic health record data quality assessment: Enabling reuse for clinical research. J Am Med Inf Assoc 20(1):144\u0026ndash;151\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eHersh WR, Weiner MG, Embi PJ et al (2013) Caveats for the use of operational electronic health record data in comparative effectiveness research. Med Care 51(8 Suppl 3):S30\u0026ndash;S37\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eJames G, Witten D, Hastie T, Tibshirani R (2021) An Introduction to Statistical Learning, 2nd edn. Springer\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eSteyerberg EW (2019) Clinical Prediction Models, 2nd edn. Springer, Cham\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eVan Smeden M, Moons KGM, de Groot JAH et al (2019) Sample size for binary logistic prediction models: Beyond events per variable criteria. Stat Methods Med Res 28(8):2455\u0026ndash;2474\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eRiley RD, Ensor J, Snell KIE et al (2020) Calculating the sample size required for developing a clinical prediction model. BMJ 368:m441\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eBreiman L (2001) Random forests. Mach Learn 45:5\u0026ndash;32\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eVagliano I, Chesnaye NC, Leopold JH et al (2022) Comparative analysis of explainable machine learning prediction models for hospital mortality. BMC Med Inf Decis Mak 22(1):53\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eChen T, Guestrin C, XGBoost (2016) A scalable tree boosting system. In: Proc KDD ACM; 2016\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eLeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521:436\u0026ndash;444\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eHochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735\u0026ndash;1780\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eDevlin J, Chang MW, Lee K, Toutanova K (2019) BERT: Pre-training of deep bidirectional transformers for language understanding. In: NAACL-HLT\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eHe H, Garcia EA (2009) Learning from imbalanced data. IEEE Trans Knowl Data Eng 21(9):1263\u0026ndash;1284\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eBrier GW (1950) Verification of forecasts expressed in terms of probability. Mon Weather Rev 78:1\u0026ndash;3\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eSaito T, Rehmsmeier M (2015) The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets. PLoS ONE 10(3):e0118432\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003ePage MJ, McKenzie JE, Bossuyt PM et al (2021) The PRISMA 2020 statement: An updated guideline for reporting systematic reviews. BMJ 372:n71\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eMoons KGM, Wolff RF, Riley RD et al (2019) PROBAST: A tool to assess risk of bias and applicability of prediction model studies. Ann Intern Med 170(1):51\u0026ndash;58\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eWolff RF, Moons KGM, Riley RD et al (2019) Explanation and elaboration. Ann Intern Med 170(1):W1\u0026ndash;W33\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eCollins GS, Reitsma JB, Altman DG, Moons KGM (2015) Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD). BMJ 350:g7594\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eCollins GS, Moons KGM, Dhiman P et al (2024) TRIPOD\u0026thinsp;+\u0026thinsp;AI statement: Updated guidance for reporting clinical prediction models that use regression or machine learning methods. BMJ 385:e078378. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1136/bmj-2023-078378\u003c/span\u003e\u003cspan address=\"10.1136/bmj-2023-078378\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eWynants L, Van Calster B, Collins GS et al (2020) Prediction models for diagnosis and prognosis of COVID-19: Systematic review and critical appraisal. BMJ 369:m1328\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eHernandez-Boussard T, Bozkurt S, Ioannidis JPA, Shah NH (2020) MINIMAR (MINimum Information for Medical AI Reporting). NPJ Digit Med 3:105\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eSounderajah V, Ashrafian H, Rose S et al (2021) QUADAS-AI: A quality assessment tool for AI-centered diagnostic test accuracy studies. Nat Med 27:1663\u0026ndash;1665\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eLiu X, Cruz Rivera S, Moher D et al (2020) CONSORT-AI extension: Reporting guidelines for clinical trial reports involving artificial intelligence. Nat Med 26:1364\u0026ndash;1374\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eRivera SC, Liu X, Chan AW et al (2020) SPIRIT-AI extension: Guidance for clinical trial protocols involving artificial intelligence. Nat Med 26:1351\u0026ndash;1363\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eCollins GS, Moons KGM (2019) Reporting of artificial intelligence prediction models. Lancet 393(10181):1577\u0026ndash;1579\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eVan Calster B, McLernon DJ, van Smeden M, Wynants L, Steyerberg EW (2019) Calibration: The Achilles heel of predictive analytics. BMC Med 17:230\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eRiley RD, Snell KIE, Ensor J et al (2019) Minimum sample size for developing a multivariable prediction model: PART II. Stat Med 38(7):1276\u0026ndash;1296\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eCollins GS, Ogundimu EO, Altman DG (2016) External validation of clinical prediction models using big datasets from e-health records or IPD meta-analysis. BMJ 353:i3140\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eFraser HSF, Biondich P, Moodley D, Choi S, Mamlin BW, Szolovits P (2017) Implementing electronic health records in resource-limited settings. Int J Med Inf 97:268\u0026ndash;276\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eWiens J, Saria S, Sendak M et al (2019) Do no harm: A roadmap for responsible machine learning for health care. Nat Med 25:1337\u0026ndash;1360\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eKelly CJ, Karthikesalingam A, Suleyman M, Corrado G, King D (2019) Key challenges for delivering clinical impact with artificial intelligence. BMC Med 17:195\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eHarutyunyan H, Khachatrian H, Kale DC, Ver Steeg G, Galstyan A (2019) Multitask learning and benchmarking with clinical time series data. Sci Data 6:96\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eMiotto R, Li L, Kidd BA, Dudley JT (2016) Deep Patient: An unsupervised representation to predict the future of patients from the electronic health records. Sci Rep 6:26094\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eSalath\u0026eacute; M (2018) Digital epidemiology: What is it, and where is it going? Life Sci Soc Policy 14:1\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eViboud C, Sun K, Gaffey R et al (2018) The RAPIDD Ebola forecasting challenge: Synthesis and lessons learnt. Epidemics 22:13\u0026ndash;21\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eLundberg SM, Lee SI (2018) A unified approach to interpreting model predictions. In: Advances in Neural Information Processing Systems (NeurIPS)\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eAmann J, Blasimme A, Vayena E, Frey D, Madai VI (2020) Explainability for artificial intelligence in healthcare: A multidisciplinary perspective. BMC Med Inf Decis Mak 20:310\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eBlaya JA, Fraser HSF, Holt B (2010) E-health technologies show promise in developing countries. Health Aff (Millwood) 29(2):244\u0026ndash;251\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eSendak MP, Gao M, Nichols M et al (2020) Human-centred implementation of machine learning in clinical systems. NPJ Digit Med 3:1\u0026ndash;10\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003ePetticrew M, Roberts H (2006) Systematic Reviews in the Social Sciences: A Practical Guide. Blackwell, Oxford\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eRaji ID, Smart A, White RN et al (2020) Closing the AI accountability gap. In: Proc ACM FAccT\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eChallen R, Denny J, Pitt M, Gompels L, Edwards T, Tsaneva-Atanasova K (2019) Artificial intelligence, bias and clinical safety. BMJ Qual Saf 28:231\u0026ndash;237\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eMehrabi N, Morstatter F, Saxena N, Lerman K, Galstyan A (2021) A survey on bias and fairness in machine learning. ACM Comput Surv 54(6):115\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eSalgado TM et al (2020) Machine learning for health system planning: A scoping review. Health Policy 124:1011\u0026ndash;1018\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eBastani H, Bayati M, Khosravi P (2019) Analytics for healthcare operations management: A review. Manuf Serv Oper Manag 21(3):517\u0026ndash;534\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eRudin C (2019) Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat Mach Intell 1:206\u0026ndash;215\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eChoi E, Bahadori MT, Song L, Stewart WF, Sun J (2017) GRAM: Graph-based attention model for healthcare representation learning. In: Proc 23rd ACM SIGKDD. :787\u0026ndash;795\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eSi Y, Du J, Li Z et al (2021) Deep representation learning of patient data from EHR: A systematic review. J Biomed Inf 115:103671\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eXiao C, Choi E, Sun J (2018) Opportunities and challenges in developing deep learning models using electronic health records: A systematic review. J Am Med Inf Assoc 25(10):1419\u0026ndash;1428\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eRibeiro MT, Singh S, Guestrin C (2016) 'Why should I trust you?': Explaining the predictions of any classifier. In: Proc 22nd ACM SIGKDD. :1135\u0026ndash;1144\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eAiello M, Cavaliere C, D'Albore A et al (2023) The challenges of explainable AI in biomedical research. Front Neurosci 17:1035246\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eReich NG, McGowan CJ, Yamana TK et al (2019) Accuracy of real-time multi-model ensemble forecasts for seasonal influenza in the U.S. PLOS Comput Biol 15(11):e1007486\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eSendak MP, Balu S, Schulman KA (2022) Barriers to achieving scalable implementation of machine learning in health care. NPJ Digit Med 5:98\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eArueyingho OV, Al-Taie A, McCallum C (2024) Scoping review: Machine learning interventions in the management of healthcare systems. Digit Health 10:20552076221144095\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eChen M, Hao Y, Hwang K, Wang L, Wang L (2017) Disease prediction by machine learning over big data from healthcare communities. IEEE Access 5:8869\u0026ndash;8879\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eChar DS, Shah NH, Magnus D (2018) Implementing machine learning in health care: addressing ethical challenges. N Engl J Med 378(11):981\u0026ndash;983\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eGianfrancesco MA, Tamang S, Yazdany J, Schmajuk G (2018) Potential biases in machine learning algorithms using electronic health record data. JAMA Intern Med 178(11):1544\u0026ndash;1547\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eChen IY, Pierson E, Rose S, Joshi S, Ferryman K, Ghassemi M (2021) Ethical machine learning in healthcare. Annu Rev Biomed Eng 23:123\u0026ndash;150\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003ePanch T, Mattie H, Atun R (2019) Artificial intelligence and algorithmic bias: Implications for health systems. J Glob Health 9(2):010318\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eSendak M, Ratliff W, Sarro D et al (2020) Real-world integration of a sepsis deep learning technology into routine clinical care. JMIR Med Inf 8(7):e15182\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eXie F, Yuan H, Ning Y et al (2022) Deep learning for temporal data representation in electronic health records: A systematic review. J Biomed Inf 126:103980\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eGao J, Xiao C, Glass LM, Sun J (2023) Deep learning prediction models based on EHR trajectories: A systematic review. J Biomed Inf 144:104428\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eNazir S, Dickson DM, Akram MU (2024) A survey of explainable artificial intelligence in healthcare. Healthc Anal 6:100344\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eNilsson M, Sandin F, Gustafsson J et al (2024) Implementation of machine learning applications in health care organizations: Systematic review of empirical studies. JMIR Med Inform. ;12:e55897.le\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eLandi I, Glicksberg BS, Lee HC et al (2020) Deep representation learning of electronic health records to unlock patient stratification at scale. NPJ Digit Med 3:96\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eYang S, Varghese P, Stephenson E, Tu K, Gronsbell J (2023) Machine learning approaches for electronic health records phenotyping: a methodical review. J Am Med Inform Assoc 30(2):367\u0026ndash;381. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1093/jamia/ocac216\u003c/span\u003e\u003cspan address=\"10.1093/jamia/ocac216\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eTomašev N, Harris N, Baur S, Mottram A, Glorot X Rae J. W.,\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eUse of deep (2021) learning to develop continuous-risk models for adverse event prediction from electronic health records. Nat Protoc 16:2765\u0026ndash;2787\u003c/span\u003e\u003c/li\u003e\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":true,"hideJournal":true,"highlight":"","institution":"Ministry of Health","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"Machine Learning, Electronic Health Records, Predictive Analytics, Public Health, Epidemiology, Low- and Middle-Income Countries","lastPublishedDoi":"10.21203/rs.3.rs-9227225/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-9227225/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eThe growing availability of electronic health records (EHRs) has accelerated the use of artificial intelligence (AI) and machine learning (ML) in public health. Yet, how well these methods work in resource-limited settings, particularly low- and middle-income countries (LMICs), remains poorly understood.\u003c/p\u003e \u003cp\u003eThis systematic review synthesizes evidence from 64 peer-reviewed studies (2018\u0026ndash;2025) on ML-based predictive analytics using EHRs, with LMICs as the primary focus and high-income country studies as a methodological reference.\u003c/p\u003e \u003cp\u003eFollowing PRISMA guidelines, searches across five major databases identified 64 eligible studies published between 2018 and 2025. Of these, 12 (18.8%) were conducted exclusively in LMIC settings, 44 (68.8%) in high-income countries, and 8 (12.5%) drew on mixed or multi-setting data. Retrospective designs predominated (81.3%). Disease progression (40.6%), mortality (34.4%), and treatment response (25.0%) were the most common prediction targets. Deep learning architectures were the most frequently applied category overall (39.1%, n\u0026thinsp;=\u0026thinsp;25), driven by high-income country studies with access to large curated datasets; among LMIC-focused studies, traditional ML and ensemble methods were each applied in 33.3% of studies. Evaluation practices were dominated by discrimination metrics, particularly AUROC; external validation was reported in only 5 studies (7.8%) and calibration in only 4 (6.2%). Explainability assessment was reported in 1 of 12 LMIC studies (8.3%) compared with 16 of 44 high-income studies (36.4%), with governance and ethical considerations inconsistently documented in LMIC settings.\u003c/p\u003e \u003cp\u003eThis review highlights key methodological and contextual gaps and offers actionable guidance for developing interpretable, reliable, and context-appropriate AI tools for public health decision-making in resource-constrained settings.\u003c/p\u003e","manuscriptTitle":"A Systematic Review of Artificial Intelligence and Machine Learning Methods and Deployment Challenges for Public Health Predictions Using Electronic Health Records in Low- and Middle-Income Countries","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2026-03-27 05:33:09","doi":"10.21203/rs.3.rs-9227225/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"cbe46b92-e80b-4c3c-8f9b-97f1d9ea92a3","owner":[],"postedDate":"March 27th, 2026","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[{"id":65150019,"name":"Artificial Intelligence and Machine Learning"},{"id":65150020,"name":"Health Economics \u0026 Outcomes Research"}],"tags":[],"updatedAt":"2026-03-27T05:33:09+00:00","versionOfRecord":[],"versionCreatedAt":"2026-03-27 05:33:09","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-9227225","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-9227225","identity":"rs-9227225","version":["v1"]},"buildId":"XKTyCvWXoU3ODBz1xrDgd","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

⚙ Ask this paper AI returns verbatim quotes from the full text · source: preprint-html ⓘ

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2026) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc: last seen: 2026-05-20T01:45:00.602351+00:00