Evaluating Machine Learning models for predicting HIV treatment interruption: a systematic review of accuracy, validity, and applicability | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Systematic Review Evaluating Machine Learning models for predicting HIV treatment interruption: a systematic review of accuracy, validity, and applicability Williams Kwarah, Frances Baaba da-Costa Vroom, Duah Dwomoh, Samuel Bosomprah This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-5810875/v1 This work is licensed under a CC BY 4.0 License Status: Under Review Version 1 posted 9 You are reading this latest preprint version Abstract Interruption in HIV treatment (IIT) remains a significant barrier to achieving global HIV/AIDS control goals. Machine learning (ML) models offer potential for predicting IIT by leveraging large clinical data. Understanding how these models were developed, validated, and applied remains essential for advancing research. We searched the PubMed, BMC, Cochrane Library, Scopus, ScienceDirect, Lancet, and Google Scholar, for studies published in English from 1990 to September 2024. Search terms covered HIV, machine learning, treatment interruption, and loss to follow-up. Articles were screened and reviewed independently, and data were extracted using the CHARMS checklist. Risk of bias was assessed with PROBAST. The PRISMA guidelines were followed throughout. Out of 116,672 records, nine studies met the inclusion criteria and reported 12 ML models. Random Forest, XGBoost, and AdaBoost were predominant models (91.7%). Internal validation was performed in all models, but only two models included external validation. Performance varied, with a mean AUC-ROC of 0.668 (SD = 0.066), indicating moderate discrimination. About 75% of models showed a high risk of bias due to inadequate handling of missing data, lack of calibration, and absence of decision curve analysis (DCA). ML models show promise for predicting IIT, particularly in resource-limited settings. Future research should prioritize external validation, robust missing data handling, decision curve analysis, and include sociocultural predictors to improve model robustness. HIV treatment interruption machine learning predictive modeling Figures Figure 1 Figure 2 Figure 3 Introduction Human immunodeficiency virus (HIV) treatment interruption poses a significant challenge to global efforts in the HIV/AIDS epidemic response. In 2022, an estimated 39 million people were living with HIV (PLHIV) globally, with an estimated 1.3 million new infections and 630,000 deaths reported [ 1 ]. The burden of HIV infection is disproportionately high in sub-Saharan Africa, Asia, and the Pacific, which together account for about 88% of all cases [ 2 ]. Despite the availability of antiretroviral therapy (ART), which has dramatically reduced the progression of HIV to AIDS and decreased AIDS-related mortality, many individuals living with HIV struggle to maintain consistent adherence to their treatment regimen [ 3 , 4 ]. It has been estimated that only 46–85% of patients continue to stay on ART two years after initiation [ 5 , 6 ]. This lack of adherence is particularly concerning given that, when left untreated, HIV weakens the immune system and can lead to life-threatening complications [ 4 ]. People who stay in treatment are economically viable and productive to their families and the community [ 7 ]. Interrupting HIV treatment may result in viral rebound, deterioration of the immune system, heightened transmission risk, and the development of drug resistance, thereby compromising both individual health and community prevention initiatives. The situation places significant pressure on healthcare systems and compromises public health initiatives [ 8 – 11 ]. Improving ART adherence is critical to achieving global HIV/AIDS control goals. While current strategies to address treatment interruption primarily focus on re-engaging patients after missed doses [ 12 , 13 ], these reactive measures often fall short of preventing the associated health risks and potential for increased transmission. The ability to predict treatment interruptions before they occur could revolutionize HIV care by enabling healthcare providers to implement targeted and proactive interventions that keep patients on therapy, thus enhancing their chances of achieving and sustaining viral suppression. Artificial Intelligence (AI) and ML offer powerful tools for developing such predictive models due to their capacity to dynamically analyze large, complex datasets and uncover patterns that traditional methods might miss [ 14 – 18 ]. Despite the promise of these technologies, there remains a significant evidence gap in their application to HIV treatment adherence, particularly in low-resource settings where the burden of the disease is greatest. Addressing this gap through systematic evaluation of existing predictive models is crucial for advancing the use of AI and ML in HIV care. This can lead to more effective and personalized treatment strategies that can help meet the ambitious UNAIDS 95-95-95 targets by 2030 [ 2 ]. This systematic review aimed to evaluate the effectiveness of machine learning-based predictive models in forecasting HIV treatment interruptions. Specifically, the review (1) identified the types of predictive models previously developed, (2) assessed their accuracy and applicability in various settings, and (3) determined which models have been validated and how they performed in different populations. The impact of this review will provide insights that can guide the integration of advanced predictive technologies into HIV care programs, potentially improving patient retention, optimizing treatment outcomes, and supporting global efforts to eliminate HIV as a public health threat by 2030. Methods Search strategy and eligibility criteria We searched multiple electronic databases, including Scopus, PubMed, The Lancet, BioMed Central (BMC) Public Health, ScienceDirect, Google Scholar, and Cochrane Library. Our search covered publications from January 1990 to September 2024. We searched using a combination of Medical Subject Headings (MeSH) and free-text terms. The key terms included “HIV”, “Human Immunodeficiency Virus”, “AIDS”, and “Acquired Immunodeficiency Syndrome” for HIV-related concepts, “Machine Learning”, “ML”, “Artificial Intelligence”, “AI”, “Neural Networks”, and “Predictive Modeling” for machine learning concepts, and “Treatment Interruption”, “Loss to Follow-Up”, “Non-adherence”, “Default”, and “Treatment Discontinuation” for treatment adherence concepts. These terms were combined using Boolean operators (AND, OR) to ensure a broad and inclusive search. Details of the search strategy for each database is shown in the Additional File 1. We applied specific eligibility criteria to select studies for inclusion. Eligible studies focused on developing or validating prediction models for HIV treatment interruption at the individual level using machine learning methods. We only included studies published in English. We included studies that focused on HIV treatment interruption defined as missing a scheduled clinic or pharmacy appointment by at least 28 days. We excluded studies that identified predictors without focusing on prediction models, and studies lacking full-text availability. Reviews, commentaries, conference abstracts, letters, reports, and opinions were excluded. In addition to database searches, we manually reviewed the reference lists of the included studies to identify additional relevant articles. To capture recent and unpublished research, we searched preprint servers such as bioRxiv, medRxiv, and arXiv. The corresponding authors of the included articles were emailed to seek further information and clarity. The search strategy was carefully documented (Additional File), and articles were managed using Zotero 6.0.37 reference management software [19]. The Preferred Reporting Items for Systematic Reviews and Meta-analysis (PRISMA) statements [20] and the conduct of systematic reviews [21] guided the review. A protocol for this review was registered on PROSPERO CRD42024578109. Selection process Article selection was conducted in multiple stages to ensure that only studies meeting the predefined inclusion criteria were included. Initially, two independent reviewers (WK and GJP) screened the titles and abstracts of all records retrieved from the database searches to identify potentially relevant studies. We resolved any disagreements between reviewers during the article selection process through discussion, and a third reviewer (ZN) was available to adjudicate unresolved disputes. To enhance the rigor of the selection process, systematic review software Distiller SR 2.35 [22] was used to assist in the identification and removal of duplicate records before the screening began. Data extraction Two independent reviewers (WK and GJP) extracted data from the selected studies to ensure accuracy and consistency. Each reviewer independently extracted data, using the standardized CHecklist for critical Appraisal and data extraction for systematic Reviews of prediction Modelling Studies (CHARMS) tool [23,24]. CHARMS was developed for systematic reviews of prognostic or diagnostic prediction models without external validation, with external validation, or external prediction model validation with or without model updating. The data collected included the data sources, study characteristics, details of the predictive models, outcomes, and performance metrics [24]. We resolved any disagreements between reviewers during the data extraction process through discussion, and a third reviewer (ZN) was available to adjudicate unresolved disputes. The reviewers manually extracted all data and then cross-verified it to maintain the integrity of the data collection process. A consolidated final completed CHARMS tool was compiled for this review. Quality and risk of bias assessment We used the Prediction Model Risk of Bias Assessment Tool (PROBAST) [25] to assess the risk of bias (ROB) in the included studies. The PROBAST was designed to evaluate the risk of bias and applicability in prediction model studies. The tool evaluated four key domains: participants, predictors, outcomes, and analysis. There were two questions on participants, three questions on predictors, six questions about outcomes, and nine questions linked to the statistical analysis. Responses to these questions were either "yes," "probably yes," "probably no," "no," or "no information." The ROB was classified as either low, high, or unclear based on the responses within these domains. A domain was classified as high-risk if it included at least one question that has been answered with either "no" or "probably no; " low-risk if all the questions indicated as "yes" or "probably yes;" and unclear if there is no information in the responses. If all domains were assessed as having a low risk, then the overall risk of bias was classified as low. However, if at least one domain was determined to have a high risk, then the overall risk of bias was classified as high. If there was a recognized concern for bias in at least one area and the level of concern was low for all other domains, it was classified as having a moderate level of concern for bias. Two reviewers (WK and GJP) independently evaluated the risk of bias in each included study. When the reviewers disagreed on the risk of bias judgment, the discrepancies were discussed to reach a consensus. If the disagreement persisted, a third reviewer (ZN) was consulted to decide. We conducted all evaluations manually and documented the results of the risk of bias assessments in detail, with summary judgments presented in the form of charts to facilitate a clear understanding of the quality and reliability of the included studies. Synthesis and Analysis We tabulated the results of individual studies to provide a clear and organized presentation of the key findings. This included details such as study characteristics, model performance metrics (e.g., area under the receiver-operating characteristic curve, calibration statistics), and risk of bias assessments. We used visual displays, including charts to enhance the clarity of the results and to facilitate the comparison of study outcomes. For the synthesis of results, we used a narrative synthesis approach due to the anticipated heterogeneity of the included studies, particularly in terms of model types, outcome measures, and study populations. This approach allowed us to systematically describe and compare the predictive models, highlighting common themes and differences among the studies. We did not perform a meta-analysis because there were insufficient external validation studies of the same index model to justify a quantitative synthesis [21]. We explored possible causes of heterogeneity by conducting subgroup analyses, where applicable. These analyses considered factors such as the type of machine learning model used, population characteristics, and study setting. The synthesis followed guidelines from the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD) statement [26], CHARMS checklist [24], and PROBAST [25]. Results Characteristics of included studies Our search identified 116,672 studies, of which 9 met the inclusion criteria (Figure 1). Seven of these studies focused on developing predictive models [27–33] , while two included both model development and validation [34,35]. Six studies were conducted in Africa [27–30,33,34], of which three were in South Africa, one in Tanzania, and one combining data from Nigeria and Mozambique. The remaining three studies were in the United States of America [31,32,35] (Table 1). These studies were published between 2018 and 2024, with the majority published in 2023, and 2022. Seven studies were conducted in public healthcare facilities, while two were conducted in university clinics. Seven studies relied on retrospective cohort data, while two used existing registries (Table 1). Model performance metrics Among the nine studies selected, a total of 12 machine learning models were reported, with nine focused on model development and three on model validation (Table 2). The median sample size across studies was 136,415 (interquartile range: 178–450,000), though one model was developed using a sample size of less than 1000 participants. On average, 15 predictors (SD=4.0) were included in the final models. Ensemble learning techniques were the most frequently used algorithms, accounting for 92% of the total models. These included Random Forest (3 models), Adaptive Boosting (AdaBoost, 3 models), Extreme Gradient Boosting (XGBoost, 2 models), Decision Trees (2 models), and Categorical Boosting (CatBoost, 1 model) (Table 2). Logistic regression was used in only one model. All 12 models reported internal validation. These included random sample split (6), cross-validation (4), and a combination of random sample split and cross-validation (2) (Table 2). Model performance was primarily assessed using the c-statistic or area under the receiver operating characteristic curve (AUC), with an average AUC of 0.668 (SD: 0.07). Some models also reported additional metrics, including accuracy, sensitivity, specificity, negative predictive value (NPV), and positive predictive value (PPV) (Table 2). Notably, two models reported only PPV, while another two reported the Mathews Correlation Coefficient. Model calibration methods were used in just three models, which reported an average F1 score of 0.292 (SD: 0.01) alongside the AUC. None of the studies used Decision Curve Analysis (DCA) to assess clinical value and implications, a significant limitation in evaluating the practical utility of the models. However, one study addressed model utility by gathering feedback from healthcare workers. Additional information is shown in Additional File 1. Risk of bias assessment We reported the risk of bias assessment for the twelve models using the PROBAST tool (Figure 2). Of these, nine models (75.0%) were rated as having a high risk of bias, two models (16.7%) were rated low risk, and one model (8.3%) had an unclear risk of bias. A notable majority (58.3%) expressed high risk in the statistical analysis domain. For example, nearly half of the models failed to report how missing data was handled, and ten models (83.3%) did not disclose the extent of missing data. Furthermore, only three models (25.0%) provided details on calibration measures, which are important for ensuring the reliability of predictions. None of the studies reported DCA or other methods to assess clinical utility, highlighting a critical gap in evaluating the practical application of these models. Additional details on the risk of bias analysis are provided in the supplementary material (Additional file 1). Applicability assessment We evaluated the applicability of the models for use in the intended population and primary healthcare settings. Overall, 83% of the models were rated as low concern, indicating their suitability for primary healthcare use. However, 17% were rated as high concern, reflecting limitations in certain aspects of model development (Figure 3). Predictors were rated as low concern, suggesting that the included predictors were relevant to the target population and routinely collected in clinical settings. Similarly, the outcome domain was rated as low concern in 92% of the models, while 8% were marked as unclear due to insufficient reporting of key details. Three models were externally validated, but only two reported calibration measures, with an average F1 score of 0.2935, alongside c-statistic (AUC) values. These validations were done using datasets received from registries of people living with HIV and scheduled for clinical appointments. While sensitivity, specificity, PPV and NPV were included, one model lacked critical details on eligibility criteria and missing data handling. None of the externally validated models assessed clinical utility. Further details are provided in the supplementary material (Additional File 1). Discussions This review examined 12 machine learning models developed to predict interruptions in HIV treatment, with most relying on advanced ensemble techniques like Random Forest, AdaBoost, and XGBoost. These models were built using data from large retrospective cohorts, with a median sample size of 120,000 participants, and were validated internally through methods like cross-validation and random sample splitting. The models demonstrated acceptable predictive performance, with an average AUC-ROC of 0.668, and utilized data commonly collected in clinical settings, making them practical for real-world use. For prognostic predictive models, AUC of 0.5–0.7 suggests poor discrimination, 0.7–0.8 is considered acceptable, 0.8–0.9 excellent, and > 0.9 as outstanding [ 36 , 37 ]. Although only two models were externally validated, most models showed strong potential for application in primary healthcare, highlighting their promise in improving adherence and supporting HIV care strategies. Electronic Medical Records (EMRs) are increasingly prevalent worldwide, including in Africa [ 38 ], facilitating the ongoing accumulation of extensive healthcare data and enabling big data analytics [ 39 – 44 ], as well as the application of machine learning and artificial intelligence [ 42 , 45 , 46 ]. Numerous prognostic studies have employed EMR data to create models for predicting individual diagnoses of HIV, healthcare attendance, and viral load suppression [ 47 – 49 ]. The growing utilization of these analytic tools is likely due to the interest in employing predictive models as decision support instruments at the point of care. Moreover, executing focused, high-impact treatments with limited resources in underprivileged healthcare environments is essential [ 50 , 51 ]. Two-thirds of the research was conducted in Africa, predominantly in South Africa, an area characterized by a high incidence of HIV [ 52 ]. This emphasis is praiseworthy, yet it constrains the comprehension of predictive model application in areas with low prevalence. Utilizing data from high-prevalence regions, such as South Africa, offers essential insights into models that help tackle adherence difficulties in analogous circumstances. This emphasis requires careful consideration when extrapolating results to areas with varying healthcare systems and compliance challenges. The research conducted in the United States of America, however limited in number, offered a divergent viewpoint, highlighting the necessity for regionally appropriate models. The machine learning techniques in our analysis have shown significant potential in forecasting IIT by utilizing routinely gathered clinical data. Ensemble learning methodologies, specifically Random Forest, AdaBoost, and XGBoost, were significant, collectively representing 91.7% of the models created. Previous studies have demonstrated that ensemble approaches effectively address the complex, nonlinear interactions prevalent in healthcare datasets [ 53 , 54 ]. These algorithms have achieved above 90% accuracy across many datasets [ 55 ]. Ensemble algorithms are beneficial because of their resilience to overfitting and their capacity to handle extensive feature sets. The outcomes of our review correspond with these results. Upon analysis, most models in our study provided the c-statistic (AUC), which evaluates the discriminatory capability of predictive models. The average AUC of 0.668 in our analysis aligns with the findings of Chilamkurthy et al., 2018 who stated that whereas ML models excel at distinguishing different outcomes, their clinical performance criteria, such as accuracy, sensitivity, and specificity, frequently lack efficacy due to unbalanced datasets or inadequate predictor selection often found in healthcare datasets. Other studies have emphasized the need for ML algorithms to employ the AUC as a more effective and superior metric in conjunction with calibration and decision curve analysis for assessing model performance in comparison to accuracy [ 57 ]. We discovered in our review that several studies failed to include calibration and clinical efficacy in their reports. Although there are many possible problems in the creation and validation of prediction models, it is essential to disclose calibration measurements, which are vital components of statistical performance [ 58 , 59 ]. Calibration measures are essential since they guarantee that model prediction probabilities correspond with real probabilities, hence ensuring model dependability. Merely 25% of the research included in our evaluation assessed model calibration. In the absence of calibration, predictive models may provide probabilities that inaccurately reflect actual hazards, hence compromising their therapeutic relevance [ 60 ]. We noted significant problems with the ROB in the developed prediction models. Seventy-five percent of the reviewed models were classified as exhibiting a high risk of bias, mostly due to inadequacies in the statistical analysis and data management. Approximately 83.3% of models did not disclose the magnitude of missing data or the methodologies employed to mitigate it, underscoring its significance as a key concern. This conclusion aligns with prior research demonstrating that most predictive model studies do not report their methods for addressing missing data [ 61 ]. Missing data is a widespread problem in retrospective healthcare datasets and, if not properly managed, can compromise model performance and integrity [ 61 – 63 ]. Several studies have utilized imputation approaches, precisely predicting missing values to mirror reality, which increases the probability of acquiring high-quality and reusable data [ 64 ]. However, if this is not handled appropriately, it can lead to systemic biases and diminish the validity and integrity of models, particularly in datasets utilized in healthcare research [ 65 , 66 ]. Furthermore, our review observed the lack of decision curve analysis (DCA) in all the studies included. DCA is essential for assessing a model's clinical relevance by weighing the benefits and risks at different decision thresholds, rendering its exclusion a significant constraint [ 67 ]. DCA are essential metrics that enhance calibration and discrimination measures in machine learning models [ 68 ] and help in incorporating the clinical consequences of using a model. Besides conducting DCA, net benefit analysis is an alternative measure to assess the applicability of models in real-life situations. The reviewed models show potential for improving IIT predictions; nevertheless, their reliability and applicability in clinical environments are constrained, as shown in the risk of bias and applicability results. Overall, an 83% applicability score was achieved for the reviewed models, suggesting their broad appropriateness for the target groups and settings. This result indicates the incorporation of frequently gathered predictors in clinical contexts, including demographic information, adherence records, and clinical indicators, which improve the practicality of applying these models in actual healthcare settings [ 69 ]. Ninety-two percent of models assessed the outcome domains as minimal concern; nevertheless, the absence of external validation and decision curve analysis presents serious constraints in the practical use in guiding clinical decisions [ 60 ]. For optimal real-world applicability, models must address these deficiencies by integrating external validation across diverse contexts and evaluating clinical significance using methodologies such as DCA, net benefit analysis or net reclassification improvement assessments. Aligning with clinical processes is crucial for maximizing the efficacy of machine learning in enhancing adherence and minimizing inappropriate treatment exclusion in HIV care. Enhancing future research through stringent reporting standards and robust statistical methodologies, such as those outlined in the TRIPOD recommendations, is essential to mitigate biases and improve the reliability of predictive modeling in HIV care [ 70 ]. Limitations The results of this review should be interpreted with certain limitations in mind. First, the review included only journal articles published in English with free-text availability, and the search was conducted across a limited number of databases, which may introduce language and publication bias. Excluding studies conducted in other languages besides English presented a potential selection bias. This potentially limits the generalizability of the findings to English speaking settings. To address potential selection and publication bias stemming from the restricted database search, we supplemented our efforts by conducting backward and forward citation searches in Google Scholar and reviewing article references. Most of these studies were conducted in resource-poor settings, which made it difficult for validation studies to be carried out. It is recommended that in such circumstances, validation studies should be conducted on different datasets or settings Recommendations for future research Future studies should prioritize implementing robust external validation across diverse populations and geographic regions, which is essential to evaluate model performance under varying demographic, clinical, and systemic conditions, ensuring reliability in real-world applications. The inclusion of socio-cultural and structural factors in model development should be considered in future research. Also, addressing missing data is critical for enhancing model accuracy and reliability. Future studies should adopt systematic strategies such as multiple imputations or sensitivity analyses and adhere to standardized reporting guidelines like TRIPOD. Finally, incorporating decision curve analysis (DCA) into model assessment is recommended to bridge the gap between statistical performance and practical, real-world impact. Conclusion This study provides key insights into the current state of predictive modeling for HIV treatment interruptions. Machine learning, particularly ensemble learning techniques, is popularly used with retrospective cohort data to address adherence issues in HIV programs, demonstrating moderate accuracy and applicability in primary healthcare settings. However, critical shortcomings, including insufficient calibration reporting, lack of decision curve analysis (DCA), and limited external validation, restrict the models’ clinical utility and generalizability. Predictive modeling holds significant promise in supporting countries to achieve the UNAIDS 95-95-95 targets by advancing equitable access to medications, high treatment retention rates, and achieving widespread viral load suppression. Abbreviations Abbreviation Meaning HIV Human Immune Virus AIDS Acquired Immuno-Deficiency Syndrome PLHIV People Living with HIV ART Anti-Retroviral Therapy ML Machine Learning AI Artificial Intelligence UNAIDS Joint United Nations Programme on HIV/AIDS PRISMA Preferred Reporting Items for Systematic reviews and Meta-Analyses PROSPERO International Prospective Register of Systematic Reviews BMC BioMed Central MeSH Medical Subject Headings CHARMS CHecklist for critical Appraisal and data extraction for systematic Reviews of prediction Modelling Studies PROBAST Prediction Model Risk of Bias Assessment Tool ROB Risk of Bias TRIPOD Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis SD Standard Deviation XGBoost Extreme Gradient Boosting AdaBoost Adaptive Boosting CatBoost Categorical Boosting AUC-ROC Area Under the Receiver Operating Characteristic Curve AUC-PR Area Under the Precision Recall NPV Negative Predictive Value PPV Positive Predictive Value DCA Decision Curve Analysis EMR Electronic Medical Records Declarations Ethics approval and consent to participate Consent to participate is not applicable. However, since this study is nested within another study on HIV treatment interruptions, ethical approval was received from the Ghana Health Service Ethics Review Committee with approval number GHS-ERC:003/08/24. All ethical principles were followed in this review. Consent for publication Not applicable Availability of data and materials All data generated or analyzed during this study are part of the supplementary information Additional File 2. Competing interests All authors declare that they have no competing interests. Funding No funding was secured for this study. Authors' contributions WK conceived the research topic, led data review and extraction, analyzed and interpreted the extracted data, and wrote the first draft of the manuscript. FBV, DD, and SB contributed to the methods, analysis, and reporting and reviewed the manuscript. All authors read and approved the final manuscript. Acknowledgements We would like to express our sincere gratitude to Jasmin Kwarah for generously providing the stationery that was crucial for the successful completion of this systematic review. Also, we would like to express sincere gratitude to Ekua Houphoet for her support in reviewing the manuscript. References UNAIDS_FactSheet_en.pdf, (n.d.). https://www.unaids.org/sites/default/files/media_asset/UNAIDS_FactSheet_en.pdf (accessed December 17, 2024). L. Frescura, P. Godfrey-Faussett, A. Feizzadeh A., W. El-Sadr, O. Syarif, P.D. Ghys, Achieving the 95 95 95 targets for all: A pathway to ending AIDS, PLoS ONE 17 (2022) e0272405. https://doi.org/10.1371/journal.pone.0272405. F. Altice, O. Evuarherhe, S. Shina, G. Carter, A.C. Beaubrun, Adherence to HIV treatment regimens: systematic literature review and meta-analysis, Patient Prefer. Adherence 13 (2019) 475–490. https://doi.org/10.2147/PPA.S192735. G. Dubrocq, N. Rakhmanina, Antiretroviral therapy interruptions: impact on HIV treatment and transmission, HIVAIDS - Res. Palliat. Care 10 (2018) 91–101. https://doi.org/10.2147/HIV.S141965. U. Akpan, K. Kakanfo, O.D. Ekele, K. Ukpong, O. Toyo, P. Nwaokoro, E. James, S. Pandey, K. Olatubosun, M. Bateganya, Predictors of treatment interruption among patients on antiretroviral therapy in Akwa Ibom, Nigeria: outcomes after 12 months, AIDS Care 35 (2023) 114–122. https://doi.org/10.1080/09540121.2022.2093826. S. Rosen, M.P. Fox, C.J. Gill, Patient Retention in Antiretroviral Therapy Programs in Sub-Saharan Africa: A Systematic Review, PLoS Med. 4 (2007) e298. https://doi.org/10.1371/journal.pmed.0040298. H. Thirumurthy, O. Galárraga, B. Larson, S. Rosen, HIV Treatment Produces Economic Returns Through Increased Work And Education, And Warrants Continued US Support, Health Aff. Proj. Hope 31 (2012) 1470–1477. https://doi.org/10.1377/hlthaff.2012.0217. B. Jewell, J. Smith, T. Hallett, The Potential Impact of Interruptions to HIV Services: A Modelling Case Study for South Africa, (2020) 2020.04.22.20075861. https://doi.org/10.1101/2020.04.22.20075861. E.J. Mills, A. Funk, S. Kanters, E. Kawuma, C. Cooper, B. Mukasa, M. Odit, Y. Karamagi, D. Mwehire, J. Nachega, S. Yaya, A. Featherstone, N. Ford, Long-Term Health Care Interruptions Among HIV-Positive Patients in Uganda, JAIDS J. Acquir. Immune Defic. Syndr. 63 (2013) e23. https://doi.org/10.1097/QAI.0b013e31828a3fb8. C. Thomadakis, C.T. Yiannoutsos, N. Pantazis, L. Diero, A. Mwangi, B.S. Musick, K. Wools-Kaloustian, G. Touloumi, The Effect of HIV Treatment Interruption on Subsequent Immunological Response, Am. J. Epidemiol. 192 (2023) 1181–1191. https://doi.org/10.1093/aje/kwad076. A. Trickey, L. Zhang, C.T. Rentsch, N. Pantazis, R. Izquierdo, A. Antinori, G. Leierer, G. Burkholder, M. Cavassini, J. Palacio-Vieira, M.J. Gill, R. Teira, C. Stephan, N. Obel, J.-J. Vehreschild, T.R. Sterling, M. Van Der Valk, F. Bonnet, H.M. Crane, M.J. Silverberg, S.M. Ingle, J.A.C. Sterne, the A.T.C. Collaboration (ART-CC), Care interruptions and mortality among adults in Europe and North America, AIDS 38 (2024) 1533. https://doi.org/10.1097/QAD.0000000000003924. S. Chamberlin, M. Mphande, K. Phiri, P. Kalande, K. Dovel, How HIV Clients Find Their Way Back to the ART Clinic: A Qualitative Study of Disengagement and Re-engagement with HIV Care in Malawi, AIDS Behav. 26 (2022) 674–685. https://doi.org/10.1007/s10461-021-03427-1. J. Palacio-Vieira, J.M. Reyes-Urueña, A. Imaz, A. Bruguera, L. Force, A.O. Llaveria, J.M. Llibre, I. Vilaró, F.H. Borràs, V. Falcó, M. Riera, P. Domingo, E. de Lazzari, J.M. Miró, J. Casabona, Strategies to reengage patients lost to follow up in HIV care in high income countries, a scoping review, BMC Public Health 21 (2021) 1596. https://doi.org/10.1186/s12889-021-11613-y. M. Bektaş, J.B. Tuynman, J. Costa Pereira, G.L. Burchell, D.L. van der Peet, Machine Learning Algorithms for Predicting Surgical Outcomes after Colorectal Surgery: A Systematic Review, World J. Surg. 46 (2022) 1. https://doi.org/10.1007/s00268-022-06728-1. Y. Huang, J. Li, M. Li, R.R. Aparasu, Application of machine learning in predicting survival outcomes involving real-world data: a scoping review, BMC Med. Res. Methodol. 23 (2023) 268. https://doi.org/10.1186/s12874-023-02078-1. J.T. Senders, P.C. Staples, A.V. Karhade, M.M. Zaki, W.B. Gormley, M.L.D. Broekman, T.R. Smith, O. Arnaout, Machine Learning and Neurosurgical Outcome Prediction: A Systematic Review, World Neurosurg. 109 (2018) 476-486.e1. https://doi.org/10.1016/j.wneu.2017.09.149. E.W. Steyerberg, Applications of Prediction Models, in: E.W. Steyerberg (Ed.), Clin. Predict. Models Pract. Approach Dev. Valid. Updat., Springer International Publishing, Cham, 2019: pp. 15–36. https://doi.org/10.1007/978-3-030-16399-0_2. W. Zu, X. Huang, T. Xu, L. Du, Y. Wang, L. Wang, W. Nie, Machine learning in predicting outcomes for stroke patients following rehabilitation treatment: A systematic review, PLOS ONE 18 (2023) e0287308. https://doi.org/10.1371/journal.pone.0287308. J. Puckett, Zotero: A Guide for Librarians, Researchers, and Educators, Assoc of Cllge & Rsrch Libr, 2011. M.J. Page, D. Moher, P.M. Bossuyt, I. Boutron, T.C. Hoffmann, C.D. Mulrow, L. Shamseer, J.M. Tetzlaff, E.A. Akl, S.E. Brennan, R. Chou, J. Glanville, J.M. Grimshaw, A. Hróbjartsson, M.M. Lalu, T. Li, E.W. Loder, E. Mayo-Wilson, S. McDonald, L.A. McGuinness, L.A. Stewart, J. Thomas, A.C. Tricco, V.A. Welch, P. Whiting, J.E. McKenzie, PRISMA 2020 explanation and elaboration: updated guidance and exemplars for reporting systematic reviews, BMJ 372 (2021) n160. https://doi.org/10.1136/bmj.n160. J.A.A. Damen, K.G.M. Moons, M. van Smeden, L. Hooft, How to conduct a systematic review and meta-analysis of prognostic model studies, Clin. Microbiol. Infect. 29 (2023) 434–440. https://doi.org/10.1016/j.cmi.2022.07.019. Systematic Review and Literature Review Software by DistillerSR, DistillerSR (n.d.). https://www.distillersr.com/ (accessed December 17, 2024). A. Liberati, D.G. Altman, J. Tetzlaff, C. Mulrow, P.C. Gøtzsche, J.P.A. Ioannidis, M. Clarke, P.J. Devereaux, J. Kleijnen, D. Moher, The PRISMA Statement for Reporting Systematic Reviews and Meta-Analyses of Studies That Evaluate Health Care Interventions: Explanation and Elaboration, PLOS Med. 6 (2009) e1000100. https://doi.org/10.1371/journal.pmed.1000100. K.G.M. Moons, J.A.H. de Groot, W. Bouwmeester, Y. Vergouwe, S. Mallett, D.G. Altman, J.B. Reitsma, G.S. Collins, Critical Appraisal and Data Extraction for Systematic Reviews of Prediction Modelling Studies: The CHARMS Checklist, PLOS Med. 11 (2014) e1001744. https://doi.org/10.1371/journal.pmed.1001744. R.F. Wolff, K.G.M. Moons, R.D. Riley, P.F. Whiting, M. Westwood, G.S. Collins, J.B. Reitsma, J. Kleijnen, S. Mallett, PROBAST Group†, PROBAST: A Tool to Assess the Risk of Bias and Applicability of Prediction Model Studies, Ann. Intern. Med. 170 (2019) 51–58. https://doi.org/10.7326/M18-1376. K.G.M. Moons, D.G. Altman, J.B. Reitsma, J.P.A. Ioannidis, P. Macaskill, E.W. Steyerberg, A.J. Vickers, D.F. Ransohoff, G.S. Collins, Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD): Explanation and Elaboration, Ann. Intern. Med. 162 (2015) W1–W73. https://doi.org/10.7326/M14-0698. C.A. Fahey, L. Wei, P.F. Njau, S. Shabani, S. Kwilasa, W. Maokola, L. Packel, Z. Zheng, J. Wang, S.I. McCoy, Machine learning with routine electronic medical record data to identify people at high risk of disengagement from HIV care in Tanzania, PLOS Glob. Public Health 2 (2022) e0000720. https://doi.org/10.1371/journal.pgph.0000720. M. Maskew, K. Sharpey-Schafer, L. De Voux, T. Crompton, J. Bor, M. Rennick, A. Chirowodza, J. Miot, S. Molefi, C. Onaga, P. Majuba, I. Sanne, P. Pisa, Applying machine learning and predictive modeling to retention and viral suppression in South African HIV treatment cohorts, Sci. Rep. 12 (2022) 12715. https://doi.org/10.1038/s41598-022-16062-0. M. Maskew, S. Smith, L.D. Voux, K. Sharpey-Schafer, T. Crompton, A. Govender, P. Pisa, S. Rosen, Triaging clients at risk of disengagement from HIV care: Application of a predictive model to clinical trial data in South Africa, (2024) 2024.08.05.24311488. https://doi.org/10.1101/2024.08.05.24311488. M.-D. Ogbechie, C.F. Walker, M.-T. Lee, A.A. Gana, A. Oduola, A. Idemudia, M. Edor, E.L. Harris, J. Stephens, X. Gao, P.-L. Chen, N.E. Persaud, Predicting Treatment Interruption Among People Living With HIV in Nigeria: Machine Learning Approach, JMIR AI 2 (2023) e44432. https://doi.org/10.2196/44432. B.W. Pence, A.M. Bengtson, S. Boswell, K.A. Christopoulos, H.M. Crane, E. Geng, J.C. Keruly, W.C. Mathews, M.J. Mugavero, Who will show? Predicting missed visits among patients in routine HIV primary care in the United States, AIDS Behav. 23 (2019) 418–426. https://doi.org/10.1007/s10461-018-2215-1. A. Ramachandran, A. Kumar, H. Koenig, A. De Unanue, C. Sung, J. Walsh, J. Schneider, R. Ghani, J.P. Ridgway, Predictive Analytics for Retention in Care in an Urban HIV Clinic, Sci. Rep. 10 (2020) 6421. https://doi.org/10.1038/s41598-020-62729-x. J. Stockman, J. Friedman, J. Sundberg, E. Harris, Predictive analytics using machine learning to identify ART clients at health system level at greatest risk of treatment interruption in Mozambique and Nigeria, JAIDS J. Acquir. Immune Defic. Syndr. (2022) 10.1097/QAI.0000000000002947. https://doi.org/10.1097/QAI.0000000000002947. R. Esra, J. Carstens, S. Le Roux, T. Mabuto, M. Eisenstein, O. Keiser, E. Orel, A. Merzouki, L. De Voux, M. Maskew, K. Sharpey-Schafer, Validation and Improvement of a Machine Learning Model to Predict Interruptions in Antiretroviral Treatment in South Africa, JAIDS J. Acquir. Immune Defic. Syndr. 92 (2023) 42. https://doi.org/10.1097/QAI.0000000000003108. J.A. Mason, E.E. Friedman, J.C. Rojas, J.P. Ridgway, No-show Prediction Model Performance Among People With HIV: External Validation Study, J. Med. Internet Res. 25 (2023) e43277. https://doi.org/10.2196/43277. A.M. Carrington, D.G. Manuel, P.W. Fieguth, T. Ramsay, V. Osmani, B. Wernly, C. Bennett, S. Hawken, O. Magwood, Y. Sheikh, M. McInnes, A. Holzinger, Deep ROC Analysis and AUC as Balanced Average Accuracy, for Improved Classifier Selection, Audit and Explanation, IEEE Trans. Pattern Anal. Mach. Intell. 45 (2023) 329–341. https://doi.org/10.1109/TPAMI.2022.3145392. N. White, R. Parsons, G. Collins, A. Barnett, Evidence of questionable research practices in clinical prediction models, BMC Med. 21 (2023) 339. https://doi.org/10.1186/s12916-023-03048-6. M.O. Akanbi, A.N. Ocheke, P.A. Agaba, C.A. Daniyam, E.I. Agaba, E.N. Okeke, C.O. Ukoli, Use of Electronic Health Records in sub-Saharan Africa: Progress and challenges, J. Med. Trop. 14 (2012) 1. F. Colombo, J. Oderkirk, L. Slawomirski, Health Information Systems, Electronic Medical Records, and Big Data in Global Healthcare: Progress and Challenges in OECD Countries, in: R. Haring, I. Kickbusch, D. Ganten, M. Moeti (Eds.), Handb. Glob. Health, Springer International Publishing, Cham, 2020: pp. 1–31. https://doi.org/10.1007/978-3-030-05325-3_71-1. B. Cyganek, M. Graña, B. Krawczyk, A. Kasprzak, P. Porwik, K. Walkowiak, M. Woźniak, A Survey of Big Data Issues in Electronic Health Record Analysis, Appl. Artif. Intell. 30 (2016) 497–520. https://doi.org/10.1080/08839514.2016.1193714. Z.F. Khan, S.R. Alotaibi, Applications of Artificial Intelligence and Big Data Analytics in m-Health: A Healthcare System Perspective, J. Healthc. Eng. 2020 (2020) 8894694. https://doi.org/10.1155/2020/8894694. J.T. Schwartz, M. Gao, E.A. Geng, K.S. Mody, C.M. Mikhail, S.K. Cho, Applications of Machine Learning Using Electronic Medical Records in Spine Surgery, Neurospine 16 (2019) 643–653. https://doi.org/10.14245/ns.1938386.193. A. Shinozaki, Electronic Medical Records and Machine Learning in Approaches to Drug Development, in: Artif. Intell. Oncol. Drug Discov. Dev., IntechOpen, 2020. https://doi.org/10.5772/intechopen.92613. F.M. Syed, F.K.E. S, AI in Securing Electronic Health Records (EHR) Systems, Int. J. Adv. Eng. Technol. Innov. 1 (2024) 593–620. K. Kawamoto, J. Finkelstein, G.D. Fiol, Implementing Machine Learning in the Electronic Health Record: Checklist of Essential Considerations, Mayo Clin. Proc. 98 (2023) 366–369. https://doi.org/10.1016/j.mayocp.2023.01.013. S.F. Weng, J. Reps, J. Kai, J.M. Garibaldi, N. Qureshi, Can machine-learning improve cardiovascular risk prediction using routine clinical data?, PLoS ONE 12 (2017) e0174944. https://doi.org/10.1371/journal.pone.0174944. B. Critelli, A. Hassan, I. Lahooti, L. Noh, J.S. Park, K. Tong, A. Lahooti, N. Matzko, J.N. Adams, L. Liss, J. Quion, D. Restrepo, M. Nikahd, S. Culp, A. Lacy-Hulbert, C. Speake, J. Buxbaum, J. Bischof, C. Yazici, A.E. Phillips, S. Terp, A. Weissman, D. Conwell, P. Hart, M. Ramsey, S. Krishna, S. Han, E. Park, R. Shah, V. Akshintala, J.A. Windsor, N.K. Mull, G.I. Papachristou, L.A. Celi, P.J. Lee, A Systematic Review of Machine Learning-based Prognostic Models for Acute Pancreatitis: Towards Improving Methods and Reporting Quality, (2024) 2024.06.26.24309389. https://doi.org/10.1101/2024.06.26.24309389. T. Endebu, G. Taye, A. Addissie, A. Deksisa, W. Deressa, Electronic medical record-based prediction models developed and deployed in the HIV care continuum: a systematic review, Discov. Health Syst. 3 (2024) 25. https://doi.org/10.1007/s44250-024-00092-8. J.P. Ridgway, A. Lee, S. Devlin, J. Kerman, A. Mayampurath, Machine Learning and Clinical Informatics for Improving HIV Care Continuum Outcomes, Curr. HIV/AIDS Rep. 18 (2021) 229–236. https://doi.org/10.1007/s11904-021-00552-3. R.J. Chin, D. Sangmanee, L. Piergallini, PEPFAR Funding and Reduction in HIV Infection Rates in 12 Focus Sub-Saharan African Countries: A Quantitative Analysis, Int. J. MCH AIDS 3 (2015) 150. M. Pal, S. Parija, G. Panda, K. Dhama, R.K. Mohapatra, Risk prediction of cardiovascular disease using machine learning classifiers, Open Med. 17 (2022) 1100–1113. https://doi.org/10.1515/med-2022-0508. South Africa, (n.d.). https://www.unaids.org/en/regionscountries/countries/southafrica (accessed December 17, 2024). T.G. Dietterich, Ensemble Methods in Machine Learning, in: Mult. Classif. Syst., Springer, Berlin, Heidelberg, 2000: pp. 1–15. https://doi.org/10.1007/3-540-45014-9_1. N. Rane, S.P. Choudhary, J. Rane, Ensemble deep learning and machine learning: applications, opportunities, challenges, and future directions, Stud. Med. Health Sci. 1 (2024) 18–41. https://doi.org/10.48185/smhs.v1i2.1225. L.R. Namamula, D. Chaytor, Effective ensemble learning approach for large-scale medical data analytics, Int. J. Syst. Assur. Eng. Manag. 15 (2024) 13–20. https://doi.org/10.1007/s13198-021-01552-7. S. Chilamkurthy, R. Ghosh, S. Tanamala, M. Biviji, N.G. Campeau, V.K. Venugopal, V. Mahajan, P. Rao, P. Warier, Development and Validation of Deep Learning Algorithms for Detection of Critical Findings in Head CT Scans, (2018). https://doi.org/10.48550/arXiv.1803.05854. J. Huang, C.X. Ling, Using AUC and accuracy in evaluating learning algorithms, IEEE Trans. Knowl. Data Eng. 17 (2005) 299–310. https://doi.org/10.1109/TKDE.2005.50. A.C. Alba, T. Agoritsas, M. Walsh, S. Hanna, A. Iorio, P.J. Devereaux, T. McGinn, G. Guyatt, Discrimination and Calibration of Clinical Prediction Models: Users’ Guides to the Medical Literature, JAMA 318 (2017) 1377–1384. https://doi.org/10.1001/jama.2017.12126. M.A.E. Binuya, E.G. Engelhardt, W. Schats, M.K. Schmidt, E.W. Steyerberg, Methodological guidance for the evaluation and updating of clinical prediction models: a systematic review, BMC Med. Res. Methodol. 22 (2022) 316. https://doi.org/10.1186/s12874-022-01801-8. B. Van Calster, D. Nieboer, Y. Vergouwe, B. De Cock, M.J. Pencina, E.W. Steyerberg, A calibration hierarchy for risk models was defined: from utopia to empirical data, J. Clin. Epidemiol. 74 (2016) 167–176. https://doi.org/10.1016/j.jclinepi.2015.12.005. S.W.J. Nijman, A.M. Leeuwenberg, I. Beekers, I. Verkouter, J.J.L. Jacobs, M.L. Bots, F.W. Asselbergs, K.G.M. Moons, T.P.A. Debray, Missing data is poorly handled and reported in prediction model studies using machine learning: a literature review, J. Clin. Epidemiol. 142 (2022) 218–229. https://doi.org/10.1016/j.jclinepi.2021.11.023. D.P. Misra, A.S. Yadav, Impact of Preprocessing Methods on Healthcare Predictions, (2019). https://doi.org/10.2139/ssrn.3349586. D.A. Newman, Missing Data: Five Practical Guidelines, Organ. Res. Methods 17 (2014) 372–411. https://doi.org/10.1177/1094428114548590. M. Afkanpour, E. Hosseinzadeh, H. Tabesh, Identify the most appropriate imputation method for handling missing values in clinical structured datasets: a systematic review, BMC Med. Res. Methodol. 24 (2024) 188. https://doi.org/10.1186/s12874-024-02310-6. S. van Buuren, Flexible Imputation of Missing Data, CRC Press, 2012. R. Rios, R.J. Miller, N. Manral, T. Sharir, A.J. Einstein, M.B. Fish, T.D. Ruddy, P.A. Kaufmann, A.J. Sinusas, E.J. Miller, T.M. Bateman, S. Dorbala, M.D. Carli, S.D.V. Kriekinge, P.B. Kavanagh, T. Parekh, J.X. Liang, D. Dey, D.S. Berman, P.J. Slomka, Handling missing values in machine learning to predict patient-specific risk of adverse cardiac events: insights from REFINE SPECT registry, Comput. Biol. Med. 145 (2022) 105449. https://doi.org/10.1016/j.compbiomed.2022.105449. A.J. Vickers, B. van Calster, E.W. Steyerberg, A simple, step-by-step guide to interpreting decision curve analysis, Diagn. Progn. Res. 3 (2019) 18. https://doi.org/10.1186/s41512-019-0064-7. Y. Wu, L. Xu, P. Yang, N. Lin, X. Huang, W. Pan, H. Li, P. Lin, B. Li, V. Bunpetch, C. Luo, Y. Jiang, D. Yang, M. Huang, T. Niu, Z. Ye, Survival Prediction in High-grade Osteosarcoma Using Radiomics of Diagnostic Computed Tomography, eBioMedicine 34 (2018) 27–34. https://doi.org/10.1016/j.ebiom.2018.07.006. A.J. Vickers, B. Van Claster, L. Wynants, E.W. Steyerberg, Decision curve analysis: confidence intervals and hypothesis testing for net benefit, Diagn. Progn. Res. 7 (2023) 11. https://doi.org/10.1186/s41512-023-00148-y. G.S. Collins, J.B. Reitsma, D.G. Altman, K.G.M. Moons, Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement, BMJ 350 (2015) g7594. https://doi.org/10.1136/bmj.g7594. Tables Table 1 and 2 are available in the Supplementary Files section. Additional Declarations No competing interests reported. Supplementary Files AdditionalFile1.docx AdditionalFile2.xlsx Table1.png Table 1: Characteristics of the included studies. Table2.png Table 2: Summary of model performance metrics using the CHARMS checklist Cite Share Download PDF Status: Under Review Version 1 posted Editorial decision: Revision requested 24 Jun, 2025 Editor assigned by journal 19 Jun, 2025 Reviews received at journal 02 Jun, 2025 Reviewers agreed at journal 12 May, 2025 Reviews received at journal 21 Apr, 2025 Reviewers agreed at journal 21 Apr, 2025 Reviewers invited by journal 21 Apr, 2025 Submission checks completed at journal 31 Mar, 2025 First submitted to journal 29 Mar, 2025 You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-5810875","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Systematic Review","associatedPublications":[],"authors":[{"id":445700714,"identity":"92b834e5-02d8-486b-8771-06daf9495e5f","order_by":0,"name":"Williams Kwarah","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAAwElEQVRIiWNgGAWjYBACgwNAgrHBBkQ2HiBKi2EDWEsamCROizEDWPFhMIc4LWYSyc8+/Nxx3m5t+2GgLTU20QS12EikGc/sPXM7eduZRKCWY2m5DQS1SCcYM/C23U42OwDUAnQhYS1m0umfGf+2nUs2O/+QSC3G0jnGzLxtB+zMbhBri+H8N8XMsm3JCWY3gLYkEOMXgzPHNzO+bbOzNzuf/vDBhxobwlpgIBGsMoFY5SBgT4riUTAKRsEoGGEAAMGTSGyFq5e/AAAAAElFTkSuQmCC","orcid":"","institution":"University of Ghana","correspondingAuthor":true,"prefix":"","firstName":"Williams","middleName":"","lastName":"Kwarah","suffix":""},{"id":445700715,"identity":"558dc0e2-2bad-4939-af56-e877a861a8e8","order_by":1,"name":"Frances Baaba da-Costa Vroom","email":"","orcid":"","institution":"University of Ghana","correspondingAuthor":false,"prefix":"","firstName":"Frances","middleName":"Baaba da-Costa","lastName":"Vroom","suffix":""},{"id":445700716,"identity":"3e89b3e4-9045-4129-8f72-dd1a2c6aa478","order_by":2,"name":"Duah Dwomoh","email":"","orcid":"","institution":"University of Ghana","correspondingAuthor":false,"prefix":"","firstName":"Duah","middleName":"","lastName":"Dwomoh","suffix":""},{"id":445700717,"identity":"c7b53804-189d-4c3f-b560-32a4b477c7bf","order_by":3,"name":"Samuel Bosomprah","email":"","orcid":"","institution":"University of Ghana","correspondingAuthor":false,"prefix":"","firstName":"Samuel","middleName":"","lastName":"Bosomprah","suffix":""}],"badges":[],"createdAt":"2025-01-11 18:53:08","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-5810875/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-5810875/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":81119855,"identity":"e99adb37-2c54-493c-a3e6-2a7938ce5568","added_by":"auto","created_at":"2025-04-22 12:38:07","extension":"jpeg","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":737948,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003ePRISMA flow of article selection\u003c/strong\u003e\u003c/p\u003e","description":"","filename":"floatimage1.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-5810875/v1/42438d921bd93ec53e1fb121.jpeg"},{"id":81119850,"identity":"40647464-9930-42ee-9e70-3f6e64d8c365","added_by":"auto","created_at":"2025-04-22 12:38:07","extension":"jpeg","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":281199,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eSummary of risk of bias assessment\u003c/strong\u003e\u003c/p\u003e","description":"","filename":"floatimage4.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-5810875/v1/ba0fe502c181f24571b00831.jpeg"},{"id":81121388,"identity":"bc89e742-37a2-4381-bb89-ff8683ca0095","added_by":"auto","created_at":"2025-04-22 12:54:07","extension":"jpeg","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":224291,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eSummary of applicability assessment\u003c/strong\u003e\u003c/p\u003e","description":"","filename":"floatimage5.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-5810875/v1/7300b0dc4600ccba635e6375.jpeg"},{"id":81121741,"identity":"43379eb0-5835-49b2-9e3e-2f61e3586af2","added_by":"auto","created_at":"2025-04-22 13:02:08","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":1935389,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-5810875/v1/9f3d9768-1fea-4ce6-a5ad-3808b982fd0b.pdf"},{"id":81119853,"identity":"916e8ecb-5e2d-46e7-8c58-4152497ff1e3","added_by":"auto","created_at":"2025-04-22 12:38:07","extension":"docx","order_by":1,"title":"","display":"","copyAsset":false,"role":"supplement","size":191213,"visible":true,"origin":"","legend":"","description":"","filename":"AdditionalFile1.docx","url":"https://assets-eu.researchsquare.com/files/rs-5810875/v1/7f3ce25fd6b4d271545061ef.docx"},{"id":81120528,"identity":"0cfb8c39-0486-4df0-ba2b-a40d96f8c218","added_by":"auto","created_at":"2025-04-22 12:46:07","extension":"xlsx","order_by":2,"title":"","display":"","copyAsset":false,"role":"supplement","size":275053,"visible":true,"origin":"","legend":"","description":"","filename":"AdditionalFile2.xlsx","url":"https://assets-eu.researchsquare.com/files/rs-5810875/v1/0d662346f47e503878b0a2a2.xlsx"},{"id":81120527,"identity":"4f31ad75-9238-4820-9cc8-c501613639d6","added_by":"auto","created_at":"2025-04-22 12:46:07","extension":"png","order_by":3,"title":"","display":"","copyAsset":false,"role":"supplement","size":85790,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eTable 1: Characteristics of the included studies.\u003c/strong\u003e\u003c/p\u003e","description":"","filename":"Table1.png","url":"https://assets-eu.researchsquare.com/files/rs-5810875/v1/c5e24aea6cd5baf5c75c3ff8.png"},{"id":81120530,"identity":"a65e491b-a68d-4de0-ae58-a2a8afc4fa2b","added_by":"auto","created_at":"2025-04-22 12:46:08","extension":"png","order_by":4,"title":"","display":"","copyAsset":false,"role":"supplement","size":755612,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eTable 2: Summary of model performance metrics using the CHARMS checklist\u003c/strong\u003e\u003c/p\u003e","description":"","filename":"Table2.png","url":"https://assets-eu.researchsquare.com/files/rs-5810875/v1/90f4cd1d137beffc85abc88b.png"}],"financialInterests":"No competing interests reported.","formattedTitle":"Evaluating Machine Learning models for predicting HIV treatment interruption: a systematic review of accuracy, validity, and applicability","fulltext":[{"header":"Introduction","content":"\u003cp\u003eHuman immunodeficiency virus (HIV) treatment interruption poses a significant challenge to global efforts in the HIV/AIDS epidemic response. In 2022, an estimated 39\u0026nbsp;million people were living with HIV (PLHIV) globally, with an estimated 1.3\u0026nbsp;million new infections and 630,000 deaths reported [\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e]. The burden of HIV infection is disproportionately high in sub-Saharan Africa, Asia, and the Pacific, which together account for about 88% of all cases [\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e]. Despite the availability of antiretroviral therapy (ART), which has dramatically reduced the progression of HIV to AIDS and decreased AIDS-related mortality, many individuals living with HIV struggle to maintain consistent adherence to their treatment regimen [\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e, \u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e]. It has been estimated that only 46\u0026ndash;85% of patients continue to stay on ART two years after initiation [\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e, \u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e]. This lack of adherence is particularly concerning given that, when left untreated, HIV weakens the immune system and can lead to life-threatening complications [\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e]. People who stay in treatment are economically viable and productive to their families and the community [\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e]. Interrupting HIV treatment may result in viral rebound, deterioration of the immune system, heightened transmission risk, and the development of drug resistance, thereby compromising both individual health and community prevention initiatives. The situation places significant pressure on healthcare systems and compromises public health initiatives [\u003cspan additionalcitationids=\"CR9 CR10\" citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eImproving ART adherence is critical to achieving global HIV/AIDS control goals. While current strategies to address treatment interruption primarily focus on re-engaging patients after missed doses [\u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e, \u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e], these reactive measures often fall short of preventing the associated health risks and potential for increased transmission. The ability to predict treatment interruptions before they occur could revolutionize HIV care by enabling healthcare providers to implement targeted and proactive interventions that keep patients on therapy, thus enhancing their chances of achieving and sustaining viral suppression. Artificial Intelligence (AI) and ML offer powerful tools for developing such predictive models due to their capacity to dynamically analyze large, complex datasets and uncover patterns that traditional methods might miss [\u003cspan additionalcitationids=\"CR15 CR16 CR17\" citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e18\u003c/span\u003e]. Despite the promise of these technologies, there remains a significant evidence gap in their application to HIV treatment adherence, particularly in low-resource settings where the burden of the disease is greatest. Addressing this gap through systematic evaluation of existing predictive models is crucial for advancing the use of AI and ML in HIV care. This can lead to more effective and personalized treatment strategies that can help meet the ambitious UNAIDS 95-95-95 targets by 2030 [\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eThis systematic review aimed to evaluate the effectiveness of machine learning-based predictive models in forecasting HIV treatment interruptions. Specifically, the review (1) identified the types of predictive models previously developed, (2) assessed their accuracy and applicability in various settings, and (3) determined which models have been validated and how they performed in different populations. The impact of this review will provide insights that can guide the integration of advanced predictive technologies into HIV care programs, potentially improving patient retention, optimizing treatment outcomes, and supporting global efforts to eliminate HIV as a public health threat by 2030.\u003c/p\u003e"},{"header":"Methods","content":"\u003ch3\u003e\u003cstrong\u003eSearch strategy and eligibility criteria\u003c/strong\u003e\u003c/h3\u003e\n\u003cp\u003eWe searched multiple electronic databases, including Scopus, PubMed, The Lancet, BioMed Central (BMC) Public Health, ScienceDirect, Google Scholar, and Cochrane Library. Our search covered publications from January 1990 to September 2024. We searched using a combination of Medical Subject Headings (MeSH) and free-text terms. The key terms included \u0026ldquo;HIV\u0026rdquo;, \u0026ldquo;Human Immunodeficiency Virus\u0026rdquo;, \u0026ldquo;AIDS\u0026rdquo;, and \u0026ldquo;Acquired Immunodeficiency Syndrome\u0026rdquo; for HIV-related concepts, \u0026ldquo;Machine Learning\u0026rdquo;, \u0026ldquo;ML\u0026rdquo;, \u0026ldquo;Artificial Intelligence\u0026rdquo;, \u0026ldquo;AI\u0026rdquo;, \u0026ldquo;Neural Networks\u0026rdquo;, and \u0026ldquo;Predictive Modeling\u0026rdquo; for machine learning concepts, and \u0026ldquo;Treatment Interruption\u0026rdquo;, \u0026ldquo;Loss to Follow-Up\u0026rdquo;, \u0026ldquo;Non-adherence\u0026rdquo;, \u0026ldquo;Default\u0026rdquo;, and \u0026ldquo;Treatment Discontinuation\u0026rdquo; for treatment adherence concepts. These terms were combined using Boolean operators (AND, OR) to ensure a broad and inclusive search. Details of the search strategy for each database is shown in the Additional File 1.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eWe applied specific eligibility criteria to select studies for inclusion. Eligible studies focused on developing or validating prediction models for HIV treatment interruption at the individual level using machine learning methods. We only included studies published in English. We included studies that focused on HIV treatment interruption defined as missing a scheduled clinic or pharmacy appointment by at least 28 days. We excluded studies that identified predictors without focusing on prediction models, and studies lacking full-text availability. Reviews, commentaries, conference abstracts, letters, reports, and opinions were excluded. In addition to database searches, we manually reviewed the reference lists of the included studies to identify additional relevant articles. To capture recent and unpublished research, we searched preprint servers such as bioRxiv, medRxiv, and arXiv. The corresponding authors of the included articles were emailed to seek further information and clarity. The search strategy was carefully documented (Additional File), and articles were managed using Zotero 6.0.37 reference management software [19]. The Preferred Reporting Items for Systematic Reviews and Meta-analysis (PRISMA) statements [20] and the conduct of systematic reviews [21] guided the review. A protocol for this review was registered on PROSPERO CRD42024578109. \u0026nbsp;\u003c/p\u003e\n\u003ch3\u003e\u003cstrong\u003eSelection process\u003c/strong\u003e\u003c/h3\u003e\n\u003cp\u003eArticle selection was conducted in multiple stages to ensure that only studies meeting the predefined inclusion criteria were included. Initially, two independent reviewers (WK and GJP) screened the titles and abstracts of all records retrieved from the database searches to identify potentially relevant studies. We resolved any disagreements between reviewers during the article selection process through discussion, and a third reviewer (ZN) was available to adjudicate unresolved disputes. To enhance the rigor of the selection process, systematic review software Distiller SR 2.35 [22] was used to assist in the identification and removal of duplicate records before the screening began.\u0026nbsp;\u003c/p\u003e\n\u003ch3\u003e\u003cstrong\u003eData extraction\u0026nbsp;\u003c/strong\u003e\u003c/h3\u003e\n\u003cp\u003eTwo independent reviewers (WK and GJP) extracted data from the selected studies to ensure accuracy and consistency. Each reviewer independently extracted data, using the standardized CHecklist for critical\u0026nbsp;Appraisal and data extraction for systematic\u0026nbsp;Reviews of prediction\u0026nbsp;Modelling\u0026nbsp;Studies (CHARMS) tool [23,24]. CHARMS was developed for systematic reviews of prognostic or diagnostic prediction models without external validation, with external validation, or external prediction model validation with or without model updating. The data collected included the data sources, study characteristics, details of the predictive models, outcomes, and performance metrics\u0026nbsp;[24]. We resolved any disagreements between reviewers during the data extraction process through discussion, and a third reviewer (ZN) was available to adjudicate unresolved disputes. The reviewers manually extracted all data and then cross-verified it to maintain the integrity of the data collection process. A consolidated final completed CHARMS tool was compiled for this review.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eQuality and risk of bias assessment\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eWe used the Prediction Model Risk of Bias Assessment Tool (PROBAST) [25] to assess the risk of bias (ROB) in the included studies. The PROBAST was designed to evaluate the risk of bias and applicability in prediction model studies. The tool evaluated four key domains: participants, predictors, outcomes, and analysis. There were two questions on participants, three questions on predictors, six questions about outcomes, and nine questions linked to the statistical analysis. Responses to these questions were either \u0026quot;yes,\u0026quot; \u0026quot;probably yes,\u0026quot; \u0026quot;probably no,\u0026quot; \u0026quot;no,\u0026quot; or \u0026quot;no information.\u0026quot; The ROB was classified as either low, high, or unclear based on the responses within these domains. A domain was classified as high-risk if it included at least one question that has been answered with either \u0026quot;no\u0026quot; or \u0026quot;probably no; \u0026quot; low-risk if all the questions indicated as \u0026quot;yes\u0026quot; or \u0026quot;probably yes;\u0026quot; and unclear if there is no information in the responses. If all domains were assessed as having a low risk, then the overall risk of bias was classified as low. However, if at least one domain was determined to have a high risk, then the overall risk of bias was classified as high. If there was a recognized concern for bias in at least one area and the level of concern was low for all other domains, it was classified as having a moderate level of concern for bias. Two reviewers\u0026nbsp;(WK and GJP)\u0026nbsp;independently evaluated the risk of bias in each included study. When the reviewers disagreed on the risk of bias judgment, the discrepancies were discussed to reach a consensus. If the disagreement persisted, a third reviewer (ZN) was consulted to decide. We conducted all evaluations manually and documented the results of the risk of bias assessments in detail, with summary judgments presented in the form of charts to facilitate a clear understanding of the quality and reliability of the included studies.\u003c/p\u003e\n\u003ch3\u003e\u003cstrong\u003eSynthesis and Analysis\u003c/strong\u003e\u003c/h3\u003e\n\u003cp\u003eWe tabulated the results of individual studies to provide a clear and organized presentation of the key findings. This included details such as study characteristics, model performance metrics (e.g., area under the receiver-operating characteristic curve, calibration statistics), and risk of bias assessments. We used visual displays, including charts to enhance the clarity of the results and to facilitate the comparison of study outcomes. For the synthesis of results, we used a narrative synthesis approach due to the anticipated heterogeneity of the included studies, particularly in terms of model types, outcome measures, and study populations. This approach allowed us to systematically describe and compare the predictive models, highlighting common themes and differences among the studies. We did not perform a meta-analysis because there were insufficient external validation studies of the same index model to justify a quantitative synthesis [21]. We explored possible causes of heterogeneity by conducting subgroup analyses, where applicable. These analyses considered factors such as the type of machine learning model used, population characteristics, and study setting. The synthesis followed guidelines from the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD) statement [26], CHARMS checklist [24], and PROBAST [25].\u003c/p\u003e"},{"header":"Results","content":"\u003cp\u003e\u003cstrong\u003eCharacteristics of included studies\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eOur search identified 116,672 studies, of which 9 met the inclusion criteria (Figure 1). Seven of these studies focused on developing predictive models [27\u0026ndash;33] , while two included both model development and validation [34,35]. \u0026nbsp;Six studies were conducted in Africa [27\u0026ndash;30,33,34], of which three were in South Africa, one in Tanzania, and one combining data from Nigeria and Mozambique. The remaining three studies were in the United States of America [31,32,35] (Table 1). These studies were published between 2018 and 2024, with the majority published in 2023, and 2022. Seven studies were conducted in public healthcare facilities, while two were conducted in university clinics. Seven studies relied on retrospective cohort data, while two used existing registries (Table 1).\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e\u003cem\u003eModel performance metrics\u003c/em\u003e\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eAmong the nine studies selected, a total of 12 machine learning models were reported, with nine focused on model development and three on model validation (Table 2). The median sample size across studies was 136,415 (interquartile range: 178\u0026ndash;450,000), though one model was developed using a sample size of less than 1000 participants. On average, 15 predictors (SD=4.0) were included in the final models. Ensemble learning techniques were the most frequently used algorithms, accounting for 92% of the total models. These included Random Forest (3 models), Adaptive Boosting (AdaBoost, 3 models), Extreme Gradient Boosting (XGBoost, 2 models), Decision Trees (2 models), and Categorical Boosting (CatBoost, 1 model) (Table 2). Logistic regression was used in only one model. \u0026nbsp;\u003c/p\u003e\n\u003cp\u003eAll 12 models reported internal validation. These included random sample split (6), cross-validation (4), and a combination of random sample split and cross-validation (2) (Table 2). Model performance was primarily assessed using the c-statistic or area under the receiver operating characteristic curve (AUC), with an average AUC of 0.668 (SD: 0.07). Some models also reported additional metrics, including accuracy, sensitivity, specificity, negative predictive value (NPV), and positive predictive value (PPV) (Table 2). Notably, two models reported only PPV, while another two reported the Mathews Correlation Coefficient. Model calibration methods were used in just three models, which reported an average F1 score of 0.292 (SD: 0.01) alongside the AUC. None of the studies used Decision Curve Analysis (DCA) to assess clinical value and implications, a significant limitation in evaluating the practical utility of the models. However, one study addressed model utility by gathering feedback from healthcare workers. Additional information is shown in Additional File 1.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eRisk of bias assessment\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eWe reported the risk of bias assessment for the twelve models using the PROBAST tool (Figure 2). Of these, nine models (75.0%) were rated as having a high risk of bias, two models (16.7%) were rated low risk, and one model (8.3%) had an unclear risk of bias. A notable majority (58.3%) expressed high risk in the statistical analysis domain. For example, nearly half of the models failed to report how missing data was handled, and ten models (83.3%) did not disclose the extent of missing data. Furthermore, only three models (25.0%) provided details on calibration measures, which are important for ensuring the reliability of predictions. None of the studies reported DCA or other methods to assess clinical utility, highlighting a critical gap in evaluating the practical application of these models. Additional details on the risk of bias analysis are provided in the supplementary material (Additional file 1).\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e\u003cem\u003eApplicability assessment\u003c/em\u003e\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eWe evaluated the applicability of the models for use in the intended population and primary healthcare settings. Overall, 83% of the models were rated as low concern, indicating their suitability for primary healthcare use. However, 17% were rated as high concern, reflecting limitations in certain aspects of model development (Figure 3). Predictors were rated as low concern, suggesting that the included predictors were relevant to the target population and routinely collected in clinical settings. Similarly, the outcome domain was rated as low concern in 92% of the models, while 8% were marked as unclear due to insufficient reporting of key details. Three models were externally validated, but only two reported calibration measures, with an average F1 score of 0.2935, alongside c-statistic (AUC) values. These validations were done using datasets received from registries of people living with HIV and scheduled for clinical appointments. While sensitivity, specificity, PPV and NPV were included, one model lacked critical details on eligibility criteria and missing data handling. None of the externally validated models assessed clinical utility. Further details are provided in the supplementary material (Additional File 1).\u003c/p\u003e"},{"header":"Discussions","content":"\u003cp\u003eThis review examined 12 machine learning models developed to predict interruptions in HIV treatment, with most relying on advanced ensemble techniques like Random Forest, AdaBoost, and XGBoost. These models were built using data from large retrospective cohorts, with a median sample size of 120,000 participants, and were validated internally through methods like cross-validation and random sample splitting. The models demonstrated acceptable predictive performance, with an average AUC-ROC of 0.668, and utilized data commonly collected in clinical settings, making them practical for real-world use. For prognostic predictive models, AUC of 0.5\u0026ndash;0.7 suggests poor discrimination, 0.7\u0026ndash;0.8 is considered acceptable, 0.8\u0026ndash;0.9 excellent, and \u0026gt;\u0026thinsp;0.9 as outstanding [\u003cspan citationid=\"CR36\" class=\"CitationRef\"\u003e36\u003c/span\u003e, \u003cspan citationid=\"CR37\" class=\"CitationRef\"\u003e37\u003c/span\u003e]. Although only two models were externally validated, most models showed strong potential for application in primary healthcare, highlighting their promise in improving adherence and supporting HIV care strategies.\u003c/p\u003e \u003cp\u003eElectronic Medical Records (EMRs) are increasingly prevalent worldwide, including in Africa [\u003cspan citationid=\"CR38\" class=\"CitationRef\"\u003e38\u003c/span\u003e], facilitating the ongoing accumulation of extensive healthcare data and enabling big data analytics [\u003cspan additionalcitationids=\"CR40 CR41 CR42 CR43\" citationid=\"CR39\" class=\"CitationRef\"\u003e39\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR44\" class=\"CitationRef\"\u003e44\u003c/span\u003e], as well as the application of machine learning and artificial intelligence [\u003cspan citationid=\"CR42\" class=\"CitationRef\"\u003e42\u003c/span\u003e, \u003cspan citationid=\"CR45\" class=\"CitationRef\"\u003e45\u003c/span\u003e, \u003cspan citationid=\"CR46\" class=\"CitationRef\"\u003e46\u003c/span\u003e]. Numerous prognostic studies have employed EMR data to create models for predicting individual diagnoses of HIV, healthcare attendance, and viral load suppression [\u003cspan additionalcitationids=\"CR48\" citationid=\"CR47\" class=\"CitationRef\"\u003e47\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR49\" class=\"CitationRef\"\u003e49\u003c/span\u003e]. The growing utilization of these analytic tools is likely due to the interest in employing predictive models as decision support instruments at the point of care. Moreover, executing focused, high-impact treatments with limited resources in underprivileged healthcare environments is essential [\u003cspan citationid=\"CR50\" class=\"CitationRef\"\u003e50\u003c/span\u003e, \u003cspan citationid=\"CR51\" class=\"CitationRef\"\u003e51\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eTwo-thirds of the research was conducted in Africa, predominantly in South Africa, an area characterized by a high incidence of HIV [\u003cspan citationid=\"CR52\" class=\"CitationRef\"\u003e52\u003c/span\u003e]. This emphasis is praiseworthy, yet it constrains the comprehension of predictive model application in areas with low prevalence. Utilizing data from high-prevalence regions, such as South Africa, offers essential insights into models that help tackle adherence difficulties in analogous circumstances. This emphasis requires careful consideration when extrapolating results to areas with varying healthcare systems and compliance challenges. The research conducted in the United States of America, however limited in number, offered a divergent viewpoint, highlighting the necessity for regionally appropriate models.\u003c/p\u003e \u003cp\u003eThe machine learning techniques in our analysis have shown significant potential in forecasting IIT by utilizing routinely gathered clinical data. Ensemble learning methodologies, specifically Random Forest, AdaBoost, and XGBoost, were significant, collectively representing 91.7% of the models created. Previous studies have demonstrated that ensemble approaches effectively address the complex, nonlinear interactions prevalent in healthcare datasets [\u003cspan citationid=\"CR53\" class=\"CitationRef\"\u003e53\u003c/span\u003e, \u003cspan citationid=\"CR54\" class=\"CitationRef\"\u003e54\u003c/span\u003e]. These algorithms have achieved above 90% accuracy across many datasets [\u003cspan citationid=\"CR55\" class=\"CitationRef\"\u003e55\u003c/span\u003e]. Ensemble algorithms are beneficial because of their resilience to overfitting and their capacity to handle extensive feature sets. The outcomes of our review correspond with these results. Upon analysis, most models in our study provided the c-statistic (AUC), which evaluates the discriminatory capability of predictive models. The average AUC of 0.668 in our analysis aligns with the findings of Chilamkurthy et al., 2018 who stated that whereas ML models excel at distinguishing different outcomes, their clinical performance criteria, such as accuracy, sensitivity, and specificity, frequently lack efficacy due to unbalanced datasets or inadequate predictor selection often found in healthcare datasets. Other studies have emphasized the need for ML algorithms to employ the AUC as a more effective and superior metric in conjunction with calibration and decision curve analysis for assessing model performance in comparison to accuracy [\u003cspan citationid=\"CR57\" class=\"CitationRef\"\u003e57\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eWe discovered in our review that several studies failed to include calibration and clinical efficacy in their reports. Although there are many possible problems in the creation and validation of prediction models, it is essential to disclose calibration measurements, which are vital components of statistical performance [\u003cspan citationid=\"CR58\" class=\"CitationRef\"\u003e58\u003c/span\u003e, \u003cspan citationid=\"CR59\" class=\"CitationRef\"\u003e59\u003c/span\u003e]. Calibration measures are essential since they guarantee that model prediction probabilities correspond with real probabilities, hence ensuring model dependability. Merely 25% of the research included in our evaluation assessed model calibration. In the absence of calibration, predictive models may provide probabilities that inaccurately reflect actual hazards, hence compromising their therapeutic relevance [\u003cspan citationid=\"CR60\" class=\"CitationRef\"\u003e60\u003c/span\u003e]. We noted significant problems with the ROB in the developed prediction models. Seventy-five percent of the reviewed models were classified as exhibiting a high risk of bias, mostly due to inadequacies in the statistical analysis and data management. Approximately 83.3% of models did not disclose the magnitude of missing data or the methodologies employed to mitigate it, underscoring its significance as a key concern. This conclusion aligns with prior research demonstrating that most predictive model studies do not report their methods for addressing missing data [\u003cspan citationid=\"CR61\" class=\"CitationRef\"\u003e61\u003c/span\u003e]. Missing data is a widespread problem in retrospective healthcare datasets and, if not properly managed, can compromise model performance and integrity [\u003cspan additionalcitationids=\"CR62\" citationid=\"CR61\" class=\"CitationRef\"\u003e61\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR63\" class=\"CitationRef\"\u003e63\u003c/span\u003e]. Several studies have utilized imputation approaches, precisely predicting missing values to mirror reality, which increases the probability of acquiring high-quality and reusable data [\u003cspan citationid=\"CR64\" class=\"CitationRef\"\u003e64\u003c/span\u003e]. However, if this is not handled appropriately, it can lead to systemic biases and diminish the validity and integrity of models, particularly in datasets utilized in healthcare research [\u003cspan citationid=\"CR65\" class=\"CitationRef\"\u003e65\u003c/span\u003e, \u003cspan citationid=\"CR66\" class=\"CitationRef\"\u003e66\u003c/span\u003e]. Furthermore, our review observed the lack of decision curve analysis (DCA) in all the studies included. DCA is essential for assessing a model's clinical relevance by weighing the benefits and risks at different decision thresholds, rendering its exclusion a significant constraint [\u003cspan citationid=\"CR67\" class=\"CitationRef\"\u003e67\u003c/span\u003e]. DCA are essential metrics that enhance calibration and discrimination measures in machine learning models [\u003cspan citationid=\"CR68\" class=\"CitationRef\"\u003e68\u003c/span\u003e] and help in incorporating the clinical consequences of using a model. Besides conducting DCA, net benefit analysis is an alternative measure to assess the applicability of models in real-life situations.\u003c/p\u003e \u003cp\u003eThe reviewed models show potential for improving IIT predictions; nevertheless, their reliability and applicability in clinical environments are constrained, as shown in the risk of bias and applicability results. Overall, an 83% applicability score was achieved for the reviewed models, suggesting their broad appropriateness for the target groups and settings. This result indicates the incorporation of frequently gathered predictors in clinical contexts, including demographic information, adherence records, and clinical indicators, which improve the practicality of applying these models in actual healthcare settings [\u003cspan citationid=\"CR69\" class=\"CitationRef\"\u003e69\u003c/span\u003e]. Ninety-two percent of models assessed the outcome domains as minimal concern; nevertheless, the absence of external validation and decision curve analysis presents serious constraints in the practical use in guiding clinical decisions [\u003cspan citationid=\"CR60\" class=\"CitationRef\"\u003e60\u003c/span\u003e]. For optimal real-world applicability, models must address these deficiencies by integrating external validation across diverse contexts and evaluating clinical significance using methodologies such as DCA, net benefit analysis or net reclassification improvement assessments. Aligning with clinical processes is crucial for maximizing the efficacy of machine learning in enhancing adherence and minimizing inappropriate treatment exclusion in HIV care. Enhancing future research through stringent reporting standards and robust statistical methodologies, such as those outlined in the TRIPOD recommendations, is essential to mitigate biases and improve the reliability of predictive modeling in HIV care [\u003cspan citationid=\"CR70\" class=\"CitationRef\"\u003e70\u003c/span\u003e].\u003c/p\u003e \u003cdiv id=\"Sec14\" class=\"Section2\"\u003e \u003ch2\u003eLimitations\u003c/h2\u003e \u003cp\u003eThe results of this review should be interpreted with certain limitations in mind. First, the review included only journal articles published in English with free-text availability, and the search was conducted across a limited number of databases, which may introduce language and publication bias. Excluding studies conducted in other languages besides English presented a potential selection bias. This potentially limits the generalizability of the findings to English speaking settings. To address potential selection and publication bias stemming from the restricted database search, we supplemented our efforts by conducting backward and forward citation searches in Google Scholar and reviewing article references. Most of these studies were conducted in resource-poor settings, which made it difficult for validation studies to be carried out. It is recommended that in such circumstances, validation studies should be conducted on different datasets or settings\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec15\" class=\"Section2\"\u003e \u003ch2\u003eRecommendations for future research\u003c/h2\u003e \u003cp\u003eFuture studies should prioritize implementing robust external validation across diverse populations and geographic regions, which is essential to evaluate model performance under varying demographic, clinical, and systemic conditions, ensuring reliability in real-world applications. The inclusion of socio-cultural and structural factors in model development should be considered in future research. Also, addressing missing data is critical for enhancing model accuracy and reliability. Future studies should adopt systematic strategies such as multiple imputations or sensitivity analyses and adhere to standardized reporting guidelines like TRIPOD. Finally, incorporating decision curve analysis (DCA) into model assessment is recommended to bridge the gap between statistical performance and practical, real-world impact.\u003c/p\u003e \u003c/div\u003e"},{"header":"Conclusion","content":"\u003cp\u003eThis study provides key insights into the current state of predictive modeling for HIV treatment interruptions. Machine learning, particularly ensemble learning techniques, is popularly used with retrospective cohort data to address adherence issues in HIV programs, demonstrating moderate accuracy and applicability in primary healthcare settings. However, critical shortcomings, including insufficient calibration reporting, lack of decision curve analysis (DCA), and limited external validation, restrict the models\u0026rsquo; clinical utility and generalizability. Predictive modeling holds significant promise in supporting countries to achieve the UNAIDS 95-95-95 targets by advancing equitable access to medications, high treatment retention rates, and achieving widespread viral load suppression.\u003c/p\u003e"},{"header":"Abbreviations","content":"\u003ctable border=\"0\" cellspacing=\"0\" cellpadding=\"0\" width=\"653\"\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd valign=\"bottom\" style=\"width: 105px;\"\u003e\n \u003cp\u003eAbbreviation\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"bottom\" style=\"width: 547px;\"\u003e\n \u003cp\u003eMeaning\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"bottom\" style=\"width: 105px;\"\u003e\n \u003cp\u003eHIV\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"bottom\" style=\"width: 547px;\"\u003e\n \u003cp\u003eHuman Immune Virus\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"bottom\" style=\"width: 105px;\"\u003e\n \u003cp\u003eAIDS\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"bottom\" style=\"width: 547px;\"\u003e\n \u003cp\u003eAcquired Immuno-Deficiency Syndrome\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 105px;\"\u003e\n \u003cp\u003ePLHIV\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"bottom\" style=\"width: 547px;\"\u003e\n \u003cp\u003ePeople Living with HIV\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 105px;\"\u003e\n \u003cp\u003eART\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"bottom\" style=\"width: 547px;\"\u003e\n \u003cp\u003eAnti-Retroviral Therapy\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 105px;\"\u003e\n \u003cp\u003eML\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"bottom\" style=\"width: 547px;\"\u003e\n \u003cp\u003eMachine Learning\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 105px;\"\u003e\n \u003cp\u003eAI\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"bottom\" style=\"width: 547px;\"\u003e\n \u003cp\u003eArtificial Intelligence\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 105px;\"\u003e\n \u003cp\u003eUNAIDS\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"bottom\" style=\"width: 547px;\"\u003e\n \u003cp\u003eJoint United Nations Programme on HIV/AIDS\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 105px;\"\u003e\n \u003cp\u003ePRISMA\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"bottom\" style=\"width: 547px;\"\u003e\n \u003cp\u003ePreferred Reporting Items for Systematic reviews and Meta-Analyses\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 105px;\"\u003e\n \u003cp\u003ePROSPERO\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"bottom\" style=\"width: 547px;\"\u003e\n \u003cp\u003eInternational Prospective Register of Systematic Reviews\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 105px;\"\u003e\n \u003cp\u003eBMC\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"bottom\" style=\"width: 547px;\"\u003e\n \u003cp\u003eBioMed Central\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 105px;\"\u003e\n \u003cp\u003eMeSH\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"bottom\" style=\"width: 547px;\"\u003e\n \u003cp\u003eMedical Subject Headings\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 105px;\"\u003e\n \u003cp\u003eCHARMS\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"bottom\" style=\"width: 547px;\"\u003e\n \u003cp\u003eCHecklist for critical\u0026nbsp;Appraisal and data extraction for systematic\u0026nbsp;Reviews of prediction\u0026nbsp;Modelling\u0026nbsp;Studies\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 105px;\"\u003e\n \u003cp\u003ePROBAST\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"bottom\" style=\"width: 547px;\"\u003e\n \u003cp\u003ePrediction Model Risk of Bias Assessment Tool\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 105px;\"\u003e\n \u003cp\u003eROB\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"bottom\" style=\"width: 547px;\"\u003e\n \u003cp\u003eRisk of Bias\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 105px;\"\u003e\n \u003cp\u003eTRIPOD\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"bottom\" style=\"width: 547px;\"\u003e\n \u003cp\u003eTransparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 105px;\"\u003e\n \u003cp\u003eSD\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"bottom\" style=\"width: 547px;\"\u003e\n \u003cp\u003eStandard Deviation\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 105px;\"\u003e\n \u003cp\u003eXGBoost\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"bottom\" style=\"width: 547px;\"\u003e\n \u003cp\u003eExtreme Gradient Boosting\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 105px;\"\u003e\n \u003cp\u003eAdaBoost\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"bottom\" style=\"width: 547px;\"\u003e\n \u003cp\u003eAdaptive Boosting\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 105px;\"\u003e\n \u003cp\u003eCatBoost\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"bottom\" style=\"width: 547px;\"\u003e\n \u003cp\u003eCategorical Boosting\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 105px;\"\u003e\n \u003cp\u003eAUC-ROC\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"bottom\" style=\"width: 547px;\"\u003e\n \u003cp\u003eArea Under the Receiver Operating Characteristic Curve\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 105px;\"\u003e\n \u003cp\u003eAUC-PR\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"bottom\" style=\"width: 547px;\"\u003e\n \u003cp\u003eArea Under the Precision Recall\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 105px;\"\u003e\n \u003cp\u003eNPV\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"bottom\" style=\"width: 547px;\"\u003e\n \u003cp\u003eNegative Predictive Value\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 105px;\"\u003e\n \u003cp\u003ePPV\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"bottom\" style=\"width: 547px;\"\u003e\n \u003cp\u003ePositive Predictive Value\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 105px;\"\u003e\n \u003cp\u003eDCA\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"bottom\" style=\"width: 547px;\"\u003e\n \u003cp\u003eDecision Curve Analysis\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 105px;\"\u003e\n \u003cp\u003eEMR\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"bottom\" style=\"width: 547px;\"\u003e\n \u003cp\u003eElectronic Medical Records\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n\u003c/table\u003e"},{"header":"Declarations","content":"\u003cp\u003e\u003cstrong\u003eEthics approval and consent to participate\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eConsent to participate is not applicable. However, since this study is nested within another study on HIV treatment interruptions, ethical approval was received from the Ghana Health Service Ethics Review Committee with approval number GHS-ERC:003/08/24. All ethical principles were followed in this review.\u003cstrong\u003e\u0026nbsp;\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eConsent for publication\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eNot applicable\u003cstrong\u003e\u0026nbsp;\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAvailability of data and materials\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eAll data generated or analyzed during this study are part of the supplementary information Additional File 2.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eCompeting interests\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eAll authors declare that they have no competing interests.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eFunding\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eNo funding was secured for this study.\u003cstrong\u003e\u0026nbsp;\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAuthors\u0026apos; contributions\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eWK conceived the research topic, led data review and extraction, analyzed and interpreted the extracted data, and wrote the first draft of the manuscript. FBV, DD, and SB contributed to the methods, analysis, and reporting and reviewed the manuscript. All authors read and approved the final manuscript.\u003cstrong\u003e\u0026nbsp;\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAcknowledgements\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eWe would like to express our sincere gratitude to Jasmin Kwarah for generously providing the stationery that was crucial for the successful completion of this systematic review. Also, we would like to express sincere gratitude to Ekua Houphoet for her support in reviewing the manuscript.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\n\u003cli\u003eUNAIDS_FactSheet_en.pdf, (n.d.). https://www.unaids.org/sites/default/files/media_asset/UNAIDS_FactSheet_en.pdf (accessed December 17, 2024).\u003c/li\u003e\n\u003cli\u003eL. Frescura, P. Godfrey-Faussett, A. Feizzadeh A., W. El-Sadr, O. Syarif, P.D. Ghys, Achieving the 95 95 95 targets for all: A pathway to ending AIDS, PLoS ONE 17 (2022) e0272405. https://doi.org/10.1371/journal.pone.0272405.\u003c/li\u003e\n\u003cli\u003eF. Altice, O. Evuarherhe, S. Shina, G. Carter, A.C. Beaubrun, Adherence to HIV treatment regimens: systematic literature review and meta-analysis, Patient Prefer. Adherence 13 (2019) 475\u0026ndash;490. https://doi.org/10.2147/PPA.S192735.\u003c/li\u003e\n\u003cli\u003eG. Dubrocq, N. Rakhmanina, Antiretroviral therapy interruptions: impact on HIV treatment and transmission, HIVAIDS - Res. Palliat. Care 10 (2018) 91\u0026ndash;101. https://doi.org/10.2147/HIV.S141965.\u003c/li\u003e\n\u003cli\u003eU. Akpan, K. Kakanfo, O.D. Ekele, K. Ukpong, O. Toyo, P. Nwaokoro, E. James, S. Pandey, K. Olatubosun, M. Bateganya, Predictors of treatment interruption among patients on antiretroviral therapy in Akwa Ibom, Nigeria: outcomes after 12 months, AIDS Care 35 (2023) 114\u0026ndash;122. https://doi.org/10.1080/09540121.2022.2093826.\u003c/li\u003e\n\u003cli\u003eS. Rosen, M.P. Fox, C.J. Gill, Patient Retention in Antiretroviral Therapy Programs in Sub-Saharan Africa: A Systematic Review, PLoS Med. 4 (2007) e298. https://doi.org/10.1371/journal.pmed.0040298.\u003c/li\u003e\n\u003cli\u003eH. Thirumurthy, O. Gal\u0026aacute;rraga, B. Larson, S. Rosen, HIV Treatment Produces Economic Returns Through Increased Work And Education, And Warrants Continued US Support, Health Aff. Proj. Hope 31 (2012) 1470\u0026ndash;1477. https://doi.org/10.1377/hlthaff.2012.0217.\u003c/li\u003e\n\u003cli\u003eB. Jewell, J. Smith, T. Hallett, The Potential Impact of Interruptions to HIV Services: A Modelling Case Study for South Africa, (2020) 2020.04.22.20075861. https://doi.org/10.1101/2020.04.22.20075861.\u003c/li\u003e\n\u003cli\u003eE.J. Mills, A. Funk, S. Kanters, E. Kawuma, C. Cooper, B. Mukasa, M. Odit, Y. Karamagi, D. Mwehire, J. Nachega, S. Yaya, A. Featherstone, N. Ford, Long-Term Health Care Interruptions Among HIV-Positive Patients in Uganda, JAIDS J. Acquir. Immune Defic. Syndr. 63 (2013) e23. https://doi.org/10.1097/QAI.0b013e31828a3fb8.\u003c/li\u003e\n\u003cli\u003eC. Thomadakis, C.T. Yiannoutsos, N. Pantazis, L. Diero, A. Mwangi, B.S. Musick, K. Wools-Kaloustian, G. Touloumi, The Effect of HIV Treatment Interruption on Subsequent Immunological Response, Am. J. Epidemiol. 192 (2023) 1181\u0026ndash;1191. https://doi.org/10.1093/aje/kwad076.\u003c/li\u003e\n\u003cli\u003eA. Trickey, L. Zhang, C.T. Rentsch, N. Pantazis, R. Izquierdo, A. Antinori, G. Leierer, G. Burkholder, M. Cavassini, J. Palacio-Vieira, M.J. Gill, R. Teira, C. Stephan, N. Obel, J.-J. Vehreschild, T.R. Sterling, M. Van Der Valk, F. Bonnet, H.M. Crane, M.J. Silverberg, S.M. Ingle, J.A.C. Sterne, the A.T.C. Collaboration (ART-CC), Care interruptions and mortality among adults in Europe and North America, AIDS 38 (2024) 1533. https://doi.org/10.1097/QAD.0000000000003924.\u003c/li\u003e\n\u003cli\u003eS. Chamberlin, M. Mphande, K. Phiri, P. Kalande, K. Dovel, How HIV Clients Find Their Way Back to the ART Clinic: A Qualitative Study of Disengagement and Re-engagement with HIV Care in Malawi, AIDS Behav. 26 (2022) 674\u0026ndash;685. https://doi.org/10.1007/s10461-021-03427-1.\u003c/li\u003e\n\u003cli\u003eJ. Palacio-Vieira, J.M. Reyes-Urue\u0026ntilde;a, A. Imaz, A. Bruguera, L. Force, A.O. Llaveria, J.M. Llibre, I. Vilar\u0026oacute;, F.H. Borr\u0026agrave;s, V. Falc\u0026oacute;, M. Riera, P. Domingo, E. de Lazzari, J.M. Mir\u0026oacute;, J. Casabona, Strategies to reengage patients lost to follow up in HIV care in high income countries, a scoping review, BMC Public Health 21 (2021) 1596. https://doi.org/10.1186/s12889-021-11613-y.\u003c/li\u003e\n\u003cli\u003eM. Bektaş, J.B. Tuynman, J. Costa Pereira, G.L. Burchell, D.L. van der Peet, Machine Learning Algorithms for Predicting Surgical Outcomes after Colorectal Surgery: A Systematic Review, World J. Surg. 46 (2022) 1. https://doi.org/10.1007/s00268-022-06728-1.\u003c/li\u003e\n\u003cli\u003eY. Huang, J. Li, M. Li, R.R. Aparasu, Application of machine learning in predicting survival outcomes involving real-world data: a scoping review, BMC Med. Res. Methodol. 23 (2023) 268. https://doi.org/10.1186/s12874-023-02078-1.\u003c/li\u003e\n\u003cli\u003eJ.T. Senders, P.C. Staples, A.V. Karhade, M.M. Zaki, W.B. Gormley, M.L.D. Broekman, T.R. Smith, O. Arnaout, Machine Learning and Neurosurgical Outcome Prediction: A Systematic Review, World Neurosurg. 109 (2018) 476-486.e1. https://doi.org/10.1016/j.wneu.2017.09.149.\u003c/li\u003e\n\u003cli\u003eE.W. Steyerberg, Applications of Prediction Models, in: E.W. Steyerberg (Ed.), Clin. Predict. Models Pract. Approach Dev. Valid. Updat., Springer International Publishing, Cham, 2019: pp. 15\u0026ndash;36. https://doi.org/10.1007/978-3-030-16399-0_2.\u003c/li\u003e\n\u003cli\u003eW. Zu, X. Huang, T. Xu, L. Du, Y. Wang, L. Wang, W. Nie, Machine learning in predicting outcomes for stroke patients following rehabilitation treatment: A systematic review, PLOS ONE 18 (2023) e0287308. https://doi.org/10.1371/journal.pone.0287308.\u003c/li\u003e\n\u003cli\u003eJ. Puckett, Zotero: A Guide for Librarians, Researchers, and Educators, Assoc of Cllge \u0026amp;amp; Rsrch Libr, 2011.\u003c/li\u003e\n\u003cli\u003eM.J. Page, D. Moher, P.M. Bossuyt, I. Boutron, T.C. Hoffmann, C.D. Mulrow, L. Shamseer, J.M. Tetzlaff, E.A. Akl, S.E. Brennan, R. Chou, J. Glanville, J.M. Grimshaw, A. Hr\u0026oacute;bjartsson, M.M. Lalu, T. Li, E.W. Loder, E. Mayo-Wilson, S. McDonald, L.A. McGuinness, L.A. Stewart, J. Thomas, A.C. Tricco, V.A. Welch, P. Whiting, J.E. McKenzie, PRISMA 2020 explanation and elaboration: updated guidance and exemplars for reporting systematic reviews, BMJ 372 (2021) n160. https://doi.org/10.1136/bmj.n160.\u003c/li\u003e\n\u003cli\u003eJ.A.A. Damen, K.G.M. Moons, M. van Smeden, L. Hooft, How to conduct a systematic review and meta-analysis of prognostic model studies, Clin. Microbiol. Infect. 29 (2023) 434\u0026ndash;440. https://doi.org/10.1016/j.cmi.2022.07.019.\u003c/li\u003e\n\u003cli\u003eSystematic Review and Literature Review Software by DistillerSR, DistillerSR (n.d.). https://www.distillersr.com/ (accessed December 17, 2024).\u003c/li\u003e\n\u003cli\u003eA. Liberati, D.G. Altman, J. Tetzlaff, C. Mulrow, P.C. G\u0026oslash;tzsche, J.P.A. Ioannidis, M. Clarke, P.J. Devereaux, J. Kleijnen, D. Moher, The PRISMA Statement for Reporting Systematic Reviews and Meta-Analyses of Studies That Evaluate Health Care Interventions: Explanation and Elaboration, PLOS Med. 6 (2009) e1000100. https://doi.org/10.1371/journal.pmed.1000100.\u003c/li\u003e\n\u003cli\u003eK.G.M. Moons, J.A.H. de Groot, W. Bouwmeester, Y. Vergouwe, S. Mallett, D.G. Altman, J.B. Reitsma, G.S. Collins, Critical Appraisal and Data Extraction for Systematic Reviews of Prediction Modelling Studies: The CHARMS Checklist, PLOS Med. 11 (2014) e1001744. https://doi.org/10.1371/journal.pmed.1001744.\u003c/li\u003e\n\u003cli\u003eR.F. Wolff, K.G.M. Moons, R.D. Riley, P.F. Whiting, M. Westwood, G.S. Collins, J.B. Reitsma, J. Kleijnen, S. Mallett, PROBAST Group\u0026dagger;, PROBAST: A Tool to Assess the Risk of Bias and Applicability of Prediction Model Studies, Ann. Intern. Med. 170 (2019) 51\u0026ndash;58. https://doi.org/10.7326/M18-1376.\u003c/li\u003e\n\u003cli\u003eK.G.M. Moons, D.G. Altman, J.B. Reitsma, J.P.A. Ioannidis, P. Macaskill, E.W. Steyerberg, A.J. Vickers, D.F. Ransohoff, G.S. Collins, Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD): Explanation and Elaboration, Ann. Intern. Med. 162 (2015) W1\u0026ndash;W73. https://doi.org/10.7326/M14-0698.\u003c/li\u003e\n\u003cli\u003eC.A. Fahey, L. Wei, P.F. Njau, S. Shabani, S. Kwilasa, W. Maokola, L. Packel, Z. Zheng, J. Wang, S.I. McCoy, Machine learning with routine electronic medical record data to identify people at high risk of disengagement from HIV care in Tanzania, PLOS Glob. Public Health 2 (2022) e0000720. https://doi.org/10.1371/journal.pgph.0000720.\u003c/li\u003e\n\u003cli\u003eM. Maskew, K. Sharpey-Schafer, L. De Voux, T. Crompton, J. Bor, M. Rennick, A. Chirowodza, J. Miot, S. Molefi, C. Onaga, P. Majuba, I. Sanne, P. Pisa, Applying machine learning and predictive modeling to retention and viral suppression in South African HIV treatment cohorts, Sci. Rep. 12 (2022) 12715. https://doi.org/10.1038/s41598-022-16062-0.\u003c/li\u003e\n\u003cli\u003eM. Maskew, S. Smith, L.D. Voux, K. Sharpey-Schafer, T. Crompton, A. Govender, P. Pisa, S. Rosen, Triaging clients at risk of disengagement from HIV care: Application of a predictive model to clinical trial data in South Africa, (2024) 2024.08.05.24311488. https://doi.org/10.1101/2024.08.05.24311488.\u003c/li\u003e\n\u003cli\u003eM.-D. Ogbechie, C.F. Walker, M.-T. Lee, A.A. Gana, A. Oduola, A. Idemudia, M. Edor, E.L. Harris, J. Stephens, X. Gao, P.-L. Chen, N.E. Persaud, Predicting Treatment Interruption Among People Living With HIV in Nigeria: Machine Learning Approach, JMIR AI 2 (2023) e44432. https://doi.org/10.2196/44432.\u003c/li\u003e\n\u003cli\u003eB.W. Pence, A.M. Bengtson, S. Boswell, K.A. Christopoulos, H.M. Crane, E. Geng, J.C. Keruly, W.C. Mathews, M.J. Mugavero, Who will show? Predicting missed visits among patients in routine HIV primary care in the United States, AIDS Behav. 23 (2019) 418\u0026ndash;426. https://doi.org/10.1007/s10461-018-2215-1.\u003c/li\u003e\n\u003cli\u003eA. Ramachandran, A. Kumar, H. Koenig, A. De Unanue, C. Sung, J. Walsh, J. Schneider, R. Ghani, J.P. Ridgway, Predictive Analytics for Retention in Care in an Urban HIV Clinic, Sci. Rep. 10 (2020) 6421. https://doi.org/10.1038/s41598-020-62729-x.\u003c/li\u003e\n\u003cli\u003eJ. Stockman, J. Friedman, J. Sundberg, E. Harris, Predictive analytics using machine learning to identify ART clients at health system level at greatest risk of treatment interruption in Mozambique and Nigeria, JAIDS J. Acquir. Immune Defic. Syndr. (2022) 10.1097/QAI.0000000000002947. https://doi.org/10.1097/QAI.0000000000002947.\u003c/li\u003e\n\u003cli\u003eR. Esra, J. Carstens, S. Le Roux, T. Mabuto, M. Eisenstein, O. Keiser, E. Orel, A. Merzouki, L. De Voux, M. Maskew, K. Sharpey-Schafer, Validation and Improvement of a Machine Learning Model to Predict Interruptions in Antiretroviral Treatment in South Africa, JAIDS J. Acquir. Immune Defic. Syndr. 92 (2023) 42. https://doi.org/10.1097/QAI.0000000000003108.\u003c/li\u003e\n\u003cli\u003eJ.A. Mason, E.E. Friedman, J.C. Rojas, J.P. Ridgway, No-show Prediction Model Performance Among People With HIV: External Validation Study, J. Med. Internet Res. 25 (2023) e43277. https://doi.org/10.2196/43277.\u003c/li\u003e\n\u003cli\u003eA.M. Carrington, D.G. Manuel, P.W. Fieguth, T. Ramsay, V. Osmani, B. Wernly, C. Bennett, S. Hawken, O. Magwood, Y. Sheikh, M. McInnes, A. Holzinger, Deep ROC Analysis and AUC as Balanced Average Accuracy, for Improved Classifier Selection, Audit and Explanation, IEEE Trans. Pattern Anal. Mach. Intell. 45 (2023) 329\u0026ndash;341. https://doi.org/10.1109/TPAMI.2022.3145392.\u003c/li\u003e\n\u003cli\u003eN. White, R. Parsons, G. Collins, A. Barnett, Evidence of questionable research practices in clinical prediction models, BMC Med. 21 (2023) 339. https://doi.org/10.1186/s12916-023-03048-6.\u003c/li\u003e\n\u003cli\u003eM.O. Akanbi, A.N. Ocheke, P.A. Agaba, C.A. Daniyam, E.I. Agaba, E.N. Okeke, C.O. Ukoli, Use of Electronic Health Records in sub-Saharan Africa: Progress and challenges, J. Med. Trop. 14 (2012) 1.\u003c/li\u003e\n\u003cli\u003eF. Colombo, J. Oderkirk, L. Slawomirski, Health Information Systems, Electronic Medical Records, and Big Data in Global Healthcare: Progress and Challenges in OECD Countries, in: R. Haring, I. Kickbusch, D. Ganten, M. Moeti (Eds.), Handb. Glob. Health, Springer International Publishing, Cham, 2020: pp. 1\u0026ndash;31. https://doi.org/10.1007/978-3-030-05325-3_71-1.\u003c/li\u003e\n\u003cli\u003eB. Cyganek, M. Gra\u0026ntilde;a, B. Krawczyk, A. Kasprzak, P. Porwik, K. Walkowiak, M. Woźniak, A Survey of Big Data Issues in Electronic Health Record Analysis, Appl. Artif. Intell. 30 (2016) 497\u0026ndash;520. https://doi.org/10.1080/08839514.2016.1193714.\u003c/li\u003e\n\u003cli\u003eZ.F. Khan, S.R. Alotaibi, Applications of Artificial Intelligence and Big Data Analytics in m-Health: A Healthcare System Perspective, J. Healthc. Eng. 2020 (2020) 8894694. https://doi.org/10.1155/2020/8894694.\u003c/li\u003e\n\u003cli\u003eJ.T. Schwartz, M. Gao, E.A. Geng, K.S. Mody, C.M. Mikhail, S.K. Cho, Applications of Machine Learning Using Electronic Medical Records in Spine Surgery, Neurospine 16 (2019) 643\u0026ndash;653. https://doi.org/10.14245/ns.1938386.193.\u003c/li\u003e\n\u003cli\u003eA. Shinozaki, Electronic Medical Records and Machine Learning in Approaches to Drug Development, in: Artif. Intell. Oncol. Drug Discov. Dev., IntechOpen, 2020. https://doi.org/10.5772/intechopen.92613.\u003c/li\u003e\n\u003cli\u003eF.M. Syed, F.K.E. S, AI in Securing Electronic Health Records (EHR) Systems, Int. J. Adv. Eng. Technol. Innov. 1 (2024) 593\u0026ndash;620.\u003c/li\u003e\n\u003cli\u003eK. Kawamoto, J. Finkelstein, G.D. Fiol, Implementing Machine Learning in the Electronic Health Record: Checklist of Essential Considerations, Mayo Clin. Proc. 98 (2023) 366\u0026ndash;369. https://doi.org/10.1016/j.mayocp.2023.01.013.\u003c/li\u003e\n\u003cli\u003eS.F. Weng, J. Reps, J. Kai, J.M. Garibaldi, N. Qureshi, Can machine-learning improve cardiovascular risk prediction using routine clinical data?, PLoS ONE 12 (2017) e0174944. https://doi.org/10.1371/journal.pone.0174944.\u003c/li\u003e\n\u003cli\u003eB. Critelli, A. Hassan, I. Lahooti, L. Noh, J.S. Park, K. Tong, A. Lahooti, N. Matzko, J.N. Adams, L. Liss, J. Quion, D. Restrepo, M. Nikahd, S. Culp, A. Lacy-Hulbert, C. Speake, J. Buxbaum, J. Bischof, C. Yazici, A.E. Phillips, S. Terp, A. Weissman, D. Conwell, P. Hart, M. Ramsey, S. Krishna, S. Han, E. Park, R. Shah, V. Akshintala, J.A. Windsor, N.K. Mull, G.I. Papachristou, L.A. Celi, P.J. Lee, A Systematic Review of Machine Learning-based Prognostic Models for Acute Pancreatitis: Towards Improving Methods and Reporting Quality, (2024) 2024.06.26.24309389. https://doi.org/10.1101/2024.06.26.24309389.\u003c/li\u003e\n\u003cli\u003eT. Endebu, G. Taye, A. Addissie, A. Deksisa, W. Deressa, Electronic medical record-based prediction models developed and deployed in the HIV care continuum: a systematic review, Discov. Health Syst. 3 (2024) 25. https://doi.org/10.1007/s44250-024-00092-8.\u003c/li\u003e\n\u003cli\u003eJ.P. Ridgway, A. Lee, S. Devlin, J. Kerman, A. Mayampurath, Machine Learning and Clinical Informatics for Improving HIV Care Continuum Outcomes, Curr. HIV/AIDS Rep. 18 (2021) 229\u0026ndash;236. https://doi.org/10.1007/s11904-021-00552-3.\u003c/li\u003e\n\u003cli\u003eR.J. Chin, D. Sangmanee, L. Piergallini, PEPFAR Funding and Reduction in HIV Infection Rates in 12 Focus Sub-Saharan African Countries: A Quantitative Analysis, Int. J. MCH AIDS 3 (2015) 150.\u003c/li\u003e\n\u003cli\u003eM. Pal, S. Parija, G. Panda, K. Dhama, R.K. Mohapatra, Risk prediction of cardiovascular disease using machine learning classifiers, Open Med. 17 (2022) 1100\u0026ndash;1113. https://doi.org/10.1515/med-2022-0508.\u003c/li\u003e\n\u003cli\u003eSouth Africa, (n.d.). https://www.unaids.org/en/regionscountries/countries/southafrica (accessed December 17, 2024).\u003c/li\u003e\n\u003cli\u003eT.G. Dietterich, Ensemble Methods in Machine Learning, in: Mult. Classif. Syst., Springer, Berlin, Heidelberg, 2000: pp. 1\u0026ndash;15. https://doi.org/10.1007/3-540-45014-9_1.\u003c/li\u003e\n\u003cli\u003eN. Rane, S.P. Choudhary, J. Rane, Ensemble deep learning and machine learning: applications, opportunities, challenges, and future directions, Stud. Med. Health Sci. 1 (2024) 18\u0026ndash;41. https://doi.org/10.48185/smhs.v1i2.1225.\u003c/li\u003e\n\u003cli\u003eL.R. Namamula, D. Chaytor, Effective ensemble learning approach for large-scale medical data analytics, Int. J. Syst. Assur. Eng. Manag. 15 (2024) 13\u0026ndash;20. https://doi.org/10.1007/s13198-021-01552-7.\u003c/li\u003e\n\u003cli\u003eS. Chilamkurthy, R. Ghosh, S. Tanamala, M. Biviji, N.G. Campeau, V.K. Venugopal, V. Mahajan, P. Rao, P. Warier, Development and Validation of Deep Learning Algorithms for Detection of Critical Findings in Head CT Scans, (2018). https://doi.org/10.48550/arXiv.1803.05854.\u003c/li\u003e\n\u003cli\u003eJ. Huang, C.X. Ling, Using AUC and accuracy in evaluating learning algorithms, IEEE Trans. Knowl. Data Eng. 17 (2005) 299\u0026ndash;310. https://doi.org/10.1109/TKDE.2005.50.\u003c/li\u003e\n\u003cli\u003eA.C. Alba, T. Agoritsas, M. Walsh, S. Hanna, A. Iorio, P.J. Devereaux, T. McGinn, G. Guyatt, Discrimination and Calibration of Clinical Prediction Models: Users\u0026rsquo; Guides to the Medical Literature, JAMA 318 (2017) 1377\u0026ndash;1384. https://doi.org/10.1001/jama.2017.12126.\u003c/li\u003e\n\u003cli\u003eM.A.E. Binuya, E.G. Engelhardt, W. Schats, M.K. Schmidt, E.W. Steyerberg, Methodological guidance for the evaluation and updating of clinical prediction models: a systematic review, BMC Med. Res. Methodol. 22 (2022) 316. https://doi.org/10.1186/s12874-022-01801-8.\u003c/li\u003e\n\u003cli\u003eB. Van Calster, D. Nieboer, Y. Vergouwe, B. De Cock, M.J. Pencina, E.W. Steyerberg, A calibration hierarchy for risk models was defined: from utopia to empirical data, J. Clin. Epidemiol. 74 (2016) 167\u0026ndash;176. https://doi.org/10.1016/j.jclinepi.2015.12.005.\u003c/li\u003e\n\u003cli\u003eS.W.J. Nijman, A.M. Leeuwenberg, I. Beekers, I. Verkouter, J.J.L. Jacobs, M.L. Bots, F.W. Asselbergs, K.G.M. Moons, T.P.A. Debray, Missing data is poorly handled and reported in prediction model studies using machine learning: a literature review, J. Clin. Epidemiol. 142 (2022) 218\u0026ndash;229. https://doi.org/10.1016/j.jclinepi.2021.11.023.\u003c/li\u003e\n\u003cli\u003eD.P. Misra, A.S. Yadav, Impact of Preprocessing Methods on Healthcare Predictions, (2019). https://doi.org/10.2139/ssrn.3349586.\u003c/li\u003e\n\u003cli\u003eD.A. Newman, Missing Data: Five Practical Guidelines, Organ. Res. Methods 17 (2014) 372\u0026ndash;411. https://doi.org/10.1177/1094428114548590.\u003c/li\u003e\n\u003cli\u003eM. Afkanpour, E. Hosseinzadeh, H. Tabesh, Identify the most appropriate imputation method for handling missing values in clinical structured datasets: a systematic review, BMC Med. Res. Methodol. 24 (2024) 188. https://doi.org/10.1186/s12874-024-02310-6.\u003c/li\u003e\n\u003cli\u003eS. van Buuren, Flexible Imputation of Missing Data, CRC Press, 2012.\u003c/li\u003e\n\u003cli\u003eR. Rios, R.J. Miller, N. Manral, T. Sharir, A.J. Einstein, M.B. Fish, T.D. Ruddy, P.A. Kaufmann, A.J. Sinusas, E.J. Miller, T.M. Bateman, S. Dorbala, M.D. Carli, S.D.V. Kriekinge, P.B. Kavanagh, T. Parekh, J.X. Liang, D. Dey, D.S. Berman, P.J. Slomka, Handling missing values in machine learning to predict patient-specific risk of adverse cardiac events: insights from REFINE SPECT registry, Comput. Biol. Med. 145 (2022) 105449. https://doi.org/10.1016/j.compbiomed.2022.105449.\u003c/li\u003e\n\u003cli\u003eA.J. Vickers, B. van Calster, E.W. Steyerberg, A simple, step-by-step guide to interpreting decision curve analysis, Diagn. Progn. Res. 3 (2019) 18. https://doi.org/10.1186/s41512-019-0064-7.\u003c/li\u003e\n\u003cli\u003eY. Wu, L. Xu, P. Yang, N. Lin, X. Huang, W. Pan, H. Li, P. Lin, B. Li, V. Bunpetch, C. Luo, Y. Jiang, D. Yang, M. Huang, T. Niu, Z. Ye, Survival Prediction in High-grade Osteosarcoma Using Radiomics of Diagnostic Computed Tomography, eBioMedicine 34 (2018) 27\u0026ndash;34. https://doi.org/10.1016/j.ebiom.2018.07.006.\u003c/li\u003e\n\u003cli\u003eA.J. Vickers, B. Van Claster, L. Wynants, E.W. Steyerberg, Decision curve analysis: confidence intervals and hypothesis testing for net benefit, Diagn. Progn. Res. 7 (2023) 11. https://doi.org/10.1186/s41512-023-00148-y.\u003c/li\u003e\n\u003cli\u003eG.S. Collins, J.B. Reitsma, D.G. Altman, K.G.M. Moons, Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement, BMJ 350 (2015) g7594. https://doi.org/10.1136/bmj.g7594.\u003c/li\u003e\n\u003c/ol\u003e"},{"header":"Tables","content":"\u003cp\u003eTable 1 and 2 are available in the Supplementary Files section.\u003c/p\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":true,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"bmc-global-and-public-health","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"","sideBox":"Learn more about [BMC Global and Public Health](https://bmcglobalpublichealth.biomedcentral.com/)","snPcode":"44263","submissionUrl":"https://submission.springernature.com/new-submission/44263/3","title":"BMC Global and Public Health","twitterHandle":"@BMC_GPH","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"stoa","reportingPortfolio":"BMC Series","inReviewEnabled":true,"inReviewRevisionsEnabled":true},"keywords":"HIV treatment interruption, machine learning, predictive modeling","lastPublishedDoi":"10.21203/rs.3.rs-5810875/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-5810875/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eInterruption in HIV treatment (IIT) remains a significant barrier to achieving global HIV/AIDS control goals. Machine learning (ML) models offer potential for predicting IIT by leveraging large clinical data. Understanding how these models were developed, validated, and applied remains essential for advancing research.\u003c/p\u003e \u003cp\u003eWe searched the PubMed, BMC, Cochrane Library, Scopus, ScienceDirect, Lancet, and Google Scholar, for studies published in English from 1990 to September 2024. Search terms covered HIV, machine learning, treatment interruption, and loss to follow-up. Articles were screened and reviewed independently, and data were extracted using the CHARMS checklist. Risk of bias was assessed with PROBAST. The PRISMA guidelines were followed throughout.\u003c/p\u003e \u003cp\u003eOut of 116,672 records, nine studies met the inclusion criteria and reported 12 ML models. Random Forest, XGBoost, and AdaBoost were predominant models (91.7%). Internal validation was performed in all models, but only two models included external validation. Performance varied, with a mean AUC-ROC of 0.668 (SD\u0026thinsp;=\u0026thinsp;0.066), indicating moderate discrimination. About 75% of models showed a high risk of bias due to inadequate handling of missing data, lack of calibration, and absence of decision curve analysis (DCA).\u003c/p\u003e \u003cp\u003eML models show promise for predicting IIT, particularly in resource-limited settings. Future research should prioritize external validation, robust missing data handling, decision curve analysis, and include sociocultural predictors to improve model robustness.\u003c/p\u003e","manuscriptTitle":"Evaluating Machine Learning models for predicting HIV treatment interruption: a systematic review of accuracy, validity, and applicability","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-04-22 12:38:03","doi":"10.21203/rs.3.rs-5810875/v1","editorialEvents":[{"type":"communityComments","content":0},{"type":"decision","content":"Revision requested","date":"2025-06-24T10:52:37+00:00","index":"","fulltext":""},{"type":"editorAssigned","content":"","date":"2025-06-19T07:01:52+00:00","index":"","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2025-06-02T15:10:32+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"161623869998302650052842370424492540005","date":"2025-05-12T14:51:04+00:00","index":"hide","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2025-04-22T01:41:27+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"108027098115028522690998369246012645350","date":"2025-04-22T01:37:38+00:00","index":"hide","fulltext":""},{"type":"reviewersInvited","content":"","date":"2025-04-21T10:27:37+00:00","index":"","fulltext":""},{"type":"checksComplete","content":"","date":"2025-03-31T09:37:33+00:00","index":"","fulltext":""},{"type":"submitted","content":"BMC Global and Public Health","date":"2025-03-29T22:24:31+00:00","index":"","fulltext":""}],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"bmc-global-and-public-health","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"","sideBox":"Learn more about [BMC Global and Public Health](https://bmcglobalpublichealth.biomedcentral.com/)","snPcode":"44263","submissionUrl":"https://submission.springernature.com/new-submission/44263/3","title":"BMC Global and Public Health","twitterHandle":"@BMC_GPH","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"stoa","reportingPortfolio":"BMC Series","inReviewEnabled":true,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"e8416cf1-e4b3-4ecd-ac9a-4f74d9c625e1","owner":[],"postedDate":"April 22nd, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"under-review","subjectAreas":[],"tags":[],"updatedAt":"2025-07-09T11:39:07+00:00","versionOfRecord":[],"versionCreatedAt":"2025-04-22 12:38:03","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-5810875","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-5810875","identity":"rs-5810875","version":["v1"]},"buildId":"8U1c8b4HqxoKbykW_rLl7","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.