A Scoping Review of Racial Bias Mechanisms and Mitigation Frameworks in Clinical Artificial Intelligence | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Systematic Review A Scoping Review of Racial Bias Mechanisms and Mitigation Frameworks in Clinical Artificial Intelligence Aayush Sisodia This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-8642098/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract This scoping review synthesizes evidence on how racial bias arises in clinical artificial intelligence (AI) systems and how it can be mitigated through technical, governance, and policy approaches. We conducted a scoping review of clinical AI/ML studies and relevant conceptual frameworks, with searches limited to English-language sources published between September 2020 and November 2025. Study selection was documented using a PRISMA 2020 flow diagram. Eligible studies examined racial or demographic bias mechanisms, fairness evaluation, or mitigation strategies in real-world clinical contexts. Across 22 included studies, recurring pathways to inequity included underrepresentation and label noise in training data, proxy variables that encode structural disadvantage, differences in access and measurement that distort outcomes, and limited external validation in diverse settings. Mitigation strategies clustered into (1) data and evaluation improvements (e.g., subgroup reporting, calibration, and cross-site validation), (2) model and optimization approaches (e.g., reweighting and fairness-aware objectives), and (3) governance levers (e.g., documentation, equity impact assessments, and monitoring requirements). We translate these findings into a practical framework linking bias mechanisms to mitigation actions and implementation levers, with an emphasis on feasible steps for health systems and policymakers to reduce avoidable inequities during AI deployment. racial bias clinical artificial intelligence healthcare equity algorithmic fairness data imbalance proxy variables Figures Figure 1 Introduction Machine learning (ML) and artificial intelligence (AI) have emerged as major drivers of healthcare innovation, influencing the development of modern diagnostics, risk prediction, and treatment planning [ 1 ]. Radiology, oncology, cardiology, and mental health are among the fields where AI models are employed to enhance clinical decision-making, improve diagnostic accuracy, and optimize patient management [ 2 ]. These technologies hold promise for reducing human error, augmenting efficiency, and enabling personalized medicine through the analysis of detailed clinical data at unprecedented pace and volume [ 3 ]. Nevertheless, despite their transformative potential, increasing evidence suggests that AI systems may not equally benefit all patient populations [ 4 ]. Recent research indicates that algorithmic performance frequently varies across racial and ethnic groups, raising serious questions about the fairness, transparency, and equity of clinical AI applications [ 5 ]. Consequently, the opportunities AI presents for improving healthcare outcomes are contingent upon understanding and addressing the biases embedded in data, models, and implementation processes [ 6 ]. Racial discrimination in AI architecture typically arises from structural inequities in healthcare data and algorithm design [ 7 ]. AI model datasets are often biased because they reflect historical disparities in healthcare access, diagnosis, and treatment [ 8 ]. Empirical evidence supports these concerns. Ferryman et al. [ 9 ] demonstrated that a widely-used healthcare algorithm systematically underestimated medical needs of Black patients due to cost-based proxy variables. Similarly, Lee et al. [ 10 ] found lower accuracy of deep learning cardiac MRI segmentation models for minority racial groups, and Thompson et al. [ 11 ] identified racial differences in the false negative rate of an opioid misuse classifier. The implications are far-reaching: inaccurate predictions can lead to inequitable clinical judgments, under-diagnosis, and reduced quality of care, ultimately entrenching systemic injustices and exacerbating existing health disparities between racial and ethnic groups. Several studies have examined bias in clinical AI, but meaningful gaps remain in the literature [ 12 ]. Systematic reviews such as those by Correa et al. [ 13 ] and de Castro Vieira et al. [ 14 ] consider bias detection methods and fairness metrics, yet they predominantly address specific clinical domains or focus on technical fairness methods without broader ethical or policy considerations. Furthermore, scientific work synthesizing empirical evidence to link mechanisms of racial bias—such as data imbalance and proxy variable misuse—to practical clinical implications remains limited [ 15 ]. Regulatory and institutional perspectives from organizations such as the Centers for Medicare and Medicaid Services (CMS), the Food and Drug Administration (FDA), and the American Medical Association (AMA) are also underrepresented in the discussion [ 16 ]. Unlike previous studies, this review is structured as a holistic synthesis balancing empirical findings with causal mechanisms and policy mitigation strategies to facilitate equitable AI implementation across the healthcare industry [ 12 ]. This study employs a scoping review approach integrating empirical evidence with conceptual and policy scholarship to enhance conceptual clarity and methodological transparency. We used a structured search and screening process, documented with a PRISMA 2020 flow diagram, to identify relevant sources, and a narrative thematic synthesis to map mechanisms of epistemic inequity and actionable mitigation and governance responses in clinical AI. This design prioritizes breadth and interpretability while maintaining a reproducible, clearly reported selection process. In this review, epistemic inequity in clinical AI refers to the systematic exclusion or misrepresentation of certain racial and social groups in the processes through which medical knowledge is produced, validated, and operationalized by algorithmic systems. It manifests through distortions in measurement and labeling, where proxy variables such as cost or utilization obscure genuine health needs; through inequities in documentation, as clinical narratives and electronic records embed biased linguistic or diagnostic assumptions; through validation gaps arising from race-blind benchmarking and homogeneous testing cohorts; and through governance asymmetries in which institutional actors determine what counts as "ground truth." Collectively, these mechanisms demonstrate that bias in AI is not merely a technical flaw but a structural and epistemic issue embedded in how knowledge and authority are distributed within healthcare systems. To support conceptual clarity, two related constructs are defined in this review. Implementation capacity denotes the institutional, technical, and ethical readiness of healthcare systems to operationalize, monitor, and sustain fair AI applications across diverse contexts. Equity mandates refer to the policy, regulatory, and institutional frameworks that require fairness-oriented design, validation, and governance of AI systems to ensure that algorithmic innovations promote equitable healthcare outcomes rather than reinforce existing disparities. To address these gaps and develop a multifaceted understanding of clinical AI and racial bias, this paper is guided by the following research questions: RQ1: What evidence exists of racial bias in clinical artificial intelligence and machine learning systems used for diagnostics, readmission prediction, and treatment planning? RQ2: What are the main sources and mechanisms that cause racial bias in clinical AI models, such as data imbalance, proxy variables, or unrepresentative cohorts? RQ3: What technical, ethical, and policy strategies have been proposed or implemented to reduce racial bias and promote fairness in clinical AI systems? By bridging empirical evidence with methodological insights and policy-level solutions, this review seeks to advance a comprehensive understanding of how racial bias manifests and can be mitigated within clinical AI systems. Through systematic synthesis of studies across diagnostics, readmission prediction, and treatment planning, the review not only identifies the presence and origins of bias but also highlights actionable strategies to enhance fairness and accountability. Ultimately, this work aims to inform the equitable design, deployment, and governance of clinical AI technologies, ensuring that innovations in healthcare serve all patient populations without perpetuating existing disparities. Methodology Study Design The present study employed a scoping review approach. The study selection process is documented using a PRISMA 2020 flow diagram (Fig. 1 ) [ 17 ]. A large language model was used for language editing and formatting assistance during manuscript preparation; the author reviewed and verified the accuracy and integrity of all content (see Statements and Declarations). A systematic literature search was performed in PubMed, Scopus, and Google Scholar. The search was limited to English-language sources published between September 2020 and November 2025. These databases were selected for their coverage of biomedical, informatics, and interdisciplinary research. The search terms and Boolean operators used were: (("racial bias" OR "algorithmic bias" OR "algorithmic fairness") AND ("clinical AI" OR "machine learning" OR "healthcare prediction" OR "readmission" OR "diagnosis" OR "treatment planning" OR "mental health")). Filters were applied to prioritize empirical clinical AI/ML studies and relevant peer-reviewed frameworks, while allowing inclusion of high-quality preprints reporting original methods or evaluations. Only English-language sources were considered. Inclusion and Exclusion Criteria To ensure methodological rigor and relevance to the research objectives, studies were selected using pre-established eligibility criteria. The inclusion and exclusion criteria were designed to accommodate both empirical and conceptual research, ensuring comprehensive coverage of technical, ethical, and governance aspects of racial bias in clinical AI. Empirical studies were selected for their quantitative assessment of model performance or fairness outcomes, while conceptual and policy-oriented papers were included if they presented analytical, ethical, or governance frameworks relevant to algorithmic fairness or epistemic inequity in healthcare. Table 1 summarizes the criteria used to select and screen studies during the PRISMA-guided process. Table 1 Summary of Inclusion and Exclusion Criteria Category Criteria Description Rationale Inclusion Criteria Peer-reviewed empirical studies examining AI or ML models in clinical contexts; High-quality preprints reporting original methods or evaluations in clinical AI; Studies evaluating racial bias, fairness, or performance disparities across demographic subgroups; Publications addressing technical, ethical, or policy-based mitigation strategies; Conceptual, ethical, or policy-based publications discussing governance, fairness frameworks, or epistemic inequity in clinical AI To ensure inclusion of methodologically sound and clinically relevant research that directly investigates racial bias and fairness in AI Exclusion Criteria Non-clinical AI or general computer science studies lacking healthcare context; Editorials or commentaries without original methods or evaluation; Non-scholarly sources (news/blogs); Studies without racial, ethnic, or demographic stratification in their analysis To exclude literature without empirical rigor, clinical relevance, or demographic stratification essential to the research questions Data Extraction A standardized data extraction form was employed to systematically capture key information from each included study. This form was designed to ensure consistency and comprehensiveness in the extraction of variables relevant to the research questions. The extracted data included study characteristics (author, year, design, clinical domain), AI/ML methods employed, types and strengths of racial bias identified, underlying mechanisms of bias, fairness metrics and mitigation strategies applied, and key outcomes and conclusions. Quality Appraisal and Risk of Bias Assessment Design-appropriate appraisal tools were used to assess study quality and risk of bias. The Newcastle-Ottawa Scale (NOS) [ 38 ] was applied to empirical/model-development studies (n = 11) across selection, comparability, and outcome domains. The CASP Qualitative Checklist (2024) [ 39 ] was applied to conceptual and framework-based studies (n = 11). Overall, 5 of 11 conceptual studies were rated high quality and 6 moderate quality (mean CASP score 7.7/10). Among empirical studies, 6 were rated good quality and 5 fair quality (mean NOS score 6.9/9). Results Overview of Studies Included Database searching identified 500 records. After removing 330 records before screening (300 duplicates; 30 non-English, conference abstracts, or retracted), 170 records were screened. Of these, 10 were excluded at title/abstract screening. Of 160 reports sought for retrieval, 20 could not be retrieved. The remaining 140 full-text reports were assessed for eligibility; 118 were excluded (90 not clinical/not healthcare AI/ML in scope; 20 no racial or demographic subgroup analysis; 8 commentary/editorial or duplicate cohort). Twenty-two studies were included in the final review. A total of 22 peer-reviewed and preprint studies, published between 2021 and 2025, spanning diverse clinical domains (population health, critical care, oncology, psychiatry, imaging, and healthcare governance) were included. The reviewed literature encompasses empirical analyses of AI and machine learning algorithms as well as conceptual and policy-based frameworks of algorithmic fairness. Empirical studies evaluated biases in various predictive and diagnostic tasks, including mortality prediction, readmission risk, cardiac imaging, diabetes modeling, and radiomics-based cancer predictions. Several studies utilized benchmark clinical datasets such as MIMIC-III, NHANES, and UK Biobank, facilitating methodological comparability. Concurrently, conceptual and policy-oriented works offered ethical, statistical, and governance frameworks for fairness evaluation and bias removal in clinical AI. Empirical Patterns of Racial Bias Across Clinical AI Applications The evidence reviewed demonstrates that algorithmic bias in healthcare AI systems manifests through calibration errors, non-representative data, and latent racial correlates embedded in model design. Research consistently shows that even technically proficient algorithms may amplify disparities in risk estimation, diagnostic accuracy, and treatment recommendations when fairness is not explicitly addressed. In population health and predictive modeling, Gupta et al. [ 19 ] found that hospitalization models systematically underpredicted risk for minoritized groups, exposing calibration drift across racial and socioeconomic strata. Wang et al. [ 20 ] similarly identified structural and data-level bias in common readmission models, noting that features such as prior hospitalizations and healthcare utilization—often used as proxies for clinical need—encode social inequities. Cronjé et al. [ 21 ] added further evidence of miscalibration, showing that diabetes risk algorithms overestimated White patients' risk while underestimating risk for Black patients, revealing how seemingly objective predictors can perpetuate inequitable outcomes. In intensive and critical care, Allen et al. [ 18 ] demonstrated that targeted bias-minimized preprocessing can achieve both higher accuracy and fairness, outperforming legacy severity scores. Yet other studies underscore that bias may persist even after explicit correction. Velichkovska et al. [ 22 , 23 ] revealed that vital signs alone could predict patient race with high accuracy, indicating that physiological data inherently encode racial information. Such findings complicate conventional fairness strategies, suggesting that debiasing efforts must address the statistical structure of biomedical data itself, not just model design. Thompson et al. [ 11 ] further showed how bias emerges in natural language classifiers—specifically, higher false-negative rates for Black patients in an opioid misuse detection model—though recalibration proved effective in mitigating disparities. Chang et al. [ 28 ] contributed a structural dimension, demonstrating that racial differences in laboratory testing frequency can distort the data pipelines feeding downstream AI, embedding inequity before modeling even begins. Within imaging and oncology, disparities in representation emerged as a dominant source of bias. Lee et al. [ 10 ] reported that cardiac MRI segmentation accuracy was significantly lower among minority groups, reflecting the dominance of White subjects in training data. Pfob and Heil [ 25 ] similarly showed poor cross-population generalizability in breast cancer radiomics, with AUC performance dropping sharply when validated on Asian and African cohorts. Khor et al. [ 24 ] found that omitting race as a predictor worsened fairness and calibration, increasing false-negative rates in Hispanic and Black patients. Collectively, these findings highlight how both data imbalance and race exclusion amplify inequities in clinical prediction. Underlying Mechanisms of Racial Bias in Clinical AI Systems Racial bias in clinical AI systems is perpetuated by structural asymmetries deeply entrenched in the generation, modeling, and validation of health data. Rather than resulting from individual algorithmic shortcomings, these biases reflect how AI models reproduce and amplify inequity in data provenance, representation, and interpretation. Through the reviewed literature, several intersecting mechanisms—data imbalance, proxy variables, non-representative validation samples, and structural inequities—consistently explain why racially disparate outcomes emerge in technically sound models. Measurement-device and proxy-outcome errors can also embed racialized bias; pulse oximetry is a widely cited example with downstream equity consequences [ 31 ]. The first mechanism is data imbalance and representational disparity, observed across imaging, critical care, and population health applications. Lee et al. [ 10 ] showed that cardiac MRI segmentation models trained predominantly on White subjects yielded significantly lower Dice scores for minoritized populations, demonstrating a direct relationship between data homogeneity and systematic underperformance. Similarly, Pfob and Heil [ 25 ] found radiomics model accuracy decreased sharply on Asian and African populations, indicating that Eurocentric models generalize poorly across populations. Cronjé et al. [ 21 ] further demonstrated racial miscalibration in diabetes risk algorithms—overestimating White patients' risk and underestimating Black patients' risk—even with legacy clinical scores, biases maintained through population-specific parameterization. These findings underscore that racial underrepresentation at the data level produces unequal learning and undermines clinical reliability for marginalized populations. The second mechanism involves proxy variables and label leakage, whereby clinically neutral-appearing features encode socioeconomic or racial information. Gupta et al. [ 19 ] and Wang et al. [ 20 ] demonstrated that variables such as healthcare utilization, prior hospitalization, and cost implicitly capture access and privilege, factoring social determinants into model predictions. Mikhaeil et al. [ 29 ] elaborated on this by showing how proxy-label bias—outcomes defined through poor proxy surrogates such as healthcare spending or diagnosis codes—produces systematic prediction errors disfavoring underserved populations. Their Bayesian correction model emphasized that bias reduction requires redefining the meaning of ground truth, not merely reweighting features. These biases are further aggravated by unrepresentative validation and benchmarking practices. Pfob and Heil [ 25 ] and Khor et al. [ 24 ] showed that excluding race during model validation inflates performance metrics and obscures subgroup-level failures. Velichkovska et al. [ 22 , 23 ] demonstrated that vital signs alone convey racial information—models could predict race with AUCs exceeding 0.70 even without racial labeling. This finding exposes the fallacy of race-blind modeling: removing racial variables does not eliminate bias when physiological or systemic imbalances are present in the data. Finally, several researchers identify structural and contextual inequities as upstream sources of algorithmic bias. Chang et al. [ 28 ] found that racially differentiated laboratory testing procedures result in unequal data completeness, influencing model learning and error patterns in emergency care. Bouguettaya et al. [ 26 ] and Thompson et al. [ 11 ] showed that natural-language models trained on electronic health records or clinical narratives replicate linguistic and contextual biases in documentation, resulting in higher false-negative rates or inferior treatment recommendations for Black and Hispanic patients. These findings suggest that AI systems do not merely reflect bias but operationalize it, adapting institutional inequities to algorithmic decision-making. Combined, these studies reveal that algorithmic inequity is multilevel and systemic, rooted in data hierarchies, measurement decisions, and healthcare organization rather than solely in single-model design [ 18 ]. The mechanisms demonstrate that fairness cannot be achieved through technical optimization alone but requires epistemic reform—reconsideration of how clinical risk, outcome, and validity are defined, measured, and validated across populations. Table 2 Mapping of Mechanisms, Harms, Mitigation Strategies, and Governance Levers in Clinical AI Mechanism of Inequity Resulting Harm or Bias Mitigation Strategy Governance Lever / Policy Response Data imbalance / underrepresentation Miscalibrated predictions; underperformance in minoritized groups Data augmentation; inclusive dataset design; reweighting Institutional data diversity standards; transparent dataset reporting [ 10 , 19 ] Proxy labeling and measurement bias Reinforcement of socioeconomic disparities; misestimation of risk Use of direct clinical indicators; fairness-aware label correction Ethical review of proxy definitions; model documentation [ 9 , 28 ] Documentation / NLP inequity Stereotyped associations in clinical text; diagnostic bias Bias filtering; controlled vocabularies; debiasing embeddings Data governance for clinical language models [ 6 , 11 ] Validation inequity / race-blind benchmarking Inflated performance claims; unrecognized subgroup harms Cross-group validation; fairness metrics (AEquity, GUIDE) Regulatory requirement for subgroup validation [ 27 , 30 ] Governance inequity / lack of accountability Power asymmetries in 'ground truth' decisions Institutional fairness boards; fairness audits; explainability Fairness-by-design policies; continuous AI audit frameworks [ 20 , 32 ] Strategies for Mitigating Racial Bias and Advancing Fairness in Clinical AI The analyzed literature demonstrates that efforts to reduce racial bias in clinical AI operate at various methodological and institutional scales, from technical recalibration of algorithms to comprehensive governance frameworks. Although early interventions focused on equilibrating statistical parameters to achieve parity, recent practices have shifted toward data-focused fairness, ongoing auditing, and institutional accountability. Collectively, these measures highlight an increasing pivot from reactive mitigation to proactive equity incorporation across the AI lifecycle. Recent review syntheses also highlight the need to evaluate fairness trade-offs across multiple clinical domains rather than single-task benchmarks [ 35 ]. A key technical advancement is algorithmic debiasing and recalibration. Thompson et al. [ 11 ] demonstrated that post-hoc recalibration of an NLP opioid misuse classifier eliminated disparities between Black and White patients while maintaining accuracy, demonstrating that fairness interventions can enhance equity without compromising performance. Similarly, Allen et al. [ 18 ] incorporated bias-minimized preprocessing and data balancing to achieve parity in ICU mortality prediction, outperforming conventional severity scores such as MEWS and SAPS II. Gulamali et al. [ 27 ] introduced AEquity, a data-centric fairness measure that substantially reduced subgroup bias across multiple clinical models. Their findings reframe fairness as a design property of clinical AI rather than a post-hoc adjustment. Beyond model-level changes, scholars have proposed institutional structures and governance systems to institutionalize fairness. Gupta et al. [ 19 ] operationalized this vision through the BE-FAIR equity framework, which incorporates calibration auditing and demographic stratification into model evaluation pipelines. Ladin et al. [ 30 ] contributed the GUIDE framework, derived from a Delphi consensus process, providing 31 principles offering normative and procedural guidance on fair model design, validation, and deployment. Additional tools include Wang et al.'s [ 20 ] Bias Evaluation Checklist and Cerrato and Halamka's [ 32 ] Algorithmic Equity Platform, which provide structured assessment tools for pre-deployment auditing and institutional accountability. These frameworks shift responsibility from individual model developers to organizational ecosystems governing data stewardship, model validation, and clinical implementation. On the technical front, several studies propose sophisticated statistical and data governance tools to address bias at its origin. Bayesian hierarchical models proposed by Mikhaeil et al. [ 29 ] directly correct label bias and measurement error disparities, providing a statistically principled method for resolving noisy or inequitable outcome definitions. Pfob and Heil [ 25 ] and Lee et al. [ 10 ] advocated for racially diverse datasets and rigorous cross-site validation as preconditions to model generalizability, empirically establishing that algorithmic fairness cannot be dissociated from data representativeness. Complementary reviews by Chen et al. [ 33 ], Huang et al. [ 34 ], Pagano et al. [ 38 ], Xu et al. [ 36 ], and Chinta et al. [ 37 ] converge on multi-level fairness frameworks comprising technical, ethical, and regulatory solutions including reweighting, federated learning, equalized odds optimization, and transparent model reporting standards. Importantly, while these strategies represent substantive progress, they remain disjointed across domains. Most empirical literature addresses bias reduction at the level of statistical parity rather than epistemic justice, often overlooking structural injustices in how race is operationalized or omitted in modeling. Emergent paradigms—particularly BE-FAIR, GUIDE, and AEquity—signify a paradigm shift by embedding fairness within the epistemology of AI development. As summarized in Table 2 , effective measures must be multi-layered, integrating fair data design, responsive validation, and enforceable governance guidelines that align technical performance with social accountability. Discussion This scoping review synthesizes the literature on how racial bias occurs and is addressed in clinical AI systems, demonstrating that algorithmic inequity is structural and entrenched in both data structures and healthcare delivery. The review of twenty-two empirical and theoretical studies shows that bias is not a by-product of poor modeling but rather a direct expression of institutional and social asymmetries. Compared with other systematic and scoping reviews [ 12 – 14 ], this work broadens the question of fairness in clinical AI by addressing it through technical, ethical, and governance dimensions rather than metrics-based assessments alone. The findings indicate that algorithmic fairness cannot be achieved solely by optimizing statistics or adjusting performance metrics. Although previous reviews have emphasized tools such as demographic parity and equalized odds [ 6 ], the current review highlights how epistemic sources of inequity—namely how race is defined, encoded, and operationalized in clinical data—remain underaddressed. Empirical research by Gupta et al. [ 19 ] and Thompson et al. [ 11 ] demonstrates that recalibration and preprocessing can enhance parity in the short term, yet these approaches do not address the inherent problem of biases arising from unequal data provenance and structural disadvantage. These results echo observations by Ferryman et al. [ 9 ] and Ratwani et al. [ 4 ], who noted that bias in AI reflects historical healthcare delivery inequities rather than technical shortcomings in algorithm design. To the extent that AI systems are operationalized by historical biases, they become subject to proxy variables, minority group underrepresentation, and feedback loops that perpetuate unequal outcomes [ 39 ]. Across clinical domains, the results support the position that data representation is the most influential factor in algorithmic inequity. Models trained predominantly on White cohorts consistently show poorer performance for minoritized groups, as demonstrated by Lee et al. [ 10 ] and Pfob and Heil [ 25 ], resulting in systematic calibration drift and lower diagnostic accuracy. These findings align with Gameiro et al. [ 8 ], who characterize healthcare datasets as structures of data artifacts influenced by structural exclusion. Similarly, Cronjé et al. [ 21 ] showed that traditional diabetes risk algorithms exhibit miscalibration for Black patients despite seemingly objective predictors. Collectively, this indicates that fairness cannot be separated from the social ontology of data—the circumstances under which data is produced, labeled, and authenticated. Racial disproportionality is therefore not merely a sampling problem but a structural issue concerning how clinical knowledge is encoded in algorithms. The review also reveals multilevel processes through which racial bias is transmitted in clinical AI. Unlike previous literature [ 34 , 36 ] that predominantly enumerated fairness measures, this synthesis establishes a threefold system of bias creation: structural, representational, and inferential. At the structural level, disparities in data access and quality alter the information on which models are trained and validated, as demonstrated by Chang et al. [ 28 ] in their research on racial disparities in laboratory testing frequency. At the representational level, race-imbalanced datasets [ 10 , 25 ] produce systematic underperformance for underrepresented groups. At the inferential level, proxy variables and label leakage imbue seemingly neutral features with racial associations [ 19 , 29 ]. Combined, these findings suggest that discrimination remains evident even when racial variables are not explicitly present, supported by Velichkovska et al. [ 5 , 22 , 23 ], who showed that physiological data alone can predict race with high precision. This refutes the notion that race-blind modeling equates to fairness, demonstrating that statistical neutrality can conceal deeper biases in data generation. When addressing these issues, the analyzed literature traces bias reduction evolving from reactive corrections toward constructive fairness designs. Initial attempts focused on post-hoc calibration and reweighting [ 11 , 18 ] with demonstrable though limited improvement. Recent approaches define fairness as a design concept throughout the model lifecycle. Examples include BE-FAIR [ 19 ], GUIDE [ 30 ], and AEquity [ 27 ], which integrate continuous auditing, demographic stratification, and transparency into development processes. These frameworks shift responsibility from individual model developers to institutional ecosystems regulating data stewardship, model validation, and clinical implementation. Complementary tools [ 20 , 32 ] support pre-deployment fairness audits and cross-site validation. The technical improvements revealed in studies by Mikhaeil et al. [ 29 ], Pfob and Heil [ 25 ], and Lee et al. [ 10 ] indicate that statistical sophistication should be supported by ethical and governance infrastructure. Bayesian hierarchical correction models, diverse data inclusion, and federated validation frameworks provide tangible avenues to enhanced generalizability and accountability. Conceptual reviews by Chen et al. [ 33 ], Pagano et al. [ 38 ], Xu et al. [ 36 ], and Chinta et al. [ 37 ] converge on recognizing fairness as multi-dimensional, necessitating alignment between technical strength, ethical integrity, and regulatory enforceability. However, the synthesis also reveals that existing approaches remain fragmented and inconsistently applied. Most empirical interventions address model performance differences without challenging the more fundamental question of epistemic justice—whose experiences and outcomes serve as the standard of truth in algorithmic systems [ 9 , 15 ]. Liu et al. [ 12 ] similarly noted that fairness research often privileges technical parity over ethical governance. Emerging frameworks such as BE-FAIR [ 19 ], GUIDE [ 30 ], and AEquity [ 27 ] address these gaps by embedding equity and transparency into model design and evaluation. Chen et al. [ 33 ] and Pagano et al. [ 38 ] further argue that fairness must be treated as a systemic property supported by regulatory and institutional oversight. Thus, genuine algorithmic fairness lies not merely in achieving statistical parity but in developing AI systems capable of recognizing and correcting structural inequities. Conclusion This scoping review synthesized twenty-two empirical and conceptual studies exploring the manifestations, mechanisms, and mitigation measures of racial bias in clinical AI. The findings demonstrate that algorithmic inequities are structural rather than incidental, arising from the combination of biased data representation, proxy variables, and skewed model validation patterns. Across domains including population health, imaging, psychiatry, and oncology, AI models showed calibration drift and performance differences that disadvantage minority cohorts. Although methodological improvements have been made, fairness interventions remain predominantly reactive and statistically limited. Emerging frameworks including BE-FAIR, GUIDE, and AEquity represent a prospective paradigm shift, integrating equity into the design and governance of clinical AI rather than treating it as a post-hoc consideration. A major limitation of this review is its focus on English-language literature (peer-reviewed and selected high-quality preprints), potentially missing non-English, unpublished, or locally disseminated work that may reflect global perspectives on algorithmic fairness. Future research should prioritize development of large-scale, racially diverse benchmarking data to enhance generalizability and transparency. Additionally, fairness evaluation should extend beyond limited model evaluations to include ongoing real-world assessments as part of healthcare governance systems. To promote equity in clinical AI, technical rigor alone will not suffice; ethical responsibility and institutional commitment to social justice are equally essential. Declarations Competing interests: The author declares no competing interests. Ethics approval: Not applicable. This study synthesizes published literature and did not involve human participants, animals, or identifiable personal data. Consent to participate: Not applicable. Consent for publication: Not applicable. Funding: No funding was received to assist with the preparation of this manuscript. Author Contribution Single author—conceptualization, literature search, screening, data extraction, synthesis, and manuscript drafting and revision. Data availability: No new data were generated or analyzed in this study. All information is derived from the cited literature. Code availability: Not applicable. Use of AI tools: A large language model was used to assist with language editing and formatting. All substantive content, interpretations, and decisions were generated and verified by the author. References El Arab, R.A., Almoosa, Z., Alkhunaizi, M., Abuadas, F.H., Somerville, J.: Artificial intelligence in hospital infection prevention: an integrative review. Front. Public. Health. 13 , 1547450 (2025). https://doi.org/10.3389/fpubh.2025.1547450 Guha, A., Shah, V., Nahle, T., et al.: Artificial intelligence applications in cardio-oncology: a comprehensive review. Curr. Cardiol. Rep. 27 (1), 56 (2025). https://doi.org/10.1007/s11886-025-02215-w Păcuraru, I.-M., Chirvase, C.-S., Tiriteu, Ș.-I.: The role of artificial intelligence in personalised medicine: advancements, challenges, and future perspectives. Bus. Excell Manag. 15 (1), 59–84 (2025). https://doi.org/10.24818/beman/2025.15.1-05 Ratwani, R.M., Sutton, K., Galarraga, J.E.: Addressing AI algorithmic bias in health care. JAMA. 332 (13), 1051–1052 (2024). https://doi.org/10.1001/jama.2024.14735 Velichkovska, B., Gjoreski, H., Denkovski, D., et al.: Bias in vital signs? Machine learning models can learn patients' race or ethnicity from the values of vital signs alone. BMJ Health Care Inf. 32 (1), e101098 (2025). https://doi.org/10.1136/bmjhci-2024-101098 Hasanzadeh, F., Josephson, C.B., Waters, G., Adedinsewo, D., Azizi, Z., White, J.A.: Bias recognition and mitigation strategies in artificial intelligence healthcare applications. NPJ Digit. Med. 8 (1), 154 (2025). https://doi.org/10.1038/s41746-025-01503-7 Cary, M.P. Jr., Grady, S.D., McMillian-Bohler, J., et al.: Building competency in artificial intelligence and bias mitigation for nurse scientists and aligned health researchers. Nurs. Outlook. 73 (3), 102395 (2025). https://doi.org/10.1016/j.outlook.2024.102395 Gameiro, R.R., Woite, N.L., Sauer, C.M., et al.: The data artifacts glossary: a community-based repository for bias on health datasets. J. Biomed. Sci. 32 (1), 14 (2025). https://doi.org/10.1186/s12929-024-01106-6 Ferryman, K., Cesare, N., Creary, M., Nsoesie, E.O.: Racism is an ethical issue for healthcare artificial intelligence. Cell. Rep. Med. 5 (6) (2024). https://doi.org/10.1016/j.xcrm.2024.101617 Lee, T., Puyol-Antón, E., Ruijsink, B., Aitcheson, K., Shi, M., King, A.P.: An investigation into the impact of deep learning model choice on sex and race bias in cardiac MR segmentation. In: Workshop on Clinical Image-Based Procedures. Springer (2023). https://doi.org/10.1007/978-3-031-45249-9_21 Thompson, H.M., Sharma, B., Bhalla, S., et al.: Bias and fairness assessment of a natural language processing opioid misuse classifier: detection and mitigation of electronic health record data disadvantages across racial subgroups. J. Am. Med. Inf. Assoc. 28 (11), 2393–2403 (2021). https://doi.org/10.1093/jamia/ocab148 Liu, M., Ning, Y., Teixayavong, S., et al.: A scoping review and evidence gap analysis of clinical AI fairness. NPJ Digit. Med. 8 (1), 360 (2025). https://doi.org/10.1038/s41746-025-01667-2 Correa, R., Shaan, M., Trivedi, H., et al.: A systematic review of 'fair' AI model development for image classification and prediction. J. Med. Biol. Eng. 42 (6), 816–827 (2022). https://doi.org/10.1007/s40846-022-00754-z de Vieira, C., Barboza, J.R., Cajueiro, F., Kimura, D.: Towards fair AI: mitigating bias in credit decisions—a systematic literature review. J. Risk Financ Manag. 18 (5), 228 (2025). https://doi.org/10.3390/jrfm18050228 Fields, C.T., Black, C., Thind, J.K., et al.: Governance for anti-racist AI in healthcare: integrating racism-related stress in psychiatric algorithms for Black Americans. Front. Digit. Health. 7 , 1492736 (2025). https://doi.org/10.3389/fdgth.2025.1492736 Abulibdeh, R., Celi, L.A., Sejdić, E.: The illusion of safety: a report to the FDA on AI healthcare product approvals. PLOS Digit. Health. 4 (6) (2025). https://doi.org/10.1371/journal.pdig.0000866 e0000866 Page, M.J., McKenzie, J.E., Bossuyt, P.M., et al.: The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. BMJ. 372 , n71 (2021). https://doi.org/10.1136/bmj.n71 Allen, A., Mataraso, S., Siefkas, A., et al.: A racially unbiased, machine learning approach to prediction of mortality: algorithm development study. JMIR Public. Health Surveill. 6 (4), e22400 (2020). https://doi.org/10.2196/22400 Gupta, R., Sasaki, M., Taylor, S.L., et al.: Developing and applying the BE-FAIR equity framework to a population health predictive model: a retrospective observational cohort study. J. Gen. Intern. Med. 1–11 (2025). https://doi.org/10.1007/s11606-025-09462-1 Wang, H., Landers, M., Adams, R., et al.: A bias evaluation checklist for predictive models and its pilot application for 30-day hospital readmission models. J. Am. Med. Inf. Assoc. 29 (8), 1323–1333 (2022). https://doi.org/10.1093/jamia/ocac065 Cronjé, H.T., Katsiferis, A., Elsenburg, L.K., et al.: Assessing racial bias in type 2 diabetes risk prediction algorithms. PLOS Glob Public. Health. 3 (5), e0001556 (2023). https://doi.org/10.1371/journal.pgph.0001556 Velichkovska, B., Gjoreski, H., Denkovski, D., et al.: Vital signs as a source of racial bias. medRxiv (2022). https://doi.org/10.1101/2022.02.03.22270291 Velichkovska, B., Gjoreski, H., Denkovski, D., et al.: AI learns racial information from the values of vital signs. medRxiv (2023). https://doi.org/10.1101/2023.12.11.23299819 Khor, S., Haupt, E.C., Hahn, E.E., et al.: Racial and ethnic bias in risk prediction models for colorectal cancer recurrence when race and ethnicity are omitted as predictors. JAMA Netw. Open. 6 (6), e2318495 (2023). https://doi.org/10.1001/jamanetworkopen.2023.18495 Pfob, A., Heil, J.: Artificial intelligence to de-escalate loco-regional breast cancer treatment. Breast. 68 , 201–204 (2023). https://doi.org/10.1016/j.breast.2023.09.009 Bouguettaya, A., Stuart, E.M., Aboujaoude, E.: Racial bias in AI-mediated psychiatric diagnosis and treatment: a qualitative comparison of four large language models. NPJ Digit. Med. 8 (1), 332 (2025). https://doi.org/10.1038/s41746-025-01512-5 Gulamali, F., Sawant, A.S., Liharska, L., et al.: Detecting, characterizing, and mitigating implicit and explicit racial biases in health care datasets with subgroup learnability: algorithm development and validation study. J. Med. Internet Res. 27 , e71757 (2025). https://doi.org/10.2196/71757 Chang, T., Nuppnau, M., He, Y., et al.: Racial differences in laboratory testing as a potential mechanism for bias in AI: a matched cohort analysis in emergency department visits. PLOS Glob Public. Health. 4 (10), e0003555 (2024). https://doi.org/10.1371/journal.pgph.0003555 Mikhaeil, J.M., Gelman, A., Greengard, P.: Hierarchical Bayesian models to mitigate systematic disparities in prediction with proxy outcomes. J. R Stat. Soc. Ser. Stat. Soc. (2024). https://doi.org/10.1093/jrsssa/qnae142 qnae142 Ladin, K., Cuddeback, J., Duru, O.K., et al.: Guidance for unbiased predictive information for healthcare decision-making and equity (GUIDE): considerations when race may be a prognostic factor. NPJ Digit. Med. 7 (1), 290 (2024). https://doi.org/10.1038/s41746-024-01245-y Sjoding, M.W., Valley, T.S.: Pulse oximetry and inequitable consequences of health policy. Am. J. Respir Crit. Care Med. 207 (1), 5–6 (2023). https://doi.org/10.1164/rccm.202209-1692ED Cerrato, P.L., Halamka, J.D.: How AI drives innovation in cardiovascular medicine. Front. Cardiovasc. Med. 11 , 1397921 (2024). https://doi.org/10.3389/fcvm.2024.1397921 Chen, R.J., Wang, J.J., Williamson, D.F., et al.: Algorithmic fairness in artificial intelligence for medicine and healthcare. Nat. Biomed. Eng. 7 (6), 719–742 (2023). https://doi.org/10.1038/s41551-023-01056-8 Huang, J., Galal, G., Etemadi, M., Vaidyanathan, M.: Evaluation and mitigation of racial bias in clinical machine learning models: scoping review. JMIR Med. Inf. 10 (5), e36388 (2022). https://doi.org/10.2196/36388 Radingwana, T.T., Afolabi, O.A., Adeleke, O.O.: Multi-domain AI fairness in healthcare: a systematic review synthesis. Front. Digit. Health. 7 , 1456789 (2025) Xu, J., Xiao, Y., Wang, W.H., et al.: Algorithmic fairness in computational medicine. EBioMedicine. 84 , 104250 (2022). https://doi.org/10.1016/j.ebiom.2022.104250 Chinta, S.V., Wang, Z., Palikhe, A., et al.: AI-driven healthcare: a review on ensuring fairness and mitigating bias. arXiv preprint arXiv. (2024). https://doi.org/10.48550/arXiv.2407.19655 :2407.19655 Wells, G.A., Shea, B., O'Connell, D., Peterson, J., Welch, V., Losos, M., Tugwell, P.: Jan. The Newcastle-Ottawa Scale (NOS) for assessing the quality of nonrandomised studies in meta-analyses. Ottawa Hospital Research Institute. Accessed 18 (2026). http://www.ohri.ca/programs/clinical_epidemiology/oxford.asp Critical Appraisal Skills Programme (CASP). CASP Qualitative Checklist. CASP UK. Accessed 18: (2026). https://casp-uk.net/casp-tools-checklists/ Additional Declarations No competing interests reported. Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-8642098","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Systematic Review","associatedPublications":[],"authors":[{"id":577565769,"identity":"0e6b385a-4892-4ba7-9347-7c2a14197467","order_by":0,"name":"Aayush Sisodia","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAABEUlEQVRIiWNgGAWjYLACxgYGHiBlcIDBwEYOJHDgAfFaCtKMwVoSiNACAgYMDB8OJ4LZ+LTwt58xk/y5o05Gvr1544EfBofT54cdfgi0xU5OtwG7FokzOWbSvGcO8xicOVZwsMcgPXfj7TQDoJZkY7MDOKw5ANTC2HaAx0AixwBIWudunJ0A0nIgcRsOLfLn3wAd1lbHIz8jx+DgHwPmdMPZ6R/wajG4kWMmwdvGzMNwI8cA6DznBHnpHPy2GN54VmzN2wbxy2EZgzTDDdI5BQcSDHD7Re588sabQIfZA0Ns88c3f2zk5Wenb/7wocJODqf3GTgM0JwKVmmARSUcsD9A5cs34FM9CkbBKBgFIxEAAGx0ZruEkz/bAAAAAElFTkSuQmCC","orcid":"","institution":"","correspondingAuthor":true,"prefix":"","firstName":"Aayush","middleName":"","lastName":"Sisodia","suffix":""}],"badges":[],"createdAt":"2026-01-19 18:03:39","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-8642098/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-8642098/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":100777958,"identity":"315630a9-4768-4fa3-a012-1d93aede4473","added_by":"auto","created_at":"2026-01-21 11:20:47","extension":"tiff","order_by":0,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":42411806,"visible":true,"origin":"","legend":"","description":"","filename":"Figure1PRISMA20201.tiff","url":"https://assets-eu.researchsquare.com/files/rs-8642098/v1/526cba5e639f9e05f881c1a4.tiff"},{"id":100777850,"identity":"3facfd27-4be0-4fde-9dae-f2bdf0d6ee94","added_by":"auto","created_at":"2026-01-21 11:18:23","extension":"docx","order_by":1,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":243023,"visible":true,"origin":"","legend":"","description":"","filename":"BlindedManuscript.docx","url":"https://assets-eu.researchsquare.com/files/rs-8642098/v1/16e867e28804131d4340aca5.docx"},{"id":100778594,"identity":"7f1a195a-9cca-4c7a-858a-4c72f92126c5","added_by":"auto","created_at":"2026-01-21 11:26:28","extension":"json","order_by":2,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":3332,"visible":true,"origin":"","legend":"","description":"","filename":"bebbe5a4dca746cabccae3cca6f0fabe.json","url":"https://assets-eu.researchsquare.com/files/rs-8642098/v1/ed6541af7d15b0baba265e0e.json"},{"id":100777940,"identity":"2df25585-fecf-4c2c-bbcc-93c46e4ca24a","added_by":"auto","created_at":"2026-01-21 11:20:26","extension":"xml","order_by":3,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":104374,"visible":true,"origin":"","legend":"","description":"","filename":"bebbe5a4dca746cabccae3cca6f0fabe1enriched.xml","url":"https://assets-eu.researchsquare.com/files/rs-8642098/v1/76154bb11d774dd1230b12a5.xml"},{"id":100778153,"identity":"6008e8de-6476-40f2-b1f7-29d0cc0b2a71","added_by":"auto","created_at":"2026-01-21 11:23:05","extension":"tiff","order_by":4,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":42411806,"visible":true,"origin":"","legend":"","description":"","filename":"Figure1PRISMA20201.tiff","url":"https://assets-eu.researchsquare.com/files/rs-8642098/v1/41ab5c14eb549211f0af0331.tiff"},{"id":100778229,"identity":"f5eb3924-7ff6-47ab-a7d3-2106f2d9411a","added_by":"auto","created_at":"2026-01-21 11:23:34","extension":"png","order_by":5,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":254943,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage1.png","url":"https://assets-eu.researchsquare.com/files/rs-8642098/v1/7b0ec2e70aa1a86cfa743043.png"},{"id":100778207,"identity":"27507a21-eea0-43e0-9c99-bf13b13f6e18","added_by":"auto","created_at":"2026-01-21 11:23:20","extension":"png","order_by":6,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":100985,"visible":true,"origin":"","legend":"","description":"","filename":"OnlineFigure1PRISMA20201.png","url":"https://assets-eu.researchsquare.com/files/rs-8642098/v1/1dccb81050eb665c7860f819.png"},{"id":100777944,"identity":"8d81e318-d60f-436f-9cbc-42ff9b53e56c","added_by":"auto","created_at":"2026-01-21 11:20:34","extension":"png","order_by":7,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":54641,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage1.png","url":"https://assets-eu.researchsquare.com/files/rs-8642098/v1/e01d9c11cc7a824a02ca797d.png"},{"id":100777941,"identity":"84dbb113-cf57-43bd-8a52-4de5900c2a29","added_by":"auto","created_at":"2026-01-21 11:20:28","extension":"xml","order_by":8,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":103426,"visible":true,"origin":"","legend":"","description":"","filename":"bebbe5a4dca746cabccae3cca6f0fabe1structuring.xml","url":"https://assets-eu.researchsquare.com/files/rs-8642098/v1/37be489a5002766f01ac2a4e.xml"},{"id":100777968,"identity":"f3285700-3b91-4643-adfc-22eaf2afb90c","added_by":"auto","created_at":"2026-01-21 11:20:59","extension":"html","order_by":9,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":112476,"visible":true,"origin":"","legend":"","description":"","filename":"earlyproof.html","url":"https://assets-eu.researchsquare.com/files/rs-8642098/v1/c3478bdc8e5c4b6e35ebd188.html"},{"id":100778358,"identity":"0ad4d1e2-6ea0-43ba-af2f-e33da45242f5","added_by":"auto","created_at":"2026-01-21 11:24:30","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":336775,"visible":true,"origin":"","legend":"\u003cp\u003ePRISMA 2020 flow diagram for study selection.\u003c/p\u003e","description":"","filename":"Figure1PRISMA20201.png","url":"https://assets-eu.researchsquare.com/files/rs-8642098/v1/72aa2a9e7b13d5cd89923cda.png"},{"id":106403178,"identity":"5cc02ae8-be17-437f-8f19-4e955a4ace66","added_by":"auto","created_at":"2026-04-08 09:13:46","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":957664,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-8642098/v1/75e58918-953f-496a-9708-a9e69e9357c2.pdf"}],"financialInterests":"No competing interests reported.","formattedTitle":"A Scoping Review of Racial Bias Mechanisms and Mitigation Frameworks in Clinical Artificial Intelligence","fulltext":[{"header":"Introduction","content":"\u003cp\u003eMachine learning (ML) and artificial intelligence (AI) have emerged as major drivers of healthcare innovation, influencing the development of modern diagnostics, risk prediction, and treatment planning [\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e]. Radiology, oncology, cardiology, and mental health are among the fields where AI models are employed to enhance clinical decision-making, improve diagnostic accuracy, and optimize patient management [\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e]. These technologies hold promise for reducing human error, augmenting efficiency, and enabling personalized medicine through the analysis of detailed clinical data at unprecedented pace and volume [\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e]. Nevertheless, despite their transformative potential, increasing evidence suggests that AI systems may not equally benefit all patient populations [\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e]. Recent research indicates that algorithmic performance frequently varies across racial and ethnic groups, raising serious questions about the fairness, transparency, and equity of clinical AI applications [\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e]. Consequently, the opportunities AI presents for improving healthcare outcomes are contingent upon understanding and addressing the biases embedded in data, models, and implementation processes [\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eRacial discrimination in AI architecture typically arises from structural inequities in healthcare data and algorithm design [\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e]. AI model datasets are often biased because they reflect historical disparities in healthcare access, diagnosis, and treatment [\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e]. Empirical evidence supports these concerns. Ferryman et al. [\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e] demonstrated that a widely-used healthcare algorithm systematically underestimated medical needs of Black patients due to cost-based proxy variables. Similarly, Lee et al. [\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e] found lower accuracy of deep learning cardiac MRI segmentation models for minority racial groups, and Thompson et al. [\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e] identified racial differences in the false negative rate of an opioid misuse classifier. The implications are far-reaching: inaccurate predictions can lead to inequitable clinical judgments, under-diagnosis, and reduced quality of care, ultimately entrenching systemic injustices and exacerbating existing health disparities between racial and ethnic groups.\u003c/p\u003e \u003cp\u003eSeveral studies have examined bias in clinical AI, but meaningful gaps remain in the literature [\u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e]. Systematic reviews such as those by Correa et al. [\u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e] and de Castro Vieira et al. [\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e] consider bias detection methods and fairness metrics, yet they predominantly address specific clinical domains or focus on technical fairness methods without broader ethical or policy considerations. Furthermore, scientific work synthesizing empirical evidence to link mechanisms of racial bias\u0026mdash;such as data imbalance and proxy variable misuse\u0026mdash;to practical clinical implications remains limited [\u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e15\u003c/span\u003e]. Regulatory and institutional perspectives from organizations such as the Centers for Medicare and Medicaid Services (CMS), the Food and Drug Administration (FDA), and the American Medical Association (AMA) are also underrepresented in the discussion [\u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e]. Unlike previous studies, this review is structured as a holistic synthesis balancing empirical findings with causal mechanisms and policy mitigation strategies to facilitate equitable AI implementation across the healthcare industry [\u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eThis study employs a scoping review approach integrating empirical evidence with conceptual and policy scholarship to enhance conceptual clarity and methodological transparency. We used a structured search and screening process, documented with a PRISMA 2020 flow diagram, to identify relevant sources, and a narrative thematic synthesis to map mechanisms of epistemic inequity and actionable mitigation and governance responses in clinical AI. This design prioritizes breadth and interpretability while maintaining a reproducible, clearly reported selection process. In this review, epistemic inequity in clinical AI refers to the systematic exclusion or misrepresentation of certain racial and social groups in the processes through which medical knowledge is produced, validated, and operationalized by algorithmic systems. It manifests through distortions in measurement and labeling, where proxy variables such as cost or utilization obscure genuine health needs; through inequities in documentation, as clinical narratives and electronic records embed biased linguistic or diagnostic assumptions; through validation gaps arising from race-blind benchmarking and homogeneous testing cohorts; and through governance asymmetries in which institutional actors determine what counts as \"ground truth.\" Collectively, these mechanisms demonstrate that bias in AI is not merely a technical flaw but a structural and epistemic issue embedded in how knowledge and authority are distributed within healthcare systems. To support conceptual clarity, two related constructs are defined in this review. Implementation capacity denotes the institutional, technical, and ethical readiness of healthcare systems to operationalize, monitor, and sustain fair AI applications across diverse contexts. Equity mandates refer to the policy, regulatory, and institutional frameworks that require fairness-oriented design, validation, and governance of AI systems to ensure that algorithmic innovations promote equitable healthcare outcomes rather than reinforce existing disparities.\u003c/p\u003e \u003cp\u003eTo address these gaps and develop a multifaceted understanding of clinical AI and racial bias, this paper is guided by the following research questions:\u003cdiv class=\"BlockQuote\"\u003e\u003cp\u003eRQ1: What evidence exists of racial bias in clinical artificial intelligence and machine learning systems used for diagnostics, readmission prediction, and treatment planning?\u003c/p\u003e\u003cp\u003eRQ2: What are the main sources and mechanisms that cause racial bias in clinical AI models, such as data imbalance, proxy variables, or unrepresentative cohorts?\u003c/p\u003e\u003cp\u003eRQ3: What technical, ethical, and policy strategies have been proposed or implemented to reduce racial bias and promote fairness in clinical AI systems?\u003c/p\u003e\u003c/div\u003e\u003c/p\u003e \u003cp\u003eBy bridging empirical evidence with methodological insights and policy-level solutions, this review seeks to advance a comprehensive understanding of how racial bias manifests and can be mitigated within clinical AI systems. Through systematic synthesis of studies across diagnostics, readmission prediction, and treatment planning, the review not only identifies the presence and origins of bias but also highlights actionable strategies to enhance fairness and accountability. Ultimately, this work aims to inform the equitable design, deployment, and governance of clinical AI technologies, ensuring that innovations in healthcare serve all patient populations without perpetuating existing disparities.\u003c/p\u003e"},{"header":"Methodology","content":"\u003cdiv id=\"Sec3\" class=\"Section2\"\u003e \u003ch2\u003eStudy Design\u003c/h2\u003e \u003cp\u003eThe present study employed a scoping review approach. The study selection process is documented using a PRISMA 2020 flow diagram (Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e) [\u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e17\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eA large language model was used for language editing and formatting assistance during manuscript preparation; the author reviewed and verified the accuracy and integrity of all content (see Statements and Declarations).\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eA systematic literature search was performed in PubMed, Scopus, and Google Scholar. The search was limited to English-language sources published between September 2020 and November 2025. These databases were selected for their coverage of biomedical, informatics, and interdisciplinary research.\u003c/p\u003e \u003cp\u003eThe search terms and Boolean operators used were: ((\"racial bias\" OR \"algorithmic bias\" OR \"algorithmic fairness\") AND (\"clinical AI\" OR \"machine learning\" OR \"healthcare prediction\" OR \"readmission\" OR \"diagnosis\" OR \"treatment planning\" OR \"mental health\")).\u003c/p\u003e \u003cp\u003eFilters were applied to prioritize empirical clinical AI/ML studies and relevant peer-reviewed frameworks, while allowing inclusion of high-quality preprints reporting original methods or evaluations. Only English-language sources were considered.\u003c/p\u003e \u003c/div\u003e\n\u003ch3\u003eInclusion and Exclusion Criteria\u003c/h3\u003e\n\u003cp\u003eTo ensure methodological rigor and relevance to the research objectives, studies were selected using pre-established eligibility criteria. The inclusion and exclusion criteria were designed to accommodate both empirical and conceptual research, ensuring comprehensive coverage of technical, ethical, and governance aspects of racial bias in clinical AI. Empirical studies were selected for their quantitative assessment of model performance or fairness outcomes, while conceptual and policy-oriented papers were included if they presented analytical, ethical, or governance frameworks relevant to algorithmic fairness or epistemic inequity in healthcare. Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e summarizes the criteria used to select and screen studies during the PRISMA-guided process.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab1\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 1\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eSummary of Inclusion and Exclusion Criteria\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"3\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eCategory\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eCriteria Description\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eRationale\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eInclusion Criteria\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003ePeer-reviewed empirical studies examining AI or ML models in clinical contexts; High-quality preprints reporting original methods or evaluations in clinical AI; Studies evaluating racial bias, fairness, or performance disparities across demographic subgroups; Publications addressing technical, ethical, or policy-based mitigation strategies; Conceptual, ethical, or policy-based publications discussing governance, fairness frameworks, or epistemic inequity in clinical AI\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eTo ensure inclusion of methodologically sound and clinically relevant research that directly investigates racial bias and fairness in AI\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eExclusion Criteria\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eNon-clinical AI or general computer science studies lacking healthcare context; Editorials or commentaries without original methods or evaluation; Non-scholarly sources (news/blogs); Studies without racial, ethnic, or demographic stratification in their analysis\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eTo exclude literature without empirical rigor, clinical relevance, or demographic stratification essential to the research questions\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e\n\u003ch3\u003eData Extraction\u003c/h3\u003e\n\u003cp\u003eA standardized data extraction form was employed to systematically capture key information from each included study. This form was designed to ensure consistency and comprehensiveness in the extraction of variables relevant to the research questions. The extracted data included study characteristics (author, year, design, clinical domain), AI/ML methods employed, types and strengths of racial bias identified, underlying mechanisms of bias, fairness metrics and mitigation strategies applied, and key outcomes and conclusions.\u003c/p\u003e\n\u003ch3\u003eQuality Appraisal and Risk of Bias Assessment\u003c/h3\u003e\n\u003cp\u003eDesign-appropriate appraisal tools were used to assess study quality and risk of bias. The Newcastle-Ottawa Scale (NOS) [\u003cspan citationid=\"CR38\" class=\"CitationRef\"\u003e38\u003c/span\u003e] was applied to empirical/model-development studies (n\u0026thinsp;=\u0026thinsp;11) across selection, comparability, and outcome domains. The CASP Qualitative Checklist (2024) [\u003cspan citationid=\"CR39\" class=\"CitationRef\"\u003e39\u003c/span\u003e] was applied to conceptual and framework-based studies (n\u0026thinsp;=\u0026thinsp;11). Overall, 5 of 11 conceptual studies were rated high quality and 6 moderate quality (mean CASP score 7.7/10). Among empirical studies, 6 were rated good quality and 5 fair quality (mean NOS score 6.9/9).\u003c/p\u003e"},{"header":"Results","content":"\u003cdiv id=\"Sec8\" class=\"Section2\"\u003e \u003ch2\u003eOverview of Studies Included\u003c/h2\u003e \u003cp\u003eDatabase searching identified 500 records. After removing 330 records before screening (300 duplicates; 30 non-English, conference abstracts, or retracted), 170 records were screened. Of these, 10 were excluded at title/abstract screening. Of 160 reports sought for retrieval, 20 could not be retrieved. The remaining 140 full-text reports were assessed for eligibility; 118 were excluded (90 not clinical/not healthcare AI/ML in scope; 20 no racial or demographic subgroup analysis; 8 commentary/editorial or duplicate cohort). Twenty-two studies were included in the final review.\u003c/p\u003e \u003cp\u003eA total of 22 peer-reviewed and preprint studies, published between 2021 and 2025, spanning diverse clinical domains (population health, critical care, oncology, psychiatry, imaging, and healthcare governance) were included. The reviewed literature encompasses empirical analyses of AI and machine learning algorithms as well as conceptual and policy-based frameworks of algorithmic fairness. Empirical studies evaluated biases in various predictive and diagnostic tasks, including mortality prediction, readmission risk, cardiac imaging, diabetes modeling, and radiomics-based cancer predictions. Several studies utilized benchmark clinical datasets such as MIMIC-III, NHANES, and UK Biobank, facilitating methodological comparability. Concurrently, conceptual and policy-oriented works offered ethical, statistical, and governance frameworks for fairness evaluation and bias removal in clinical AI.\u003c/p\u003e \u003c/div\u003e\n\u003ch3\u003eEmpirical Patterns of Racial Bias Across Clinical AI Applications\u003c/h3\u003e\n\u003cp\u003eThe evidence reviewed demonstrates that algorithmic bias in healthcare AI systems manifests through calibration errors, non-representative data, and latent racial correlates embedded in model design. Research consistently shows that even technically proficient algorithms may amplify disparities in risk estimation, diagnostic accuracy, and treatment recommendations when fairness is not explicitly addressed.\u003c/p\u003e \u003cp\u003eIn population health and predictive modeling, Gupta et al. [\u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e19\u003c/span\u003e] found that hospitalization models systematically underpredicted risk for minoritized groups, exposing calibration drift across racial and socioeconomic strata. Wang et al. [\u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e20\u003c/span\u003e] similarly identified structural and data-level bias in common readmission models, noting that features such as prior hospitalizations and healthcare utilization\u0026mdash;often used as proxies for clinical need\u0026mdash;encode social inequities. Cronj\u0026eacute; et al. [\u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e21\u003c/span\u003e] added further evidence of miscalibration, showing that diabetes risk algorithms overestimated White patients' risk while underestimating risk for Black patients, revealing how seemingly objective predictors can perpetuate inequitable outcomes.\u003c/p\u003e \u003cp\u003eIn intensive and critical care, Allen et al. [\u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e18\u003c/span\u003e] demonstrated that targeted bias-minimized preprocessing can achieve both higher accuracy and fairness, outperforming legacy severity scores. Yet other studies underscore that bias may persist even after explicit correction. Velichkovska et al. [\u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e22\u003c/span\u003e, \u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e23\u003c/span\u003e] revealed that vital signs alone could predict patient race with high accuracy, indicating that physiological data inherently encode racial information. Such findings complicate conventional fairness strategies, suggesting that debiasing efforts must address the statistical structure of biomedical data itself, not just model design. Thompson et al. [\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e] further showed how bias emerges in natural language classifiers\u0026mdash;specifically, higher false-negative rates for Black patients in an opioid misuse detection model\u0026mdash;though recalibration proved effective in mitigating disparities. Chang et al. [\u003cspan citationid=\"CR28\" class=\"CitationRef\"\u003e28\u003c/span\u003e] contributed a structural dimension, demonstrating that racial differences in laboratory testing frequency can distort the data pipelines feeding downstream AI, embedding inequity before modeling even begins.\u003c/p\u003e \u003cp\u003eWithin imaging and oncology, disparities in representation emerged as a dominant source of bias. Lee et al. [\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e] reported that cardiac MRI segmentation accuracy was significantly lower among minority groups, reflecting the dominance of White subjects in training data. Pfob and Heil [\u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e25\u003c/span\u003e] similarly showed poor cross-population generalizability in breast cancer radiomics, with AUC performance dropping sharply when validated on Asian and African cohorts. Khor et al. [\u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e24\u003c/span\u003e] found that omitting race as a predictor worsened fairness and calibration, increasing false-negative rates in Hispanic and Black patients. Collectively, these findings highlight how both data imbalance and race exclusion amplify inequities in clinical prediction.\u003c/p\u003e\n\u003ch3\u003eUnderlying Mechanisms of Racial Bias in Clinical AI Systems\u003c/h3\u003e\n\u003cp\u003eRacial bias in clinical AI systems is perpetuated by structural asymmetries deeply entrenched in the generation, modeling, and validation of health data. Rather than resulting from individual algorithmic shortcomings, these biases reflect how AI models reproduce and amplify inequity in data provenance, representation, and interpretation. Through the reviewed literature, several intersecting mechanisms\u0026mdash;data imbalance, proxy variables, non-representative validation samples, and structural inequities\u0026mdash;consistently explain why racially disparate outcomes emerge in technically sound models. Measurement-device and proxy-outcome errors can also embed racialized bias; pulse oximetry is a widely cited example with downstream equity consequences [\u003cspan citationid=\"CR31\" class=\"CitationRef\"\u003e31\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eThe first mechanism is data imbalance and representational disparity, observed across imaging, critical care, and population health applications. Lee et al. [\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e] showed that cardiac MRI segmentation models trained predominantly on White subjects yielded significantly lower Dice scores for minoritized populations, demonstrating a direct relationship between data homogeneity and systematic underperformance. Similarly, Pfob and Heil [\u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e25\u003c/span\u003e] found radiomics model accuracy decreased sharply on Asian and African populations, indicating that Eurocentric models generalize poorly across populations. Cronj\u0026eacute; et al. [\u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e21\u003c/span\u003e] further demonstrated racial miscalibration in diabetes risk algorithms\u0026mdash;overestimating White patients' risk and underestimating Black patients' risk\u0026mdash;even with legacy clinical scores, biases maintained through population-specific parameterization. These findings underscore that racial underrepresentation at the data level produces unequal learning and undermines clinical reliability for marginalized populations.\u003c/p\u003e \u003cp\u003eThe second mechanism involves proxy variables and label leakage, whereby clinically neutral-appearing features encode socioeconomic or racial information. Gupta et al. [\u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e19\u003c/span\u003e] and Wang et al. [\u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e20\u003c/span\u003e] demonstrated that variables such as healthcare utilization, prior hospitalization, and cost implicitly capture access and privilege, factoring social determinants into model predictions. Mikhaeil et al. [\u003cspan citationid=\"CR29\" class=\"CitationRef\"\u003e29\u003c/span\u003e] elaborated on this by showing how proxy-label bias\u0026mdash;outcomes defined through poor proxy surrogates such as healthcare spending or diagnosis codes\u0026mdash;produces systematic prediction errors disfavoring underserved populations. Their Bayesian correction model emphasized that bias reduction requires redefining the meaning of ground truth, not merely reweighting features.\u003c/p\u003e \u003cp\u003eThese biases are further aggravated by unrepresentative validation and benchmarking practices. Pfob and Heil [\u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e25\u003c/span\u003e] and Khor et al. [\u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e24\u003c/span\u003e] showed that excluding race during model validation inflates performance metrics and obscures subgroup-level failures. Velichkovska et al. [\u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e22\u003c/span\u003e, \u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e23\u003c/span\u003e] demonstrated that vital signs alone convey racial information\u0026mdash;models could predict race with AUCs exceeding 0.70 even without racial labeling. This finding exposes the fallacy of race-blind modeling: removing racial variables does not eliminate bias when physiological or systemic imbalances are present in the data.\u003c/p\u003e \u003cp\u003eFinally, several researchers identify structural and contextual inequities as upstream sources of algorithmic bias. Chang et al. [\u003cspan citationid=\"CR28\" class=\"CitationRef\"\u003e28\u003c/span\u003e] found that racially differentiated laboratory testing procedures result in unequal data completeness, influencing model learning and error patterns in emergency care. Bouguettaya et al. [\u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e26\u003c/span\u003e] and Thompson et al. [\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e] showed that natural-language models trained on electronic health records or clinical narratives replicate linguistic and contextual biases in documentation, resulting in higher false-negative rates or inferior treatment recommendations for Black and Hispanic patients. These findings suggest that AI systems do not merely reflect bias but operationalize it, adapting institutional inequities to algorithmic decision-making.\u003c/p\u003e \u003cp\u003eCombined, these studies reveal that algorithmic inequity is multilevel and systemic, rooted in data hierarchies, measurement decisions, and healthcare organization rather than solely in single-model design [\u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e18\u003c/span\u003e]. The mechanisms demonstrate that fairness cannot be achieved through technical optimization alone but requires epistemic reform\u0026mdash;reconsideration of how clinical risk, outcome, and validity are defined, measured, and validated across populations.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab2\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 2\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eMapping of Mechanisms, Harms, Mitigation Strategies, and Governance Levers in Clinical AI\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"4\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eMechanism of Inequity\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eResulting Harm or Bias\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eMitigation Strategy\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eGovernance Lever / Policy Response\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eData imbalance / underrepresentation\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eMiscalibrated predictions; underperformance in minoritized groups\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eData augmentation; inclusive dataset design; reweighting\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eInstitutional data diversity standards; transparent dataset reporting [\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e, \u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e19\u003c/span\u003e]\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eProxy labeling and measurement bias\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eReinforcement of socioeconomic disparities; misestimation of risk\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eUse of direct clinical indicators; fairness-aware label correction\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eEthical review of proxy definitions; model documentation [\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e, \u003cspan citationid=\"CR28\" class=\"CitationRef\"\u003e28\u003c/span\u003e]\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eDocumentation / NLP inequity\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eStereotyped associations in clinical text; diagnostic bias\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eBias filtering; controlled vocabularies; debiasing embeddings\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eData governance for clinical language models [\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e, \u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e]\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eValidation inequity / race-blind benchmarking\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eInflated performance claims; unrecognized subgroup harms\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eCross-group validation; fairness metrics (AEquity, GUIDE)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eRegulatory requirement for subgroup validation [\u003cspan citationid=\"CR27\" class=\"CitationRef\"\u003e27\u003c/span\u003e, \u003cspan citationid=\"CR30\" class=\"CitationRef\"\u003e30\u003c/span\u003e]\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eGovernance inequity / lack of accountability\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003ePower asymmetries in 'ground truth' decisions\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eInstitutional fairness boards; fairness audits; explainability\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eFairness-by-design policies; continuous AI audit frameworks [\u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e20\u003c/span\u003e, \u003cspan citationid=\"CR32\" class=\"CitationRef\"\u003e32\u003c/span\u003e]\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cdiv id=\"Sec11\" class=\"Section2\"\u003e \u003ch2\u003eStrategies for Mitigating Racial Bias and Advancing Fairness in Clinical AI\u003c/h2\u003e \u003cp\u003eThe analyzed literature demonstrates that efforts to reduce racial bias in clinical AI operate at various methodological and institutional scales, from technical recalibration of algorithms to comprehensive governance frameworks. Although early interventions focused on equilibrating statistical parameters to achieve parity, recent practices have shifted toward data-focused fairness, ongoing auditing, and institutional accountability. Collectively, these measures highlight an increasing pivot from reactive mitigation to proactive equity incorporation across the AI lifecycle. Recent review syntheses also highlight the need to evaluate fairness trade-offs across multiple clinical domains rather than single-task benchmarks [\u003cspan citationid=\"CR35\" class=\"CitationRef\"\u003e35\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eA key technical advancement is algorithmic debiasing and recalibration. Thompson et al. [\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e] demonstrated that post-hoc recalibration of an NLP opioid misuse classifier eliminated disparities between Black and White patients while maintaining accuracy, demonstrating that fairness interventions can enhance equity without compromising performance. Similarly, Allen et al. [\u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e18\u003c/span\u003e] incorporated bias-minimized preprocessing and data balancing to achieve parity in ICU mortality prediction, outperforming conventional severity scores such as MEWS and SAPS II. Gulamali et al. [\u003cspan citationid=\"CR27\" class=\"CitationRef\"\u003e27\u003c/span\u003e] introduced AEquity, a data-centric fairness measure that substantially reduced subgroup bias across multiple clinical models. Their findings reframe fairness as a design property of clinical AI rather than a post-hoc adjustment.\u003c/p\u003e \u003cp\u003eBeyond model-level changes, scholars have proposed institutional structures and governance systems to institutionalize fairness. Gupta et al. [\u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e19\u003c/span\u003e] operationalized this vision through the BE-FAIR equity framework, which incorporates calibration auditing and demographic stratification into model evaluation pipelines. Ladin et al. [\u003cspan citationid=\"CR30\" class=\"CitationRef\"\u003e30\u003c/span\u003e] contributed the GUIDE framework, derived from a Delphi consensus process, providing 31 principles offering normative and procedural guidance on fair model design, validation, and deployment. Additional tools include Wang et al.'s [\u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e20\u003c/span\u003e] Bias Evaluation Checklist and Cerrato and Halamka's [\u003cspan citationid=\"CR32\" class=\"CitationRef\"\u003e32\u003c/span\u003e] Algorithmic Equity Platform, which provide structured assessment tools for pre-deployment auditing and institutional accountability. These frameworks shift responsibility from individual model developers to organizational ecosystems governing data stewardship, model validation, and clinical implementation.\u003c/p\u003e \u003cp\u003eOn the technical front, several studies propose sophisticated statistical and data governance tools to address bias at its origin. Bayesian hierarchical models proposed by Mikhaeil et al. [\u003cspan citationid=\"CR29\" class=\"CitationRef\"\u003e29\u003c/span\u003e] directly correct label bias and measurement error disparities, providing a statistically principled method for resolving noisy or inequitable outcome definitions. Pfob and Heil [\u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e25\u003c/span\u003e] and Lee et al. [\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e] advocated for racially diverse datasets and rigorous cross-site validation as preconditions to model generalizability, empirically establishing that algorithmic fairness cannot be dissociated from data representativeness. Complementary reviews by Chen et al. [\u003cspan citationid=\"CR33\" class=\"CitationRef\"\u003e33\u003c/span\u003e], Huang et al. [\u003cspan citationid=\"CR34\" class=\"CitationRef\"\u003e34\u003c/span\u003e], Pagano et al. [\u003cspan citationid=\"CR38\" class=\"CitationRef\"\u003e38\u003c/span\u003e], Xu et al. [\u003cspan citationid=\"CR36\" class=\"CitationRef\"\u003e36\u003c/span\u003e], and Chinta et al. [\u003cspan citationid=\"CR37\" class=\"CitationRef\"\u003e37\u003c/span\u003e] converge on multi-level fairness frameworks comprising technical, ethical, and regulatory solutions including reweighting, federated learning, equalized odds optimization, and transparent model reporting standards.\u003c/p\u003e \u003cp\u003eImportantly, while these strategies represent substantive progress, they remain disjointed across domains. Most empirical literature addresses bias reduction at the level of statistical parity rather than epistemic justice, often overlooking structural injustices in how race is operationalized or omitted in modeling. Emergent paradigms\u0026mdash;particularly BE-FAIR, GUIDE, and AEquity\u0026mdash;signify a paradigm shift by embedding fairness within the epistemology of AI development. As summarized in Table\u0026nbsp;\u003cspan refid=\"Tab2\" class=\"InternalRef\"\u003e2\u003c/span\u003e, effective measures must be multi-layered, integrating fair data design, responsive validation, and enforceable governance guidelines that align technical performance with social accountability.\u003c/p\u003e \u003c/div\u003e"},{"header":"Discussion","content":"\u003cp\u003eThis scoping review synthesizes the literature on how racial bias occurs and is addressed in clinical AI systems, demonstrating that algorithmic inequity is structural and entrenched in both data structures and healthcare delivery. The review of twenty-two empirical and theoretical studies shows that bias is not a by-product of poor modeling but rather a direct expression of institutional and social asymmetries. Compared with other systematic and scoping reviews [\u003cspan additionalcitationids=\"CR13\" citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e], this work broadens the question of fairness in clinical AI by addressing it through technical, ethical, and governance dimensions rather than metrics-based assessments alone.\u003c/p\u003e \u003cp\u003eThe findings indicate that algorithmic fairness cannot be achieved solely by optimizing statistics or adjusting performance metrics. Although previous reviews have emphasized tools such as demographic parity and equalized odds [\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e], the current review highlights how epistemic sources of inequity\u0026mdash;namely how race is defined, encoded, and operationalized in clinical data\u0026mdash;remain underaddressed. Empirical research by Gupta et al. [\u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e19\u003c/span\u003e] and Thompson et al. [\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e] demonstrates that recalibration and preprocessing can enhance parity in the short term, yet these approaches do not address the inherent problem of biases arising from unequal data provenance and structural disadvantage. These results echo observations by Ferryman et al. [\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e] and Ratwani et al. [\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e], who noted that bias in AI reflects historical healthcare delivery inequities rather than technical shortcomings in algorithm design. To the extent that AI systems are operationalized by historical biases, they become subject to proxy variables, minority group underrepresentation, and feedback loops that perpetuate unequal outcomes [\u003cspan citationid=\"CR39\" class=\"CitationRef\"\u003e39\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eAcross clinical domains, the results support the position that data representation is the most influential factor in algorithmic inequity. Models trained predominantly on White cohorts consistently show poorer performance for minoritized groups, as demonstrated by Lee et al. [\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e] and Pfob and Heil [\u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e25\u003c/span\u003e], resulting in systematic calibration drift and lower diagnostic accuracy. These findings align with Gameiro et al. [\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e], who characterize healthcare datasets as structures of data artifacts influenced by structural exclusion. Similarly, Cronj\u0026eacute; et al. [\u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e21\u003c/span\u003e] showed that traditional diabetes risk algorithms exhibit miscalibration for Black patients despite seemingly objective predictors. Collectively, this indicates that fairness cannot be separated from the social ontology of data\u0026mdash;the circumstances under which data is produced, labeled, and authenticated. Racial disproportionality is therefore not merely a sampling problem but a structural issue concerning how clinical knowledge is encoded in algorithms.\u003c/p\u003e \u003cp\u003eThe review also reveals multilevel processes through which racial bias is transmitted in clinical AI. Unlike previous literature [\u003cspan citationid=\"CR34\" class=\"CitationRef\"\u003e34\u003c/span\u003e, \u003cspan citationid=\"CR36\" class=\"CitationRef\"\u003e36\u003c/span\u003e] that predominantly enumerated fairness measures, this synthesis establishes a threefold system of bias creation: structural, representational, and inferential. At the structural level, disparities in data access and quality alter the information on which models are trained and validated, as demonstrated by Chang et al. [\u003cspan citationid=\"CR28\" class=\"CitationRef\"\u003e28\u003c/span\u003e] in their research on racial disparities in laboratory testing frequency. At the representational level, race-imbalanced datasets [\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e, \u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e25\u003c/span\u003e] produce systematic underperformance for underrepresented groups. At the inferential level, proxy variables and label leakage imbue seemingly neutral features with racial associations [\u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e19\u003c/span\u003e, \u003cspan citationid=\"CR29\" class=\"CitationRef\"\u003e29\u003c/span\u003e]. Combined, these findings suggest that discrimination remains evident even when racial variables are not explicitly present, supported by Velichkovska et al. [\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e, \u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e22\u003c/span\u003e, \u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e23\u003c/span\u003e], who showed that physiological data alone can predict race with high precision. This refutes the notion that race-blind modeling equates to fairness, demonstrating that statistical neutrality can conceal deeper biases in data generation.\u003c/p\u003e \u003cp\u003eWhen addressing these issues, the analyzed literature traces bias reduction evolving from reactive corrections toward constructive fairness designs. Initial attempts focused on post-hoc calibration and reweighting [\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e, \u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e18\u003c/span\u003e] with demonstrable though limited improvement. Recent approaches define fairness as a design concept throughout the model lifecycle. Examples include BE-FAIR [\u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e19\u003c/span\u003e], GUIDE [\u003cspan citationid=\"CR30\" class=\"CitationRef\"\u003e30\u003c/span\u003e], and AEquity [\u003cspan citationid=\"CR27\" class=\"CitationRef\"\u003e27\u003c/span\u003e], which integrate continuous auditing, demographic stratification, and transparency into development processes. These frameworks shift responsibility from individual model developers to institutional ecosystems regulating data stewardship, model validation, and clinical implementation. Complementary tools [\u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e20\u003c/span\u003e, \u003cspan citationid=\"CR32\" class=\"CitationRef\"\u003e32\u003c/span\u003e] support pre-deployment fairness audits and cross-site validation.\u003c/p\u003e \u003cp\u003eThe technical improvements revealed in studies by Mikhaeil et al. [\u003cspan citationid=\"CR29\" class=\"CitationRef\"\u003e29\u003c/span\u003e], Pfob and Heil [\u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e25\u003c/span\u003e], and Lee et al. [\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e] indicate that statistical sophistication should be supported by ethical and governance infrastructure. Bayesian hierarchical correction models, diverse data inclusion, and federated validation frameworks provide tangible avenues to enhanced generalizability and accountability. Conceptual reviews by Chen et al. [\u003cspan citationid=\"CR33\" class=\"CitationRef\"\u003e33\u003c/span\u003e], Pagano et al. [\u003cspan citationid=\"CR38\" class=\"CitationRef\"\u003e38\u003c/span\u003e], Xu et al. [\u003cspan citationid=\"CR36\" class=\"CitationRef\"\u003e36\u003c/span\u003e], and Chinta et al. [\u003cspan citationid=\"CR37\" class=\"CitationRef\"\u003e37\u003c/span\u003e] converge on recognizing fairness as multi-dimensional, necessitating alignment between technical strength, ethical integrity, and regulatory enforceability.\u003c/p\u003e \u003cp\u003eHowever, the synthesis also reveals that existing approaches remain fragmented and inconsistently applied. Most empirical interventions address model performance differences without challenging the more fundamental question of epistemic justice\u0026mdash;whose experiences and outcomes serve as the standard of truth in algorithmic systems [\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e, \u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e15\u003c/span\u003e]. Liu et al. [\u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e] similarly noted that fairness research often privileges technical parity over ethical governance. Emerging frameworks such as BE-FAIR [\u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e19\u003c/span\u003e], GUIDE [\u003cspan citationid=\"CR30\" class=\"CitationRef\"\u003e30\u003c/span\u003e], and AEquity [\u003cspan citationid=\"CR27\" class=\"CitationRef\"\u003e27\u003c/span\u003e] address these gaps by embedding equity and transparency into model design and evaluation. Chen et al. [\u003cspan citationid=\"CR33\" class=\"CitationRef\"\u003e33\u003c/span\u003e] and Pagano et al. [\u003cspan citationid=\"CR38\" class=\"CitationRef\"\u003e38\u003c/span\u003e] further argue that fairness must be treated as a systemic property supported by regulatory and institutional oversight. Thus, genuine algorithmic fairness lies not merely in achieving statistical parity but in developing AI systems capable of recognizing and correcting structural inequities.\u003c/p\u003e"},{"header":"Conclusion","content":"\u003cp\u003eThis scoping review synthesized twenty-two empirical and conceptual studies exploring the manifestations, mechanisms, and mitigation measures of racial bias in clinical AI. The findings demonstrate that algorithmic inequities are structural rather than incidental, arising from the combination of biased data representation, proxy variables, and skewed model validation patterns. Across domains including population health, imaging, psychiatry, and oncology, AI models showed calibration drift and performance differences that disadvantage minority cohorts. Although methodological improvements have been made, fairness interventions remain predominantly reactive and statistically limited. Emerging frameworks including BE-FAIR, GUIDE, and AEquity represent a prospective paradigm shift, integrating equity into the design and governance of clinical AI rather than treating it as a post-hoc consideration.\u003c/p\u003e \u003cp\u003eA major limitation of this review is its focus on English-language literature (peer-reviewed and selected high-quality preprints), potentially missing non-English, unpublished, or locally disseminated work that may reflect global perspectives on algorithmic fairness. Future research should prioritize development of large-scale, racially diverse benchmarking data to enhance generalizability and transparency. Additionally, fairness evaluation should extend beyond limited model evaluations to include ongoing real-world assessments as part of healthcare governance systems. To promote equity in clinical AI, technical rigor alone will not suffice; ethical responsibility and institutional commitment to social justice are equally essential.\u003c/p\u003e"},{"header":"Declarations","content":"\u003cp\u003e \u003ch2\u003eCompeting interests:\u003c/h2\u003e \u003cp\u003eThe author declares no competing interests.\u003c/p\u003e \u003c/p\u003e \u003cp\u003e \u003cstrong\u003eEthics approval:\u003c/strong\u003e \u003cp\u003eNot applicable. This study synthesizes published literature and did not involve human participants, animals, or identifiable personal data.\u003c/p\u003e \u003c/p\u003e \u003cp\u003e \u003cstrong\u003eConsent to participate:\u003c/strong\u003e \u003cp\u003eNot applicable.\u003c/p\u003e \u003c/p\u003e \u003cp\u003e \u003cstrong\u003eConsent for publication:\u003c/strong\u003e \u003cp\u003eNot applicable.\u003c/p\u003e \u003c/p\u003e\u003ch2\u003eFunding:\u003c/h2\u003e \u003cp\u003eNo funding was received to assist with the preparation of this manuscript.\u003c/p\u003e\u003ch2\u003eAuthor Contribution\u003c/h2\u003e\u003cp\u003eSingle author\u0026mdash;conceptualization, literature search, screening, data extraction, synthesis, and manuscript drafting and revision.\u003c/p\u003e\u003ch2\u003eData availability:\u003c/h2\u003e \u003cp\u003eNo new data were generated or analyzed in this study. All information is derived from the cited literature.\u003c/p\u003e\u003ch2\u003eCode availability:\u003c/h2\u003e \u003cp\u003eNot applicable.\u003c/p\u003e \u003cp\u003eUse of AI tools: A large language model was used to assist with language editing and formatting. All substantive content, interpretations, and decisions were generated and verified by the author.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\u003cli\u003e\u003cspan\u003eEl Arab, R.A., Almoosa, Z., Alkhunaizi, M., Abuadas, F.H., Somerville, J.: Artificial intelligence in hospital infection prevention: an integrative review. Front. Public. Health. \u003cb\u003e13\u003c/b\u003e, 1547450 (2025). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.3389/fpubh.2025.1547450\u003c/span\u003e\u003cspan address=\"10.3389/fpubh.2025.1547450\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eGuha, A., Shah, V., Nahle, T., et al.: Artificial intelligence applications in cardio-oncology: a comprehensive review. Curr. Cardiol. Rep. \u003cb\u003e27\u003c/b\u003e(1), 56 (2025). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1007/s11886-025-02215-w\u003c/span\u003e\u003cspan address=\"10.1007/s11886-025-02215-w\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003ePăcuraru, I.-M., Chirvase, C.-S., Tiriteu, Ș.-I.: The role of artificial intelligence in personalised medicine: advancements, challenges, and future perspectives. Bus. Excell Manag. \u003cb\u003e15\u003c/b\u003e(1), 59\u0026ndash;84 (2025). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.24818/beman/2025.15.1-05\u003c/span\u003e\u003cspan address=\"10.24818/beman/2025.15.1-05\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eRatwani, R.M., Sutton, K., Galarraga, J.E.: Addressing AI algorithmic bias in health care. JAMA. \u003cb\u003e332\u003c/b\u003e(13), 1051\u0026ndash;1052 (2024). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1001/jama.2024.14735\u003c/span\u003e\u003cspan address=\"10.1001/jama.2024.14735\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eVelichkovska, B., Gjoreski, H., Denkovski, D., et al.: Bias in vital signs? Machine learning models can learn patients' race or ethnicity from the values of vital signs alone. BMJ Health Care Inf. \u003cb\u003e32\u003c/b\u003e(1), e101098 (2025). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1136/bmjhci-2024-101098\u003c/span\u003e\u003cspan address=\"10.1136/bmjhci-2024-101098\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHasanzadeh, F., Josephson, C.B., Waters, G., Adedinsewo, D., Azizi, Z., White, J.A.: Bias recognition and mitigation strategies in artificial intelligence healthcare applications. NPJ Digit. Med. \u003cb\u003e8\u003c/b\u003e(1), 154 (2025). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1038/s41746-025-01503-7\u003c/span\u003e\u003cspan address=\"10.1038/s41746-025-01503-7\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eCary, M.P. Jr., Grady, S.D., McMillian-Bohler, J., et al.: Building competency in artificial intelligence and bias mitigation for nurse scientists and aligned health researchers. Nurs. Outlook. \u003cb\u003e73\u003c/b\u003e(3), 102395 (2025). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/j.outlook.2024.102395\u003c/span\u003e\u003cspan address=\"10.1016/j.outlook.2024.102395\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eGameiro, R.R., Woite, N.L., Sauer, C.M., et al.: The data artifacts glossary: a community-based repository for bias on health datasets. J. Biomed. Sci. \u003cb\u003e32\u003c/b\u003e(1), 14 (2025). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1186/s12929-024-01106-6\u003c/span\u003e\u003cspan address=\"10.1186/s12929-024-01106-6\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eFerryman, K., Cesare, N., Creary, M., Nsoesie, E.O.: Racism is an ethical issue for healthcare artificial intelligence. Cell. Rep. Med. \u003cb\u003e5\u003c/b\u003e(6) (2024). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/j.xcrm.2024.101617\u003c/span\u003e\u003cspan address=\"10.1016/j.xcrm.2024.101617\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLee, T., Puyol-Ant\u0026oacute;n, E., Ruijsink, B., Aitcheson, K., Shi, M., King, A.P.: An investigation into the impact of deep learning model choice on sex and race bias in cardiac MR segmentation. In: Workshop on Clinical Image-Based Procedures. Springer (2023). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1007/978-3-031-45249-9_21\u003c/span\u003e\u003cspan address=\"10.1007/978-3-031-45249-9_21\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eThompson, H.M., Sharma, B., Bhalla, S., et al.: Bias and fairness assessment of a natural language processing opioid misuse classifier: detection and mitigation of electronic health record data disadvantages across racial subgroups. J. Am. Med. Inf. Assoc. \u003cb\u003e28\u003c/b\u003e(11), 2393\u0026ndash;2403 (2021). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1093/jamia/ocab148\u003c/span\u003e\u003cspan address=\"10.1093/jamia/ocab148\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLiu, M., Ning, Y., Teixayavong, S., et al.: A scoping review and evidence gap analysis of clinical AI fairness. NPJ Digit. Med. \u003cb\u003e8\u003c/b\u003e(1), 360 (2025). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1038/s41746-025-01667-2\u003c/span\u003e\u003cspan address=\"10.1038/s41746-025-01667-2\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eCorrea, R., Shaan, M., Trivedi, H., et al.: A systematic review of 'fair' AI model development for image classification and prediction. J. Med. Biol. Eng. \u003cb\u003e42\u003c/b\u003e(6), 816\u0026ndash;827 (2022). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1007/s40846-022-00754-z\u003c/span\u003e\u003cspan address=\"10.1007/s40846-022-00754-z\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003ede Vieira, C., Barboza, J.R., Cajueiro, F., Kimura, D.: Towards fair AI: mitigating bias in credit decisions\u0026mdash;a systematic literature review. J. Risk Financ Manag. \u003cb\u003e18\u003c/b\u003e(5), 228 (2025). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.3390/jrfm18050228\u003c/span\u003e\u003cspan address=\"10.3390/jrfm18050228\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eFields, C.T., Black, C., Thind, J.K., et al.: Governance for anti-racist AI in healthcare: integrating racism-related stress in psychiatric algorithms for Black Americans. Front. Digit. Health. \u003cb\u003e7\u003c/b\u003e, 1492736 (2025). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.3389/fdgth.2025.1492736\u003c/span\u003e\u003cspan address=\"10.3389/fdgth.2025.1492736\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAbulibdeh, R., Celi, L.A., Sejdić, E.: The illusion of safety: a report to the FDA on AI healthcare product approvals. PLOS Digit. Health. \u003cb\u003e4\u003c/b\u003e(6) (2025). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1371/journal.pdig.0000866\u003c/span\u003e\u003cspan address=\"10.1371/journal.pdig.0000866\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e e0000866\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003ePage, M.J., McKenzie, J.E., Bossuyt, P.M., et al.: The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. BMJ. \u003cb\u003e372\u003c/b\u003e, n71 (2021). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1136/bmj.n71\u003c/span\u003e\u003cspan address=\"10.1136/bmj.n71\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAllen, A., Mataraso, S., Siefkas, A., et al.: A racially unbiased, machine learning approach to prediction of mortality: algorithm development study. JMIR Public. Health Surveill. \u003cb\u003e6\u003c/b\u003e(4), e22400 (2020). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.2196/22400\u003c/span\u003e\u003cspan address=\"10.2196/22400\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eGupta, R., Sasaki, M., Taylor, S.L., et al.: Developing and applying the BE-FAIR equity framework to a population health predictive model: a retrospective observational cohort study. J. Gen. Intern. Med. 1\u0026ndash;11 (2025). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1007/s11606-025-09462-1\u003c/span\u003e\u003cspan address=\"10.1007/s11606-025-09462-1\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWang, H., Landers, M., Adams, R., et al.: A bias evaluation checklist for predictive models and its pilot application for 30-day hospital readmission models. J. Am. Med. Inf. Assoc. \u003cb\u003e29\u003c/b\u003e(8), 1323\u0026ndash;1333 (2022). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1093/jamia/ocac065\u003c/span\u003e\u003cspan address=\"10.1093/jamia/ocac065\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eCronj\u0026eacute;, H.T., Katsiferis, A., Elsenburg, L.K., et al.: Assessing racial bias in type 2 diabetes risk prediction algorithms. PLOS Glob Public. Health. \u003cb\u003e3\u003c/b\u003e(5), e0001556 (2023). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1371/journal.pgph.0001556\u003c/span\u003e\u003cspan address=\"10.1371/journal.pgph.0001556\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eVelichkovska, B., Gjoreski, H., Denkovski, D., et al.: Vital signs as a source of racial bias. medRxiv (2022). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1101/2022.02.03.22270291\u003c/span\u003e\u003cspan address=\"10.1101/2022.02.03.22270291\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eVelichkovska, B., Gjoreski, H., Denkovski, D., et al.: AI learns racial information from the values of vital signs. medRxiv (2023). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1101/2023.12.11.23299819\u003c/span\u003e\u003cspan address=\"10.1101/2023.12.11.23299819\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKhor, S., Haupt, E.C., Hahn, E.E., et al.: Racial and ethnic bias in risk prediction models for colorectal cancer recurrence when race and ethnicity are omitted as predictors. JAMA Netw. Open. \u003cb\u003e6\u003c/b\u003e(6), e2318495 (2023). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1001/jamanetworkopen.2023.18495\u003c/span\u003e\u003cspan address=\"10.1001/jamanetworkopen.2023.18495\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003ePfob, A., Heil, J.: Artificial intelligence to de-escalate loco-regional breast cancer treatment. Breast. \u003cb\u003e68\u003c/b\u003e, 201\u0026ndash;204 (2023). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/j.breast.2023.09.009\u003c/span\u003e\u003cspan address=\"10.1016/j.breast.2023.09.009\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBouguettaya, A., Stuart, E.M., Aboujaoude, E.: Racial bias in AI-mediated psychiatric diagnosis and treatment: a qualitative comparison of four large language models. NPJ Digit. Med. \u003cb\u003e8\u003c/b\u003e(1), 332 (2025). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1038/s41746-025-01512-5\u003c/span\u003e\u003cspan address=\"10.1038/s41746-025-01512-5\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eGulamali, F., Sawant, A.S., Liharska, L., et al.: Detecting, characterizing, and mitigating implicit and explicit racial biases in health care datasets with subgroup learnability: algorithm development and validation study. J. Med. Internet Res. \u003cb\u003e27\u003c/b\u003e, e71757 (2025). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.2196/71757\u003c/span\u003e\u003cspan address=\"10.2196/71757\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eChang, T., Nuppnau, M., He, Y., et al.: Racial differences in laboratory testing as a potential mechanism for bias in AI: a matched cohort analysis in emergency department visits. PLOS Glob Public. Health. \u003cb\u003e4\u003c/b\u003e(10), e0003555 (2024). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1371/journal.pgph.0003555\u003c/span\u003e\u003cspan address=\"10.1371/journal.pgph.0003555\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMikhaeil, J.M., Gelman, A., Greengard, P.: Hierarchical Bayesian models to mitigate systematic disparities in prediction with proxy outcomes. J. R Stat. Soc. Ser. Stat. Soc. (2024). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1093/jrsssa/qnae142\u003c/span\u003e\u003cspan address=\"10.1093/jrsssa/qnae142\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e qnae142\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLadin, K., Cuddeback, J., Duru, O.K., et al.: Guidance for unbiased predictive information for healthcare decision-making and equity (GUIDE): considerations when race may be a prognostic factor. NPJ Digit. Med. \u003cb\u003e7\u003c/b\u003e(1), 290 (2024). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1038/s41746-024-01245-y\u003c/span\u003e\u003cspan address=\"10.1038/s41746-024-01245-y\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSjoding, M.W., Valley, T.S.: Pulse oximetry and inequitable consequences of health policy. Am. J. Respir Crit. Care Med. \u003cb\u003e207\u003c/b\u003e(1), 5\u0026ndash;6 (2023). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1164/rccm.202209-1692ED\u003c/span\u003e\u003cspan address=\"10.1164/rccm.202209-1692ED\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eCerrato, P.L., Halamka, J.D.: How AI drives innovation in cardiovascular medicine. Front. Cardiovasc. Med. \u003cb\u003e11\u003c/b\u003e, 1397921 (2024). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.3389/fcvm.2024.1397921\u003c/span\u003e\u003cspan address=\"10.3389/fcvm.2024.1397921\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eChen, R.J., Wang, J.J., Williamson, D.F., et al.: Algorithmic fairness in artificial intelligence for medicine and healthcare. Nat. Biomed. Eng. \u003cb\u003e7\u003c/b\u003e(6), 719\u0026ndash;742 (2023). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1038/s41551-023-01056-8\u003c/span\u003e\u003cspan address=\"10.1038/s41551-023-01056-8\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHuang, J., Galal, G., Etemadi, M., Vaidyanathan, M.: Evaluation and mitigation of racial bias in clinical machine learning models: scoping review. JMIR Med. Inf. \u003cb\u003e10\u003c/b\u003e(5), e36388 (2022). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.2196/36388\u003c/span\u003e\u003cspan address=\"10.2196/36388\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eRadingwana, T.T., Afolabi, O.A., Adeleke, O.O.: Multi-domain AI fairness in healthcare: a systematic review synthesis. Front. Digit. Health. \u003cb\u003e7\u003c/b\u003e, 1456789 (2025)\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eXu, J., Xiao, Y., Wang, W.H., et al.: Algorithmic fairness in computational medicine. EBioMedicine. \u003cb\u003e84\u003c/b\u003e, 104250 (2022). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/j.ebiom.2022.104250\u003c/span\u003e\u003cspan address=\"10.1016/j.ebiom.2022.104250\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eChinta, S.V., Wang, Z., Palikhe, A., et al.: AI-driven healthcare: a review on ensuring fairness and mitigating bias. arXiv preprint arXiv. (2024). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.48550/arXiv.2407.19655\u003c/span\u003e\u003cspan address=\"10.48550/arXiv.2407.19655\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e :2407.19655\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWells, G.A., Shea, B., O'Connell, D., Peterson, J., Welch, V., Losos, M., Tugwell, P.: Jan. The Newcastle-Ottawa Scale (NOS) for assessing the quality of nonrandomised studies in meta-analyses. Ottawa Hospital Research Institute. Accessed 18 (2026). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttp://www.ohri.ca/programs/clinical_epidemiology/oxford.asp\u003c/span\u003e\u003cspan address=\"http://www.ohri.ca/programs/clinical_epidemiology/oxford.asp\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eCritical Appraisal Skills Programme (CASP). CASP Qualitative Checklist. CASP UK. Accessed 18: (2026). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://casp-uk.net/casp-tools-checklists/\u003c/span\u003e\u003cspan address=\"https://casp-uk.net/casp-tools-checklists/\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":true,"hideJournal":true,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":true,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"racial bias, clinical artificial intelligence, healthcare equity, algorithmic fairness, data imbalance, proxy variables","lastPublishedDoi":"10.21203/rs.3.rs-8642098/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-8642098/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eThis scoping review synthesizes evidence on how racial bias arises in clinical artificial intelligence (AI) systems and how it can be mitigated through technical, governance, and policy approaches. We conducted a scoping review of clinical AI/ML studies and relevant conceptual frameworks, with searches limited to English-language sources published between September 2020 and November 2025. Study selection was documented using a PRISMA 2020 flow diagram. Eligible studies examined racial or demographic bias mechanisms, fairness evaluation, or mitigation strategies in real-world clinical contexts. Across 22 included studies, recurring pathways to inequity included underrepresentation and label noise in training data, proxy variables that encode structural disadvantage, differences in access and measurement that distort outcomes, and limited external validation in diverse settings. Mitigation strategies clustered into (1) data and evaluation improvements (e.g., subgroup reporting, calibration, and cross-site validation), (2) model and optimization approaches (e.g., reweighting and fairness-aware objectives), and (3) governance levers (e.g., documentation, equity impact assessments, and monitoring requirements). We translate these findings into a practical framework linking bias mechanisms to mitigation actions and implementation levers, with an emphasis on feasible steps for health systems and policymakers to reduce avoidable inequities during AI deployment.\u003c/p\u003e","manuscriptTitle":"A Scoping Review of Racial Bias Mechanisms and Mitigation Frameworks in Clinical Artificial Intelligence","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2026-01-21 09:56:34","doi":"10.21203/rs.3.rs-8642098/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"8e5706a6-a1ea-4297-8e34-e431c2dce52f","owner":[],"postedDate":"January 21st, 2026","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[],"tags":[],"updatedAt":"2026-04-07T01:55:32+00:00","versionOfRecord":[],"versionCreatedAt":"2026-01-21 09:56:34","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-8642098","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-8642098","identity":"rs-8642098","version":["v1"]},"buildId":"XKTyCvWXoU3ODBz1xrDgd","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.