Establishing Reliability and Construct Validity of High-Fidelity Simulation-Based Assessments for Procedural Skills in Nursing Education Using the Kane Validity Framework | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article Establishing Reliability and Construct Validity of High-Fidelity Simulation-Based Assessments for Procedural Skills in Nursing Education Using the Kane Validity Framework Simon Ntumi This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-6636009/v1 This work is licensed under a CC BY 4.0 License Status: Published Journal Publication published 18 Aug, 2025 Read the published version in BMC Medical Education → Version 1 posted 10 You are reading this latest preprint version Abstract The integration of simulation-based assessments (SBAs) in nursing education has gained significant attention globally, offering a promising approach to evaluate procedural skills and clinical competence in a controlled environment. In Ghana, however, the adoption and validation of SBAs in nursing education remain limited, with traditional assessment methods like Objective Structured Clinical Examinations (OSCEs) being predominantly used. The study investigated the construct validity of high-fidelity simulation-based assessments for evaluating procedural skills in nursing students in Ghana. Employing a quantitative, cross-sectional correlational research design, the study aims to assess the relationships between simulation performance and established clinical competence indicators, including OSCE scores and clinical practicum evaluations. Data were collected from 150 final-year nursing students across three public nursing institutions in southern and middle Ghana. The study found strong inter-rater reliability (ICC = 0.77–0.84), good internal consistency (Cronbach’s alpha = 0.83), and moderate positive correlations between simulation and OSCE scores (r = 0.45, p < 0.01), suggesting that SBAs are a valid measure of procedural competence. Furthermore, regression analyses revealed that simulation scores explained 42% of the variance in clinical practicum performance (R² = 0.42, p < 0.01), supporting their predictive validity. The study also demonstrated substantial agreement between simulation-based assessment decisions and OSCE outcomes (Cohen’s k = 0.67, p < 0.01). These findings contribute valuable evidence for the continued integration of simulation-based assessments in nursing education, offering insights into their reliability, validity, and educational impact in Ghana’s context. The results underscore the potential of SBAs to serve as a credible tool for evaluating clinical readiness in nursing students, aligning with regulatory standards and enhancing the accuracy of competence assessments. simulation-based assessments nursing education procedural skills construct validity clinical competence Ghana Introduction In the 21st century, health professions education has undergone a significant transformation, shifting from traditional, time-based training models to competency-based education (CBE). This approach emphasizes measurable learning outcomes and performance-based assessments, ensuring that healthcare professionals possess the necessary skills and knowledge to provide safe and effective patient care (Frank, 2010; ten Cate, 2017). Among the most impactful innovations supporting CBE is high-fidelity simulation (HFS), a pedagogical method that employs advanced manikins and interactive scenarios to mimic real-life clinical environments. HFS provides nursing students with experiential learning opportunities in safe, controlled settings, where they can practice and refine procedural skills without risk to patients (Cant & Cooper, 2017; Kim, 2022). Globally, HFS has been widely adopted as a gold standard for teaching and assessing clinical competencies in nursing education. Meta-analyses have consistently shown that simulation-based education improves knowledge acquisition, critical thinking, clinical decision-making, and psychomotor skills (La Cerra, 2019; Shin et al., 2015). These benefits are particularly evident in the teaching and assessment of procedural skills such as catheter insertion, medication administration, and basic life support where hands-on practice is essential for proficiency. However, despite growing global reliance on simulation-based assessments (SBAs), concerns about their validity and reliability persist. Of particular concern is construct validity, which refers to the degree to which these assessments measure the intended clinical skills or competencies (Cook & Lineberry, 2022). Without rigorous validation, particularly in varied cultural and institutional contexts, it is difficult to determine whether assessment scores reflect true competence. Scholars have increasingly advocated for the use of comprehensive validation models, such as the Kane Validity Framework, which organizes the validation process into four inferential stages: scoring, generalization, extrapolation, and implication (Kane, 2006; Downing, 2003). Although multiple studies in high-income countries have applied Kane’s model to validate SBAs, there is a dearth of such research in low- and middle-income countries (LMICs), where contextual factors may significantly influence assessment design, implementation, and interpretation. In Sub-Saharan Africa, simulation-based education is gaining momentum as a response to multiple structural challenges, including overcrowded clinical sites, limited faculty supervision, and high patient-to-student ratios (Ajani & Moez, 2021; Natarajan, 2019). Countries like South Africa, Kenya, and Nigeria have made notable strides in adopting simulation-based learning, often supported by international collaborations or donor funding (Okrainec et al., 2010). However, such programs are often unsustainable, unstandardized, and disconnected from national nursing education frameworks. Moreover, few studies in Africa have empirically examined the validity of simulation-based assessments, especially for procedural competencies that directly affect patient outcomes and healthcare quality. In Ghana, the integration of simulation into nursing education is a relatively recent development, introduced primarily as a response to growing challenges within the clinical training environment. These challenges include increased student intake in nursing and midwifery programmes, limited clinical placement opportunities in hospitals and community health settings, and inadequate clinical supervision due to staff shortages. In recognition of these systemic constraints, the Nursing and Midwifery Council (NMC) of Ghana has formally acknowledged the transformative potential of simulation in enhancing clinical preparedness and bridging the theory–practice gap in pre-service nursing education (NMC Ghana, 2020). As a result, several public nursing and midwifery training institutions particularly those affiliated with larger teaching hospitals have begun to establish simulation laboratories equipped with low- to mid-fidelity manikins, procedural task trainers, and basic audiovisual technology. These simulation centres are typically used to support skill acquisition in areas such as basic life support, wound dressing, intravenous line insertion, and antenatal care procedures. While this progress represents a significant step toward modernizing nursing education in Ghana, the implementation of simulation remains fragmented and lacks a coordinated national strategy or standardized curriculum framework. There are currently no uniform guidelines on the duration, frequency, or assessment criteria for simulation-based learning, and substantial variability exists across institutions in terms of infrastructure, faculty training, and pedagogical integration. Of particular concern is the absence of a systematic protocol for validating simulation-based assessments (SBAs). Without robust, contextually grounded validation efforts, it is difficult to ascertain whether the outcomes of simulation sessions particularly those used for summative or high-stakes evaluations accurately reflect students’ real-world clinical competencies. This lack of standardization and evidence-based validation raises critical concerns about the fairness, credibility, and interpretability of simulation assessment scores. It also has implications for high-stakes decisions such as licensure, certification, and graduation, which may depend heavily on students' performance in simulated clinical scenarios. In the absence of empirical validation frameworks such as those grounded in the Kane Validity Framework educational stakeholders, including regulatory bodies and employers, may be unable to make defensible judgments about a graduate’s readiness for autonomous clinical practice. The increasing integration of high-fidelity simulations (HFS) into nursing education globally marks a significant pedagogical advancement aimed at improving clinical competence, decision-making, and patient safety. In both high- and low-resource settings, simulation offers an alternative or supplement to traditional clinical placements, which are often constrained by logistical, ethical, or safety considerations (Lateef, 2010; Cant & Cooper, 2017). In Ghana, the adoption of simulation is gaining momentum in response to expanding student enrolments and declining access to adequately supervised clinical practice environments (NMC Ghana, 2020). However, while the instructional benefits of simulation have been widely promoted, considerably less attention has been paid to the validity of simulation-based assessments (SBAs), particularly those used to make high-stakes decisions regarding student progression, graduation, and licensure. One of the most critical but underexplored aspects of SBA is construct validity the extent to which the assessment accurately measures the intended clinical or procedural skill (Messick, 1995; Cook et al., 2015). Without compelling evidence of construct validity, assessment outcomes may be misleading, potentially leading to either the premature advancement of underprepared students or the unfair penalization of competent candidates. In low- and middle-income countries (LMICs) like Ghana, where simulation is often implemented with limited resources, undertrained faculty, and variable institutional infrastructure, the lack of rigorous validation protocols exacerbates the risk of compromised assessment quality and equity (Okereke, 2021; Osei-Akoto et al., 2022). Furthermore, a critical review of the existing literature in the Ghanaian and broader Sub-Saharan African context reveals that most studies on simulation focus predominantly on learner satisfaction, knowledge acquisition, or perceived realism (Ofori, 2021; Badu-Nyarko et al., 2023). While such findings provide valuable insights into the acceptability and feasibility of simulation in resource-constrained settings, they fall short of addressing the psychometric robustness of the tools used to assess procedural competencies. Few, if any, studies have undertaken systematic, theory-guided investigations into whether simulation scores can be generalized, extrapolated to real clinical settings, or used to support consequential educational decisions. The Kane Validity Framework (Kane, 2006; Kane, 2023) provides a comprehensive and widely accepted model for validating complex assessments, particularly in performance-based disciplines such as medicine and nursing. It structures the validation process around four key inferences scoring, generalization, extrapolation, and implications each of which must be supported by empirical evidence. To date, however, no published study in Ghana has applied this framework to examine the validity of high-fidelity simulation assessments used for evaluating procedural skills in nursing students. This gap in the literature and practice is both theoretically and practically significant. It limits the ability of regulators, educators, and employers to make defensible decisions based on simulation performance, particularly in high-stakes contexts. It also hampers efforts to align simulation-based assessment with international quality assurance standards. Therefore, this study seeks to address this critical void by establishing the construct validity of high-fidelity simulation-based assessments for procedural skills in Ghanaian nursing education using the Kane Validity Framework. By generating localized, contextually relevant validity evidence, the study aims to enhance the quality and credibility of simulation-based assessment in Ghana and contribute to the global discourse on simulation validity in LMICs. Research Questions 1. To what extent do the scoring procedures of high-fidelity simulation-based assessments in Ghanaian nursing education demonstrate reliability and consistency across evaluators and scenarios? 2. How well do simulation-based assessment scores generalize across different procedural tasks and student cohorts in Ghanaian nursing training institutions? 3. To what extent do scores from high-fidelity simulation-based assessments predict actual clinical performance or competence in real-world nursing practice settings? Methodology Research Design This study adopted a quantitative, cross-sectional correlational research design to rigorously investigate the construct validity of high-fidelity simulation-based assessments (SBAs) used to evaluate nursing students’ procedural skills in Ghana. The choice of a quantitative design was informed by the study’s objective to generate statistically analyzable evidence on the relationships between simulation performance and established indicators of clinical competence, such as Objective Structured Clinical Examination (OSCE) scores and clinical practicum evaluations. The cross-sectional nature of the design enabled the researcher to collect data at a single point in time across multiple institutions, providing a snapshot of the validity evidence within the academic year under review. This design was particularly well-suited for examining correlations between different assessment modalities that purportedly measure the same construct namely, procedural competence. The use of a correlational approach allowed for the testing of predictive and concurrent relationships between scores obtained from simulation-based assessments and those from other validated performance measures. This was in line with previous simulation validity studies that employed similar quantitative methods to assess construct validity in both high-income and low- and middle-income country (LMIC) contexts (Cook, 2015; Alinier, 2007). Furthermore, the study was conceptually anchored in the Kane Validity Framework (Kane, 2023), which guided the gathering and interpretation of quantitative evidence to support four inferential stages: scoring, generalization, extrapolation, and implications. This framework provided a structured lens for evaluating how simulation scores are generated, how generalizable they are across contexts, how well they predict real-world performance, and whether they support meaningful decisions (e.g., passing or licensing). This theoretically grounded design ensured that the study not only established statistical correlations but also interrogated the legitimacy of inferences made from simulation scores. Study Setting The research was conducted across three public nursing training institutions located in southern and middle Ghana, selected using purposive sampling. These institutions were chosen based on their adoption of high-fidelity simulation (HFS) technology as a regular component of clinical skills instruction and assessment. All three institutions were accredited by the Nursing and Midwifery Council (NMC) of Ghana and affiliated with public universities offering diploma and degree programs in general nursing. These institutions had, over the past five years, invested in simulation laboratories equipped with mannequins such as the Laerdal SimMan 3G and Gaumard’s HAL S3201, along with video recording capabilities for debriefing sessions. They had also begun integrating SBAs into their summative clinical assessments, although with varying levels of formality and standardization. Simulation exercises at these sites typically included procedural skills such as intravenous therapy, catheterization, wound dressing, and cardiopulmonary resuscitation, among others. By selecting institutions from both the middle and southern zones of Ghana, the study captured a range of simulation practices that reflect the emerging diversity in simulation implementation within the country. All participating institutions operated under the standardized guidelines of the NMC Ghana, which emphasizes the demonstration of clinical competence as a prerequisite for both graduation and registration/licensure (NMC Ghana, 2020). This regulatory environment further justified the relevance of examining whether SBAs yield valid and defensible evidence of students' clinical readiness. Population and Sample The target population for this study comprised final-year nursing students enrolled in either diploma or bachelor’s degree programs in the selected institutions. These students had completed simulation-based assessments in procedural skills as part of their clinical education and had also undergone OSCEs and clinical practicum evaluations. Final-year students were selected because they were closest to professional licensure and had been exposed to the full spectrum of assessment modalities, making them ideal candidates for investigating the construct validity of simulation scores. A sample of 150 students was drawn using stratified random sampling to ensure equitable representation from each of the three institutions. The strata were defined by institutional affiliation, and students were randomly selected within each stratum. Stratification was necessary to avoid sampling bias and to ensure that institutional variations in simulation exposure and assessment practices were proportionally reflected in the sample. The sample size was calculated using G*Power 3.1 software for correlation and multiple regression analyses. Based on a conventional alpha level of 0.05, a desired statistical power of 0.80, and a medium effect size (r = 0.30), the minimum required sample size was estimated at 138 (Cohen, 1988). The final sample of 150 participants accounted for potential attrition and incomplete data. Eligibility criteria included students who had: • Participated in at least one high-fidelity simulation scenario for procedural assessment within the academic year. • Completed an OSCE administered by their institution. • Received clinical practicum evaluations from their preceptors during official hospital placements. • Provided informed consent for the use of their anonymized assessment data for research purposes. Participants were excluded if they had missed any of the three assessments (SBA, OSCE, or practicum), or if their simulation scores were unavailable due to technical malfunctions or missing rater data. This sample provided a robust basis for conducting inferential statistical analyses required to evaluate the strength and plausibility of the inferences laid out in the Kane Framework. It also ensured that the study had sufficient statistical power to detect meaningful relationships among the variables of interest. Instrumentation This study utilized three primary instruments to collect quantitative data on students' clinical competence as measured through simulation-based assessments, OSCEs, and clinical practicum evaluations. Each instrument was selected for its alignment with the construct of procedural competence and its relevance to the inferential stages of the Kane Validity Framework. Simulation-Based Assessment Rubrics High-fidelity simulation performance was evaluated using a structured procedural skills rubric adapted from the widely validated Creighton Competency Evaluation Instrument (CCEI). The adapted version was customized to reflect the Ghanaian nursing education context while preserving the psychometric integrity of the original instrument. The rubric included both global and task-specific performance indicators across common nursing procedures such as intravenous (IV) cannulation, wound dressing, urinary catheter insertion, and vital signs monitoring. Each item was scored on a 5-point Likert scale assessing dimensions such as accuracy, efficiency, adherence to protocol, and patient communication. The tool allowed for objective rating by trained assessors and had been previously demonstrated to exhibit good reliability and construct validity (Todd et al., 2018). The inclusion of multiple domains in the rubric facilitated a comprehensive assessment of procedural competence, in alignment with the scoring and generalization inferences of the Kane Framework. Objective Structured Clinical Examination (OSCE) Scores Students’ OSCE scores were collected from institutional records for the same semester in which the simulation-based assessments were conducted. The OSCEs served as a benchmark for clinical skills assessment and were administered under controlled conditions by trained faculty examiners using standardized checklists. Each station in the OSCE assessed a specific clinical skill, often under timed conditions, and was evaluated based on national clinical competency standards as outlined by the Nursing and Midwifery Council (NMC) of Ghana. OSCE scores provided a valuable external measure against which the simulation scores could be compared, contributing to the generalization and extrapolation components of the validity argument. Clinical Practicum Evaluation Scores The third instrument comprised the end-of-placement evaluation reports completed by clinical supervisors during students’ final clinical postings. These structured evaluations were based on standardized forms developed by the NMC and assessed key performance areas such as procedural accuracy, patient safety, infection control, critical thinking, and adherence to professional standards. Ratings were typically provided using a numerical scale and included both observational data and summative judgments from preceptors. These evaluation scores were included to support extrapolation inferences, as they reflected real-world clinical competence across various practice settings. Their inclusion allowed the study to assess whether high scores in simulation correlated with competent performance in authentic clinical environments. All scores from the three instruments were numerically coded to enable quantitative analysis. Data from different institutions were standardized in format to ensure comparability. Calibration workshops were conducted prior to data collection to harmonize scoring approaches and rubric interpretations across assessors and institutions. Data Collection Procedure The data collection process followed a systematic, ethically guided protocol across the three participating nursing training institutions. Prior to data collection, formal approval was obtained from the institutional heads of each school, who facilitated access to student assessment records and coordinated with internal research liaisons. Eligible students were approached, and written informed consent was obtained after explaining the purpose of the study, the voluntary nature of participation, and measures taken to ensure confidentiality. Data were collected retrospectively from academic records and anonymized by assigning unique identifiers to each participant. The simulation-based assessment scores had been rated by two independent faculty assessors at the time of assessment. Copies of the OSCE scores and clinical practicum evaluations were retrieved from the schools' examination and clinical coordination units. To assess the scoring reliability of simulation-based assessments, the researcher reviewed the scores assigned by both raters for each student and computed inter-rater reliability using the intra-class correlation coefficient. The data collection period lasted approximately six weeks and was closely monitored to ensure consistency in data extraction protocols and to resolve any discrepancies in scoring formats or documentation across the institutions. All collected data were securely stored in password-protected digital files and locked cabinets where physical documents were involved. Data Analysis Quantitative data were entered into IBM SPSS Statistics Version 26 for analysis. A range of descriptive and inferential statistical techniques were employed, organized around the four key inferences of the Kane Validity Framework. Scoring Inference To assess the consistency and reliability of simulation scores, inter-rater reliability was computed using the Intra-Class Correlation Coefficient (ICC). High ICC values (≥ 0.75) were interpreted as indicative of good agreement between independent raters. Additionally, the internal consistency of the adapted CCEI rubric was assessed using Cronbach’s alpha, with values above 0.80 considered acceptable for high-stakes assessments. Generalization Inference To evaluate the extent to which simulation scores generalized to other performance-based assessments, Pearson’s correlation coefficients were calculated between students’ simulation scores and OSCE scores. Moderate to strong positive correlations (r ≥ 0.30) were expected if both assessments measured overlapping domains of procedural competence. Extrapolation Inference Multiple linear regression analysis was conducted to examine the predictive validity of simulation scores in explaining variance in clinical practicum evaluation scores. Independent variables included simulation scores and OSCE scores, while the dependent variable was the practicum evaluation rating. Where preliminary analysis suggested significant institutional variation in scores, hierarchical linear modeling (HLM) was considered to account for the nested structure of the data (students nested within institutions). Implication Inference To assess the utility of simulation scores for decision-making, descriptive statistics were used to explore the distribution of scores across students. In addition, decision consistency analysis was conducted to determine the alignment between simulation pass/fail decisions and those from OSCEs and clinical practicum outcomes. Cut-score analysis was performed to evaluate whether the simulation thresholds used for passing aligned with real-world competence as evidenced in other assessments. All inferential analyses were conducted at a significance level of p < 0.05, and effect sizes (e.g., Cohen’s d, R²) were reported to indicate the practical significance of the findings. Validity and Reliability of Instruments Prior to full implementation, all three instruments underwent rigorous content validation and pilot testing. The simulation rubric and clinical evaluation tools were subjected to expert review by panels comprising senior nursing faculty, simulation specialists, and curriculum experts from the participating institutions. The review focused on alignment with Ghanaian clinical training standards, clarity of items, and coverage of essential competencies. Following expert review, a pilot study was conducted with 15 final-year nursing students from institutions not included in the main study. Feedback from this pilot informed minor revisions to improve item clarity and scoring guidance. The Content Validity Index (CVI) was computed based on expert ratings of relevance and clarity, with all subscales achieving CVI values above 0.80, indicating strong content validity. During full-scale data analysis, construct validity was further assessed using exploratory factor analysis (EFA) to examine the underlying structure of the simulation and practicum assessment tools. Factor loadings and internal consistency metrics supported the unidimensionality of key domains, lending support to the validity of using these instruments for evaluating procedural competence. Ethical Considerations This study adhered to rigorous ethical standards throughout the research process. Ethical clearance was obtained from the University of Education, Winneba Institutional Review Board (IRB), and additional permission was sought from the administrative heads of the participating institutions. All ethical protocols were strictly followed to ensure the protection of participants’ rights and data. Participation was entirely voluntary, and informed consent was obtained from all student participants after a full explanation of the study’s objectives, data usage, and confidentiality measures. Students were informed of their right to withdraw from the study at any point without penalty. To maintain confidentiality, all data were anonymized using coded identifiers. Personally identifiable information was removed from all records prior to analysis. Hard copy documents were stored in locked filing cabinets, and digital data were encrypted and stored on password-protected computers accessible only to the research team. All procedures were guided by principles of beneficence, autonomy, and justice, in accordance with standard ethical frameworks for human subjects research (Israel & Hay, 2006). These considerations ensured that the research met both institutional and international standards for ethical conduct. Results In Table 1 , the Shapiro-Wilk test is commonly used to assess whether a dataset is normally distributed. A p-value greater than 0.05 indicates that the data do not significantly deviate from a normal distribution. In the case of our study, all the variables (simulation scores, OSCE scores, and clinical practicum scores) showed no significant deviation from normality, with p-values greater than 0.05 (p > 0.05). The Kolmogorov-Smirnov test compares the sample distribution to a normal distribution. Similar to the Shapiro-Wilk test, a p-value greater than 0.05 suggests no significant deviation from normality. Here, the Kolmogorov-Smirnov test results indicated that all variables followed a normal distribution, with p-values greater than 0.05. The Q-Q plot visually assesses the normality of data by plotting the quantiles of the data against the quantiles of a normal distribution. In our study, the Q-Q plots for both simulation scores and OSCE scores showed that the data points closely followed the diagonal line, further confirming the normality of the data. Table 1 Normality Tests for Simulation-Based Assessment Scores and Predictive Variables Measure Value Statistical Interpretation Shapiro-Wilk Test (Simulation Scores) W = 0.96 (p = 0.13) No significant deviation from normality, p > 0.05, indicating normal distribution Shapiro-Wilk Test (OSCE Scores) W = 0.94 (p = 0.08) No significant deviation from normality, p > 0.05, indicating normal distribution Shapiro-Wilk Test (Clinical Practicum Scores) W = 0.98 (p = 0.29) No significant deviation from normality, p > 0.05, indicating normal distribution Shapiro-Wilk Test (Regression Residuals) W = 0.97 (p = 0.15) No significant deviation from normality in residuals, p > 0.05, indicating normal distribution Kolmogorov-Smirnov Test (Simulation Scores) D = 0.08 (p = 0.18) No significant deviation from normality, p > 0.05, suggesting normal distribution Kolmogorov-Smirnov Test (OSCE Scores) D = 0.07 (p = 0.22) No significant deviation from normality, p > 0.05, suggesting normal distribution Q-Q Plot (Simulation Scores) Visual Inspection Data points closely follow the diagonal line, indicating normality Q-Q Plot (OSCE Scores) Visual Inspection Data points closely follow the diagonal line, indicating normality The results presented in Table 2 indicate high levels of reliability for the simulation-based assessment scoring. The Inter-rater Reliability (ICC) ranged from 0.77 to 0.84, with a 95% confidence interval (CI) of [0.72, 0.89], and a p-value less than 0.001. This suggests strong consistency in scoring across different raters and evaluation scenarios, confirming the reliability of the assessment tool in diverse settings. Similarly, the Cronbach's Alpha of 0.83, with a 95% CI of [0.80, 0.86], demonstrates excellent internal consistency, indicating that the items within the assessment tool are measuring the same underlying construct. The Standard Error of Measurement (SEM) value of 1.23 provides an estimate of the precision of the scores, with lower values indicating more accurate measurements. The Mean Item-Total Correlation of 0.65 shows a strong positive relationship between individual item scores and total scores, further confirming the internal validity of the tool. Additionally, the Split-Half Reliability of 0.79, with a 95% CI of [0.74, 0.84], suggests moderate to high consistency across different halves of the assessment, while the Cohen’s Kappa of 0.72 (95% CI: [0.68, 0.76]) indicates substantial agreement between raters, which strengthens the validity of the scoring process. Lastly, the Intraclass Correlation for Task-Specific Criteria ranged from 0.80 to 0.88 (95% CI: [0.75, 0.92]), suggesting high reliability for assessing task-specific performance across different evaluators and scenarios. Table 2 Reliability of Simulation-Based Assessment Scoring Measure Value Statistical Interpretation Inter-rater Reliability (ICC) 0.77–0.84 95% CI: [0.72, 0.89], p < 0.001, High reliability between raters Cronbach’s Alpha (Internal Consistency) 0.83 95% CI: [0.80, 0.86], p < 0.001, Acceptable internal consistency Standard Error of Measurement (SEM) 1.23 SEM = √(1 - α) × SD, provides an estimate of the accuracy of the scores Mean Item-Total Correlation 0.65 Strong correlation between individual item scores and total scores Split-Half Reliability 0.79 95% CI: [0.74, 0.84], p < 0.001, Moderate to high consistency across halves Cohen’s Kappa (for Rater Agreement) 0.72 95% CI: [0.68, 0.76], p < 0.001, Substantial agreement between raters Intraclass Correlation for Task-Specific Criteria 0.80–0.88 95% CI: [0.75, 0.92], p < 0.001, High task-specific reliability • ICC values ≥ 0.75 indicate strong inter-rater agreement. • Cronbach’s alpha ≥ 0.70 reflects acceptable internal consistency. • SEM estimates score precision; lower values are better. • Cohen’s kappa ≥ 0.61 denotes substantial rater agreement. • Split-half and item-total correlations confirm internal consistency. In Table 3 , the statistical measures provide a comprehensive overview of the relationships and predictive power of simulation-based assessments. The Inter-rater Reliability (ICC) between 0.77 and 0.84 is reaffirmed, demonstrating strong agreement between raters. The Cronbach's Alpha value of 0.83 supports the internal consistency of the assessment tool, indicating that the rubric used for simulation scoring is well-designed and reliable. The Pearson’s Correlation between simulation and OSCE scores is 0.45 (p < 0.01), with a 95% CI of [0.30, 0.58], reflecting a moderate positive correlation and suggesting that simulation scores are meaningfully related to OSCE performance. The Cohen’s d of 0.56 (95% CI: [0.40, 0.72]) indicates a moderate effect size between simulation and OSCE scores, highlighting the educational impact of the simulation assessments. Table 3 Comprehensive Statistical Analysis of Simulation-Based Assessments Measure Value Statistical Interpretation 1. Inter-rater Reliability (ICC) 0.77–0.84 High ICC values (≥ 0.75) indicate good agreement between raters. 95% CI: [0.71, 0.89] confirms strong consistency across evaluators and scenarios. 2. Cronbach’s Alpha (Internal Consistency) 0.83 Cronbach's alpha of 0.83 indicates good internal consistency of the adapted CCEI rubric for high-stakes assessments. 95% CI: [0.80, 0.86] confirms the reliability. 3. Pearson’s Correlation (Simulation vs. OSCE) 0.45 (p < 0.01) Moderate Positive Correlation with a 95% CI: [0.30, 0.58], indicating a meaningful relationship between simulation and OSCE scores. Medium Effect Size (r = 0.45). 4. Cohen’s d (Simulation vs. OSCE Scores) 0.56 (Medium Effect Size) 95% CI: [0.40, 0.72] suggests a moderate difference between simulation and OSCE scores, supporting moderate educational impact. 5. Confidence Interval (CI) for Mean Difference (Simulation vs. OSCE Scores) [2.5, 5.5] p < 0.05 indicating a moderate difference between simulation and OSCE scores. The true mean difference likely lies between 2.5 and 5.5. 6. Multiple Regression (Simulation → Clinical Practicum) R² = 0.42 (p < 0.01) 95% CI: [0.32, 0.52], suggesting simulation scores explain 42% of the variance in clinical practicum performance. 7. Regression Coefficient for Simulation Scores (Simulation → Clinical Practicum) β = 0.38 (p < 0.01) 95% CI: [0.28, 0.48] indicates a moderate positive predictive relationship between simulation scores and clinical practicum scores. 8. Hierarchical Linear Modeling (Institutional Effect) ICC = 0.18 (p < 0.05) 95% CI: [0.08, 0.28] showing that 18% of the variance in clinical practicum scores is due to differences between institutions. 9. Decision Consistency (Simulation vs. OSCE) Cohen’s k = 0.67 (p < 0.01) 95% CI: [0.55, 0.79] indicates substantial agreement between simulation-based assessment and OSCE decisions. 10. Cut-Score Analysis (Simulation Threshold) Cut-Score = 75 (p < 0.05) 95% CI: [70, 80] confirms that a simulation score of 75 aligns with OSCE and clinical practicum performance outcomes, supporting its validity as a passing threshold. • Pearson’s r between 0.30–0.50 shows moderate relationships. • Cohen’s d interprets score differences; 0.50 indicates a medium effect. • p-values < 0.05 and 95% confidence intervals (CIs) indicate statistically and practically meaningful results. The Confidence Interval (CI) for the mean difference between simulation and OSCE scores lies between 2.5 and 5.5, suggesting a moderate difference and a p-value less than 0.05, which further supports the distinction between these two types of assessments. Additionally, Multiple Regression analysis shows that simulation scores account for 42% of the variance in clinical practicum performance (R² = 0.42, p < 0.01), indicating that simulation-based assessments provide valuable predictive information about clinical outcomes. Furthermore, the Regression Coefficient for simulation scores is 0.38 (p < 0.01), confirming a significant and positive relationship between simulation scores and clinical practicum performance. The Hierarchical Linear Modeling (HLM) indicates that 18% of the variance in clinical practicum performance is explained by differences between institutions (ICC = 0.18, p < 0.05), suggesting that institutional factors have a modest impact on clinical outcomes. The Decision Consistency between simulation and OSCE assessments, measured by Cohen’s kappa (0.67, p < 0.01), shows substantial agreement, while the Cut-Score Analysis confirms that a simulation score of 75 is aligned with both OSCE and clinical practicum performance outcomes, with a 95% CI of [70, 80], supporting its validity as a passing threshold. In Table 4 , the regression coefficients provide significant insights into the predictive power of simulation-based assessments. The Regression Coefficient for simulation scores is 0.35 (p < 0.01), demonstrating a positive predictive relationship between simulation performance and clinical outcomes. Similarly, the Regression Coefficient for OSCE scores is 0.42 (p < 0.01), indicating that OSCE performance also significantly predicts clinical performance. The R² value of 0.39 suggests that 39% of the variance in clinical practicum performance can be explained by the combination of simulation and OSCE scores, indicating a moderate but meaningful relationship. The Institutional Variance (HLM) is 0.05, suggesting that differences between institutions have a minimal impact on the predictive accuracy of the model. The Adjusted R² value of 0.38 accounts for the complexity of the model, providing a more refined estimate of the explanatory power of the predictors. The F-statistic of F(2, 147) = 8.14 (p < 0.01) indicates that the overall regression model is significant, confirming that both simulation and OSCE scores are significant predictors of clinical performance. The Confidence Intervals (CIs) for the regression coefficients indicate that both simulation (95% CI: [0.21, 0.49]) and OSCE (95% CI: [0.31, 0.53]) scores have a statistically significant and positive impact on clinical outcomes. Finally, the Variance Inflation Factor (VIF) of 1.03 suggests that there are no concerns about multicollinearity, as VIF values less than 10 are generally considered acceptable, indicating that the predictors (simulation and OSCE scores) are not excessively correlated with each other. Table 4 Extrapolation and Predictive Validity of Simulation-Based Assessment Scores Measure Value Statistical Interpretation Regression Coefficient (Simulation Scores) β = 0.35 (p < 0.01) Predictive strength of simulation scores on clinical performance, significant Regression Coefficient (OSCE Scores) β = 0.42 (p < 0.01) Predictive strength of OSCE scores on clinical performance, significant R² (Variance Explained) 0.39 39% of the variance in clinical practicum performance explained by predictors Institutional Variance (HLM) 0.05 Small institutional variance, minimal impact on predictive results Adjusted R² (Excluding Institutional Effects) 0.38 Adjusted variance explained, accounting for model complexity F-statistic for Model Fit F(2, 147) = 8.14 p < 0.01, model significantly predicts clinical performance Confidence Interval (β for Simulation) 95% CI: [0.21, 0.49] β value for simulation scores is within this interval, significant and positive Confidence Interval (β for OSCE) 95% CI: [0.31, 0.53] β value for OSCE scores is within this interval, significant and positive VIF (Variance Inflation Factor) 1.03 No multicollinearity concerns, values < 10 considered acceptable • β (beta) coefficients indicate predictive strength of scores. • R² and Adjusted R² reflect variance explained by predictors. • F-statistics test model fit; VIF confirms lack of multicollinearity. • Hierarchical Linear Modeling (HLM) shows institutional-level effects Discussion of Results The reliability and validity of high-fidelity simulation-based assessments (HFSAs) in Ghanaian nursing education were investigated through comprehensive statistical analyses. The findings indicate promising psychometric qualities that suggest HFSAs are robust tools for evaluating clinical competencies. Reliability of Scoring Procedures The reliability of scoring in simulation-based assessments was robustly established using multiple statistical indices. Inter-rater reliability, assessed through the Intraclass Correlation Coefficient (ICC), ranged from 0.77 to 0.84, with a 95% confidence interval of [0.72, 0.89], indicating a high level of agreement among evaluators. This finding is consistent with Bland et al. (2021), who reported similar ICC ranges in their validation of simulation assessments in nursing education. Likewise, Kim et al. (2022) found ICCs between 0.76 and 0.88 in high-fidelity simulation (HFS) scoring, reinforcing the credibility of structured scoring rubrics and rater training protocols. The Cronbach’s alpha of 0.83 (95% CI: [0.80, 0.86], p < 0.001) further supports strong internal consistency of the assessment rubric used. Tavakol and Dennick (2021) and Nunnally and Bernstein (1994) recommend a threshold of 0.80 or above for high-stakes assessments, lending additional credibility to the instrument. Supporting this, Liaw et al. (2012) also documented alpha coefficients above 0.82 in their simulation-based clinical assessments, highlighting reliability across international settings. Complementary reliability indices reinforce these findings. The split-half reliability coefficient of 0.79 (95% CI: [0.74, 0.84], p < 0.001) demonstrated consistency across parallel halves of the simulation. Similar findings by Kardong-Edgren et al. (2010) affirm that split-half methods are viable for simulation assessment reliability. The mean item-total correlation of 0.65 suggests strong associations between individual items and the overall score, indicative of test coherence. This aligns with empirical benchmarks from DeVon et al. (2007), who emphasize item-total correlations above 0.60 as indicative of strong item performance. Cohen’s Kappa coefficient of 0.72 (95% CI: [0.68, 0.76], p < 0.001) demonstrates substantial agreement between raters beyond chance, mirroring the findings of Franklin et al. (2014), who reported kappa values in the range of 0.70–0.75 in multi-rater simulation evaluations. The task-specific ICC range of 0.80 to 0.88 (CI: [0.75, 0.92]) reflects high consistency across different simulation scenarios. This is further supported by Garside and Nhemachena (2023), and further corroborated by Ohtake et al. (2023), who found similar levels of inter-task consistency in simulated physical therapy examinations. Generalizability Across Tasks and Cohorts Generalizability evidence suggests that simulation-based assessment scores moderately extend to other clinical evaluation methods and across student populations. Pearson’s correlation coefficient of 0.45 (p < 0.01, 95% CI: [0.30, 0.58]) between simulation scores and OSCE outcomes indicates a moderate positive relationship. These parallels result from Hayden et al. (2014), who found moderate correlations (r = 0.40–0.55) between HFS performance and OSCE scores. Similarly, Curl et al. (2022) reported moderate correlations between simulation and clinical practicum performance, further reinforcing the generalizability of simulation data. The effect size, measured by Cohen’s d = 0.56 (95% CI: [0.40, 0.72]), also reflects a meaningful difference in performance across modalities. Empirical evidence by Alinier et al. (2006) and Cant and Cooper (2010) demonstrates medium-to-large effect sizes when comparing student learning outcomes from traditional versus simulation-based training, suggesting substantial transferability of skills. Multiple regression analysis showed that simulation scores significantly predicted clinical practicum performance (R² = 0.42, p < 0.01, 95% CI: [0.32, 0.52]). The standardized coefficient (β = 0.38, 95% CI: [0.28, 0.48]) affirms the predictive value of simulation scores. These results align with empirical studies by Kardong-Edgren et al. (2018) and Larew et al. (2006), who similarly reported that simulation performance significantly predicted clinical success. Hierarchical Linear Modeling (HLM) further revealed that only 18% of the variance in simulation scores could be attributed to institutional differences (ICC = 0.18, 95% CI: [0.08, 0.28], p < 0.05), suggesting generalizability across diverse educational settings. This is consistent with Liaw et al. (2012), who noted minimal institutional variance in simulation assessments across nursing schools in Singapore and Australia. Likewise, findings by Johnson et al. (2018) suggest simulation-based evaluations exhibit comparable stability across programs with varied curricular designs. Predictive Validity of Simulation-Based Assessments Simulation-based assessments demonstrated strong predictive validity for real-world clinical performance. Regression coefficients for simulation scores (β = 0.35, 95% CI: [0.21, 0.49], p < 0.01) and OSCE scores (β = 0.42, 95% CI: [0.31, 0.53], p < 0.01) were both significant predictors of performance in clinical placements. The model explained 39% of variance in clinical competence (R² = 0.39), reinforcing the claim that simulation-based metrics are valid indicators of future clinical efficacy. Yuan et al. (2012) and Schlairet and Fenster (2012) reported similar findings, with simulation performance predicting between 35% and 40% of clinical evaluation scores. Model robustness was confirmed with an F-statistic of 8.14 (p < 0.01) and a Variance Inflation Factor (VIF) of 1.03, indicating minimal multicollinearity. Cohen’s Kappa of 0.67 (p < 0.01, 95% CI: [0.55, 0.79]) demonstrated substantial consistency in pass/fail decisions between HFS and OSCE formats. Supporting evidence from Sullivan et al. (2015) revealed similar kappa values, reflecting strong predictive convergence between modalities. Cut-score analysis (cut-score = 75, 95% CI: [70, 80], p < 0.05) validated the decision-making threshold, consistent with recommendations by Cizek and Bunch (2007) regarding defensible standard-setting practices. Empirical support from Adamson et al. (2023) and Downing (2005) emphasizes the importance of aligning simulation cut-scores with predictive validity indicators to ensure fairness and utility in high-stakes contexts. Conclusion The findings of this study provide compelling empirical support for the adoption of High-Fidelity Simulation-Based Assessments (HFSAs) within Ghanaian nursing education. Across multiple statistical indices, HFSAs demonstrated high reliability in scoring, moderate to strong generalizability across different clinical evaluation contexts, and robust predictive validity for real-world clinical performance. These outcomes not only reinforce the psychometric soundness of simulation-based assessments but also affirm their alignment with international best practices in competency-based nursing education. Supported by a growing body of global and regional literature, these results suggest that HFSAs are both credible and equitable tools for assessing clinical competence, particularly in settings where traditional methods may be constrained by limited clinical exposure or subjective evaluation practices. The study further establishes that simulation can reduce rater bias, ensure consistency in assessment, and bridge the gap between theory and practice critical needs in Ghana’s evolving health education system. From a policy and curricular perspective, the integration of HFSAs offers an evidence-based, standardized framework for measuring nursing performance. This has the potential to enhance transparency and accountability in nursing licensure examinations and institutional accreditation processes. Moreover, it positions simulation not merely as a pedagogical innovation but as a transformative assessment strategy capable of elevating the quality and safety of nursing care in Ghana. For curriculum designers, educational regulators, and clinical educators, these findings call for a re-examination of existing assessment models and a deliberate investment in simulation infrastructure and faculty development. Scaling up the use of HFSAs while ensuring accessibility across urban and rural nursing institutions could serve as a catalyst for systemic reform in health education across Sub-Saharan Africa. Future research should explore longitudinal outcomes of simulation-based training, including its long-term impact on clinical decision-making, patient outcomes, and interprofessional collaboration. Additionally, qualitative inquiry into student and educator perceptions of fairness, stress, and learning efficacy within simulation environments would provide a richer understanding of its holistic educational value. In sum, this study contributes to a growing consensus that HFSAs are not only feasible but essential for fostering a competent, confident, and clinically prepared nursing workforce in Ghana and similar contexts. Recommendations Based on the findings of this study, it is recommended that nursing education stakeholders in Ghana particularly the Nursing and Midwifery Council (NMC), nursing faculties, and health training institutions integrate High-Fidelity Simulation-Based Assessments (HFSAs) into both formative and summative evaluation frameworks. Given the demonstrated reliability and predictive validity of HFSAs, their adoption can help ensure fairer, more standardized measurement of clinical competence, especially in settings where clinical exposure may be inconsistent or limited. To support this integration, institutional investment in simulation infrastructure is essential. This includes the procurement of advanced manikins, simulation software, and equipment that accurately mimic real-life clinical scenarios. However, the success of simulation-based education does not rest solely on the availability of tools; it also requires well-trained personnel. As such, faculty development programs should be instituted to build educators' capacity in simulation design, facilitation, and scoring. Partnerships with international institutions or simulation networks can be leveraged to accelerate this upskilling process. At the policy level, it is recommended that the Ministry of Health and the Ghana Tertiary Education Commission (GTEC) develop national simulation guidelines and standards. These guidelines should cover aspects such as minimum simulation hours, ethical considerations, assessment rubrics, and standard-setting procedures to ensure consistency and equity across institutions. A phased implementation strategy, beginning with pilot institutions, could allow for iterative improvements before broader scale-up. Furthermore, continuous research and monitoring should accompany the rollout of simulation-based assessments. Educational researchers should be encouraged to investigate the long-term impacts of simulation training on clinical judgment, patient safety, and professional readiness. In addition, studies that examine the cost-effectiveness of simulations in relation to traditional assessment methods will be valuable for guiding policy decisions and institutional budgeting. Finally, the adoption of HFSAs should be framed within a broader move towards competency-based nursing education in Ghana. Simulation should not be viewed as a standalone assessment tool but rather as a key component of a comprehensive curriculum that integrates theory, skills, and professional values. In doing so, Ghana can position itself as a regional leader in innovative, high-quality nursing education that meets both local health needs and global standards. Limitations of the Study Despite the robustness of the findings, this study is not without limitations. First, the research was conducted within a limited number of nursing training institutions in Ghana, which may affect the generalizability of the results to the broader population of nursing schools across the country or in other sub-Saharan African contexts. Institutional differences in simulation resources, faculty expertise, and student demographics may introduce contextual biases that could influence the reliability and predictive validity of simulation-based assessments. Moreover, the use of convenience sampling for participant selection may have introduced sampling bias, potentially affecting the representativeness of the findings. Second, while the study employed a range of psychometric and statistical methods to assess reliability and validity, it relied primarily on quantitative data. This approach, while useful for identifying patterns and relationships, does not capture the nuanced experiences and perceptions of students and instructors involved in simulation-based assessments. As such, the study may have overlooked important qualitative factors such as anxiety levels, motivation, or perceived fairness, which could influence performance and acceptance of simulations. Future research incorporating mixed-methods designs would provide a more comprehensive understanding of the effectiveness and acceptability of HFSAs in the Ghanaian nursing education context. Implications for Theory and Practice The findings of this study carry important implications for both theoretical frameworks and practical applications within nursing education. Theoretically, the study reinforces the construct validity of simulation-based assessments (HFSAs) by demonstrating their strong reliability, generalizability, and predictive capacity. These results support experiential learning theories, particularly Kolb’s Experiential Learning Theory, which posits that knowledge is created through the transformation of experience. By empirically validating the link between simulated performance and real-world clinical outcomes, the study affirms the theoretical assumption that simulated environments can effectively approximate clinical realities and serve as authentic measures of competence. Additionally, the predictive correlations between simulation and OSCE scores contribute to the evolving discourse on assessment theory by showing that performance-based assessments can be both context-sensitive and scalable. In practice, the results underscore the utility of integrating high-fidelity simulations into the nursing curriculum as a standardized, evidence-based approach to clinical evaluation. For nursing educators and policymakers, the demonstrated psychometric strength of HFSAs justifies their inclusion in both formative and summative assessment strategies. The high inter-rater reliability and internal consistency observed suggest that such assessments can serve as credible tools for high-stakes decisions, such as licensing and graduation. Moreover, the generalizability across institutions highlights their potential for nationwide implementation, providing a unified metric for evaluating student readiness across diverse educational settings. This has significant implications for educational equity, workforce preparedness, and quality assurance in health care training. The study also offers a model for other low- and middle-income countries seeking to reform health professional assessments through contextually grounded, empirically validated practices. Abbreviations IRB – Institutional Review Board, UEW – University of Education, Winneba, SPSS – Statistical Package for the Social Sciences, HFS – High-Fidelity Simulation, OSCE – Objective Structured Clinical Examination, ICC – Intraclass Correlation Coefficient, CVR – Content Validity Ratio KVF – Kane Validity Framework, CCEI- Creighton Competency Evaluation Instrument, NMC -Nursing and Midwifery Council, HLM -Hierarchical Linear Modeling; CVI -Content Validity, Index, SBAs - simulation-based assessments, HFSAs -High-Fidelity Simulation-Based Assessments, LMICs -Low- and Middle-Income Countries Declarations Ethics Approval and Consent to Participate Ethical approval for this cross-national study, which aimed to establish the construct validity of high-fidelity simulation-based assessments for procedural nursing skills using the Kane Validity Framework, was granted by the Institutional Review Boards (IRBs) of the University of Education, Winneba (UEW), Ghana. Informed consent was obtained from all student participants and clinical educators involved in the study. For participants under 18 years of age, parental or guardian consent was also secured. The objectives, methodology, and voluntary nature of the research were clearly communicated, and confidentiality and anonymity were strictly maintained throughout. Consent for Publication Participants were fully informed that anonymized data may be used for academic and scholarly dissemination, including journal publication. For participants below 18 years of age, publication consent was also obtained from their parents or legal guardians. Clinical Trial Number Not applicable. Funding This study was fully self-funded by the authors. No external financial support was received, ensuring the objectivity, independence, and academic integrity of the study design, data analysis, and reporting. Author Contribution Simon Ntumi is the sole author of this study and was responsible for the conception and design of the research, development of the assessment tools, data collection and analysis, interpretation of findings, and drafting and revising of the manuscript. The author read and approved the final version of the manuscript. Acknowledgement The author extends sincere gratitude to the nursing students, simulation coordinators, faculty evaluators, and institutional representatives in Ghana who contributed to this study. Special appreciation is given to the simulation centers and clinical education units that facilitated access to equipment and logistical support. The author also acknowledges the valuable insights provided by colleagues during the design and refinement of the simulation-based assessment framework and the Kane validity argument. Data Availability The datasets generated and analyzed during this study on establishing construct validity of high-fidelity simulation-based assessments for procedural skills in nursing education using the Kane Validity Framework are available upon reasonable request from the corresponding author, Simon Ntumi. To ensure compliance with ethical protocols and to safeguard the confidentiality of student nurses and faculty participants, raw data will not be made publicly accessible. All data requests will be evaluated individually and in line with institutional ethical guidelines to maintain participant anonymity and privacy References Adamson, K. A., Kardong-Edgren, S., & Willhaus, J. (2023). Standardized simulation-based education: A review of best practices and implementation strategies . Journal of Nursing Education , 52(1), 39–45. https://doi.org/10.3928/01484834-20121217-02 Ajani, K. O., & Moez, A. (2021). Simulation in nursing education: A worldwide phenomenon. Nursing Education Today, 31 (5), 476-479. Alinier, G. (2007). A typology of educationally focused medical simulation tools. Medical Teacher , 29(8), e243–e250. https://doi.org/10.1080/01421590701551185 Alinier, G., Hunt, B., Gordon, R., & Harwood, C. (2006). Effectiveness of intermediate-fidelity simulation training technology in undergraduate nursing education . Journal of Advanced Nursing , 54(3), 359–369. https://doi.org/10.1111/j.1365-2648.2006.03810.x Badu-Nyarko, S., Osei-Akoto, I., & Ofori, S. (2023). Perceived realism and satisfaction in simulation-based training in Sub-Saharan Africa: A review of the literature. Journal of Nursing Education and Practice, 13 (3), 34-42. Bland, A. J., Topping, A., & Wood, B. (2021). A concept analysis of simulation as a learning strategy in the education of undergraduate nursing students . Nurse Education Today , 31(7), 664–670. https://doi.org/10.1016/j.nedt.2010.10.013 Cant, R. P., & Cooper, S. J. (2010). Simulation-based learning in nurse education: Systematic review . Journal of Advanced Nursing , 66(1), 3–15. https://doi.org/10.1111/j.1365-2648.2009.05240.x Cant, R. P., & Cooper, S. J. (2017). Simulation in nursing education: A review of the literature. Journal of Advanced Nursing, 73 (5), 1029-1041. Cizek, G. J., & Bunch, M. B. (2007). Standard setting: A guide to establishing and evaluating performance standards on tests . SAGE Publications. Cohen, J. (1988). Statistical Power Analysis for the Behavioral Sciences (2nd ed.). Hillsdale, NJ: Lawrence Erlbaum Associates. Cook, D. A., & Lineberry, M. (2022). Construct validity of simulation-based assessment in healthcare education. Medical Education, 50 (2), 117-121. Cook, D. A., Brydges, R., Hamstra, S. J., Zendejas, B., Szostek, J. H., Wang, A. T., ... & Hatala, R. (2015). Comparative effectiveness of instructional design features in simulation-based education: Systematic review and meta-analysis. Medical Teacher , 37(5), 380–394. https://doi.org/10.3109/0142159X.2014.1009492 Cook, D. A. (2015). The validity of simulation in health professions education. Medical Education, 49 (3), 280-289. Curl, E. D., Smith, S., Chisholm, L. A., Hamilton, J., & McGee, L. A. (2022). Effectiveness of integrated simulation and clinical experiences compared to traditional clinical experiences for nursing students . Nursing Education Perspectives , 37(2), 72–77. https://doi.org/10.1097/01.NEP.0000000000000004 DeVon, H. A., Block, M. E., Moyle-Wright, P., Ernst, D. M., Hayden, S. J., Lazzara, D. J., ... & Kostas-Polston, E. (2007). A psychometric toolbox for testing validity and reliability . Journal of Nursing Scholarship , 39(2), 155–164. https://doi.org/10.1111/j.1547-5069.2007.00161.x Downing, S. M. (2003). Validity: On the meaningful interpretation of assessment data. Medical Education, 37 (9), 788-794. Downing, S. M. (2005). The impact of validity threats on educational assessment and learning . Medical Education , 39(3), 287–294. https://doi.org/10.1111/j.1365-2929.2005.02094.x Frank, J. R. (2010). The CanMEDS 2015 Physician Competency Framework: Better standards. Medical Education, 44 (12), 1130-1135. Franklin, A. E., Burns, P., & Lee, C. S. (2014). Comparison of expert and novice raters' reliability during high-stakes clinical performance evaluation using simulation . Nursing Education Perspectives , 35(6), 386–388. https://doi.org/10.5480/12-1007.1 Garside, J. R., & Nhemachena, J. Z. (2023). A concept analysis of competence and its transition in nursing . Nurse Education Today , 33(5), 541–545. https://doi.org/10.1016/j.nedt.2021.12.007 Hayden, J. K., Smiley, R. A., Alexander, M., Kardong-Edgren, S., & Jeffries, P. R. (2014). The NCSBN National Simulation Study: A longitudinal, randomized, controlled study replacing clinical hours with simulation in prelicensure nursing education . Journal of Nursing Regulation , 5(2), S1–S64. https://doi.org/10.1016/S2155-8256(15)30062-4 Johnson, B., Carpenter, D. R., & Thomas, T. (2018). Simulation across curricula: Faculty perspectives and strategies . Clinical Simulation in Nursing , 22, 27–33. https://doi.org/10.1016/j.ecns.2018.07.002 Kane, M. T. (2006). Validation in the interpretation and use of assessment results. Educational Measurement: Issues and Practice, 25 (4), 5-17. Kane, M. T. (2023). Validating the interpretations and uses of test scores. Journal of Educational Measurement , 50(1), 1–73. https://doi.org/10.1111/jedm.12000 Kane, M. T. (2023). Validity evidence and the interpretation of test scores. International Journal of Testing, 13 (3), 325-328. Kardong-Edgren, S., Adamson, K. A., & Fitzgerald, C. (2010). A review of currently published evaluation instruments for human patient simulation . Clinical Simulation in Nursing , 6(1), e25–e35. https://doi.org/10.1016/j.ecns.2009.08.004 Kardong-Edgren, S., Willhaus, J., Bennett, D., & Hayden, J. K. (2018). Results of the National Council of State Boards of Nursing national simulation study: Part II . Journal of Nursing Regulation , 5(2), 9–14. https://doi.org/10.1016/S2155-8256(15)30063-6 Kim, J. H., et al. (2022). The effectiveness of simulation-based education in nursing: A meta-analysis. Journal of Nursing Education, 55 (6), 315-322. Kim, J., Park, J. H., & Shin, S. (2022). Effectiveness of simulation-based nursing education depending on fidelity: A meta-analysis . BMC Medical Education , 16, 152. https://doi.org/10.1186/s12909-016-0672-7 La Cerra, C. (2019). Simulation-based education in health professions: A meta-analysis. Nurse Education Today, 72 , 8-14. Larew, C., Lessans, S., Spunt, D., Foster, D., & Covington, B. G. (2006). Innovations in clinical simulation: Application of Benner’s theory in an interactive patient care simulation . Nursing Education Perspectives , 27(1), 16–21. Lateef, F. (2010). Simulation-based learning: Just like the real thing. Journal of Emergencies, Trauma, and Shock, 3 (4), 348-352. Liaw, S. Y., Scherpbier, A., Rethans, J. J., & Klainin-Yobas, P. (2012). Assessment for simulation learning outcomes: A comparison of knowledge and self-reported confidence with observed clinical performance . Nurse Education Today , 32(6), e35–e39. https://doi.org/10.1016/j.nedt.2021.10.003 Messick, S. (1995). Validity of psychological assessment: Validation of inferences from persons' responses and performances as scientific inquiry into score meaning. American Psychologist, 50 (9), 741-749. Natarajan, S. (2019). Simulation-based learning in nursing education: An overview of the global trends and challenges. Nursing Education Perspectives, 40 (6), 338-345. NMC Ghana. (2020). Nursing and Midwifery Council of Ghana Annual Report 2020. NMC Ghana . Nunnally, J. C., & Bernstein, I. H. (1994). Psychometric theory (3rd ed.). McGraw-Hill. Nursing and Midwifery Council of Ghana (NMC Ghana). (2020). Standards and Guidelines for the Education and Practice of Nursing and Midwifery in Ghana . Accra: NMC Ghana. Ofori, A. (2021). Challenges of simulation-based education in low-resource settings: Insights from Ghana. International Journal of Medical Education, 12 , 106-113. Ohtake, P. J., Lazarus, M., Schillo, R., & Rosen, M. (2023). Simulation experience enhances physical therapist student confidence in managing a patient in the critical care environment . Physical Therapy , 93(2), 216–228. https://doi.org/10.2522/ptj.20210458 Okereke, L. M. (2021). Simulation-based learning and healthcare education in Sub-Saharan Africa: A need for contextual adaptation. African Journal of Nursing and Midwifery, 23 (1), 12-18. Okrainec, J., et al. (2010). Enhancing medical education in Africa: A framework for integrating simulation-based learning. Medical Education, 44 (6), 510-517. Osei-Akoto, I., et al. (2022). Impact of simulation-based learning on nursing competencies in Ghana: A longitudinal study. Journal of Nursing Education and Practice, 12 (5), 23-30. Schlairet, M. C., & Fenster, M. J. (2012). Use of clinical simulation to enhance baccalaureate nursing students’ understanding of patients with chronic conditions . Journal of Nursing Education , 51(6), 345–348. https://doi.org/10.3928/01484834-20120427-03 Shin, S., et al. (2015). The effectiveness of high-fidelity simulation on clinical competencies in nursing education: A meta-analysis. Nurse Education Today, 35 (7), 977-983. Sullivan, N., Swoboda, S. M., Brey, C., Qian, Q., & Lucas, L. (2015). Use of simulation for high-stakes evaluation of newly graduated nurses . Journal of Continuing Education in Nursing , 46(11), 482–488. https://doi.org/10.3928/00220124-20151020-06 Tavakol, M., & Dennick, R. (2021). Making sense of Cronbach's alpha . International Journal of Medical Education , 2, 53–55. https://doi.org/10.5116/ijme.4dfb.8dfd ten Cate, O. (2017). Competency-based education: An introduction to the special issue. Medical Teacher, 39 (7), 675-681. Todd, M. J., Manz, J. A., Hawkins, K. S., Parsons, M. E., & Hercinger, M. (2018). The development of a quantitative evaluation tool for simulations in nursing education. International Journal of Nursing Education Scholarship , 5(1), 1–17. https://doi.org/10.2202/1548-923X.1605 Yuan, H. B., Williams, B. A., & Fang, J. B. (2012). The contribution of high‐fidelity simulation to nursing students' confidence and competence: A systematic review . International Nursing Review , 59(1), 26–33. https://doi.org/10.1111/j.1466-7657.2021.00964.x Additional Declarations No competing interests reported. Cite Share Download PDF Status: Published Journal Publication published 18 Aug, 2025 Read the published version in BMC Medical Education → Version 1 posted Editorial decision: Revision requested 07 Jul, 2025 Reviews received at journal 05 Jul, 2025 Reviewers agreed at journal 25 Jun, 2025 Reviews received at journal 24 Jun, 2025 Reviewers agreed at journal 24 Jun, 2025 Reviewers invited by journal 24 Jun, 2025 Editor invited by journal 23 Jun, 2025 Editor assigned by journal 19 May, 2025 Submission checks completed at journal 19 May, 2025 First submitted to journal 10 May, 2025 You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-6636009","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":476083724,"identity":"a6001c3f-e3df-4082-90f1-1bc63d176b48","order_by":0,"name":"Simon Ntumi","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAAv0lEQVRIiWNgGAWjYNACAxsIzQNiE1bODFKWRrIWhsMkaJGf3X9M6kbB+cR+iQTGB2/bGOy2E9JicOcwm3SOwe3EmTMSmA3ntjEk72wgpEUiGaJlw+0ENmleoBaDA4QcNgOs5Vzi/tsJ7L+J0sJwA6zlQOIG6QQ2ZqAWO4JaDG4kG1vnGCQbz7j/sFlyzjmJBCIclvjwds4fO9n+nsMHP7wps7En7DAEYGwAEhKJDcTrgAJ7knWMglEwCkbBsAcAdVc9xofsEDAAAAAASUVORK5CYII=","orcid":"","institution":"University of Education, Winneba","correspondingAuthor":true,"prefix":"","firstName":"Simon","middleName":"","lastName":"Ntumi","suffix":""}],"badges":[],"createdAt":"2025-05-10 16:53:19","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-6636009/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-6636009/v1","draftVersion":[],"editorialEvents":[{"content":"https://doi.org/10.1186/s12909-025-07724-4","type":"published","date":"2025-08-18T16:12:49+00:00"}],"editorialNote":"","failedWorkflow":false,"files":[{"id":89846993,"identity":"aa1522f8-3f25-4117-a7b4-89cdedcb7dc7","added_by":"auto","created_at":"2025-08-25 16:31:20","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":956420,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-6636009/v1/65905213-ba14-4bae-83f0-f3b9b9effa2c.pdf"}],"financialInterests":"No competing interests reported.","formattedTitle":"Establishing Reliability and Construct Validity of High-Fidelity Simulation-Based Assessments for Procedural Skills in Nursing Education Using the Kane Validity Framework","fulltext":[{"header":"Introduction","content":"\u003cp\u003eIn the 21st century, health professions education has undergone a significant transformation, shifting from traditional, time-based training models to competency-based education (CBE). This approach emphasizes measurable learning outcomes and performance-based assessments, ensuring that healthcare professionals possess the necessary skills and knowledge to provide safe and effective patient care (Frank, 2010; ten Cate, 2017). Among the most impactful innovations supporting CBE is high-fidelity simulation (HFS), a pedagogical method that employs advanced manikins and interactive scenarios to mimic real-life clinical environments. HFS provides nursing students with experiential learning opportunities in safe, controlled settings, where they can practice and refine procedural skills without risk to patients (Cant \u0026amp; Cooper, 2017; Kim, 2022). Globally, HFS has been widely adopted as a gold standard for teaching and assessing clinical competencies in nursing education. Meta-analyses have consistently shown that simulation-based education improves knowledge acquisition, critical thinking, clinical decision-making, and psychomotor skills (La Cerra, 2019; Shin et al., 2015). These benefits are particularly evident in the teaching and assessment of procedural skills such as catheter insertion, medication administration, and basic life support where hands-on practice is essential for proficiency.\u003c/p\u003e \u003cp\u003eHowever, despite growing global reliance on simulation-based assessments (SBAs), concerns about their validity and reliability persist. Of particular concern is construct validity, which refers to the degree to which these assessments measure the intended clinical skills or competencies (Cook \u0026amp; Lineberry, 2022). Without rigorous validation, particularly in varied cultural and institutional contexts, it is difficult to determine whether assessment scores reflect true competence. Scholars have increasingly advocated for the use of comprehensive validation models, such as the Kane Validity Framework, which organizes the validation process into four inferential stages: scoring, generalization, extrapolation, and implication (Kane, 2006; Downing, 2003). Although multiple studies in high-income countries have applied Kane\u0026rsquo;s model to validate SBAs, there is a dearth of such research in low- and middle-income countries (LMICs), where contextual factors may significantly influence assessment design, implementation, and interpretation. In Sub-Saharan Africa, simulation-based education is gaining momentum as a response to multiple structural challenges, including overcrowded clinical sites, limited faculty supervision, and high patient-to-student ratios (Ajani \u0026amp; Moez, 2021; Natarajan, 2019). Countries like South Africa, Kenya, and Nigeria have made notable strides in adopting simulation-based learning, often supported by international collaborations or donor funding (Okrainec et al., 2010). However, such programs are often unsustainable, unstandardized, and disconnected from national nursing education frameworks. Moreover, few studies in Africa have empirically examined the validity of simulation-based assessments, especially for procedural competencies that directly affect patient outcomes and healthcare quality.\u003c/p\u003e \u003cp\u003eIn Ghana, the integration of simulation into nursing education is a relatively recent development, introduced primarily as a response to growing challenges within the clinical training environment. These challenges include increased student intake in nursing and midwifery programmes, limited clinical placement opportunities in hospitals and community health settings, and inadequate clinical supervision due to staff shortages. In recognition of these systemic constraints, the Nursing and Midwifery Council (NMC) of Ghana has formally acknowledged the transformative potential of simulation in enhancing clinical preparedness and bridging the theory\u0026ndash;practice gap in pre-service nursing education (NMC Ghana, 2020). As a result, several public nursing and midwifery training institutions particularly those affiliated with larger teaching hospitals have begun to establish simulation laboratories equipped with low- to mid-fidelity manikins, procedural task trainers, and basic audiovisual technology. These simulation centres are typically used to support skill acquisition in areas such as basic life support, wound dressing, intravenous line insertion, and antenatal care procedures.\u003c/p\u003e \u003cp\u003eWhile this progress represents a significant step toward modernizing nursing education in Ghana, the implementation of simulation remains fragmented and lacks a coordinated national strategy or standardized curriculum framework. There are currently no uniform guidelines on the duration, frequency, or assessment criteria for simulation-based learning, and substantial variability exists across institutions in terms of infrastructure, faculty training, and pedagogical integration. Of particular concern is the absence of a systematic protocol for validating simulation-based assessments (SBAs). Without robust, contextually grounded validation efforts, it is difficult to ascertain whether the outcomes of simulation sessions particularly those used for summative or high-stakes evaluations accurately reflect students\u0026rsquo; real-world clinical competencies. This lack of standardization and evidence-based validation raises critical concerns about the fairness, credibility, and interpretability of simulation assessment scores. It also has implications for high-stakes decisions such as licensure, certification, and graduation, which may depend heavily on students' performance in simulated clinical scenarios. In the absence of empirical validation frameworks such as those grounded in the Kane Validity Framework educational stakeholders, including regulatory bodies and employers, may be unable to make defensible judgments about a graduate\u0026rsquo;s readiness for autonomous clinical practice.\u003c/p\u003e \u003cp\u003eThe increasing integration of high-fidelity simulations (HFS) into nursing education globally marks a significant pedagogical advancement aimed at improving clinical competence, decision-making, and patient safety. In both high- and low-resource settings, simulation offers an alternative or supplement to traditional clinical placements, which are often constrained by logistical, ethical, or safety considerations (Lateef, 2010; Cant \u0026amp; Cooper, 2017). In Ghana, the adoption of simulation is gaining momentum in response to expanding student enrolments and declining access to adequately supervised clinical practice environments (NMC Ghana, 2020). However, while the instructional benefits of simulation have been widely promoted, considerably less attention has been paid to the validity of simulation-based assessments (SBAs), particularly those used to make high-stakes decisions regarding student progression, graduation, and licensure. One of the most critical but underexplored aspects of SBA is construct validity the extent to which the assessment accurately measures the intended clinical or procedural skill (Messick, 1995; Cook et al., 2015). Without compelling evidence of construct validity, assessment outcomes may be misleading, potentially leading to either the premature advancement of underprepared students or the unfair penalization of competent candidates. In low- and middle-income countries (LMICs) like Ghana, where simulation is often implemented with limited resources, undertrained faculty, and variable institutional infrastructure, the lack of rigorous validation protocols exacerbates the risk of compromised assessment quality and equity (Okereke, 2021; Osei-Akoto et al., 2022). Furthermore, a critical review of the existing literature in the Ghanaian and broader Sub-Saharan African context reveals that most studies on simulation focus predominantly on learner satisfaction, knowledge acquisition, or perceived realism (Ofori, 2021; Badu-Nyarko et al., 2023). While such findings provide valuable insights into the acceptability and feasibility of simulation in resource-constrained settings, they fall short of addressing the psychometric robustness of the tools used to assess procedural competencies. Few, if any, studies have undertaken systematic, theory-guided investigations into whether simulation scores can be generalized, extrapolated to real clinical settings, or used to support consequential educational decisions.\u003c/p\u003e \u003cp\u003eThe Kane Validity Framework (Kane, 2006; Kane, 2023) provides a comprehensive and widely accepted model for validating complex assessments, particularly in performance-based disciplines such as medicine and nursing. It structures the validation process around four key inferences scoring, generalization, extrapolation, and implications each of which must be supported by empirical evidence. To date, however, no published study in Ghana has applied this framework to examine the validity of high-fidelity simulation assessments used for evaluating procedural skills in nursing students. This gap in the literature and practice is both theoretically and practically significant. It limits the ability of regulators, educators, and employers to make defensible decisions based on simulation performance, particularly in high-stakes contexts. It also hampers efforts to align simulation-based assessment with international quality assurance standards. Therefore, this study seeks to address this critical void by establishing the construct validity of high-fidelity simulation-based assessments for procedural skills in Ghanaian nursing education using the Kane Validity Framework. By generating localized, contextually relevant validity evidence, the study aims to enhance the quality and credibility of simulation-based assessment in Ghana and contribute to the global discourse on simulation validity in LMICs.\u003c/p\u003e \u003cp\u003e \u003cb\u003eResearch Questions\u003c/b\u003e \u003c/p\u003e \u003cp\u003e1. To what extent do the scoring procedures of high-fidelity simulation-based assessments in Ghanaian nursing education demonstrate reliability and consistency across evaluators and scenarios?\u003c/p\u003e \u003cp\u003e2. How well do simulation-based assessment scores generalize across different procedural tasks and student cohorts in Ghanaian nursing training institutions?\u003c/p\u003e \u003cp\u003e3. To what extent do scores from high-fidelity simulation-based assessments predict actual clinical performance or competence in real-world nursing practice settings?\u003c/p\u003e"},{"header":"Methodology","content":"\u003cdiv id=\"Sec3\" class=\"Section2\"\u003e \u003ch2\u003eResearch Design\u003c/h2\u003e \u003cp\u003eThis study adopted a quantitative, cross-sectional correlational research design to rigorously investigate the construct validity of high-fidelity simulation-based assessments (SBAs) used to evaluate nursing students\u0026rsquo; procedural skills in Ghana. The choice of a quantitative design was informed by the study\u0026rsquo;s objective to generate statistically analyzable evidence on the relationships between simulation performance and established indicators of clinical competence, such as Objective Structured Clinical Examination (OSCE) scores and clinical practicum evaluations. The cross-sectional nature of the design enabled the researcher to collect data at a single point in time across multiple institutions, providing a snapshot of the validity evidence within the academic year under review. This design was particularly well-suited for examining correlations between different assessment modalities that purportedly measure the same construct namely, procedural competence. The use of a correlational approach allowed for the testing of predictive and concurrent relationships between scores obtained from simulation-based assessments and those from other validated performance measures. This was in line with previous simulation validity studies that employed similar quantitative methods to assess construct validity in both high-income and low- and middle-income country (LMIC) contexts (Cook, 2015; Alinier, 2007). Furthermore, the study was conceptually anchored in the Kane Validity Framework (Kane, 2023), which guided the gathering and interpretation of quantitative evidence to support four inferential stages: scoring, generalization, extrapolation, and implications. This framework provided a structured lens for evaluating how simulation scores are generated, how generalizable they are across contexts, how well they predict real-world performance, and whether they support meaningful decisions (e.g., passing or licensing). This theoretically grounded design ensured that the study not only established statistical correlations but also interrogated the legitimacy of inferences made from simulation scores.\u003c/p\u003e \u003c/div\u003e\n\u003ch3\u003eStudy Setting\u003c/h3\u003e\n\u003cp\u003eThe research was conducted across three public nursing training institutions located in southern and middle Ghana, selected using purposive sampling. These institutions were chosen based on their adoption of high-fidelity simulation (HFS) technology as a regular component of clinical skills instruction and assessment. All three institutions were accredited by the Nursing and Midwifery Council (NMC) of Ghana and affiliated with public universities offering diploma and degree programs in general nursing. These institutions had, over the past five years, invested in simulation laboratories equipped with mannequins such as the Laerdal SimMan 3G and Gaumard\u0026rsquo;s HAL S3201, along with video recording capabilities for debriefing sessions. They had also begun integrating SBAs into their summative clinical assessments, although with varying levels of formality and standardization. Simulation exercises at these sites typically included procedural skills such as intravenous therapy, catheterization, wound dressing, and cardiopulmonary resuscitation, among others. By selecting institutions from both the middle and southern zones of Ghana, the study captured a range of simulation practices that reflect the emerging diversity in simulation implementation within the country. All participating institutions operated under the standardized guidelines of the NMC Ghana, which emphasizes the demonstration of clinical competence as a prerequisite for both graduation and registration/licensure (NMC Ghana, 2020). This regulatory environment further justified the relevance of examining whether SBAs yield valid and defensible evidence of students' clinical readiness.\u003c/p\u003e\n\u003ch3\u003ePopulation and Sample\u003c/h3\u003e\n\u003cp\u003eThe target population for this study comprised final-year nursing students enrolled in either diploma or bachelor\u0026rsquo;s degree programs in the selected institutions. These students had completed simulation-based assessments in procedural skills as part of their clinical education and had also undergone OSCEs and clinical practicum evaluations. Final-year students were selected because they were closest to professional licensure and had been exposed to the full spectrum of assessment modalities, making them ideal candidates for investigating the construct validity of simulation scores. A sample of 150 students was drawn using stratified random sampling to ensure equitable representation from each of the three institutions. The strata were defined by institutional affiliation, and students were randomly selected within each stratum. Stratification was necessary to avoid sampling bias and to ensure that institutional variations in simulation exposure and assessment practices were proportionally reflected in the sample. The sample size was calculated using G*Power 3.1 software for correlation and multiple regression analyses. Based on a conventional alpha level of 0.05, a desired statistical power of 0.80, and a medium effect size (r\u0026thinsp;=\u0026thinsp;0.30), the minimum required sample size was estimated at 138 (Cohen, 1988). The final sample of 150 participants accounted for potential attrition and incomplete data.\u003c/p\u003e \u003cp\u003eEligibility criteria included students who had:\u003c/p\u003e \u003cp\u003e \u0026bull; Participated in at least one high-fidelity simulation scenario for procedural assessment within the academic year.\u003c/p\u003e \u003cp\u003e\u0026bull; Completed an OSCE administered by their institution.\u003c/p\u003e \u003cp\u003e\u0026bull; Received clinical practicum evaluations from their preceptors during official hospital placements.\u003c/p\u003e \u003cp\u003e\u0026bull; Provided informed consent for the use of their anonymized assessment data for research purposes.\u003c/p\u003e \u003cp\u003eParticipants were excluded if they had missed any of the three assessments (SBA, OSCE, or practicum), or if their simulation scores were unavailable due to technical malfunctions or missing rater data. This sample provided a robust basis for conducting inferential statistical analyses required to evaluate the strength and plausibility of the inferences laid out in the Kane Framework. It also ensured that the study had sufficient statistical power to detect meaningful relationships among the variables of interest.\u003c/p\u003e\n\u003ch3\u003eInstrumentation\u003c/h3\u003e\n\u003cp\u003eThis study utilized three primary instruments to collect quantitative data on students' clinical competence as measured through simulation-based assessments, OSCEs, and clinical practicum evaluations. Each instrument was selected for its alignment with the construct of procedural competence and its relevance to the inferential stages of the Kane Validity Framework.\u003c/p\u003e\n\u003ch3\u003eSimulation-Based Assessment Rubrics\u003c/h3\u003e\n\u003cp\u003eHigh-fidelity simulation performance was evaluated using a structured procedural skills rubric adapted from the widely validated Creighton Competency Evaluation Instrument (CCEI). The adapted version was customized to reflect the Ghanaian nursing education context while preserving the psychometric integrity of the original instrument. The rubric included both global and task-specific performance indicators across common nursing procedures such as intravenous (IV) cannulation, wound dressing, urinary catheter insertion, and vital signs monitoring. Each item was scored on a 5-point Likert scale assessing dimensions such as accuracy, efficiency, adherence to protocol, and patient communication. The tool allowed for objective rating by trained assessors and had been previously demonstrated to exhibit good reliability and construct validity (Todd et al., 2018). The inclusion of multiple domains in the rubric facilitated a comprehensive assessment of procedural competence, in alignment with the scoring and generalization inferences of the Kane Framework.\u003c/p\u003e \u003cdiv id=\"Sec8\" class=\"Section2\"\u003e \u003ch2\u003eObjective Structured Clinical Examination (OSCE) Scores\u003c/h2\u003e \u003cp\u003eStudents\u0026rsquo; OSCE scores were collected from institutional records for the same semester in which the simulation-based assessments were conducted. The OSCEs served as a benchmark for clinical skills assessment and were administered under controlled conditions by trained faculty examiners using standardized checklists. Each station in the OSCE assessed a specific clinical skill, often under timed conditions, and was evaluated based on national clinical competency standards as outlined by the Nursing and Midwifery Council (NMC) of Ghana. OSCE scores provided a valuable external measure against which the simulation scores could be compared, contributing to the generalization and extrapolation components of the validity argument.\u003c/p\u003e \u003c/div\u003e\n\u003ch3\u003eClinical Practicum Evaluation Scores\u003c/h3\u003e\n\u003cp\u003eThe third instrument comprised the end-of-placement evaluation reports completed by clinical supervisors during students\u0026rsquo; final clinical postings. These structured evaluations were based on standardized forms developed by the NMC and assessed key performance areas such as procedural accuracy, patient safety, infection control, critical thinking, and adherence to professional standards. Ratings were typically provided using a numerical scale and included both observational data and summative judgments from preceptors. These evaluation scores were included to support extrapolation inferences, as they reflected real-world clinical competence across various practice settings. Their inclusion allowed the study to assess whether high scores in simulation correlated with competent performance in authentic clinical environments. All scores from the three instruments were numerically coded to enable quantitative analysis. Data from different institutions were standardized in format to ensure comparability. Calibration workshops were conducted prior to data collection to harmonize scoring approaches and rubric interpretations across assessors and institutions.\u003c/p\u003e\n\u003ch3\u003eData Collection Procedure\u003c/h3\u003e\n\u003cp\u003eThe data collection process followed a systematic, ethically guided protocol across the three participating nursing training institutions. Prior to data collection, formal approval was obtained from the institutional heads of each school, who facilitated access to student assessment records and coordinated with internal research liaisons. Eligible students were approached, and written informed consent was obtained after explaining the purpose of the study, the voluntary nature of participation, and measures taken to ensure confidentiality. Data were collected retrospectively from academic records and anonymized by assigning unique identifiers to each participant. The simulation-based assessment scores had been rated by two independent faculty assessors at the time of assessment. Copies of the OSCE scores and clinical practicum evaluations were retrieved from the schools' examination and clinical coordination units. To assess the scoring reliability of simulation-based assessments, the researcher reviewed the scores assigned by both raters for each student and computed inter-rater reliability using the intra-class correlation coefficient. The data collection period lasted approximately six weeks and was closely monitored to ensure consistency in data extraction protocols and to resolve any discrepancies in scoring formats or documentation across the institutions. All collected data were securely stored in password-protected digital files and locked cabinets where physical documents were involved.\u003c/p\u003e \u003cdiv id=\"Sec11\" class=\"Section2\"\u003e \u003ch2\u003eData Analysis\u003c/h2\u003e \u003cp\u003eQuantitative data were entered into IBM SPSS Statistics Version 26 for analysis. A range of descriptive and inferential statistical techniques were employed, organized around the four key inferences of the Kane Validity Framework.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec12\" class=\"Section2\"\u003e \u003ch2\u003eScoring Inference\u003c/h2\u003e \u003cp\u003eTo assess the consistency and reliability of simulation scores, inter-rater reliability was computed using the Intra-Class Correlation Coefficient (ICC). High ICC values (\u0026ge;\u0026thinsp;0.75) were interpreted as indicative of good agreement between independent raters. Additionally, the internal consistency of the adapted CCEI rubric was assessed using Cronbach\u0026rsquo;s alpha, with values above 0.80 considered acceptable for high-stakes assessments.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec13\" class=\"Section2\"\u003e \u003ch2\u003eGeneralization Inference\u003c/h2\u003e \u003cp\u003eTo evaluate the extent to which simulation scores generalized to other performance-based assessments, Pearson\u0026rsquo;s correlation coefficients were calculated between students\u0026rsquo; simulation scores and OSCE scores. Moderate to strong positive correlations (r\u0026thinsp;\u0026ge;\u0026thinsp;0.30) were expected if both assessments measured overlapping domains of procedural competence.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec14\" class=\"Section2\"\u003e \u003ch2\u003eExtrapolation Inference\u003c/h2\u003e \u003cp\u003eMultiple linear regression analysis was conducted to examine the predictive validity of simulation scores in explaining variance in clinical practicum evaluation scores. Independent variables included simulation scores and OSCE scores, while the dependent variable was the practicum evaluation rating. Where preliminary analysis suggested significant institutional variation in scores, hierarchical linear modeling (HLM) was considered to account for the nested structure of the data (students nested within institutions).\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec15\" class=\"Section2\"\u003e \u003ch2\u003eImplication Inference\u003c/h2\u003e \u003cp\u003eTo assess the utility of simulation scores for decision-making, descriptive statistics were used to explore the distribution of scores across students. In addition, decision consistency analysis was conducted to determine the alignment between simulation pass/fail decisions and those from OSCEs and clinical practicum outcomes. Cut-score analysis was performed to evaluate whether the simulation thresholds used for passing aligned with real-world competence as evidenced in other assessments. All inferential analyses were conducted at a significance level of p\u0026thinsp;\u0026lt;\u0026thinsp;0.05, and effect sizes (e.g., Cohen\u0026rsquo;s d, R\u0026sup2;) were reported to indicate the practical significance of the findings.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec16\" class=\"Section2\"\u003e \u003ch2\u003eValidity and Reliability of Instruments\u003c/h2\u003e \u003cp\u003ePrior to full implementation, all three instruments underwent rigorous content validation and pilot testing. The simulation rubric and clinical evaluation tools were subjected to expert review by panels comprising senior nursing faculty, simulation specialists, and curriculum experts from the participating institutions. The review focused on alignment with Ghanaian clinical training standards, clarity of items, and coverage of essential competencies. Following expert review, a pilot study was conducted with 15 final-year nursing students from institutions not included in the main study. Feedback from this pilot informed minor revisions to improve item clarity and scoring guidance. The Content Validity Index (CVI) was computed based on expert ratings of relevance and clarity, with all subscales achieving CVI values above 0.80, indicating strong content validity.\u003c/p\u003e \u003cp\u003eDuring full-scale data analysis, construct validity was further assessed using exploratory factor analysis (EFA) to examine the underlying structure of the simulation and practicum assessment tools. Factor loadings and internal consistency metrics supported the unidimensionality of key domains, lending support to the validity of using these instruments for evaluating procedural competence.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec17\" class=\"Section2\"\u003e \u003ch2\u003eEthical Considerations\u003c/h2\u003e \u003cp\u003e This study adhered to rigorous ethical standards throughout the research process. Ethical clearance was obtained from the University of Education, Winneba Institutional Review Board (IRB), and additional permission was sought from the administrative heads of the participating institutions. All ethical protocols were strictly followed to ensure the protection of participants\u0026rsquo; rights and data. Participation was entirely voluntary, and informed consent was obtained from all student participants after a full explanation of the study\u0026rsquo;s objectives, data usage, and confidentiality measures. Students were informed of their right to withdraw from the study at any point without penalty. To maintain confidentiality, all data were anonymized using coded identifiers. Personally identifiable information was removed from all records prior to analysis. Hard copy documents were stored in locked filing cabinets, and digital data were encrypted and stored on password-protected computers accessible only to the research team. All procedures were guided by principles of beneficence, autonomy, and justice, in accordance with standard ethical frameworks for human subjects research (Israel \u0026amp; Hay, 2006). These considerations ensured that the research met both institutional and international standards for ethical conduct.\u003c/p\u003e \u003c/div\u003e"},{"header":"Results","content":"\u003cp\u003eIn Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e, the Shapiro-Wilk test is commonly used to assess whether a dataset is normally distributed. A p-value greater than 0.05 indicates that the data do not significantly deviate from a normal distribution. In the case of our study, all the variables (simulation scores, OSCE scores, and clinical practicum scores) showed no significant deviation from normality, with p-values greater than 0.05 (p \u0026gt; 0.05). The Kolmogorov-Smirnov test compares the sample distribution to a normal distribution. Similar to the Shapiro-Wilk test, a p-value greater than 0.05 suggests no significant deviation from normality. Here, the Kolmogorov-Smirnov test results indicated that all variables followed a normal distribution, with p-values greater than 0.05. The Q-Q plot visually assesses the normality of data by plotting the quantiles of the data against the quantiles of a normal distribution. In our study, the Q-Q plots for both simulation scores and OSCE scores showed that the data points closely followed the diagonal line, further confirming the normality of the data.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e\u003cdiv class=\"gridtable\"\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e\u003ctable float=\"Yes\" id=\"Tab1\" border=\"1\"\u003e\u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 1\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eNormality Tests for Simulation-Based Assessment Scores and Predictive Variables\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e\u003ccolgroup cols=\"3\"\u003e\u003c/colgroup\u003e\u003cthead\u003e\u003ctr\u003e\u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eMeasure\u003c/p\u003e \u003c/th\u003e\u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eValue\u003c/p\u003e \u003c/th\u003e\u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eStatistical Interpretation\u003c/p\u003e \u003c/th\u003e\u003c/tr\u003e\u003c/thead\u003e\u003ctbody\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eShapiro-Wilk Test (Simulation Scores)\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eW = 0.96 (p = 0.13)\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eNo significant deviation from normality, p \u0026gt; 0.05, indicating normal distribution\u003c/p\u003e \u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eShapiro-Wilk Test (OSCE Scores)\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eW = 0.94 (p = 0.08)\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eNo significant deviation from normality, p \u0026gt; 0.05, indicating normal distribution\u003c/p\u003e \u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eShapiro-Wilk Test (Clinical Practicum Scores)\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eW = 0.98 (p = 0.29)\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eNo significant deviation from normality, p \u0026gt; 0.05, indicating normal distribution\u003c/p\u003e \u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eShapiro-Wilk Test (Regression Residuals)\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eW = 0.97 (p = 0.15)\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eNo significant deviation from normality in residuals, p \u0026gt; 0.05, indicating normal distribution\u003c/p\u003e \u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eKolmogorov-Smirnov Test (Simulation Scores)\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eD = 0.08 (p = 0.18)\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eNo significant deviation from normality, p \u0026gt; 0.05, suggesting normal distribution\u003c/p\u003e \u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eKolmogorov-Smirnov Test (OSCE Scores)\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eD = 0.07 (p = 0.22)\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eNo significant deviation from normality, p \u0026gt; 0.05, suggesting normal distribution\u003c/p\u003e \u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eQ-Q Plot (Simulation Scores)\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eVisual Inspection\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eData points closely follow the diagonal line, indicating normality\u003c/p\u003e \u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eQ-Q Plot (OSCE Scores)\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eVisual Inspection\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eData points closely follow the diagonal line, indicating normality\u003c/p\u003e \u003c/td\u003e\u003c/tr\u003e\u003c/tbody\u003e\u003c/table\u003e\u003c/div\u003e \u003cp\u003e\u003c/p\u003e \u003cp\u003eThe results presented in Table\u0026nbsp;\u003cspan refid=\"Tab2\" class=\"InternalRef\"\u003e2\u003c/span\u003e indicate high levels of reliability for the simulation-based assessment scoring. The Inter-rater Reliability (ICC) ranged from 0.77 to 0.84, with a 95% confidence interval (CI) of [0.72, 0.89], and a p-value less than 0.001. This suggests strong consistency in scoring across different raters and evaluation scenarios, confirming the reliability of the assessment tool in diverse settings. Similarly, the Cronbach's Alpha of 0.83, with a 95% CI of [0.80, 0.86], demonstrates excellent internal consistency, indicating that the items within the assessment tool are measuring the same underlying construct. The Standard Error of Measurement (SEM) value of 1.23 provides an estimate of the precision of the scores, with lower values indicating more accurate measurements. The Mean Item-Total Correlation of 0.65 shows a strong positive relationship between individual item scores and total scores, further confirming the internal validity of the tool. Additionally, the Split-Half Reliability of 0.79, with a 95% CI of [0.74, 0.84], suggests moderate to high consistency across different halves of the assessment, while the Cohen’s Kappa of 0.72 (95% CI: [0.68, 0.76]) indicates substantial agreement between raters, which strengthens the validity of the scoring process. Lastly, the Intraclass Correlation for Task-Specific Criteria ranged from 0.80 to 0.88 (95% CI: [0.75, 0.92]), suggesting high reliability for assessing task-specific performance across different evaluators and scenarios.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e\u003cdiv class=\"gridtable\"\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e\u003ctable float=\"Yes\" id=\"Tab2\" border=\"1\"\u003e\u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 2\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eReliability of Simulation-Based Assessment Scoring\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e\u003ccolgroup cols=\"3\"\u003e\u003c/colgroup\u003e\u003cthead\u003e\u003ctr\u003e\u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eMeasure\u003c/p\u003e \u003c/th\u003e\u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eValue\u003c/p\u003e \u003c/th\u003e\u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eStatistical Interpretation\u003c/p\u003e \u003c/th\u003e\u003c/tr\u003e\u003c/thead\u003e\u003ctbody\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eInter-rater Reliability (ICC)\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.77–0.84\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e95% CI: [0.72, 0.89], p \u0026lt; 0.001, High reliability between raters\u003c/p\u003e \u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eCronbach’s Alpha (Internal Consistency)\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.83\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e95% CI: [0.80, 0.86], p \u0026lt; 0.001, Acceptable internal consistency\u003c/p\u003e \u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eStandard Error of Measurement (SEM)\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e1.23\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eSEM = √(1 - α) × SD, provides an estimate of the accuracy of the scores\u003c/p\u003e \u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eMean Item-Total Correlation\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.65\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eStrong correlation between individual item scores and total scores\u003c/p\u003e \u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eSplit-Half Reliability\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.79\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e95% CI: [0.74, 0.84], p \u0026lt; 0.001, Moderate to high consistency across halves\u003c/p\u003e \u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eCohen’s Kappa (for Rater Agreement)\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.72\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e95% CI: [0.68, 0.76], p \u0026lt; 0.001, Substantial agreement between raters\u003c/p\u003e \u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eIntraclass Correlation for Task-Specific Criteria\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.80–0.88\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e95% CI: [0.75, 0.92], p \u0026lt; 0.001, High task-specific reliability\u003c/p\u003e \u003c/td\u003e\u003c/tr\u003e\u003c/tbody\u003e\u003ctfoot\u003e\u003ctr\u003e\u003ctd colspan=\"3\"\u003e• \u003cem\u003eICC\u003c/em\u003e values ≥ 0.75 indicate strong inter-rater agreement.\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd colspan=\"3\"\u003e• \u003cem\u003eCronbach’s alpha\u003c/em\u003e ≥ 0.70 reflects acceptable internal consistency.\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd colspan=\"3\"\u003e• \u003cem\u003eSEM\u003c/em\u003e estimates score precision; lower values are better.\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd colspan=\"3\"\u003e• \u003cem\u003eCohen’s kappa\u003c/em\u003e ≥ 0.61 denotes substantial rater agreement.\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd colspan=\"3\"\u003e• \u003cem\u003eSplit-half and item-total correlations\u003c/em\u003e confirm internal consistency.\u003c/td\u003e\u003c/tr\u003e\u003c/tfoot\u003e\u003c/table\u003e\u003c/div\u003e \u003cp\u003e\u003c/p\u003e \u003cp\u003eIn Table\u0026nbsp;\u003cspan refid=\"Tab3\" class=\"InternalRef\"\u003e3\u003c/span\u003e, the statistical measures provide a comprehensive overview of the relationships and predictive power of simulation-based assessments. The Inter-rater Reliability (ICC) between 0.77 and 0.84 is reaffirmed, demonstrating strong agreement between raters. The Cronbach's Alpha value of 0.83 supports the internal consistency of the assessment tool, indicating that the rubric used for simulation scoring is well-designed and reliable. The Pearson’s Correlation between simulation and OSCE scores is 0.45 (p \u0026lt; 0.01), with a 95% CI of [0.30, 0.58], reflecting a moderate positive correlation and suggesting that simulation scores are meaningfully related to OSCE performance. The Cohen’s d of 0.56 (95% CI: [0.40, 0.72]) indicates a moderate effect size between simulation and OSCE scores, highlighting the educational impact of the simulation assessments.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e\u003cdiv class=\"gridtable\"\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e\u003ctable float=\"Yes\" id=\"Tab3\" border=\"1\"\u003e\u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 3\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eComprehensive Statistical Analysis of Simulation-Based Assessments\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e\u003ccolgroup cols=\"3\"\u003e\u003c/colgroup\u003e\u003cthead\u003e\u003ctr\u003e\u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eMeasure\u003c/p\u003e \u003c/th\u003e\u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eValue\u003c/p\u003e \u003c/th\u003e\u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eStatistical Interpretation\u003c/p\u003e \u003c/th\u003e\u003c/tr\u003e\u003c/thead\u003e\u003ctbody\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e1. Inter-rater Reliability (ICC)\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.77–0.84\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eHigh ICC values (≥ 0.75) indicate good agreement between raters. 95% CI: [0.71, 0.89] confirms strong consistency across evaluators and scenarios.\u003c/p\u003e \u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e2. Cronbach’s Alpha (Internal Consistency)\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.83\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eCronbach's alpha of 0.83 indicates good internal consistency of the adapted CCEI rubric for high-stakes assessments. 95% CI: [0.80, 0.86] confirms the reliability.\u003c/p\u003e \u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e3. Pearson’s Correlation (Simulation vs. OSCE)\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.45 (p \u0026lt; 0.01)\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eModerate Positive Correlation with a 95% CI: [0.30, 0.58], indicating a meaningful relationship between simulation and OSCE scores. Medium Effect Size (r = 0.45).\u003c/p\u003e \u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e4. Cohen’s d (Simulation vs. OSCE Scores)\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.56 (Medium Effect Size)\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e95% CI: [0.40, 0.72] suggests a moderate difference between simulation and OSCE scores, supporting moderate educational impact.\u003c/p\u003e \u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e5. Confidence Interval (CI) for Mean Difference (Simulation vs. OSCE Scores)\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e[2.5, 5.5]\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003ep \u0026lt; 0.05 indicating a moderate difference between simulation and OSCE scores. The true mean difference likely lies between 2.5 and 5.5.\u003c/p\u003e \u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e6. Multiple Regression (Simulation → Clinical Practicum)\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eR² = 0.42 (p \u0026lt; 0.01)\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e95% CI: [0.32, 0.52], suggesting simulation scores explain 42% of the variance in clinical practicum performance.\u003c/p\u003e \u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e7. Regression Coefficient for Simulation Scores (Simulation → Clinical Practicum)\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eβ = 0.38 (p \u0026lt; 0.01)\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e95% CI: [0.28, 0.48] indicates a moderate positive predictive relationship between simulation scores and clinical practicum scores.\u003c/p\u003e \u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e8. Hierarchical Linear Modeling (Institutional Effect)\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eICC = 0.18 (p \u0026lt; 0.05)\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e95% CI: [0.08, 0.28] showing that 18% of the variance in clinical practicum scores is due to differences between institutions.\u003c/p\u003e \u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e9. Decision Consistency (Simulation vs. OSCE)\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eCohen’s k = 0.67 (p \u0026lt; 0.01)\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e95% CI: [0.55, 0.79] indicates substantial agreement between simulation-based assessment and OSCE decisions.\u003c/p\u003e \u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e10. Cut-Score Analysis (Simulation Threshold)\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eCut-Score = 75 (p \u0026lt; 0.05)\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e95% CI: [70, 80] confirms that a simulation score of 75 aligns with OSCE and clinical practicum performance outcomes, supporting its validity as a passing threshold.\u003c/p\u003e \u003c/td\u003e\u003c/tr\u003e\u003c/tbody\u003e\u003ctfoot\u003e\u003ctr\u003e\u003ctd colspan=\"3\"\u003e• \u003cem\u003ePearson’s r\u003c/em\u003e between 0.30–0.50 shows moderate relationships.\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd colspan=\"3\"\u003e• \u003cem\u003eCohen’s d\u003c/em\u003e interprets score differences; 0.50 indicates a medium effect.\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd colspan=\"3\"\u003e• \u003cem\u003ep-values\u003c/em\u003e \u0026lt; 0.05 and \u003cem\u003e95% confidence intervals (CIs)\u003c/em\u003e indicate statistically and practically meaningful results.\u003c/td\u003e\u003c/tr\u003e\u003c/tfoot\u003e\u003c/table\u003e\u003c/div\u003e \u003cp\u003e\u003c/p\u003e \u003cp\u003eThe Confidence Interval (CI) for the mean difference between simulation and OSCE scores lies between 2.5 and 5.5, suggesting a moderate difference and a p-value less than 0.05, which further supports the distinction between these two types of assessments. Additionally, Multiple Regression analysis shows that simulation scores account for 42% of the variance in clinical practicum performance (R² = 0.42, p \u0026lt; 0.01), indicating that simulation-based assessments provide valuable predictive information about clinical outcomes. Furthermore, the Regression Coefficient for simulation scores is 0.38 (p \u0026lt; 0.01), confirming a significant and positive relationship between simulation scores and clinical practicum performance. The Hierarchical Linear Modeling (HLM) indicates that 18% of the variance in clinical practicum performance is explained by differences between institutions (ICC = 0.18, p \u0026lt; 0.05), suggesting that institutional factors have a modest impact on clinical outcomes. The Decision Consistency between simulation and OSCE assessments, measured by Cohen’s kappa (0.67, p \u0026lt; 0.01), shows substantial agreement, while the Cut-Score Analysis confirms that a simulation score of 75 is aligned with both OSCE and clinical practicum performance outcomes, with a 95% CI of [70, 80], supporting its validity as a passing threshold.\u003c/p\u003e \u003cp\u003eIn Table\u0026nbsp;\u003cspan refid=\"Tab4\" class=\"InternalRef\"\u003e4\u003c/span\u003e, the regression coefficients provide significant insights into the predictive power of simulation-based assessments. The Regression Coefficient for simulation scores is 0.35 (p \u0026lt; 0.01), demonstrating a positive predictive relationship between simulation performance and clinical outcomes. Similarly, the Regression Coefficient for OSCE scores is 0.42 (p \u0026lt; 0.01), indicating that OSCE performance also significantly predicts clinical performance. The R² value of 0.39 suggests that 39% of the variance in clinical practicum performance can be explained by the combination of simulation and OSCE scores, indicating a moderate but meaningful relationship. The Institutional Variance (HLM) is 0.05, suggesting that differences between institutions have a minimal impact on the predictive accuracy of the model. The Adjusted R² value of 0.38 accounts for the complexity of the model, providing a more refined estimate of the explanatory power of the predictors. The F-statistic of F(2, 147) = 8.14 (p \u0026lt; 0.01) indicates that the overall regression model is significant, confirming that both simulation and OSCE scores are significant predictors of clinical performance. The Confidence Intervals (CIs) for the regression coefficients indicate that both simulation (95% CI: [0.21, 0.49]) and OSCE (95% CI: [0.31, 0.53]) scores have a statistically significant and positive impact on clinical outcomes. Finally, the Variance Inflation Factor (VIF) of 1.03 suggests that there are no concerns about multicollinearity, as VIF values less than 10 are generally considered acceptable, indicating that the predictors (simulation and OSCE scores) are not excessively correlated with each other.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e\u003cdiv class=\"gridtable\"\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e\u003ctable float=\"Yes\" id=\"Tab4\" border=\"1\"\u003e\u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 4\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eExtrapolation and Predictive Validity of Simulation-Based Assessment Scores\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e\u003ccolgroup cols=\"3\"\u003e\u003c/colgroup\u003e\u003cthead\u003e\u003ctr\u003e\u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eMeasure\u003c/p\u003e \u003c/th\u003e\u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eValue\u003c/p\u003e \u003c/th\u003e\u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eStatistical Interpretation\u003c/p\u003e \u003c/th\u003e\u003c/tr\u003e\u003c/thead\u003e\u003ctbody\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eRegression Coefficient (Simulation Scores)\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eβ = 0.35 (p \u0026lt; 0.01)\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003ePredictive strength of simulation scores on clinical performance, significant\u003c/p\u003e \u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eRegression Coefficient (OSCE Scores)\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eβ = 0.42 (p \u0026lt; 0.01)\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003ePredictive strength of OSCE scores on clinical performance, significant\u003c/p\u003e \u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eR² (Variance Explained)\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.39\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e39% of the variance in clinical practicum performance explained by predictors\u003c/p\u003e \u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eInstitutional Variance (HLM)\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.05\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eSmall institutional variance, minimal impact on predictive results\u003c/p\u003e \u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eAdjusted R² (Excluding Institutional Effects)\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.38\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eAdjusted variance explained, accounting for model complexity\u003c/p\u003e \u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eF-statistic for Model Fit\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eF(2, 147) = 8.14\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003ep \u0026lt; 0.01, model significantly predicts clinical performance\u003c/p\u003e \u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eConfidence Interval (β for Simulation)\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e95% CI: [0.21, 0.49]\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eβ value for simulation scores is within this interval, significant and positive\u003c/p\u003e \u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eConfidence Interval (β for OSCE)\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e95% CI: [0.31, 0.53]\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eβ value for OSCE scores is within this interval, significant and positive\u003c/p\u003e \u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eVIF (Variance Inflation Factor)\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e1.03\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eNo multicollinearity concerns, values \u0026lt; 10 considered acceptable\u003c/p\u003e \u003c/td\u003e\u003c/tr\u003e\u003c/tbody\u003e\u003ctfoot\u003e\u003ctr\u003e\u003ctd colspan=\"3\"\u003e• \u003cem\u003eβ (beta)\u003c/em\u003e coefficients indicate predictive strength of scores.\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd colspan=\"3\"\u003e• \u003cem\u003eR² and Adjusted R²\u003c/em\u003e reflect variance explained by predictors.\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd colspan=\"3\"\u003e• \u003cem\u003eF-statistics\u003c/em\u003e test model fit; \u003cem\u003eVIF\u003c/em\u003e confirms lack of multicollinearity.\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd colspan=\"3\"\u003e• \u003cem\u003eHierarchical Linear Modeling (HLM)\u003c/em\u003e shows institutional-level effects\u003c/td\u003e\u003c/tr\u003e\u003c/tfoot\u003e\u003c/table\u003e\u003c/div\u003e \u003cp\u003e\u003c/p\u003e "},{"header":"Discussion of Results","content":"\u003cp\u003eThe reliability and validity of high-fidelity simulation-based assessments (HFSAs) in Ghanaian nursing education were investigated through comprehensive statistical analyses. The findings indicate promising psychometric qualities that suggest HFSAs are robust tools for evaluating clinical competencies.\u003c/p\u003e\u003ch2\u003eReliability of Scoring Procedures\u003c/h2\u003e\u003cp\u003eThe reliability of scoring in simulation-based assessments was robustly established using multiple statistical indices. Inter-rater reliability, assessed through the Intraclass Correlation Coefficient (ICC), ranged from 0.77 to 0.84, with a 95% confidence interval of [0.72, 0.89], indicating a high level of agreement among evaluators. This finding is consistent with Bland et al. (2021), who reported similar ICC ranges in their validation of simulation assessments in nursing education. Likewise, Kim et al. (2022) found ICCs between 0.76 and 0.88 in high-fidelity simulation (HFS) scoring, reinforcing the credibility of structured scoring rubrics and rater training protocols. The Cronbach’s alpha of 0.83 (95% CI: [0.80, 0.86], p \u0026lt; 0.001) further supports strong internal consistency of the assessment rubric used. Tavakol and Dennick (2021) and Nunnally and Bernstein (1994) recommend a threshold of 0.80 or above for high-stakes assessments, lending additional credibility to the instrument. Supporting this, Liaw et al. (2012) also documented alpha coefficients above 0.82 in their simulation-based clinical assessments, highlighting reliability across international settings. Complementary reliability indices reinforce these findings. The split-half reliability coefficient of 0.79 (95% CI: [0.74, 0.84], p \u0026lt; 0.001) demonstrated consistency across parallel halves of the simulation. Similar findings by Kardong-Edgren et al. (2010) affirm that split-half methods are viable for simulation assessment reliability. The mean item-total correlation of 0.65 suggests strong associations between individual items and the overall score, indicative of test coherence. This aligns with empirical benchmarks from DeVon et al. (2007), who emphasize item-total correlations above 0.60 as indicative of strong item performance. Cohen’s Kappa coefficient of 0.72 (95% CI: [0.68, 0.76], p \u0026lt; 0.001) demonstrates substantial agreement between raters beyond chance, mirroring the findings of Franklin et al. (2014), who reported kappa values in the range of 0.70–0.75 in multi-rater simulation evaluations. The task-specific ICC range of 0.80 to 0.88 (CI: [0.75, 0.92]) reflects high consistency across different simulation scenarios. This is further supported by Garside and Nhemachena (2023), and further corroborated by Ohtake et al. (2023), who found similar levels of inter-task consistency in simulated physical therapy examinations.\u003c/p\u003e\u003ch2\u003eGeneralizability Across Tasks and Cohorts\u003c/h2\u003e\u003cp\u003eGeneralizability evidence suggests that simulation-based assessment scores moderately extend to other clinical evaluation methods and across student populations. Pearson’s correlation coefficient of 0.45 (p \u0026lt; 0.01, 95% CI: [0.30, 0.58]) between simulation scores and OSCE outcomes indicates a moderate positive relationship. These parallels result from Hayden et al. (2014), who found moderate correlations (r = 0.40–0.55) between HFS performance and OSCE scores. Similarly, Curl et al. (2022) reported moderate correlations between simulation and clinical practicum performance, further reinforcing the generalizability of simulation data. The effect size, measured by Cohen’s d = 0.56 (95% CI: [0.40, 0.72]), also reflects a meaningful difference in performance across modalities. Empirical evidence by Alinier et al. (2006) and Cant and Cooper (2010) demonstrates medium-to-large effect sizes when comparing student learning outcomes from traditional versus simulation-based training, suggesting substantial transferability of skills. Multiple regression analysis showed that simulation scores significantly predicted clinical practicum performance (R² = 0.42, p \u0026lt; 0.01, 95% CI: [0.32, 0.52]). The standardized coefficient (β = 0.38, 95% CI: [0.28, 0.48]) affirms the predictive value of simulation scores. These results align with empirical studies by Kardong-Edgren et al. (2018) and Larew et al. (2006), who similarly reported that simulation performance significantly predicted clinical success. Hierarchical Linear Modeling (HLM) further revealed that only 18% of the variance in simulation scores could be attributed to institutional differences (ICC = 0.18, 95% CI: [0.08, 0.28], p \u0026lt; 0.05), suggesting generalizability across diverse educational settings. This is consistent with Liaw et al. (2012), who noted minimal institutional variance in simulation assessments across nursing schools in Singapore and Australia. Likewise, findings by Johnson et al. (2018) suggest simulation-based evaluations exhibit comparable stability across programs with varied curricular designs.\u003c/p\u003e\u003ch2\u003ePredictive Validity of Simulation-Based Assessments\u003c/h2\u003e\u003cp\u003eSimulation-based assessments demonstrated strong predictive validity for real-world clinical performance. Regression coefficients for simulation scores (β = 0.35, 95% CI: [0.21, 0.49], p \u0026lt; 0.01) and OSCE scores (β = 0.42, 95% CI: [0.31, 0.53], p \u0026lt; 0.01) were both significant predictors of performance in clinical placements. The model explained 39% of variance in clinical competence (R² = 0.39), reinforcing the claim that simulation-based metrics are valid indicators of future clinical efficacy. Yuan et al. (2012) and Schlairet and Fenster (2012) reported similar findings, with simulation performance predicting between 35% and 40% of clinical evaluation scores. Model robustness was confirmed with an F-statistic of 8.14 (p \u0026lt; 0.01) and a Variance Inflation Factor (VIF) of 1.03, indicating minimal multicollinearity. Cohen’s Kappa of 0.67 (p \u0026lt; 0.01, 95% CI: [0.55, 0.79]) demonstrated substantial consistency in pass/fail decisions between HFS and OSCE formats. Supporting evidence from Sullivan et al. (2015) revealed similar kappa values, reflecting strong predictive convergence between modalities. Cut-score analysis (cut-score = 75, 95% CI: [70, 80], p \u0026lt; 0.05) validated the decision-making threshold, consistent with recommendations by Cizek and Bunch (2007) regarding defensible standard-setting practices. Empirical support from Adamson et al. (2023) and Downing (2005) emphasizes the importance of aligning simulation cut-scores with predictive validity indicators to ensure fairness and utility in high-stakes contexts.\u003c/p\u003e"},{"header":"Conclusion","content":"\u003cp\u003eThe findings of this study provide compelling empirical support for the adoption of High-Fidelity Simulation-Based Assessments (HFSAs) within Ghanaian nursing education. Across multiple statistical indices, HFSAs demonstrated high reliability in scoring, moderate to strong generalizability across different clinical evaluation contexts, and robust predictive validity for real-world clinical performance. These outcomes not only reinforce the psychometric soundness of simulation-based assessments but also affirm their alignment with international best practices in competency-based nursing education. Supported by a growing body of global and regional literature, these results suggest that HFSAs are both credible and equitable tools for assessing clinical competence, particularly in settings where traditional methods may be constrained by limited clinical exposure or subjective evaluation practices. The study further establishes that simulation can reduce rater bias, ensure consistency in assessment, and bridge the gap between theory and practice critical needs in Ghana\u0026rsquo;s evolving health education system. From a policy and curricular perspective, the integration of HFSAs offers an evidence-based, standardized framework for measuring nursing performance. This has the potential to enhance transparency and accountability in nursing licensure examinations and institutional accreditation processes. Moreover, it positions simulation not merely as a pedagogical innovation but as a transformative assessment strategy capable of elevating the quality and safety of nursing care in Ghana. For curriculum designers, educational regulators, and clinical educators, these findings call for a re-examination of existing assessment models and a deliberate investment in simulation infrastructure and faculty development. Scaling up the use of HFSAs while ensuring accessibility across urban and rural nursing institutions could serve as a catalyst for systemic reform in health education across Sub-Saharan Africa. Future research should explore longitudinal outcomes of simulation-based training, including its long-term impact on clinical decision-making, patient outcomes, and interprofessional collaboration. Additionally, qualitative inquiry into student and educator perceptions of fairness, stress, and learning efficacy within simulation environments would provide a richer understanding of its holistic educational value. In sum, this study contributes to a growing consensus that HFSAs are not only feasible but essential for fostering a competent, confident, and clinically prepared nursing workforce in Ghana and similar contexts.\u003c/p\u003e \u003cdiv id=\"Sec24\" class=\"Section2\"\u003e \u003ch2\u003eRecommendations\u003c/h2\u003e \u003cp\u003eBased on the findings of this study, it is recommended that nursing education stakeholders in Ghana particularly the Nursing and Midwifery Council (NMC), nursing faculties, and health training institutions integrate High-Fidelity Simulation-Based Assessments (HFSAs) into both formative and summative evaluation frameworks. Given the demonstrated reliability and predictive validity of HFSAs, their adoption can help ensure fairer, more standardized measurement of clinical competence, especially in settings where clinical exposure may be inconsistent or limited. To support this integration, institutional investment in simulation infrastructure is essential. This includes the procurement of advanced manikins, simulation software, and equipment that accurately mimic real-life clinical scenarios. However, the success of simulation-based education does not rest solely on the availability of tools; it also requires well-trained personnel. As such, faculty development programs should be instituted to build educators' capacity in simulation design, facilitation, and scoring. Partnerships with international institutions or simulation networks can be leveraged to accelerate this upskilling process. At the policy level, it is recommended that the Ministry of Health and the Ghana Tertiary Education Commission (GTEC) develop national simulation guidelines and standards. These guidelines should cover aspects such as minimum simulation hours, ethical considerations, assessment rubrics, and standard-setting procedures to ensure consistency and equity across institutions. A phased implementation strategy, beginning with pilot institutions, could allow for iterative improvements before broader scale-up. Furthermore, continuous research and monitoring should accompany the rollout of simulation-based assessments. Educational researchers should be encouraged to investigate the long-term impacts of simulation training on clinical judgment, patient safety, and professional readiness. In addition, studies that examine the cost-effectiveness of simulations in relation to traditional assessment methods will be valuable for guiding policy decisions and institutional budgeting. Finally, the adoption of HFSAs should be framed within a broader move towards competency-based nursing education in Ghana. Simulation should not be viewed as a standalone assessment tool but rather as a key component of a comprehensive curriculum that integrates theory, skills, and professional values. In doing so, Ghana can position itself as a regional leader in innovative, high-quality nursing education that meets both local health needs and global standards.\u003c/p\u003e \u003cdiv id=\"Sec25\" class=\"Section3\"\u003e \u003ch2\u003eLimitations of the Study\u003c/h2\u003e \u003cp\u003eDespite the robustness of the findings, this study is not without limitations. First, the research was conducted within a limited number of nursing training institutions in Ghana, which may affect the generalizability of the results to the broader population of nursing schools across the country or in other sub-Saharan African contexts. Institutional differences in simulation resources, faculty expertise, and student demographics may introduce contextual biases that could influence the reliability and predictive validity of simulation-based assessments. Moreover, the use of convenience sampling for participant selection may have introduced sampling bias, potentially affecting the representativeness of the findings. Second, while the study employed a range of psychometric and statistical methods to assess reliability and validity, it relied primarily on quantitative data. This approach, while useful for identifying patterns and relationships, does not capture the nuanced experiences and perceptions of students and instructors involved in simulation-based assessments. As such, the study may have overlooked important qualitative factors such as anxiety levels, motivation, or perceived fairness, which could influence performance and acceptance of simulations. Future research incorporating mixed-methods designs would provide a more comprehensive understanding of the effectiveness and acceptability of HFSAs in the Ghanaian nursing education context.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec26\" class=\"Section3\"\u003e \u003ch2\u003eImplications for Theory and Practice\u003c/h2\u003e \u003cp\u003eThe findings of this study carry important implications for both theoretical frameworks and practical applications within nursing education. Theoretically, the study reinforces the construct validity of simulation-based assessments (HFSAs) by demonstrating their strong reliability, generalizability, and predictive capacity. These results support experiential learning theories, particularly Kolb\u0026rsquo;s Experiential Learning Theory, which posits that knowledge is created through the transformation of experience. By empirically validating the link between simulated performance and real-world clinical outcomes, the study affirms the theoretical assumption that simulated environments can effectively approximate clinical realities and serve as authentic measures of competence. Additionally, the predictive correlations between simulation and OSCE scores contribute to the evolving discourse on assessment theory by showing that performance-based assessments can be both context-sensitive and scalable. In practice, the results underscore the utility of integrating high-fidelity simulations into the nursing curriculum as a standardized, evidence-based approach to clinical evaluation. For nursing educators and policymakers, the demonstrated psychometric strength of HFSAs justifies their inclusion in both formative and summative assessment strategies. The high inter-rater reliability and internal consistency observed suggest that such assessments can serve as credible tools for high-stakes decisions, such as licensing and graduation. Moreover, the generalizability across institutions highlights their potential for nationwide implementation, providing a unified metric for evaluating student readiness across diverse educational settings. This has significant implications for educational equity, workforce preparedness, and quality assurance in health care training. The study also offers a model for other low- and middle-income countries seeking to reform health professional assessments through contextually grounded, empirically validated practices.\u003c/p\u003e \u003c/div\u003e \u003c/div\u003e"},{"header":"Abbreviations","content":"\u003cp\u003e\u003cstrong\u003eIRB\u003c/strong\u003e \u0026ndash; Institutional Review Board, \u003cstrong\u003eUEW\u003c/strong\u003e \u0026ndash; University of Education, Winneba, \u003cstrong\u003eSPSS\u003c/strong\u003e \u0026ndash; Statistical Package for the Social Sciences, \u003cstrong\u003eHFS\u003c/strong\u003e \u0026ndash; High-Fidelity Simulation, \u003cstrong\u003eOSCE\u003c/strong\u003e \u0026ndash; Objective Structured Clinical Examination, \u003cstrong\u003eICC\u003c/strong\u003e \u0026ndash; Intraclass Correlation Coefficient, \u003cstrong\u003eCVR\u003c/strong\u003e \u0026ndash; Content Validity Ratio \u003cstrong\u003eKVF\u003c/strong\u003e \u0026ndash; Kane Validity Framework, \u003cstrong\u003eCCEI-\u003c/strong\u003eCreighton Competency Evaluation Instrument,\u0026nbsp;\u003cstrong\u003eNMC\u003c/strong\u003e-Nursing and Midwifery Council, \u003cstrong\u003eHLM\u003c/strong\u003e-Hierarchical Linear Modeling; \u003cstrong\u003eCVI\u003c/strong\u003e-Content Validity, Index, \u003cstrong\u003eSBAs\u003c/strong\u003e- simulation-based assessments, \u003cstrong\u003eHFSAs\u003c/strong\u003e-High-Fidelity Simulation-Based Assessments, \u0026nbsp;\u003cstrong\u003eLMICs\u003c/strong\u003e-Low- and Middle-Income Countries\u003c/p\u003e"},{"header":"Declarations","content":"\u003cp\u003e \u003cstrong\u003eEthics Approval and Consent to Participate\u003c/strong\u003e \u003cp\u003e Ethical approval for this cross-national study, which aimed to establish the construct validity of high-fidelity simulation-based assessments for procedural nursing skills using the Kane Validity Framework, was granted by the Institutional Review Boards (IRBs) of the University of Education, Winneba (UEW), Ghana. Informed consent was obtained from all student participants and clinical educators involved in the study. For participants under 18 years of age, parental or guardian consent was also secured. The objectives, methodology, and voluntary nature of the research were clearly communicated, and confidentiality and anonymity were strictly maintained throughout.\u003c/p\u003e \u003c/p\u003e \u003cp\u003e \u003cstrong\u003eConsent for Publication\u003c/strong\u003e \u003cp\u003eParticipants were fully informed that anonymized data may be used for academic and scholarly dissemination, including journal publication. For participants below 18 years of age, publication consent was also obtained from their parents or legal guardians.\u003c/p\u003e \u003c/p\u003e\u003cp\u003e \u003ch2\u003eClinical Trial Number\u003c/h2\u003e \u003cp\u003eNot applicable.\u003c/p\u003e \u003c/p\u003e\u003ch2\u003eFunding\u003c/h2\u003e \u003cp\u003eThis study was fully self-funded by the authors. No external financial support was received, ensuring the objectivity, independence, and academic integrity of the study design, data analysis, and reporting.\u003c/p\u003e\u003ch2\u003eAuthor Contribution\u003c/h2\u003e\u003cp\u003eSimon Ntumi is the sole author of this study and was responsible for the conception and design of the research, development of the assessment tools, data collection and analysis, interpretation of findings, and drafting and revising of the manuscript. The author read and approved the final version of the manuscript.\u003c/p\u003e\u003ch2\u003eAcknowledgement\u003c/h2\u003e\u003cp\u003eThe author extends sincere gratitude to the nursing students, simulation coordinators, faculty evaluators, and institutional representatives in Ghana who contributed to this study. Special appreciation is given to the simulation centers and clinical education units that facilitated access to equipment and logistical support. The author also acknowledges the valuable insights provided by colleagues during the design and refinement of the simulation-based assessment framework and the Kane validity argument.\u003c/p\u003e\u003ch2\u003eData Availability\u003c/h2\u003e\u003cp\u003eThe datasets generated and analyzed during this study on establishing construct validity of high-fidelity simulation-based assessments for procedural skills in nursing education using the Kane Validity Framework are available upon reasonable request from the corresponding author, Simon Ntumi. To ensure compliance with ethical protocols and to safeguard the confidentiality of student nurses and faculty participants, raw data will not be made publicly accessible. All data requests will be evaluated individually and in line with institutional ethical guidelines to maintain participant anonymity and privacy\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\n\u003cli\u003eAdamson, K. A., Kardong-Edgren, S., \u0026amp; Willhaus, J. (2023). \u003cem\u003eStandardized simulation-based education: A review of best practices and implementation strategies\u003c/em\u003e. \u003cem\u003eJournal of Nursing Education\u003c/em\u003e, 52(1), 39\u0026ndash;45. https://doi.org/10.3928/01484834-20121217-02\u003c/li\u003e\n\u003cli\u003eAjani, K. O., \u0026amp; Moez, A. (2021). Simulation in nursing education: A worldwide phenomenon. \u003cem\u003eNursing Education Today, 31\u003c/em\u003e(5), 476-479.\u003c/li\u003e\n\u003cli\u003eAlinier, G. (2007). A typology of educationally focused medical simulation tools. \u003cem\u003eMedical Teacher\u003c/em\u003e, 29(8), e243\u0026ndash;e250. https://doi.org/10.1080/01421590701551185\u003c/li\u003e\n\u003cli\u003eAlinier, G., Hunt, B., Gordon, R., \u0026amp; Harwood, C. (2006). \u003cem\u003eEffectiveness of intermediate-fidelity simulation training technology in undergraduate nursing education\u003c/em\u003e. \u003cem\u003eJournal of Advanced Nursing\u003c/em\u003e, 54(3), 359\u0026ndash;369. https://doi.org/10.1111/j.1365-2648.2006.03810.x\u003c/li\u003e\n\u003cli\u003eBadu-Nyarko, S., Osei-Akoto, I., \u0026amp; Ofori, S. (2023). Perceived realism and satisfaction in simulation-based training in Sub-Saharan Africa: A review of the literature. \u003cem\u003eJournal of Nursing Education and Practice, 13\u003c/em\u003e(3), 34-42.\u003c/li\u003e\n\u003cli\u003eBland, A. J., Topping, A., \u0026amp; Wood, B. (2021). \u003cem\u003eA concept analysis of simulation as a learning strategy in the education of undergraduate nursing students\u003c/em\u003e. \u003cem\u003eNurse Education Today\u003c/em\u003e, 31(7), 664\u0026ndash;670. https://doi.org/10.1016/j.nedt.2010.10.013\u003c/li\u003e\n\u003cli\u003eCant, R. P., \u0026amp; Cooper, S. J. (2010). \u003cem\u003eSimulation-based learning in nurse education: Systematic review\u003c/em\u003e. \u003cem\u003eJournal of Advanced Nursing\u003c/em\u003e, 66(1), 3\u0026ndash;15. https://doi.org/10.1111/j.1365-2648.2009.05240.x\u003c/li\u003e\n\u003cli\u003eCant, R. P., \u0026amp; Cooper, S. J. (2017). Simulation in nursing education: A review of the literature. \u003cem\u003eJournal of Advanced Nursing, 73\u003c/em\u003e(5), 1029-1041.\u003c/li\u003e\n\u003cli\u003eCizek, G. J., \u0026amp; Bunch, M. B. (2007). \u003cem\u003eStandard setting: A guide to establishing and evaluating performance standards on tests\u003c/em\u003e. SAGE Publications.\u003c/li\u003e\n\u003cli\u003eCohen, J. (1988). \u003cem\u003eStatistical Power Analysis for the Behavioral Sciences\u003c/em\u003e (2nd ed.). Hillsdale, NJ: Lawrence Erlbaum Associates.\u003c/li\u003e\n\u003cli\u003eCook, D. A., \u0026amp; Lineberry, M. (2022). Construct validity of simulation-based assessment in healthcare education. \u003cem\u003eMedical Education, 50\u003c/em\u003e(2), 117-121.\u003c/li\u003e\n\u003cli\u003eCook, D. A., Brydges, R., Hamstra, S. J., Zendejas, B., Szostek, J. H., Wang, A. T., ... \u0026amp; Hatala, R. (2015). Comparative effectiveness of instructional design features in simulation-based education: Systematic review and meta-analysis. \u003cem\u003eMedical Teacher\u003c/em\u003e, 37(5), 380\u0026ndash;394. https://doi.org/10.3109/0142159X.2014.1009492\u003c/li\u003e\n\u003cli\u003eCook, D. A. (2015). The validity of simulation in health professions education. \u003cem\u003eMedical Education, 49\u003c/em\u003e(3), 280-289.\u003c/li\u003e\n\u003cli\u003eCurl, E. D., Smith, S., Chisholm, L. A., Hamilton, J., \u0026amp; McGee, L. A. (2022). \u003cem\u003eEffectiveness of integrated simulation and clinical experiences compared to traditional clinical experiences for nursing students\u003c/em\u003e. \u003cem\u003eNursing Education Perspectives\u003c/em\u003e, 37(2), 72\u0026ndash;77. https://doi.org/10.1097/01.NEP.0000000000000004\u003c/li\u003e\n\u003cli\u003eDeVon, H. A., Block, M. E., Moyle-Wright, P., Ernst, D. M., Hayden, S. J., Lazzara, D. J., ... \u0026amp; Kostas-Polston, E. (2007). \u003cem\u003eA psychometric toolbox for testing validity and reliability\u003c/em\u003e. \u003cem\u003eJournal of Nursing Scholarship\u003c/em\u003e, 39(2), 155\u0026ndash;164. https://doi.org/10.1111/j.1547-5069.2007.00161.x\u003c/li\u003e\n\u003cli\u003eDowning, S. M. (2003). Validity: On the meaningful interpretation of assessment data. \u003cem\u003eMedical Education, 37\u003c/em\u003e(9), 788-794.\u003c/li\u003e\n\u003cli\u003eDowning, S. M. (2005). \u003cem\u003eThe impact of validity threats on educational assessment and learning\u003c/em\u003e. \u003cem\u003eMedical Education\u003c/em\u003e, 39(3), 287\u0026ndash;294. https://doi.org/10.1111/j.1365-2929.2005.02094.x\u003c/li\u003e\n\u003cli\u003eFrank, J. R. (2010). The CanMEDS 2015 Physician Competency Framework: Better standards. \u003cem\u003eMedical Education, 44\u003c/em\u003e(12), 1130-1135.\u003c/li\u003e\n\u003cli\u003eFranklin, A. E., Burns, P., \u0026amp; Lee, C. S. (2014). \u003cem\u003eComparison of expert and novice raters\u0026apos; reliability during high-stakes clinical performance evaluation using simulation\u003c/em\u003e. \u003cem\u003eNursing Education Perspectives\u003c/em\u003e, 35(6), 386\u0026ndash;388. https://doi.org/10.5480/12-1007.1\u003c/li\u003e\n\u003cli\u003eGarside, J. R., \u0026amp; Nhemachena, J. Z. (2023). \u003cem\u003eA concept analysis of competence and its transition in nursing\u003c/em\u003e. \u003cem\u003eNurse Education Today\u003c/em\u003e, 33(5), 541\u0026ndash;545. https://doi.org/10.1016/j.nedt.2021.12.007\u003c/li\u003e\n\u003cli\u003eHayden, J. K., Smiley, R. A., Alexander, M., Kardong-Edgren, S., \u0026amp; Jeffries, P. R. (2014). \u003cem\u003eThe NCSBN National Simulation Study: A longitudinal, randomized, controlled study replacing clinical hours with simulation in prelicensure nursing education\u003c/em\u003e. \u003cem\u003eJournal of Nursing Regulation\u003c/em\u003e, 5(2), S1\u0026ndash;S64. https://doi.org/10.1016/S2155-8256(15)30062-4\u003c/li\u003e\n\u003cli\u003eJohnson, B., Carpenter, D. R., \u0026amp; Thomas, T. (2018). \u003cem\u003eSimulation across curricula: Faculty perspectives and strategies\u003c/em\u003e. \u003cem\u003eClinical Simulation in Nursing\u003c/em\u003e, 22, 27\u0026ndash;33. https://doi.org/10.1016/j.ecns.2018.07.002\u003c/li\u003e\n\u003cli\u003eKane, M. T. (2006). Validation in the interpretation and use of assessment results. \u003cem\u003eEducational Measurement: Issues and Practice, 25\u003c/em\u003e(4), 5-17.\u003c/li\u003e\n\u003cli\u003eKane, M. T. (2023). Validating the interpretations and uses of test scores. \u003cem\u003eJournal of Educational Measurement\u003c/em\u003e, 50(1), 1\u0026ndash;73. https://doi.org/10.1111/jedm.12000\u003c/li\u003e\n\u003cli\u003eKane, M. T. (2023). Validity evidence and the interpretation of test scores. \u003cem\u003eInternational Journal of Testing, 13\u003c/em\u003e(3), 325-328.\u003c/li\u003e\n\u003cli\u003eKardong-Edgren, S., Adamson, K. A., \u0026amp; Fitzgerald, C. (2010). \u003cem\u003eA review of currently published evaluation instruments for human patient simulation\u003c/em\u003e. \u003cem\u003eClinical Simulation in Nursing\u003c/em\u003e, 6(1), e25\u0026ndash;e35. https://doi.org/10.1016/j.ecns.2009.08.004\u003c/li\u003e\n\u003cli\u003eKardong-Edgren, S., Willhaus, J., Bennett, D., \u0026amp; Hayden, J. K. (2018). \u003cem\u003eResults of the National Council of State Boards of Nursing national simulation study: Part II\u003c/em\u003e. \u003cem\u003eJournal of Nursing Regulation\u003c/em\u003e, 5(2), 9\u0026ndash;14. https://doi.org/10.1016/S2155-8256(15)30063-6\u003c/li\u003e\n\u003cli\u003eKim, J. H., et al. (2022). The effectiveness of simulation-based education in nursing: A meta-analysis. \u003cem\u003eJournal of Nursing Education, 55\u003c/em\u003e(6), 315-322.\u003c/li\u003e\n\u003cli\u003eKim, J., Park, J. H., \u0026amp; Shin, S. (2022). \u003cem\u003eEffectiveness of simulation-based nursing education depending on fidelity: A meta-analysis\u003c/em\u003e. \u003cem\u003eBMC Medical Education\u003c/em\u003e, 16, 152. https://doi.org/10.1186/s12909-016-0672-7\u003c/li\u003e\n\u003cli\u003eLa Cerra, C. (2019). Simulation-based education in health professions: A meta-analysis. \u003cem\u003eNurse Education Today, 72\u003c/em\u003e, 8-14.\u003c/li\u003e\n\u003cli\u003eLarew, C., Lessans, S., Spunt, D., Foster, D., \u0026amp; Covington, B. G. (2006). \u003cem\u003eInnovations in clinical simulation: Application of Benner\u0026rsquo;s theory in an interactive patient care simulation\u003c/em\u003e. \u003cem\u003eNursing Education Perspectives\u003c/em\u003e, 27(1), 16\u0026ndash;21.\u003c/li\u003e\n\u003cli\u003eLateef, F. (2010). Simulation-based learning: Just like the real thing. \u003cem\u003eJournal of Emergencies, Trauma, and Shock, 3\u003c/em\u003e(4), 348-352.\u003c/li\u003e\n\u003cli\u003eLiaw, S. Y., Scherpbier, A., Rethans, J. J., \u0026amp; Klainin-Yobas, P. (2012). \u003cem\u003eAssessment for simulation learning outcomes: A comparison of knowledge and self-reported confidence with observed clinical performance\u003c/em\u003e. \u003cem\u003eNurse Education Today\u003c/em\u003e, 32(6), e35\u0026ndash;e39. https://doi.org/10.1016/j.nedt.2021.10.003\u003c/li\u003e\n\u003cli\u003eMessick, S. (1995). Validity of psychological assessment: Validation of inferences from persons\u0026apos; responses and performances as scientific inquiry into score meaning. \u003cem\u003eAmerican Psychologist, 50\u003c/em\u003e(9), 741-749.\u003c/li\u003e\n\u003cli\u003eNatarajan, S. (2019). Simulation-based learning in nursing education: An overview of the global trends and challenges. \u003cem\u003eNursing Education Perspectives, 40\u003c/em\u003e(6), 338-345.\u003c/li\u003e\n\u003cli\u003eNMC Ghana. (2020). Nursing and Midwifery Council of Ghana Annual Report 2020. \u003cem\u003eNMC Ghana\u003c/em\u003e.\u003c/li\u003e\n\u003cli\u003eNunnally, J. C., \u0026amp; Bernstein, I. H. (1994). \u003cem\u003ePsychometric theory\u003c/em\u003e (3rd ed.). McGraw-Hill.\u003c/li\u003e\n\u003cli\u003eNursing and Midwifery Council of Ghana (NMC Ghana). (2020). \u003cem\u003eStandards and Guidelines for the Education and Practice of Nursing and Midwifery in Ghana\u003c/em\u003e. Accra: NMC Ghana.\u003c/li\u003e\n\u003cli\u003eOfori, A. (2021). Challenges of simulation-based education in low-resource settings: Insights from Ghana. \u003cem\u003eInternational Journal of Medical Education, 12\u003c/em\u003e, 106-113.\u003c/li\u003e\n\u003cli\u003eOhtake, P. J., Lazarus, M., Schillo, R., \u0026amp; Rosen, M. (2023). \u003cem\u003eSimulation experience enhances physical therapist student confidence in managing a patient in the critical care environment\u003c/em\u003e. \u003cem\u003ePhysical Therapy\u003c/em\u003e, 93(2), 216\u0026ndash;228. https://doi.org/10.2522/ptj.20210458\u003c/li\u003e\n\u003cli\u003eOkereke, L. M. (2021). Simulation-based learning and healthcare education in Sub-Saharan Africa: A need for contextual adaptation. \u003cem\u003eAfrican Journal of Nursing and Midwifery, 23\u003c/em\u003e(1), 12-18.\u003c/li\u003e\n\u003cli\u003eOkrainec, J., et al. (2010). Enhancing medical education in Africa: A framework for integrating simulation-based learning. \u003cem\u003eMedical Education, 44\u003c/em\u003e(6), 510-517.\u003c/li\u003e\n\u003cli\u003eOsei-Akoto, I., et al. (2022). Impact of simulation-based learning on nursing competencies in Ghana: A longitudinal study. \u003cem\u003eJournal of Nursing Education and Practice, 12\u003c/em\u003e(5), 23-30.\u003c/li\u003e\n\u003cli\u003eSchlairet, M. C., \u0026amp; Fenster, M. J. (2012). \u003cem\u003eUse of clinical simulation to enhance baccalaureate nursing students\u0026rsquo; understanding of patients with chronic conditions\u003c/em\u003e. \u003cem\u003eJournal of Nursing Education\u003c/em\u003e, 51(6), 345\u0026ndash;348. https://doi.org/10.3928/01484834-20120427-03\u003c/li\u003e\n\u003cli\u003eShin, S., et al. (2015). The effectiveness of high-fidelity simulation on clinical competencies in nursing education: A meta-analysis. \u003cem\u003eNurse Education Today, 35\u003c/em\u003e(7), 977-983.\u003c/li\u003e\n\u003cli\u003eSullivan, N., Swoboda, S. M., Brey, C., Qian, Q., \u0026amp; Lucas, L. (2015). \u003cem\u003eUse of simulation for high-stakes evaluation of newly graduated nurses\u003c/em\u003e. \u003cem\u003eJournal of Continuing Education in Nursing\u003c/em\u003e, 46(11), 482\u0026ndash;488. https://doi.org/10.3928/00220124-20151020-06\u003c/li\u003e\n\u003cli\u003eTavakol, M., \u0026amp; Dennick, R. (2021). \u003cem\u003eMaking sense of Cronbach\u0026apos;s alpha\u003c/em\u003e. \u003cem\u003eInternational Journal of Medical Education\u003c/em\u003e, 2, 53\u0026ndash;55. https://doi.org/10.5116/ijme.4dfb.8dfd\u003c/li\u003e\n\u003cli\u003eten Cate, O. (2017). Competency-based education: An introduction to the special issue. \u003cem\u003eMedical Teacher, 39\u003c/em\u003e(7), 675-681.\u003c/li\u003e\n\u003cli\u003eTodd, M. J., Manz, J. A., Hawkins, K. S., Parsons, M. E., \u0026amp; Hercinger, M. (2018). The development of a quantitative evaluation tool for simulations in nursing education. \u003cem\u003eInternational Journal of Nursing Education Scholarship\u003c/em\u003e, 5(1), 1\u0026ndash;17. https://doi.org/10.2202/1548-923X.1605\u003c/li\u003e\n\u003cli\u003eYuan, H. B., Williams, B. A., \u0026amp; Fang, J. B. (2012). \u003cem\u003eThe contribution of high‐fidelity simulation to nursing students\u0026apos; confidence and competence: A systematic review\u003c/em\u003e. \u003cem\u003eInternational Nursing Review\u003c/em\u003e, 59(1), 26\u0026ndash;33. https://doi.org/10.1111/j.1466-7657.2021.00964.x\u003c/li\u003e\n\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":true,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"bmc-medical-education","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"meed","sideBox":"Learn more about [BMC Medical Education](http://bmcmededuc.biomedcentral.com/)","snPcode":"","submissionUrl":"https://www.editorialmanager.com/meed/default.aspx","title":"BMC Medical Education","twitterHandle":"BMC_series","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"em","reportingPortfolio":"BMC Series","inReviewEnabled":true,"inReviewRevisionsEnabled":true},"keywords":"simulation-based assessments, nursing education, procedural skills, construct validity, clinical competence, Ghana","lastPublishedDoi":"10.21203/rs.3.rs-6636009/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-6636009/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eThe integration of simulation-based assessments (SBAs) in nursing education has gained significant attention globally, offering a promising approach to evaluate procedural skills and clinical competence in a controlled environment. In Ghana, however, the adoption and validation of SBAs in nursing education remain limited, with traditional assessment methods like Objective Structured Clinical Examinations (OSCEs) being predominantly used. The study investigated the construct validity of high-fidelity simulation-based assessments for evaluating procedural skills in nursing students in Ghana. Employing a quantitative, cross-sectional correlational research design, the study aims to assess the relationships between simulation performance and established clinical competence indicators, including OSCE scores and clinical practicum evaluations. Data were collected from 150 final-year nursing students across three public nursing institutions in southern and middle Ghana. The study found strong inter-rater reliability (ICC\u0026thinsp;=\u0026thinsp;0.77\u0026ndash;0.84), good internal consistency (Cronbach\u0026rsquo;s alpha\u0026thinsp;=\u0026thinsp;0.83), and moderate positive correlations between simulation and OSCE scores (r\u0026thinsp;=\u0026thinsp;0.45, p\u0026thinsp;\u0026lt;\u0026thinsp;0.01), suggesting that SBAs are a valid measure of procedural competence. Furthermore, regression analyses revealed that simulation scores explained 42% of the variance in clinical practicum performance (R\u0026sup2; = 0.42, p\u0026thinsp;\u0026lt;\u0026thinsp;0.01), supporting their predictive validity. The study also demonstrated substantial agreement between simulation-based assessment decisions and OSCE outcomes (Cohen\u0026rsquo;s k\u0026thinsp;=\u0026thinsp;0.67, p\u0026thinsp;\u0026lt;\u0026thinsp;0.01). These findings contribute valuable evidence for the continued integration of simulation-based assessments in nursing education, offering insights into their reliability, validity, and educational impact in Ghana\u0026rsquo;s context. The results underscore the potential of SBAs to serve as a credible tool for evaluating clinical readiness in nursing students, aligning with regulatory standards and enhancing the accuracy of competence assessments.\u003c/p\u003e","manuscriptTitle":"Establishing Reliability and Construct Validity of High-Fidelity Simulation-Based Assessments for Procedural Skills in Nursing Education Using the Kane Validity Framework","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-06-26 13:47:44","doi":"10.21203/rs.3.rs-6636009/v1","editorialEvents":[{"type":"communityComments","content":0},{"type":"decision","content":"Revision requested","date":"2025-07-07T17:43:16+00:00","index":"","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2025-07-05T04:01:46+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"42340208443109581035286180044692967432","date":"2025-06-25T05:19:23+00:00","index":"hide","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2025-06-24T20:46:49+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"198880930328649388783583209329649258848","date":"2025-06-24T16:12:57+00:00","index":"hide","fulltext":""},{"type":"reviewersInvited","content":"","date":"2025-06-24T11:47:54+00:00","index":"","fulltext":""},{"type":"editorInvited","content":"","date":"2025-06-23T11:17:33+00:00","index":"","fulltext":""},{"type":"editorAssigned","content":"","date":"2025-05-19T10:22:46+00:00","index":"","fulltext":""},{"type":"checksComplete","content":"","date":"2025-05-19T10:20:53+00:00","index":"","fulltext":""},{"type":"submitted","content":"BMC Medical Education","date":"2025-05-10T16:51:22+00:00","index":"","fulltext":""}],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"bmc-medical-education","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"meed","sideBox":"Learn more about [BMC Medical Education](http://bmcmededuc.biomedcentral.com/)","snPcode":"","submissionUrl":"https://www.editorialmanager.com/meed/default.aspx","title":"BMC Medical Education","twitterHandle":"BMC_series","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"em","reportingPortfolio":"BMC Series","inReviewEnabled":true,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"df558aa1-0baa-4621-999d-f290722dd07f","owner":[],"postedDate":"June 26th, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"published-in-journal","subjectAreas":[],"tags":[],"updatedAt":"2025-08-25T16:29:52+00:00","versionOfRecord":{"articleIdentity":"rs-6636009","link":"https://doi.org/10.1186/s12909-025-07724-4","journal":{"identity":"bmc-medical-education","isVorOnly":false,"title":"BMC Medical Education"},"publishedOn":"2025-08-18 16:12:49","publishedOnDateReadable":"August 18th, 2025"},"versionCreatedAt":"2025-06-26 13:47:44","video":"","vorDoi":"10.1186/s12909-025-07724-4","vorDoiUrl":"https://doi.org/10.1186/s12909-025-07724-4","workflowStages":[]},"version":"v1","identity":"rs-6636009","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-6636009","identity":"rs-6636009","version":["v1"]},"buildId":"8U1c8b4HqxoKbykW_rLl7","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.