Multimodal Physiological Assessment for Clinical Competency Classification in Simulation-Based Medical Education: A Machine Learning Approach | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article Multimodal Physiological Assessment for Clinical Competency Classification in Simulation-Based Medical Education: A Machine Learning Approach Solomon Prince Teye-Lartey, Jacob Schmieder, Umesh Yadav, Shaza Aouthmany, and 8 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-8842924/v1 This work is licensed under a CC BY 4.0 License Status: Under Review Version 1 posted 9 You are reading this latest preprint version Abstract Background Medical errors remain a leading cause of preventable harm, yet current competency assessments often rely on subjective evaluations that overlook critical performance indicators, particularly learners' responses to clinical stress. Although physiological stress markers have been linked to performance outcomes, no widely adopted or scalable framework has integrated these biomarkers with performance data to identify learners requiring additional training before real-world practice. Methods This prospective observational study developed machine learning models to classify clinical competency using multimodal data from healthcare learners. Data were collected from 152 learners (74 Emergency Medicine residents, 70 Anesthesiology residents, 8 Emergency Medical Services students) across 470 high-fidelity simulation scenarios. A multimodal assessment platform synchronized physiological signals (electrodermal activity, heart rate, skin temperature) from Empatica E4 wristbands with expert evaluations. A genetic algorithm was employed for feature selection, and neural network models were evaluated using multiple leave-N-out strategies to assess generalizability across learners and scenarios. Results The neural network achieved 84–85% balanced accuracy across thresholds 0.45–0.70, with sensitivity 93.3–95.4% and specificity 72.9–76.2%. Despite class imbalance (80.6% competent, 19.4% novice), performance remained robust, with Matthew's correlation coefficients of 0.687–0.706 and precision–recall area-under-the-curve (PR-AUC) values of 0.969–0.970 across thresholds. Conclusions This study demonstrates that integrating physiological metrics with machine learning supports objective, data-driven competency assessment. By capturing stress-performance relationships that traditional evaluations often overlook, this framework may provide an early warning system to identify learners who may require additional training and lay the foundation for more precise, data-informed medical education. simulation-based medical education competency assessment physiological monitoring machine learning wearable sensors medical error prevention Figures Figure 1 Figure 2 INTRODUCTION Assessing clinical competence among healthcare professionals is one of the most critical challenges in medical education, with direct implications for patient safety. Medical errors are recognized as the third leading cause of death in the United States, resulting in over 250,000 deaths annually (Makary and Daniel 2016). Many of these preventable adverse events can be traced to inadequate training or insufficient assessment of healthcare professionals before they enter independent practice (Makary and Daniel 2016; Shoja et al. 2025). Patient care has become increasingly complex, demanding higher standards of practice, yet traditional evaluation methods have revealed significant limitations. Current assessment paradigms rely predominantly on checklists and binary pass/fail determinations (Berendonk, Stalmeijer, and Schuwirth 2013), which fail to capture the nuanced competencies required for safe clinical practice(McKinley et al. 2008 ). This assessment gap creates a critical disconnect between formal certification and actual operational readiness. The Accreditation Council for Graduate Medical Education (ACGME) has established six interdependent competencies that residents must achieve (Batalden et al. 2002 ), each mapping to distinct domains within Bloom's taxonomy (Yanofsky and Nyquist 2010). Meeting these multifaceted standards requires assessment tools capable of evaluating complex, integrated performance under realistic conditions; yet existing methodologies remain fundamentally inadequate for identifying learners who may struggle under clinical stress. To address these challenges, medical educators have explored various innovative approaches to training and evaluation. Simulation-based medical education (SBME) has emerged as a promising approach, offering controlled environments in which trainees can develop skills without risk to patients (Elendu et al. 2024 ; Komasawa and Yokohira n.d.). Consider a typical emergency simulation: a resident managing a deteriorating patient must simultaneously process vital signs, communicate with team members, and make rapid decisions while their stress levels fluctuate dramatically. Current assessment approaches, including global rating scales, Objective Structured Clinical Examinations (OSCEs), and Objective Structured Assessment of Technical Skills (OSATS) (Elabd et al. n.d.; Zoller et al. 2021 ), have improved standardization but continue to rely on human judgment and lack objectivity. Recent advances in wearable sensor technology have opened new possibilities for capturing objective correlates of clinical performance(Virgillito, Catalfo, and Ledda 2025). Physiological indicators of stress and cognitive workload (Boffet et al. 2025 ; Howie et al. 2024 ; Weenk et al. 2018 ) have been shown to predict performance outcomes; however, this relationship is complex and far from uniform. In simulation-based medical education and clinical performance research, physiological arousal has been linked to both performance enhancement and impairment, depending on contextual factors such as task complexity, learner experience, and time pressure. Some studies report that elevated stress responses, as measured by heart rate variability, electrodermal activity, or cortisol, are associated with poorer technical or decision-making performance (Gellisch et al. 2024 ; Peek, Moore, and Arnold 2023; Vage et al. 2024 ), whereas others suggest that moderate activation or task-specific stress may facilitate engagement and situational awareness (Joseph et al. 2022 ; Kim et al. 2018 ; Nakayama et al. 2018 ; Solhjoo et al. 2019 ). Still others find no consistent association, underscoring that stress responses and performance are not linearly coupled but are dynamically modulated by individual and situational variables (Brooks, J., J.C. Crone, and D.P. Spangler, 2021; McDaniel et al. 2025 ; Pappada et al. 2022 ). This growing body of divergent evidence underscores the need for data-driven, multimodal approaches that empirically map how different physiological states correspond to clinical competency during simulated patient care. Beyond performance outcomes, clinicians' emotional states can directly influence procedural precision and decision quality. Prior research has demonstrated that heart rate variability (HRV) is a reliable index of both stress and cognitive load, reflecting autonomic regulation during complex task performance (Boffet et al. 2025 ; Joseph et al. 2022 ; Kim et al. 2018 ; Nakayama et al. 2018 ; Solhjoo et al. 2019 ). Moreover, neuroimaging evidence suggests that the ventromedial prefrontal cortex, a region central to risk appraisal and emotional regulation, serves as a neural substrate linking HRV to adaptive decision-making under pressure (Thayer et al. 2012 ). Collectively, these findings support the inclusion of HRV-derived features in the proposed modeling framework as physiologically meaningful indicators of learners’ stress responses and regulatory capacity. By leveraging continuous multimodal physiological data and machine learning, the present study moves beyond previous theoretical assumptions to empirically test how patterns of physiological activation correspond to observable clinical competence across multiple simulation scenarios and diverse learner populations. This work represents a significant advancement over prior efforts by integrating synchronized physiological responses with expert performance evaluations using a neural network–based analytic framework. The resulting model identifies subtle, nonlinear relationships between stress responses and competency levels, offering an objective approach to classifying learner performance. By analyzing data from emergency medicine residents, anesthesiology residents, and EMS students, we demonstrate that multimodal physiological features can distinguish competent practitioners from those requiring additional support. Although this study was conducted in simulated environments, the framework establishes a foundation for scalable, data-driven competency assessment that can ultimately enhance training precision and improve patient safety. METHODS Study Design and Data Collection This prospective observational study was conducted at an academic medical simulation center between 2020 and 2025 to develop and validate a machine-learning-based framework supporting objective competency classification. The study received Institutional Review Board approval, and written informed consent was gathered from all participants. Subjects were informed that performance data would be used exclusively for research purposes and would not impact their academic standing. Figure 1 illustrates the sequential methodology encompassing data collection, processing, and model development and validation. The study population comprised 152 healthcare learners: 74 Emergency Medicine residents (PGY-1–3), 70 Anesthesiology residents (PGY-1–4), and 8 Emergency Medical Services (EMS) students. Demographic characteristics are summarized as follows: 69.5% male and 30.5% female; with most participants aged 26–30 years (64.6%), followed by 22.0% aged 31 and older, 9.8% aged 21–25, and 3.7% aged 18–20. The cohort was predominantly White/Caucasian (76.8%), with 13.4% Asian, 6.1% as Black/African American, and 3.7% reporting other ethnicities. All participants completed high-fidelity simulation scenarios (15–20 minutes each) conducted in a standardized, temperature-controlled simulation environment (20–22°C, with consistent lighting). Data collection leveraged a custom-developed platform, PREPARE (PREdiction of Healthcare Provider Skill Acquisition and Future Training REquirements) (Pappada et al. 2022 ), which synchronized multimodal data streams. The platform utilizes hierarchical measurement mapping to assess performance across cognitive, psychomotor, and behavioral domains. These broad domains are further aligned with specific competencies, including clinical decision-making, medical knowledge, task efficiency, communication, and judgment (Pappada et al. 2022 ). PREPARE's assessment engine centers on defining "learning events", which are preprogrammed, scenario-specific critical moments representing essential competency demonstrations. Expert instructors evaluated each event in real-time, assigning both categorical classifications (novice/competent/expert) and continuous performance scores (0-100). For model training purposes, competent and expert categories were merged into a binary label: competent (1) versus novice (0). Because of operational constraints and clinical scheduling demands, duplicate assessments for formal inter-rater reliability were not feasible. To minimize variability, all faculty underwent standardized PREPARE training and applied consensus-based definitions for competency. Learner physiological data were collected via Empatica E4 wristbands(E4 wristband | Real-time physiological signals | Wearable PPG, EDA, Temperature, Motion sensors n.d.), which captured electrodermal activity at 4 Hz, blood volume pulse at 64 Hz, skin temperature at 4 Hz, and accelerometry at 32 Hz. Custom signal-processing algorithms derived physiological measures reflecting autonomic nervous system activity associated with stress responses. A complete list of derived measures comprising the final model feature set is provided in the supplementary materials document accompanying this manuscript (Supplementary Material Table A1). Physiological data were temporally aligned with instructor-rated events to classify learner competency. Instructors were trained to record assessments immediately after each observed task, ensuring synchronization between physiological variability and corresponding performance actions. Retrospective analysis revealed occasional deviations from protocol, including temporal inconsistencies in rating entry. The operational demands of real-time simulation assessment impose challenges: instructors must monitor multiple learners, manage scenario flow, operate simulation equipment, and provide feedback concurrently. These simultaneous responsibilities occasionally led to delayed rating entries, thereby decoupling physiological signals from corresponding performance timing. Because genuine clinical learning events rarely occur in rapid succession, temporally clustered ratings were interpreted as retrospective batch entries rather than real-time assessments. Rigorous quality-control procedures were implemented to ensure data integrity. The initial dataset comprised 2,584 instructor assessments; after screening, 730 entries were excluded based on two predefined criteria. First, 291 late assessment events were identified through temporal clustering, in which all ratings occurred within 7 seconds of one another, most frequently near the end of scenarios. Second, 439 events were removed due to physiologically implausible signal ranges and values. Thes removed values were defined as electrodermal activity outside the range of 0.01–100 µS, heart rate outside the range of 40–200 bpm, or skin temperature outside the range of 28–38°C, values suggestive of motion artifacts, sensor detachment, or loss of contact. Signal Processing and Feature Selection Physiological signals underwent preprocessing using bandpass filtering at 0.5-5 Hz for blood volume pulse and 0.05-1 Hz for electrodermal activity to isolate relevant frequencies. Individual baseline states were identified using a sliding-window algorithm that detected stable physiological periods lasting at least 15 seconds, and deviation from baseline metrics was computed by standardizing values relative to these individualized resting states. We employed a suite of custom-built algorithms to extract features from raw physiological data within event-centered windows corresponding to instructor-rated clinical performance moments. This windowing approach captured temporal fluctuations in physiological activity that reflect acute stress and cognitive workload dynamics influencing performance quality. To accommodate the differing temporal characteristics of each biosignal, we randomly sampled event-centered windows of 5 to 180 seconds (in 5-second increments; n = 1,296 configurations), testing 250 configurations per signal type. Electrodermal responses demonstrated rapid phasic changes associated with cognitive and emotional stress (Benedek and Kaernbach 2010 ; Posada-Quintero et al. 2018; Rahma et al. 2022 ), whereas cardiovascular indices, such as heart rate variability, evolved more gradually through autonomic regulation (Kasahara et al. 2021 ; Mendelowitz, D. 1999 ; Robinson et al. 1966 ). Peripheral temperature fluctuations occurred over longer timescales, necessitating signal-specific optimization of window parameters for feature extraction. Unlike previous studies that measured stress at single time points (Alasmari et al. 2025 ; Campanella et al. 2023 ), our analysis across extended windows revealed three phases distinguishing expertise levels: anticipatory arousal before events, acute stress responses during events, and recovery patterns after completion. Heart rate optimally discriminated competency using 125-second pre-event and 70-second post-event windows, whereas electrodermal activity required more extended periods (80 seconds before and 180 seconds after) due to slower response dynamics. This temporal approach revealed competency differences not only in stress magnitude but also across entire physiological timelines. Statistical features computed within these windows included mean, standard deviation, minimum, maximum, range, skewness, kurtosis, and cumulative response for both absolute and baseline-relative values, generating 73 candidate features. To identify the most informative biomarkers while avoiding overfitting, we employed a genetic algorithm operating over 50 generations with 100 candidate feature subsets per generation. A crossover rate of 0.7 and a mutation rate of 0.1 balanced exploration and exploitation. We retained the top 50 models from independent runs, stratifying features by selection frequency: high-confidence (≥ 75% of models), moderate-confidence (50–74%), and exploratory (20–49%). The algorithm converged on 65 optimal features (11% dimensionality reduction while maintaining performance), comprising 28 heart rate measures (43%), 22 electrodermal activity indicators (34%), and 15 temperature parameters (23%). This distribution reflects the primary role of cardiovascular responses in competency discrimination, consistent with the literature establishing heart rate variability as a key indicator of stress regulation and cognitive load. A list of included model features is included in the Appendix (Supplementary Materials) of this manuscript. Binary Classification Model Development Analysis of our dataset revealed a significant class imbalance (80.6% competent and 19.4% novice ratings), indicating instructors overwhelmingly classified learners as competent. Developing an objective, data-driven method to identify novice learners lacking operational readiness is essential for ensuring patient safety. To address class imbalance, we implemented inverse-frequency class weighting, which assigns higher importance to minority class examples during training. This approach ensures that the model pays equal attention to both competent and struggling learners without creating artificial data points that could distort genuine physiological stress-response patterns. We developed a feedforward neural network model to capture the complex, nonlinear relationships between physiological responses and performance outcomes. Neural networks are particularly suited for this application as they can identify subtle patterns in multivariate physiological data that may not be apparent through traditional statistical methods. To prevent overfitting, we applied dropout regularization (randomly deactivating 30% of neurons during training) and L2 weight regularization (λ = 0.01) to constrain model complexity. Model training used the Adam optimizer with an initial learning rate of 0.001 and exponential decay to allow adaptive convergence. Early stopping was implemented to halt training at 20 epochs without improvement in validation loss. The model was designed with a sigmoid transfer function in its output axon, which served to generate a model output ranging from 0 to 1, where higher output values closer to 1 represent learner competency/operational readiness. Models were developed using Python 3.9, TensorFlow 2.4, NumPy 1.19, and scikit-learn 0.24. Model Validation and Statistical Analysis To evaluate model performance on completely unseen subjects, we employed a 20-fold leave-N-out validation strategy, where each fold excluded a unique subset of subjects. For each fold, five independent trials were conducted, yielding a total of 100 model instances for comprehensive performance assessment. This approach tests the model's ability to classify entirely new individuals rather than just new events from known learners, with the held-out test set providing final performance verification. Performance metrics included sensitivity for identifying competent learners, specificity (for detecting novices who require additional support), Matthews Correlation Coefficient (MCC) to balance performance in imbalanced datasets, and both ROC-AUC and PR-AUC to assess discrimination across varying classification thresholds. In this educational context, true negatives (which represent novice trainees correctly identified as not operationally ready) are particularly critical, as such classifications directly prevent underprepared learners from advancing to independent clinical practice, thereby safeguarding patient safety. Specificity is thus emphasized as an essential performance metric for this effort, ensuring that learners needing further training are accurately flagged. The MCC was chosen as the primary optimization metric because it reflects all four classification outcomes with equal weighting and is robust in imbalanced scenarios, such as this dataset, where competent learners substantially outnumber novices (80.6% versus 19.4%). This balanced metric helps prioritize the accurate identification of struggling learners rather than merely maximizing accuracy by favoring the majority class. Metric confidence intervals were computed via bootstrap resampling (10,000 iterations) to enable robust statistical inference. To determine the optimal classification threshold for model output, we evaluated 21 unique threshold values (see Appendix/Supplementary Materials) and selected the one that maximized MCC while maintaining balanced sensitivity and specificity. Statistical significance was assessed using Friedman’s analysis of variance by ranks, with post hoc Nemenyi tests (α = 0.05), to evaluate the effects of the threshold on model performance metrics. RESULTS Dataset Characteristics The final dataset comprised 1,854 high-fidelity learning events from 152 healthcare learners across 470 simulation scenarios, representing a 71.8% data retention rate following quality-control exclusions. Class distribution showed 360 novice (19.4%) and 1,494 competent (80.6%) classifications, with variation across specialties: Emergency Medicine residents demonstrated 87.0% competency rates, Anesthesiology residents showed 75.3% competency, and EMS students exhibited 46.3% competency, potentially reflecting their respective training stages and experience levels. Although the dataset was imbalanced, as is typical in competency-based assessments, a post-hoc power analysis confirmed > 0.999 statistical power (Cohen's h = 1.32, α = 0.05) to detect meaningful competency differences. Model Performance Model results are summarized in Table 1. Classification performance remained stable across the optimal classification threshold range of 0.45–0.70 (see Table 1), achieving balanced accuracy of 84.1–84.8% and MCC of 0.687–0.706 despite significant class imbalance. Higher thresholds captured more true novices at the expense of additional false negatives. Increasing the threshold from 0.45 to 0.70 improved specificity from 72.9% to 76.2%, while sensitivity decreased modestly from 95.4% to 93.3%. A table of model performance across all model classification thresholds is included in the Supplementary Materials accompanying this manuscript (Table A2). The Precision-Recall AUC (0.969–0.970) confirmed robust discrimination across thresholds. Positive predictive values remained stable at 94.0-94.6%, ensuring that learners classified as competent overwhelmingly demonstrate true competency at any threshold within this range. False-positive counts declined from 217 to 190 as thresholds increased, reducing unnecessary remediation for competent learners, whereas true-competent identifications remained stable (3,376 to 3,304). This performance stability enables programs to calibrate their assessment systems according to institutional priorities, whether emphasizing the maximum detection of novices who need support, minimizing disruption for competent learners, or striking a balance between these objectives based on available training resources and risk tolerance. Physiological Biomarker Analysis Representative heart-rate deviation (from baseline) profiles (Fig. 2) illustrate distinct physiological patterns that distinguish competent from novice performers during simulation-based training. The competent learner (left panel) demonstrated adaptive autonomic regulation, characterized by an initial anticipatory suppression of approximately 20% below baseline, followed by a stable and modest elevation of 2–5% elevation and minimal oscillation throughout the task. In contrast, the novice learner (right panel) exhibited physiological dysregulation patterns: 22% surge above baseline, a rapid 7% drop below baseline, and persistent fluctuations between + 17% and − 5%. These distinct patterns, stable adaptation in contrast to dysregulated oscillation, suggest that heart rate changes and variability relative to baseline could serve as objective biomarkers of competency and clinical readiness, supporting their inclusion as model input features for automated performance assessment. Table 1 Model performance metrics across classification thresholds for competency assessment. Values represent mean ± standard deviation across 20 leave-N-out validation folds. The optimal threshold (0.50, shown in bold) maximizes Matthews Correlation Coefficient (MCC) while balancing sensitivity and specificity. Confusion matrix components (n_TP, n_FP, n_TN, n_FN) represent total counts aggregated across all validation folds. MCC = Matthews Correlation Coefficient; PPV = Positive Predictive Value (Precision); NPV = Negative Predictive Value; TP = True Positives; FP = False Positives; TN = True Negatives; FN = False Negatives.(Values of n are cumulative across all validation folds) Threshold Balanced Accuracy MCC Sensitivity Specificity PPV NPV n_TP n_FP n_TN n_FN 0.30 0.833 ± 0.036 0.700 ± 0.057 0.959 ± 0.018 0.708 ± 0.076 0.936 ± 0.015 0.803 ± 0.068 3395 234 566 145 0.35 0.837 ± 0.038 0.702 ± 0.056 0.956 ± 0.020 0.719 ± 0.085 0.938 ± 0.017 0.796 ± 0.068 3385 225 575 155 0.40 0.837 ± 0.036 0.702 ± 0.057 0.957 ± 0.020 0.716 ± 0.078 0.938 ± 0.016 0.797 ± 0.068 3388 227 573 152 0.45 0.841 ± 0.041 0.704 ± 0.059 0.954 ± 0.022 0.729 ± 0.090 0.940 ± 0.018 0.788 ± 0.067 3376 217 583 164 0.50 0.845 ± 0.042 0.706 ± 0.061 0.952 ± 0.023 0.739 ± 0.093 0.942 ± 0.019 0.784 ± 0.067 3369 209 591 171 0.55 0.845 ± 0.039 0.696 ± 0.061 0.944 ± 0.025 0.746 ± 0.086 0.943 ± 0.018 0.760 ± 0.069 3343 203 597 197 0.60 0.847 ± 0.040 0.695 ± 0.063 0.943 ± 0.025 0.751 ± 0.086 0.944 ± 0.018 0.755 ± 0.069 3337 199 601 203 0.65 0.847 ± 0.043 0.690 ± 0.064 0.935 ± 0.039 0.760 ± 0.104 0.946 ± 0.021 0.744 ± 0.098 3309 192 608 231 0.70 0.848 ± 0.041 0.687 ± 0.066 0.933 ± 0.036 0.762 ± 0.096 0.946 ± 0.020 0.737 ± 0.093 3304 190 610 236 0.75 0.847 ± 0.043 0.682 ± 0.070 0.934 ± 0.025 0.760 ± 0.090 0.946 ± 0.019 0.728 ± 0.071 3305 192 608 235 Cross-Domain Performance Analysis The model demonstrated robust classification accuracy across medical disciplines and skill domains (Table 2). Emergency Medicine assessments achieved the highest true positive rate (87.0%) with balanced sensitivity (95.4%) and specificity (73.8%). Anesthesiology showed optimal sensitivity (97.0%) despite lower true positive rates (69.8%), while EMS exhibited the most balanced class distribution with comparable specificity (77.8%). Table 2 Classification performance metrics across medical disciplines at model classification threshold of 0.50. True Positives (TP), False Positives (FP), True Negatives (TN), and False Negatives (FN) are shown as counts with percentages of total assessments. Emergency Medicine demonstrated the highest true positive rate despite class imbalance, while EMS showed more balanced class distribution. Sensitivity remained above 83% across all disciplines, with Anesthesiology achieving the highest value (97.0%). Balanced accuracy ranged from 80.5% to 85.0%, indicating consistent model performance across diverse clinical training contexts. Discipline n TP (%) FP (%) TN (%) FN (%) Sensitivity Specificity Balanced Accuracy Emergency Medicine 2,573 2,239 (87.0%) 59 (2.3%) 166 (6.5%) 109 (4.2%) 95.4% 73.8% 84.6% Anesthesiology 1,170 817 (69.8%) 89 (7.6%) 239 (20.4%) 25 (2.1%) 97.0% 72.9% 85.0% EMS 380 144 (37.9%) 46 (12.1%) 161 (42.4%) 29 (7.6%) 83.2% 77.8% 80.5% DISCUSSION This study demonstrates that physiological biomarkers alone derived from a wrist-worn wearable device can contribute to scalable and reliable estimation of clinical competency in simulation-based medical education. Through comprehensive leave-N-out validation across 100 independent model instances, the findings provide strong evidence of generalization across multiple learner populations and clinical domains. Unlike earlier frameworks that inferred performance from theoretical models of stress or arousal, this work establishes empirical, data-driven mappings between physiological activation and demonstrated clinical competence. The results align with and extend previous research, which has shown that physiological indicators such as heart-rate variability and electrodermal activity reflect stress regulation and cognitive workload(Joseph et al. 2022 ; Kim et al. 2018 ; Nakayama et al. 2018 ; Solhjoo et al. 2019 ). This research also addresses the mixed and sometimes contradictory findings regarding whether heightened stress impairs or facilitates performance(Cavaleri et al. 2023 ; Vage et al. 2024 ). By integrating multimodal, temporally resolved physiological data with machine-learning-based classification, the present study moves beyond correlational approaches to provide predictive and generalizable models of competency. Simulation-based medical education (SBME) has long been recognized as a crucial tool for enhancing clinical performance and improving patient safety. Foundational reviews and meta-analyses have consistently demonstrated that technology-enhanced simulation combined with deliberate practice assists in clinical skills acquisition with respect to traditional clinical education (McGaghie et al. 2011 ). Additional evidence from mastery-learning frameworks further suggests that simulation can help equalize performance gaps among learners, although assessment methods often still rely on subjective expert judgment and checklists (Cook et al. 2013 ). These limitations highlight a persistent need for objective, continuous, and scalable assessment techniques that supplement human raters. The current study addresses this gap by developing a physiological signal–based competency classifier that integrates smoothly with expert evaluation frameworks. This integration can support evidence of validity, reproducibility, and scalability in performance assessment, which are key challenges highlighted in previous simulation research(Okuda et al. 2009 ). The classification framework achieved consistent performance (balanced accuracy 80.5–85.0%) across specialties using physiological data alone, independent of discipline or scenario type. This finding supports the potential validity of physiological stress responses as candidate biomarkers of clinical competency. The model maintained robust accuracy despite the likely variability in instructor ratings, reinforcing its value as a complementary measure that enhances, rather than replaces, expert judgment. Together, these findings support the generalizability of psychophysiological assessment frameworks across diverse medical training contexts, extending prior work that was typically limited to discipline-specific or single-institution samples (Nakayama et al. 2018 ; Solhjoo et al. 2019 ). The challenges inherent in real-time instructor assessment during simulation underscore the value of multimodal assessment platforms such as PREPARE. Instructors frequently manage multiple concurrent tasks, such as scenario flow, learner safety, equipment operation, and feedback delivery, while simultaneously evaluating learner performance. Automated physiological monitoring can capture objective data even when an instructor's attention is divided, ensuring that key learning events are documented with corresponding physiological signatures. This augmentation of expert judgment through continuous, sensor-derived data contributes to more complete and accurate learner evaluations, reducing both missed observations and rater fatigue. Despite these promising findings, several limitations should be acknowledged. First, the absence of a formal interrater reliability assessment introduces uncertainty regarding the precision of instructor-based labels used for model training. Although the faculty received standardized calibration, agreement between raters was not quantified. Additionally, a tendency toward lenient instructor evaluations may have inflated positive classifications, meaning some “false negatives” identified by the model may, in fact, reflect accurate detection of struggling learners. Future work should incorporate multi-rater adjudication and video-based secondary reviews to strengthen label validity. The class imbalance, though mitigated through inverse-frequency weighting, may still limit the diversity of novice patterns captured during training. Similarly, the smaller sample of EMS trainees compared to graduate medical residents limits the generalizability of early-stage learners. While the use of simulation rather than clinical data limits ecological validity, this controlled design enabled rigorous signal processing and systematic model validation. Multi-institutional and clinical replications will be essential to confirm the robustness of the findings across different contexts. The immediate application of this framework lies in its ability to inform adaptive, learner-centered training pathways. Novice classifications can be mapped to standardized hierarchical competencies within PREPARE, ranging from general cognitive and psychomotor skills to scenario-specific clinical judgments. Such mappings could guide instructors in prescribing targeted remediation and measuring progress over time. Expanding from binary (novice/competent) to multi-level proficiency classification (novice/competent/expert) will further support precision feedback and developmental tracking. Ongoing work focuses on enhancing system intelligence and data fusion within the PREPARE platform by integrating audio, video, and text-based features. Advanced analytics, including natural language processing (Paudel, Pappada, and Cheng 2023) and large language models, are now being incorporated for automated event detection and to improve multimodal data synchronization. These extensions aim to strengthen the interpretive depth of physiological signals, linking them with behavioral and contextual indicators of performance. The ultimate goal is to develop a robust, multi-source competency model that supports individualized learning and operational readiness assessment. In conclusion, this work advances simulation-based medical education by demonstrating that objective physiological markers can augment traditional assessments. By bridging gaps between stress physiology, educational data science, and clinical competency evaluation, this framework establishes a foundation for scalable, data-driven, and individualized assessment systems. With continued validation and refinement, such models may help identify and support at-risk learners before they enter practice, ultimately contributing to improved patient safety and reduced medical errors. CONCLUSION This study provides empirical evidence that multimodal physiological data collected during simulated clinical scenarios can help support more objective and reliable competency classification and operational readiness. Using physiological features synchronized with expert evaluations, the developed model achieved strong generalization across diverse learners, maintaining a balanced accuracy of 84.5% and a sensitivity of 95.2% despite class imbalance. By integrating psychophysiological measures with machine learning analytics, this work advances competency assessment beyond checklist-based evaluations toward adaptive, precision education frameworks that tailor feedback and remediation to individual needs. Embedded machine learning-based intelligence within PREPARE platform provides the potential to leverage physiological markers to augment expert judgment, with the potential to enhance both fairness and fidelity in simulation-based evaluation. Future research should focus on multi-institutional validation and the integration of additional behavioral and contextual data streams to refine, personalize, and scale this approach for broader application in health professions training. Declarations Competing Interests: The authors have no competing interests to declare that are relevant to the content of this article. Funding: The models developed during this research are being integrated into the PREPARE platform for Operational Readiness Assessment as part of an effort funded by the US Department of Defense, Congressionally Directed Medical Research Program (Contract #: HT942524C0073) Ethics Statement: This study received Institutional Review Board approval from the University of Toledo. All procedures involving human participants were performed in accordance with the ethical standards of the institutional research committee and with the 1964 Declaration of Helsinki and its later amendments. Written informed consent was obtained from all participants. Participants were informed that performance data would be used exclusively for research purposes and would not impact their academic standing. Consent to Participate: Written informed consent was obtained from all individual participants included in the study. Consent to Publish: Not applicable. This manuscript does not contain identifiable individual participant data. Author Contributions: ST, performed data analysis, supported data acquisition, model development, and writing of the manuscript, project was part of doctoral research. JS, performed data analysis, guided model development, and supporting writing/editing of the manuscript. UY, supported software development efforts of platform required for effort. SA, supported simulation curriculum development/delivery, provided expert faculty assessment of learners, and editing of manuscript. CA, supported simulation curriculum development/delivery , ensured allocation/dedication of simulation center staff and resources to study execution. KJ, supported simulation curriculum development/delivery, provided expert faculty assessment of learners. TP, supported simulation curriculum development/delivery, writing/editing of the manuscript, provided expert faculty assessment of learners. SToy supported writing and editing of manuscript, literature review, and data analysis efforts. KB, supported simulation curriculum development/delivery , ensured allocation/dedication of simulation center staff and resources to study execution. AB, supported simulation curriculum development/delivery, provided expert faculty assessment of learners. BA, supported simulation curriculum development, provided review and editing/feedback on manuscript. SP, guided model development and data analysis efforts, writing/editing of manuscript, led and provided oversight for all study aspects, served as PI of overall study and served as primary advisor of ST for doctoral research project. Data Availability: The datasets generated and/or analyzed during the current study are not publicly available due to their inclusion of learner assessments, which raise privacy considerations for participants. Deidentified data may be made available by the corresponding author upon reasonable request, subject to institutional review board approval, data use agreements, and compliance with ethical and legal restrictions. References Alasmari, S., AlGhamdi, R., Tejani, G. G., & Sharma, S. K., and Seyed Jalaleddin Mousavirad (2025). Federated Learning-Based Multimodal Approach for Early Detection and Personalized Care in Cardiac Disease. Frontiers in Physiology , 16 . 10.3389/fphys.2025.1563185 Batalden, P., Leach, D., Swing, S., & Dreyfus, H., and Stuart Dreyfus (2002). General Competencies and Accreditation in Graduate Medical Education. Health Affairs (Project Hope) , 21 (5), 103–111. 10.1377/hlthaff.21.5.103 Benedek, M., & Kaernbach, C. (2010). A Continuous Measure of Phasic Electrodermal Activity. Journal of Neuroscience Methods , 190 (1), 80–91. 10.1016/j.jneumeth.2010.04.028 Berendonk, C., Stalmeijer, R. E., Lambert, W. T., & Schuwirth (2013). Expertise in Performance Assessment: Assessors’ Perspectives. Advances in Health Sciences Education: Theory and Practice , 18 (4), 559–571. 10.1007/s10459-012-9392-x Boffet, A., Arsac, L. M., Ibanez, V., Sauvet, F., & Véronique, D. A. (2025). Detection of Cognitive Load Modulation by EDA and HRV. Sensors (Basel, Switzerland) , 25 (8), 2343. 10.3390/s25082343 Brooks, J., Crone, J. C., & Spangler, D. P. (2021). A Physiological and Dynamical Systems Model of Stress. International Journal of Psychophysiology , 166 , 83–91. Campanella, S., Altaleb, A., Belli, A., Pierleoni, P., & Palma, L. (2023). A Method for Stress Detection Using Empatica E4 Bracelet and Machine-Learning Techniques. Sensors (Basel, Switzerland) , 23 (7), 3565. 10.3390/s23073565 Cavaleri, R., Withington, A., & Jane Chalmers, K., and Felicity Blackstock (2023). The Influence of Stress on Student Performance during Simulation-Based Learning: A Pilot Randomized Trial. ATS scholar , 4 (4), 474–489. 10.34197/ats-scholar.2022-0042OC Cook, D. A., Brydges, R., Zendejas, B., & Hamstra, S. J., and Rose Hatala (2013). Mastery Learning for Health Professionals Using Technology-Enhanced Simulation: A Systematic Review and Meta-Analysis. Academic Medicine: Journal of the Association of American Medical Colleges , 88 (8), 1178–1186. 10.1097/ACM.0b013e31829a365d E4 Wristband (January 6, 2026). | Real-Time Physiological Signals | Wearable PPG, EDA, Temperature, Motion Sensors. Empatica . https://www.empatica.com/research/e4 Elabd, K., Abdul-Kadir, H., Alkhenizan, A., Mohammed, K., & Alkhalifa A Comparison of the Checklist Scoring Systems, Global Rating Systems, and Borderline Regression Method for an Objective Structured Clinical Examination for a Small Cohort in a Saudi Medical School. Cureus 15(6): e39968. 10.7759/cureus.39968 Elendu, C., Amaechi, D. C., Alexander, U., Okatta, Emmanuel, C., Amaechi, T. C., Elendu, Chiamaka, P., & Ezeh, and Ijeoma D. Elendu (2024). The Impact of Simulation-Based Training in Medical Education: A Review. Medicine , 103 (27), e38813. 10.1097/MD.0000000000038813 Gellisch, M., Bablok, M., Brand-Saberi, B., Thorsten, & Schäfer (2024). Neurobiological Stress Markers in Educational Research: A Systematic Review of Physiological Insights in Health Science Education. Trends in Neuroscience and Education , 37 , 100242. 10.1016/j.tine.2024.100242 Howie, E. E., Harari, R., Dias, R. D., Wigmore, S. J., & Skipworth, R. J. E., and Steven Yule (2024). Feasibility of Wearable Sensors to Assess Cognitive Load During Clinical Performance: Lessons Learned and Blueprint for Success. The Journal of Surgical Research , 302 , 222–231. 10.1016/j.jss.2024.07.009 Joseph, M., Ray, J. M., Jungsoo Chang, L. D., Cramer, J. W., Bonz, T. J., Yang, A. H., Wong, M. A., Auerbach, & Evans, L. V. (2022). All Clinical Stressors Are Not Created Equal: Differential Task Stress in a Simulated Clinical Environment. AEM education and training , 6 (2), e10726. 10.1002/aet2.10726 Kasahara, Y., Yoshida, C., & Saito, M., and Yoshitaka Kimura (2021). Assessments of Heart Rate and Sympathetic and Parasympathetic Nervous Activities of Normal Mouse Fetuses at Different Stages of Fetal Development Using Fetal Electrocardiography. Frontiers in Physiology , 12 , 652828. 10.3389/fphys.2021.652828 Kim, H. G., Cheon, E. J., Bai, D. S., Lee, Y. H., & Bon-Hoon, K. (2018). Stress and Heart Rate Variability: A Meta-Analysis and Review of the Literature. Psychiatry Investigation , 15 (3), 235–245. 10.30773/pi.2017.08.17 Komasawa, N., & Yokohira, M. Simulation-Based Education in the Artificial Intelligence Era. Cureus 15(6): e40940. 10.7759/cureus.40940 Makary, M. A., and Michael Daniel (2016). Medical Error-the Third Leading Cause of Death in the US. BMJ (Clinical research ed) , 353 , i2139. 10.1136/bmj.i2139 McDaniel, G. H., Pappada, S., Alyosif, Z., & Teye-Lartey, S., and Mohamad Moussa (2025). The Impact of Stress and Distraction on Bag-Valve-Mask Ventilation Performance. Cureus , 17 (5), e84542. 10.7759/cureus.84542 McGaghie, W. C., Barry Issenberg, S., Cohen, E. R., & Barsuk, J. H., and Diane B. Wayne (2011). Does Simulation-Based Medical Education with Deliberate Practice Yield Better Results than Traditional Clinical Education? A Meta-Analytic Comparative Review of the Evidence. Academic Medicine: Journal of the Association of American Medical Colleges , 86 (6), 706–711. 10.1097/ACM.0b013e318217e119 McKinley, R. K., Strand, J., Ward, L., Gray, T., & Alun-Jones, T., and Helen Miller (2008). Checklists for Assessment and Certification of Clinical Procedural Skills Omit Essential Competencies: A Systematic Review. Medical Education , 42 (4), 338–349. 10.1111/j.1365-2923.2007.02970.x Mendelowitz, D. (1999). Advances in Parasympathetic Control of Heart Rate and Cardiac Function. Physiology , 14 (4), 155–161. Nakayama, N., Arakawa, N., Ejiri, H., & Matsuda, R., and Tsuneko Makino (2018). Heart Rate Variability Can Clarify Students’ Level of Stress during Nursing Simulation. PloS One , 13 (4), e0195280. 10.1371/journal.pone.0195280 Okuda, Y., Bryson, E. O., DeMaria, S., Jacobson, L., Quinones, J., & Shen, B., and Adam I. Levine (2009). The Utility of Simulation in Medical Education: What Is the Evidence? The Mount Sinai Journal of Medicine New York , 76 (4), 330–343. 10.1002/msj.20127 Pappada, S., Owais, M., Aouthmany, S., Rega, P., Schneiderman, J., Toy, S., Schiavi, A., et al. (2022). Personalizing Simulation-Based Medical Education: The Case for Novel Learning Management Systems. International Journal of Healthcare Simulation . 10.54531/mngy8113 Paudel, P. (2023). Scott Pappada, and Liang Cheng. Automated Multimodal Performance Evaluation in Simulation-Based Medical Education Using Natural Language Processing. In Proceedings of the ACM/IEEE 14th International Conference on Cyber-Physical Systems (with CPS-IoT Week 2023) , ICCPS ’23, New York, NY, USA: Association for Computing Machinery, 258–59. 10.1145/3576841.3589624 Peek, R., & Moore, L., and Rachel Arnold (2023). Psychophysiological Fidelity: A Comparative Study of Stress Responses to Real and Simulated Clinical Emergencies. Medical Education , 57 (12), 1248–1256. 10.1111/medu.15155 Posada-Quintero, Hugo, F., Florian, J. P., Alvaro, D., Orjuela-Cañón, & Chon, K. H. (2018). Electrodermal Activity Is Sensitive to Cognitive Stress under Water. Frontiers in Physiology , 8 . 10.3389/fphys.2017.01128 Rahma, O., Nur, A. P., Putra, A., & Rahmatillah (2022). Yang Sa’ada Kamila Ariyansah Putri, Nuzula Dwi Fajriaty, Khusnul Ain, and Rifai Chai. Electrodermal Activity for Measuring Cognitive and Emotional Stress Level. Journal of Medical Signals and Sensors 12(2): 155–62. 10.4103/jmss.JMSS_78_20 Robinson, B. F., Epstein, S. E., Beiser, G. D., & Braunwald, E. (1966). Control of Heart Rate by the Autonomic Nervous System. Studies in Man on the Interrelation between Baroreceptor Mechanisms and Exercise. Circulation Research , 19 (2), 400–411. 10.1161/01.res.19.2.400 Shoja, M. M., Darisel, N., Ventura Rodriguez, O., Avilova, & Rajput, V. Error Reduction in Healthcare Through Team Training and Cultural Transformation. Cureus 17(8): e91243. 10.7759/cureus.91243 Solhjoo, S., Haigney, M. C., McBee, E., Jeroen, J. G., van Merrienboer, L., Schuwirth, A. R., Artino, A., Battista, et al. (2019). Heart Rate and Heart Rate Variability Correlate with Clinical Reasoning Performance and Self-Reported Measures of Cognitive Load. Scientific Reports , 9 (1), 14668. 10.1038/s41598-019-50280-3 Thayer, J. F., Ahs, F., Fredrikson, M., Sollers, J. J., & Wager, T. D. (2012). A Meta-Analysis of Heart Rate Variability and Neuroimaging Studies: Implications for Heart Rate Variability as a Marker of Stress and Health. Neuroscience and Biobehavioral Reviews , 36 (2), 747–756. 10.1016/j.neubiorev.2011.11.009 Vage, A., Spence, A. D., McKeown, G., Gormley, G. J., & Hamilton, P. K. (2024). Simulate to Stimulate? A Systematic Review of Stress, Learning, and Performance in Healthcare Simulation. The Ulster Medical Journal , 93 (3), 119–126. Virgillito, D., & Catalfo, P., and Caterina Ledda (2025). Wearables in Healthcare Organizations: Implications for Occupational Health, Organizational Performance, and Economic Outcomes. Healthcare (Basel Switzerland) , 13 (18), 2289. 10.3390/healthcare13182289 Weenk, M., Alken, A. P. B., Lucien, J. L. P. G., Engelen, S. J. H., Bredie, Tom, H., van de Belt, & Harry van Goor. (2018). Stress Measurement in Surgeons and Residents Using a Smart Patch. American Journal of Surgery , 216 (2), 361–368. 10.1016/j.amjsurg.2017.05.015 Yanofsky, S. D., Julie, G., & Nyquist (2010). Using the Affective Domain to Enhance Teaching of the ACGME Competencies in Anesthesiology Training. The journal of education in perioperative medicine: JEPM , 12 (1), E055. Zoller, A., Hölle, T., Wepler, M., Radermacher, P., & Nussbaum, B. L. (2021). Development of a Novel Global Rating Scale for Objective Structured Assessment of Technical Skills in an Emergency Medical Simulation Training. BMC Medical Education , 21 (1), 184. 10.1186/s12909-021-02580-4 Additional Declarations No competing interests reported. Supplementary Files SupplementaryInformationBCMUToledo.docx Cite Share Download PDF Status: Under Review Version 1 posted Reviews received at journal 04 May, 2026 Reviewers agreed at journal 09 Apr, 2026 Reviewers agreed at journal 07 Apr, 2026 Reviews received at journal 06 Apr, 2026 Reviewers agreed at journal 02 Mar, 2026 Reviewers invited by journal 17 Feb, 2026 Editor assigned by journal 12 Feb, 2026 Submission checks completed at journal 12 Feb, 2026 First submitted to journal 10 Feb, 2026 You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-8842924","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":593156598,"identity":"1bd73374-b4d5-4fe9-a6c4-1b5e99da8d65","order_by":0,"name":"Solomon Prince Teye-Lartey","email":"","orcid":"","institution":"University of Toledo","correspondingAuthor":false,"prefix":"","firstName":"Solomon","middleName":"Prince","lastName":"Teye-Lartey","suffix":""},{"id":593156599,"identity":"fe4293cb-8d70-4881-b1ec-22e6f3571a0c","order_by":1,"name":"Jacob Schmieder","email":"","orcid":"","institution":"University of Toledo","correspondingAuthor":false,"prefix":"","firstName":"Jacob","middleName":"","lastName":"Schmieder","suffix":""},{"id":593156600,"identity":"a9562cfd-8fef-45c7-8de0-13e4e7d28142","order_by":2,"name":"Umesh Yadav","email":"","orcid":"","institution":"University of Toledo","correspondingAuthor":false,"prefix":"","firstName":"Umesh","middleName":"","lastName":"Yadav","suffix":""},{"id":593156601,"identity":"aa2f58f8-ac3c-44fa-92e5-84eab4428f91","order_by":3,"name":"Shaza Aouthmany","email":"","orcid":"","institution":"University of Toledo","correspondingAuthor":false,"prefix":"","firstName":"Shaza","middleName":"","lastName":"Aouthmany","suffix":""},{"id":593156602,"identity":"fb2081bc-44ec-42bc-af1a-8776696cf136","order_by":4,"name":"Cristina Alvarado","email":"","orcid":"","institution":"University of Toledo","correspondingAuthor":false,"prefix":"","firstName":"Cristina","middleName":"","lastName":"Alvarado","suffix":""},{"id":593156603,"identity":"1b45cb02-5ef5-404e-a579-041487b77e8d","order_by":5,"name":"Kimberly Jenkins","email":"","orcid":"","institution":"University of Toledo","correspondingAuthor":false,"prefix":"","firstName":"Kimberly","middleName":"","lastName":"Jenkins","suffix":""},{"id":593156604,"identity":"d42f6b8a-9eb5-4aa6-b1a3-ffc64d0a1eae","order_by":6,"name":"Thomas J. Papadimos","email":"","orcid":"","institution":"University of Toledo","correspondingAuthor":false,"prefix":"","firstName":"Thomas","middleName":"J.","lastName":"Papadimos","suffix":""},{"id":593156605,"identity":"d7baab0a-2be7-40cd-bcd9-e0f2187b8e53","order_by":7,"name":"Serkan Toy","email":"","orcid":"","institution":"Virginia Tech Carilion School of Medicine","correspondingAuthor":false,"prefix":"","firstName":"Serkan","middleName":"","lastName":"Toy","suffix":""},{"id":593156606,"identity":"404fc8f5-5e0f-4904-a01b-43d9b8e49f7d","order_by":8,"name":"Kris Brickman","email":"","orcid":"","institution":"University of Toledo","correspondingAuthor":false,"prefix":"","firstName":"Kris","middleName":"","lastName":"Brickman","suffix":""},{"id":593156607,"identity":"d13cc7c9-61a8-494d-a63f-d3c034a0cb8f","order_by":9,"name":"Anthony Braida","email":"","orcid":"","institution":"University of Toledo","correspondingAuthor":false,"prefix":"","firstName":"Anthony","middleName":"","lastName":"Braida","suffix":""},{"id":593156608,"identity":"6601e8ca-5ab2-4a8e-8847-6502d92ef0a1","order_by":10,"name":"Brent Altenhof","email":"","orcid":"","institution":"University of Toledo","correspondingAuthor":false,"prefix":"","firstName":"Brent","middleName":"","lastName":"Altenhof","suffix":""},{"id":593156609,"identity":"7d86ff5e-cb31-4169-b49c-d1455ef218bc","order_by":11,"name":"Scott M. Pappada","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAABBElEQVRIiWNgGAWjYLADCYYKIMkOYjAwMDbgVsjYcACu5QwQM5OkhbGNCC3y7YePP/7AcEfeXLr54Y2f8w7X8TMzH7zxgcFGdsMB7FoMzqQlAm15ZrhzzjFjy95thyUkm9mSLWcwpBnj1CLBYwjUcphxw40EMwleoBaDwzxm0jwMhxNxaZGfAdFiv+FG+jfJv3NAWvi/AbX8x6mF4QZES+KGGzlm0rwNYFvYgFoO4NQC8suMMwbPknfOOVNsLXMsXXJmM5ux5QyDZOOZuBzWfvjAh4qKO7bbpds33nxTY83Pzw4Mug8VdrJ9uBwGsesAMBxQRfApBwMMLaNgFIyCUTAKEAAAx/9jbkOpogEAAAAASUVORK5CYII=","orcid":"","institution":"University of Toledo","correspondingAuthor":true,"prefix":"","firstName":"Scott","middleName":"M.","lastName":"Pappada","suffix":""}],"badges":[],"createdAt":"2026-02-10 15:56:03","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-8842924/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-8842924/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":103167909,"identity":"365c1da3-ab08-4673-86f7-9767d88639bd","added_by":"auto","created_at":"2026-02-22 12:56:13","extension":"jpg","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":952547,"visible":true,"origin":"","legend":"\u003cp\u003eComprehensive workflow of the multimodal physiological assessment approach for clinical competency classification in simulation-based medical education.\u003c/p\u003e","description":"","filename":"Figure1.jpg","url":"https://assets-eu.researchsquare.com/files/rs-8842924/v1/c4229bae85be405491feb598.jpg"},{"id":103505489,"identity":"c5f46d30-2247-41c1-8811-4a5a0ae95cd3","added_by":"auto","created_at":"2026-02-26 13:31:26","extension":"jpg","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":444352,"visible":true,"origin":"","legend":"\u003cp\u003eRepresentative heart rate variability patterns during simulation-based medical training. \u003cstrong\u003e(a)\u003c/strong\u003e Competent performer (Play ID 1897, left) demonstrating adaptive physiological control with gradual rise from below baseline to stable plateau around 0.02-0.04, maintaining minimal volatility throughout the task. \u003cstrong\u003e(b)\u003c/strong\u003e Novice performer (Play ID 1176, right) exhibiting maladaptive stress response characterized by excessive initial reactivity (peak 0.2), followed by dramatic collapse to -0.1 below baseline and subsequent erratic fluctuations, indicating poor physiological regulation under pressure.\u003c/p\u003e","description":"","filename":"Figure2.jpg","url":"https://assets-eu.researchsquare.com/files/rs-8842924/v1/6e130bba15ce17548386fdec.jpg"},{"id":103509399,"identity":"641353be-28d5-43fe-8fce-81c4db7173fa","added_by":"auto","created_at":"2026-02-26 13:58:37","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":2495130,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-8842924/v1/3ef342f4-1f8e-4704-916a-9103d2589f3c.pdf"},{"id":103167912,"identity":"42a8856a-d817-4cc3-b595-bd4b007a31c3","added_by":"auto","created_at":"2026-02-22 12:56:14","extension":"docx","order_by":2,"title":"","display":"","copyAsset":false,"role":"supplement","size":28220,"visible":true,"origin":"","legend":"","description":"","filename":"SupplementaryInformationBCMUToledo.docx","url":"https://assets-eu.researchsquare.com/files/rs-8842924/v1/934b89fb642a1ebec9596a9b.docx"}],"financialInterests":"No competing interests reported.","formattedTitle":"Multimodal Physiological Assessment for Clinical Competency Classification in Simulation-Based Medical Education: A Machine Learning Approach","fulltext":[{"header":"INTRODUCTION","content":"\u003cp\u003eAssessing clinical competence among healthcare professionals is one of the most critical challenges in medical education, with direct implications for patient safety. Medical errors are recognized as the third leading cause of death in the United States, resulting in over 250,000 deaths annually (Makary and Daniel 2016). Many of these preventable adverse events can be traced to inadequate training or insufficient assessment of healthcare professionals before they enter independent practice (Makary and Daniel 2016; Shoja et al. 2025). Patient care has become increasingly complex, demanding higher standards of practice, yet traditional evaluation methods have revealed significant limitations. Current assessment paradigms rely predominantly on checklists and binary pass/fail determinations (Berendonk, Stalmeijer, and Schuwirth 2013), which fail to capture the nuanced competencies required for safe clinical practice(McKinley et al. \u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e2008\u003c/span\u003e). This assessment gap creates a critical disconnect between formal certification and actual operational readiness. The Accreditation Council for Graduate Medical Education (ACGME) has established six interdependent competencies that residents must achieve (Batalden et al. \u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2002\u003c/span\u003e), each mapping to distinct domains within Bloom's taxonomy (Yanofsky and Nyquist 2010). Meeting these multifaceted standards requires assessment tools capable of evaluating complex, integrated performance under realistic conditions; yet existing methodologies remain fundamentally inadequate for identifying learners who may struggle under clinical stress.\u003c/p\u003e \u003cp\u003eTo address these challenges, medical educators have explored various innovative approaches to training and evaluation. Simulation-based medical education (SBME) has emerged as a promising approach, offering controlled environments in which trainees can develop skills without risk to patients (Elendu et al. \u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e2024\u003c/span\u003e; Komasawa and Yokohira n.d.). Consider a typical emergency simulation: a resident managing a deteriorating patient must simultaneously process vital signs, communicate with team members, and make rapid decisions while their stress levels fluctuate dramatically. Current assessment approaches, including global rating scales, Objective Structured Clinical Examinations (OSCEs), and Objective Structured Assessment of Technical Skills (OSATS) (Elabd et al. n.d.; Zoller et al. \u003cspan citationid=\"CR39\" class=\"CitationRef\"\u003e2021\u003c/span\u003e), have improved standardization but continue to rely on human judgment and lack objectivity.\u003c/p\u003e \u003cp\u003eRecent advances in wearable sensor technology have opened new possibilities for capturing objective correlates of clinical performance(Virgillito, Catalfo, and Ledda 2025). Physiological indicators of stress and cognitive workload (Boffet et al. \u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e2025\u003c/span\u003e; Howie et al. \u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e2024\u003c/span\u003e; Weenk et al. \u003cspan citationid=\"CR37\" class=\"CitationRef\"\u003e2018\u003c/span\u003e) have been shown to predict performance outcomes; however, this relationship is complex and far from uniform. In simulation-based medical education and clinical performance research, physiological arousal has been linked to both performance enhancement and impairment, depending on contextual factors such as task complexity, learner experience, and time pressure. Some studies report that elevated stress responses, as measured by heart rate variability, electrodermal activity, or cortisol, are associated with poorer technical or decision-making performance (Gellisch et al. \u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e2024\u003c/span\u003e; Peek, Moore, and Arnold 2023; Vage et al. \u003cspan citationid=\"CR35\" class=\"CitationRef\"\u003e2024\u003c/span\u003e), whereas others suggest that moderate activation or task-specific stress may facilitate engagement and situational awareness (Joseph et al. \u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e2022\u003c/span\u003e; Kim et al. \u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e2018\u003c/span\u003e; Nakayama et al. \u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e2018\u003c/span\u003e; Solhjoo et al. \u003cspan citationid=\"CR33\" class=\"CitationRef\"\u003e2019\u003c/span\u003e). Still others find no consistent association, underscoring that stress responses and performance are not linearly coupled but are dynamically modulated by individual and situational variables (Brooks, J., J.C. Crone, and D.P. Spangler, 2021; McDaniel et al. \u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e2025\u003c/span\u003e; Pappada et al. \u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e2022\u003c/span\u003e). This growing body of divergent evidence underscores the need for data-driven, multimodal approaches that empirically map how different physiological states correspond to clinical competency during simulated patient care.\u003c/p\u003e \u003cp\u003eBeyond performance outcomes, clinicians' emotional states can directly influence procedural precision and decision quality. Prior research has demonstrated that heart rate variability (HRV) is a reliable index of both stress and cognitive load, reflecting autonomic regulation during complex task performance (Boffet et al. \u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e2025\u003c/span\u003e; Joseph et al. \u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e2022\u003c/span\u003e; Kim et al. \u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e2018\u003c/span\u003e; Nakayama et al. \u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e2018\u003c/span\u003e; Solhjoo et al. \u003cspan citationid=\"CR33\" class=\"CitationRef\"\u003e2019\u003c/span\u003e). Moreover, neuroimaging evidence suggests that the ventromedial prefrontal cortex, a region central to risk appraisal and emotional regulation, serves as a neural substrate linking HRV to adaptive decision-making under pressure (Thayer et al. \u003cspan citationid=\"CR34\" class=\"CitationRef\"\u003e2012\u003c/span\u003e). Collectively, these findings support the inclusion of HRV-derived features in the proposed modeling framework as physiologically meaningful indicators of learners\u0026rsquo; stress responses and regulatory capacity.\u003c/p\u003e \u003cp\u003eBy leveraging continuous multimodal physiological data and machine learning, the present study moves beyond previous theoretical assumptions to empirically test how patterns of physiological activation correspond to observable clinical competence across multiple simulation scenarios and diverse learner populations. This work represents a significant advancement over prior efforts by integrating synchronized physiological responses with expert performance evaluations using a neural network\u0026ndash;based analytic framework. The resulting model identifies subtle, nonlinear relationships between stress responses and competency levels, offering an objective approach to classifying learner performance. By analyzing data from emergency medicine residents, anesthesiology residents, and EMS students, we demonstrate that multimodal physiological features can distinguish competent practitioners from those requiring additional support. Although this study was conducted in simulated environments, the framework establishes a foundation for scalable, data-driven competency assessment that can ultimately enhance training precision and improve patient safety.\u003c/p\u003e"},{"header":"METHODS","content":"\u003cdiv id=\"Sec3\" class=\"Section2\"\u003e \u003ch2\u003eStudy Design and Data Collection\u003c/h2\u003e \u003cp\u003eThis prospective observational study was conducted at an academic medical simulation center between 2020 and 2025 to develop and validate a machine-learning-based framework supporting objective competency classification. The study received Institutional Review Board approval, and written informed consent was gathered from all participants. Subjects were informed that performance data would be used exclusively for research purposes and would not impact their academic standing. Figure\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e illustrates the sequential methodology encompassing data collection, processing, and model development and validation.\u003c/p\u003e \u003cp\u003eThe study population comprised 152 healthcare learners: 74 Emergency Medicine residents (PGY-1\u0026ndash;3), 70 Anesthesiology residents (PGY-1\u0026ndash;4), and 8 Emergency Medical Services (EMS) students. Demographic characteristics are summarized as follows: 69.5% male and 30.5% female; with most participants aged 26\u0026ndash;30 years (64.6%), followed by 22.0% aged 31 and older, 9.8% aged 21\u0026ndash;25, and 3.7% aged 18\u0026ndash;20. The cohort was predominantly White/Caucasian (76.8%), with 13.4% Asian, 6.1% as Black/African American, and 3.7% reporting other ethnicities. All participants completed high-fidelity simulation scenarios (15\u0026ndash;20 minutes each) conducted in a standardized, temperature-controlled simulation environment (20\u0026ndash;22\u0026deg;C, with consistent lighting).\u003c/p\u003e \u003cp\u003eData collection leveraged a custom-developed platform, PREPARE (PREdiction of Healthcare Provider Skill Acquisition and Future Training REquirements) (Pappada et al. \u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e2022\u003c/span\u003e), which synchronized multimodal data streams. The platform utilizes hierarchical measurement mapping to assess performance across cognitive, psychomotor, and behavioral domains. These broad domains are further aligned with specific competencies, including clinical decision-making, medical knowledge, task efficiency, communication, and judgment (Pappada et al. \u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e2022\u003c/span\u003e).\u003c/p\u003e \u003cp\u003e\u0026nbsp;PREPARE\u0026apos;s assessment engine centers on defining \u0026quot;learning events\u0026quot;, which are preprogrammed, scenario-specific critical moments representing essential competency demonstrations. Expert instructors evaluated each event in real-time, assigning both categorical classifications (novice/competent/expert) and continuous performance scores (0-100). \u0026nbsp;For model training purposes, competent and expert categories were merged into a binary label: competent (1) versus novice (0). Because of operational constraints and clinical scheduling demands, duplicate assessments for formal inter-rater reliability were not feasible. To minimize variability, all faculty underwent standardized PREPARE training and applied consensus-based definitions for competency.\u003c/p\u003e\n\u003cp\u003eLearner physiological data were collected via Empatica E4 wristbands(E4 wristband | Real-time physiological signals | Wearable PPG, EDA, Temperature, Motion sensors n.d.), which captured electrodermal activity at 4 Hz, blood volume pulse at 64 Hz, skin temperature at 4 Hz, and accelerometry at 32 Hz. Custom signal-processing algorithms derived physiological measures reflecting autonomic nervous system activity associated with stress responses. A complete list of derived measures comprising the final model feature set is provided in the supplementary materials document accompanying this manuscript (Supplementary Material Table A1).\u003c/p\u003e \u003cp\u003ePhysiological data were temporally aligned with instructor-rated events to classify learner competency. Instructors were trained to record assessments immediately after each observed task, ensuring synchronization between physiological variability and corresponding performance actions. Retrospective analysis revealed occasional deviations from protocol, including temporal inconsistencies in rating entry. The operational demands of real-time simulation assessment impose challenges: instructors must monitor multiple learners, manage scenario flow, operate simulation equipment, and provide feedback concurrently. These simultaneous responsibilities occasionally led to delayed rating entries, thereby decoupling physiological signals from corresponding performance timing. Because genuine clinical learning events rarely occur in rapid succession, temporally clustered ratings were interpreted as retrospective batch entries rather than real-time assessments.\u003c/p\u003e \u003cp\u003eRigorous quality-control procedures were implemented to ensure data integrity. The initial dataset comprised 2,584 instructor assessments; after screening, 730 entries were excluded based on two predefined criteria. First, 291 late assessment events were identified through temporal clustering, in which all ratings occurred within 7 seconds of one another, most frequently near the end of scenarios. Second, 439 events were removed due to physiologically implausible signal ranges and values. Thes removed values were defined as electrodermal activity outside the range of 0.01\u0026ndash;100 \u0026micro;S, heart rate outside the range of 40\u0026ndash;200 bpm, or skin temperature outside the range of 28\u0026ndash;38\u0026deg;C, values suggestive of motion artifacts, sensor detachment, or loss of contact.\u003c/p\u003e \u003c/div\u003e\n\u003ch3\u003eSignal Processing and Feature Selection\u003c/h3\u003e\n\u003cp\u003ePhysiological signals underwent preprocessing using bandpass filtering at 0.5-5 Hz for blood volume pulse and 0.05-1 Hz for electrodermal activity to isolate relevant frequencies. Individual baseline states were identified using a sliding-window algorithm that detected stable physiological periods lasting at least 15 seconds, and deviation from baseline metrics was computed by standardizing values relative to these individualized resting states.\u003c/p\u003e \u003cp\u003eWe employed a suite of custom-built algorithms to extract features from raw physiological data within event-centered windows corresponding to instructor-rated clinical performance moments. This windowing approach captured temporal fluctuations in physiological activity that reflect acute stress and cognitive workload dynamics influencing performance quality. To accommodate the differing temporal characteristics of each biosignal, we randomly sampled event-centered windows of 5 to 180 seconds (in 5-second increments; \u003cem\u003en\u003c/em\u003e\u0026thinsp;=\u0026thinsp;1,296 configurations), testing 250 configurations per signal type. Electrodermal responses demonstrated rapid phasic changes associated with cognitive and emotional stress (Benedek and Kaernbach \u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e2010\u003c/span\u003e; Posada-Quintero et al. 2018; Rahma et al. \u003cspan citationid=\"CR30\" class=\"CitationRef\"\u003e2022\u003c/span\u003e), whereas cardiovascular indices, such as heart rate variability, evolved more gradually through autonomic regulation (Kasahara et al. \u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e2021\u003c/span\u003e; Mendelowitz, D. \u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e1999\u003c/span\u003e; Robinson et al. \u003cspan citationid=\"CR31\" class=\"CitationRef\"\u003e1966\u003c/span\u003e). Peripheral temperature fluctuations occurred over longer timescales, necessitating signal-specific optimization of window parameters for feature extraction.\u003c/p\u003e \u003cp\u003eUnlike previous studies that measured stress at single time points (Alasmari et al. \u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e2025\u003c/span\u003e; Campanella et al. \u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e2023\u003c/span\u003e), our analysis across extended windows revealed three phases distinguishing expertise levels: anticipatory arousal before events, acute stress responses during events, and recovery patterns after completion. Heart rate optimally discriminated competency using 125-second pre-event and 70-second post-event windows, whereas electrodermal activity required more extended periods (80 seconds before and 180 seconds after) due to slower response dynamics. This temporal approach revealed competency differences not only in stress magnitude but also across entire physiological timelines. Statistical features computed within these windows included mean, standard deviation, minimum, maximum, range, skewness, kurtosis, and cumulative response for both absolute and baseline-relative values, generating 73 candidate features.\u003c/p\u003e \u003cp\u003eTo identify the most informative biomarkers while avoiding overfitting, we employed a genetic algorithm operating over 50 generations with 100 candidate feature subsets per generation. A crossover rate of 0.7 and a mutation rate of 0.1 balanced exploration and exploitation. We retained the top 50 models from independent runs, stratifying features by selection frequency: high-confidence (\u0026ge;\u0026thinsp;75% of models), moderate-confidence (50\u0026ndash;74%), and exploratory (20\u0026ndash;49%). The algorithm converged on 65 optimal features (11% dimensionality reduction while maintaining performance), comprising 28 heart rate measures (43%), 22 electrodermal activity indicators (34%), and 15 temperature parameters (23%). This distribution reflects the primary role of cardiovascular responses in competency discrimination, consistent with the literature establishing heart rate variability as a key indicator of stress regulation and cognitive load. A list of included model features is included in the Appendix (Supplementary Materials) of this manuscript.\u003c/p\u003e\n\u003ch3\u003eBinary Classification Model Development\u003c/h3\u003e\n\u003cp\u003eAnalysis of our dataset revealed a significant class imbalance (80.6% competent and 19.4% novice ratings), indicating instructors overwhelmingly classified learners as competent. Developing an objective, data-driven method to identify novice learners lacking operational readiness is essential for ensuring patient safety. To address class imbalance, we implemented inverse-frequency class weighting, which assigns higher importance to minority class examples during training. This approach ensures that the model pays equal attention to both competent and struggling learners without creating artificial data points that could distort genuine physiological stress-response patterns.\u003c/p\u003e \u003cp\u003eWe developed a feedforward neural network model to capture the complex, nonlinear relationships between physiological responses and performance outcomes. Neural networks are particularly suited for this application as they can identify subtle patterns in multivariate physiological data that may not be apparent through traditional statistical methods. To prevent overfitting, we applied dropout regularization (randomly deactivating 30% of neurons during training) and L2 weight regularization (λ\u0026thinsp;=\u0026thinsp;0.01) to constrain model complexity.\u003c/p\u003e \u003cp\u003eModel training used the Adam optimizer with an initial learning rate of 0.001 and exponential decay to allow adaptive convergence. Early stopping was implemented to halt training at 20 epochs without improvement in validation loss. The model was designed with a sigmoid transfer function in its output axon, which served to generate a model output ranging from 0 to 1, where higher output values closer to 1 represent learner competency/operational readiness. Models were developed using Python 3.9, TensorFlow 2.4, NumPy 1.19, and scikit-learn 0.24.\u003c/p\u003e\n\u003ch3\u003eModel Validation and Statistical Analysis\u003c/h3\u003e\n\u003cp\u003eTo evaluate model performance on completely unseen subjects, we employed a 20-fold leave-N-out validation strategy, where each fold excluded a unique subset of subjects. For each fold, five independent trials were conducted, yielding a total of 100 model instances for comprehensive performance assessment. This approach tests the model's ability to classify entirely new individuals rather than just new events from known learners, with the held-out test set providing final performance verification.\u003c/p\u003e \u003cp\u003ePerformance metrics included sensitivity for identifying competent learners, specificity (for detecting novices who require additional support), Matthews Correlation Coefficient (MCC) to balance performance in imbalanced datasets, and both ROC-AUC and PR-AUC to assess discrimination across varying classification thresholds. In this educational context, true negatives (which represent novice trainees correctly identified as not operationally ready) are particularly critical, as such classifications directly prevent underprepared learners from advancing to independent clinical practice, thereby safeguarding patient safety. Specificity is thus emphasized as an essential performance metric for this effort, ensuring that learners needing further training are accurately flagged. The MCC was chosen as the primary optimization metric because it reflects all four classification outcomes with equal weighting and is robust in imbalanced scenarios, such as this dataset, where competent learners substantially outnumber novices (80.6% versus 19.4%). This balanced metric helps prioritize the accurate identification of struggling learners rather than merely maximizing accuracy by favoring the majority class.\u003c/p\u003e \u003cp\u003eMetric confidence intervals were computed via bootstrap resampling (10,000 iterations) to enable robust statistical inference. To determine the optimal classification threshold for model output, we evaluated 21 unique threshold values (see Appendix/Supplementary Materials) and selected the one that maximized MCC while maintaining balanced sensitivity and specificity. Statistical significance was assessed using Friedman\u0026rsquo;s analysis of variance by ranks, with post hoc Nemenyi tests (α\u0026thinsp;=\u0026thinsp;0.05), to evaluate the effects of the threshold on model performance metrics.\u003c/p\u003e"},{"header":"RESULTS","content":"\u003cdiv id=\"Sec8\"\u003e\n \u003ch2\u003eDataset Characteristics\u003c/h2\u003e\n \u003cp\u003eThe final dataset comprised 1,854 high-fidelity learning events from 152 healthcare learners across 470 simulation scenarios, representing a 71.8% data retention rate following quality-control exclusions. Class distribution showed 360 novice (19.4%) and 1,494 competent (80.6%) classifications, with variation across specialties: Emergency Medicine residents demonstrated 87.0% competency rates, Anesthesiology residents showed 75.3% competency, and EMS students exhibited 46.3% competency, potentially reflecting their respective training stages and experience levels. Although the dataset was imbalanced, as is typical in competency-based assessments, a post-hoc power analysis confirmed \u0026gt; 0.999 statistical power (Cohen's h = 1.32, α = 0.05) to detect meaningful competency differences.\u003c/p\u003e\n\u003c/div\u003e\n\u003ch3\u003eModel Performance\u003c/h3\u003e\n\u003cp\u003eModel results are summarized in Table\u0026nbsp;1. Classification performance remained stable across the optimal classification threshold range of 0.45–0.70 (see Table\u0026nbsp;1), achieving balanced accuracy of 84.1–84.8% and MCC of 0.687–0.706 despite significant class imbalance. Higher thresholds captured more true novices at the expense of additional false negatives. Increasing the threshold from 0.45 to 0.70 improved specificity from 72.9% to 76.2%, while sensitivity decreased modestly from 95.4% to 93.3%. A table of model performance across all model classification thresholds is included in the Supplementary Materials accompanying this manuscript (Table A2).\u003c/p\u003e\n\u003cdiv\u003e\n\u003c/div\u003e\n\u003cp\u003eThe Precision-Recall AUC (0.969–0.970) confirmed robust discrimination across thresholds. Positive predictive values remained stable at 94.0-94.6%, ensuring that learners classified as competent overwhelmingly demonstrate true competency at any threshold within this range. False-positive counts declined from 217 to 190 as thresholds increased, reducing unnecessary remediation for competent learners, whereas true-competent identifications remained stable (3,376 to 3,304). This performance stability enables programs to calibrate their assessment systems according to institutional priorities, whether emphasizing the maximum detection of novices who need support, minimizing disruption for competent learners, or striking a balance between these objectives based on available training resources and risk tolerance.\u003c/p\u003e\n\u003ch3\u003ePhysiological Biomarker Analysis\u003c/h3\u003e\n\u003cp\u003eRepresentative heart-rate deviation (from baseline) profiles (Fig.\u0026nbsp;2) illustrate distinct physiological patterns that distinguish competent from novice performers during simulation-based training. The competent learner (left panel) demonstrated adaptive autonomic regulation, characterized by an initial anticipatory suppression of approximately 20% below baseline, followed by a stable and modest elevation of 2–5% elevation and minimal oscillation throughout the task. In contrast, the novice learner (right panel) exhibited physiological dysregulation patterns: 22% surge above baseline, a rapid 7% drop below baseline, and persistent fluctuations between + 17% and − 5%. These distinct patterns, stable adaptation in contrast to dysregulated oscillation, suggest that heart rate changes and variability relative to baseline could serve as objective biomarkers of competency and clinical readiness, supporting their inclusion as model input features for automated performance assessment.\u003c/p\u003e\n\u003cdiv\u003e\n \u003ctable id=\"Tab2\" border=\"1\"\u003e\n \u003ccaption language=\"En\"\u003e\n \u003cdiv\u003eTable 1\u003c/div\u003e\n \u003cdiv\u003e\n \u003cp\u003eModel performance metrics across classification thresholds for competency assessment. Values represent mean ± standard deviation across 20 leave-N-out validation folds. The optimal threshold (0.50, shown in bold) maximizes Matthews Correlation Coefficient (MCC) while balancing sensitivity and specificity. Confusion matrix components (n_TP, n_FP, n_TN, n_FN) represent total counts aggregated across all validation folds. MCC = Matthews Correlation Coefficient; PPV = Positive Predictive Value (Precision); NPV = Negative Predictive Value; TP = True Positives; FP = False Positives; TN = True Negatives; FN = False Negatives.(Values of n are cumulative across all validation folds)\u003c/p\u003e\n \u003c/div\u003e\n \u003c/caption\u003e\n \u003ccolgroup cols=\"11\"\u003e\u003c/colgroup\u003e\n \u003cthead\u003e\n \u003ctr\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eThreshold\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eBalanced Accuracy\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eMCC\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eSensitivity\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eSpecificity\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003ePPV\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eNPV\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003en_TP\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003en_FP\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003en_TN\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003en_FN\u003c/p\u003e\n \u003c/th\u003e\n \u003c/tr\u003e\n \u003c/thead\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.30\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e\u003cem\u003e0.833 ± 0.036\u003c/em\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e\u003cem\u003e0.700 ± 0.057\u003c/em\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e\u003cem\u003e0.959 ± 0.018\u003c/em\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e\u003cem\u003e0.708 ± 0.076\u003c/em\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e\u003cem\u003e0.936 ± 0.015\u003c/em\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e\u003cem\u003e0.803 ± 0.068\u003c/em\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e\u003cem\u003e3395\u003c/em\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e\u003cem\u003e234\u003c/em\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e\u003cem\u003e566\u003c/em\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e\u003cem\u003e145\u003c/em\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.35\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e\u003cem\u003e0.837 ± 0.038\u003c/em\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e\u003cem\u003e0.702 ± 0.056\u003c/em\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e\u003cem\u003e0.956 ± 0.020\u003c/em\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e\u003cem\u003e0.719 ± 0.085\u003c/em\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e\u003cem\u003e0.938 ± 0.017\u003c/em\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e\u003cem\u003e0.796 ± 0.068\u003c/em\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e\u003cem\u003e3385\u003c/em\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e\u003cem\u003e225\u003c/em\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e\u003cem\u003e575\u003c/em\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e\u003cem\u003e155\u003c/em\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.40\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e\u003cem\u003e0.837 ± 0.036\u003c/em\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e\u003cem\u003e0.702 ± 0.057\u003c/em\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e\u003cem\u003e0.957 ± 0.020\u003c/em\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e\u003cem\u003e0.716 ± 0.078\u003c/em\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e\u003cem\u003e0.938 ± 0.016\u003c/em\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e\u003cem\u003e0.797 ± 0.068\u003c/em\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e\u003cem\u003e3388\u003c/em\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e\u003cem\u003e227\u003c/em\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e\u003cem\u003e573\u003c/em\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e\u003cem\u003e152\u003c/em\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003e0.45\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e\u003cstrong\u003e0.841 ± 0.041\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e\u003cstrong\u003e0.704 ± 0.059\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e\u003cstrong\u003e0.954 ± 0.022\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e\u003cstrong\u003e0.729 ± 0.090\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e\u003cstrong\u003e0.940 ± 0.018\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e\u003cstrong\u003e0.788 ± 0.067\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e\u003cstrong\u003e3376\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e\u003cstrong\u003e217\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e\u003cstrong\u003e583\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e\u003cstrong\u003e164\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003e0.50\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e\u003cstrong\u003e0.845 ± 0.042\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e\u003cstrong\u003e0.706 ± 0.061\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e\u003cstrong\u003e0.952 ± 0.023\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e\u003cstrong\u003e0.739 ± 0.093\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e\u003cstrong\u003e0.942 ± 0.019\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e\u003cstrong\u003e0.784 ± 0.067\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e\u003cstrong\u003e3369\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e\u003cstrong\u003e209\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e\u003cstrong\u003e591\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e\u003cstrong\u003e171\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003e0.55\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e\u003cstrong\u003e0.845 ± 0.039\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e\u003cstrong\u003e0.696 ± 0.061\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e\u003cstrong\u003e0.944 ± 0.025\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e\u003cstrong\u003e0.746 ± 0.086\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e\u003cstrong\u003e0.943 ± 0.018\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e\u003cstrong\u003e0.760 ± 0.069\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e\u003cstrong\u003e3343\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e\u003cstrong\u003e203\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e\u003cstrong\u003e597\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e\u003cstrong\u003e197\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003e0.60\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e\u003cstrong\u003e0.847 ± 0.040\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e\u003cstrong\u003e0.695 ± 0.063\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e\u003cstrong\u003e0.943 ± 0.025\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e\u003cstrong\u003e0.751 ± 0.086\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e\u003cstrong\u003e0.944 ± 0.018\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e\u003cstrong\u003e0.755 ± 0.069\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e\u003cstrong\u003e3337\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e\u003cstrong\u003e199\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e\u003cstrong\u003e601\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e\u003cstrong\u003e203\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003e0.65\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e\u003cstrong\u003e0.847 ± 0.043\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e\u003cstrong\u003e0.690 ± 0.064\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e\u003cstrong\u003e0.935 ± 0.039\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e\u003cstrong\u003e0.760 ± 0.104\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e\u003cstrong\u003e0.946 ± 0.021\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e\u003cstrong\u003e0.744 ± 0.098\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e\u003cstrong\u003e3309\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e\u003cstrong\u003e192\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e\u003cstrong\u003e608\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e\u003cstrong\u003e231\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003e0.70\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e\u003cstrong\u003e0.848 ± 0.041\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e\u003cstrong\u003e0.687 ± 0.066\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e\u003cstrong\u003e0.933 ± 0.036\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e\u003cstrong\u003e0.762 ± 0.096\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e\u003cstrong\u003e0.946 ± 0.020\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e\u003cstrong\u003e0.737 ± 0.093\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e\u003cstrong\u003e3304\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e\u003cstrong\u003e190\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e\u003cstrong\u003e610\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e\u003cstrong\u003e236\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.75\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e\u003cem\u003e0.847 ± 0.043\u003c/em\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e\u003cem\u003e0.682 ± 0.070\u003c/em\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e\u003cem\u003e0.934 ± 0.025\u003c/em\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e\u003cem\u003e0.760 ± 0.090\u003c/em\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e\u003cem\u003e0.946 ± 0.019\u003c/em\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e\u003cem\u003e0.728 ± 0.071\u003c/em\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e\u003cem\u003e3305\u003c/em\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e\u003cem\u003e192\u003c/em\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e\u003cem\u003e608\u003c/em\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e\u003cem\u003e235\u003c/em\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n \u003c/table\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec11\"\u003e\n \u003ch2\u003eCross-Domain Performance Analysis\u003c/h2\u003e\n \u003cp\u003eThe model demonstrated robust classification accuracy across medical disciplines and skill domains (Table\u0026nbsp;2). Emergency Medicine assessments achieved the highest true positive rate (87.0%) with balanced sensitivity (95.4%) and specificity (73.8%). Anesthesiology showed optimal sensitivity (97.0%) despite lower true positive rates (69.8%), while EMS exhibited the most balanced class distribution with comparable specificity (77.8%).\u003c/p\u003e\n \u003cdiv\u003e\n \u003c/div\u003e\n \u003cdiv\u003e\n \u003ctable id=\"Tab4\" border=\"1\"\u003e\n \u003ccaption language=\"En\"\u003e\n \u003cdiv\u003eTable 2\u003c/div\u003e\n \u003cdiv\u003e\n \u003cp\u003eClassification performance metrics across medical disciplines at model classification threshold of 0.50. True Positives (TP), False Positives (FP), True Negatives (TN), and False Negatives (FN) are shown as counts with percentages of total assessments. Emergency Medicine demonstrated the highest true positive rate despite class imbalance, while EMS showed more balanced class distribution. Sensitivity remained above 83% across all disciplines, with Anesthesiology achieving the highest value (97.0%). Balanced accuracy ranged from 80.5% to 85.0%, indicating consistent model performance across diverse clinical training contexts.\u003c/p\u003e\n \u003c/div\u003e\n \u003c/caption\u003e\n \u003ccolgroup cols=\"9\"\u003e\u003c/colgroup\u003e\n \u003cthead\u003e\n \u003ctr\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eDiscipline\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003en\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eTP (%)\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eFP (%)\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eTN (%)\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eFN (%)\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eSensitivity\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eSpecificity\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eBalanced Accuracy\u003c/p\u003e\n \u003c/th\u003e\n \u003c/tr\u003e\n \u003c/thead\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003eEmergency Medicine\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e2,573\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e2,239 (87.0%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e59 (2.3%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e166 (6.5%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e109 (4.2%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e95.4%\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e73.8%\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e84.6%\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003eAnesthesiology\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e1,170\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e817 (69.8%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e89 (7.6%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e239 (20.4%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e25 (2.1%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e97.0%\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e72.9%\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e85.0%\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003eEMS\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e380\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e144 (37.9%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e46 (12.1%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e161 (42.4%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e29 (7.6%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e83.2%\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e77.8%\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e80.5%\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n \u003c/table\u003e\n \u003c/div\u003e\n\u003c/div\u003e"},{"header":"DISCUSSION","content":"\u003cp\u003eThis study demonstrates that physiological biomarkers alone derived from a wrist-worn wearable device can contribute to scalable and reliable estimation of clinical competency in simulation-based medical education. Through comprehensive leave-N-out validation across 100 independent model instances, the findings provide strong evidence of generalization across multiple learner populations and clinical domains. Unlike earlier frameworks that inferred performance from theoretical models of stress or arousal, this work establishes empirical, data-driven mappings between physiological activation and demonstrated clinical competence. The results align with and extend previous research, which has shown that physiological indicators such as heart-rate variability and electrodermal activity reflect stress regulation and cognitive workload(Joseph et al. \u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e2022\u003c/span\u003e; Kim et al. \u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e2018\u003c/span\u003e; Nakayama et al. \u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e2018\u003c/span\u003e; Solhjoo et al. \u003cspan citationid=\"CR33\" class=\"CitationRef\"\u003e2019\u003c/span\u003e). This research also addresses the mixed and sometimes contradictory findings regarding whether heightened stress impairs or facilitates performance(Cavaleri et al. \u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e2023\u003c/span\u003e; Vage et al. \u003cspan citationid=\"CR35\" class=\"CitationRef\"\u003e2024\u003c/span\u003e). By integrating multimodal, temporally resolved physiological data with machine-learning-based classification, the present study moves beyond correlational approaches to provide predictive and generalizable models of competency.\u003c/p\u003e \u003cp\u003eSimulation-based medical education (SBME) has long been recognized as a crucial tool for enhancing clinical performance and improving patient safety. Foundational reviews and meta-analyses have consistently demonstrated that technology-enhanced simulation combined with deliberate practice assists in clinical skills acquisition with respect to traditional clinical education (McGaghie et al. \u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e2011\u003c/span\u003e). Additional evidence from mastery-learning frameworks further suggests that simulation can help equalize performance gaps among learners, although assessment methods often still rely on subjective expert judgment and checklists (Cook et al. \u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e2013\u003c/span\u003e). These limitations highlight a persistent need for objective, continuous, and scalable assessment techniques that supplement human raters. The current study addresses this gap by developing a physiological signal\u0026ndash;based competency classifier that integrates smoothly with expert evaluation frameworks. This integration can support evidence of validity, reproducibility, and scalability in performance assessment, which are key challenges highlighted in previous simulation research(Okuda et al. \u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e2009\u003c/span\u003e).\u003c/p\u003e \u003cp\u003eThe classification framework achieved consistent performance (balanced accuracy 80.5\u0026ndash;85.0%) across specialties using physiological data alone, independent of discipline or scenario type. This finding supports the potential validity of physiological stress responses as candidate biomarkers of clinical competency. The model maintained robust accuracy despite the likely variability in instructor ratings, reinforcing its value as a complementary measure that enhances, rather than replaces, expert judgment. Together, these findings support the generalizability of psychophysiological assessment frameworks across diverse medical training contexts, extending prior work that was typically limited to discipline-specific or single-institution samples (Nakayama et al. \u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e2018\u003c/span\u003e; Solhjoo et al. \u003cspan citationid=\"CR33\" class=\"CitationRef\"\u003e2019\u003c/span\u003e).\u003c/p\u003e \u003cp\u003eThe challenges inherent in real-time instructor assessment during simulation underscore the value of multimodal assessment platforms such as PREPARE. Instructors frequently manage multiple concurrent tasks, such as scenario flow, learner safety, equipment operation, and feedback delivery, while simultaneously evaluating learner performance. Automated physiological monitoring can capture objective data even when an instructor's attention is divided, ensuring that key learning events are documented with corresponding physiological signatures. This augmentation of expert judgment through continuous, sensor-derived data contributes to more complete and accurate learner evaluations, reducing both missed observations and rater fatigue.\u003c/p\u003e \u003cp\u003eDespite these promising findings, several limitations should be acknowledged. First, the absence of a formal interrater reliability assessment introduces uncertainty regarding the precision of instructor-based labels used for model training. Although the faculty received standardized calibration, agreement between raters was not quantified. Additionally, a tendency toward lenient instructor evaluations may have inflated positive classifications, meaning some \u0026ldquo;false negatives\u0026rdquo; identified by the model may, in fact, reflect accurate detection of struggling learners. Future work should incorporate multi-rater adjudication and video-based secondary reviews to strengthen label validity. The class imbalance, though mitigated through inverse-frequency weighting, may still limit the diversity of novice patterns captured during training. Similarly, the smaller sample of EMS trainees compared to graduate medical residents limits the generalizability of early-stage learners. While the use of simulation rather than clinical data limits ecological validity, this controlled design enabled rigorous signal processing and systematic model validation. Multi-institutional and clinical replications will be essential to confirm the robustness of the findings across different contexts.\u003c/p\u003e \u003cp\u003eThe immediate application of this framework lies in its ability to inform adaptive, learner-centered training pathways. Novice classifications can be mapped to standardized hierarchical competencies within PREPARE, ranging from general cognitive and psychomotor skills to scenario-specific clinical judgments. Such mappings could guide instructors in prescribing targeted remediation and measuring progress over time. Expanding from binary (novice/competent) to multi-level proficiency classification (novice/competent/expert) will further support precision feedback and developmental tracking.\u003c/p\u003e \u003cp\u003eOngoing work focuses on enhancing system intelligence and data fusion within the PREPARE platform by integrating audio, video, and text-based features. Advanced analytics, including natural language processing (Paudel, Pappada, and Cheng 2023) and large language models, are now being incorporated for automated event detection and to improve multimodal data synchronization. These extensions aim to strengthen the interpretive depth of physiological signals, linking them with behavioral and contextual indicators of performance. The ultimate goal is to develop a robust, multi-source competency model that supports individualized learning and operational readiness assessment.\u003c/p\u003e \u003cp\u003eIn conclusion, this work advances simulation-based medical education by demonstrating that objective physiological markers can augment traditional assessments. By bridging gaps between stress physiology, educational data science, and clinical competency evaluation, this framework establishes a foundation for scalable, data-driven, and individualized assessment systems. With continued validation and refinement, such models may help identify and support at-risk learners before they enter practice, ultimately contributing to improved patient safety and reduced medical errors.\u003c/p\u003e"},{"header":"CONCLUSION","content":"\u003cp\u003eThis study provides empirical evidence that multimodal physiological data collected during simulated clinical scenarios can help support more objective and reliable competency classification and operational readiness. Using physiological features synchronized with expert evaluations, the developed model achieved strong generalization across diverse learners, maintaining a balanced accuracy of 84.5% and a sensitivity of 95.2% despite class imbalance. By integrating psychophysiological measures with machine learning analytics, this work advances competency assessment beyond checklist-based evaluations toward adaptive, precision education frameworks that tailor feedback and remediation to individual needs. Embedded machine learning-based intelligence within PREPARE platform provides the potential to leverage physiological markers to augment expert judgment, with the potential to enhance both fairness and fidelity in simulation-based evaluation. Future research should focus on multi-institutional validation and the integration of additional behavioral and contextual data streams to refine, personalize, and scale this approach for broader application in health professions training.\u003c/p\u003e"},{"header":"Declarations","content":"\u003cp\u003e\u003cstrong\u003eCompeting Interests:\u003c/strong\u003e The authors have no competing interests to declare that are relevant to the content of this article.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eFunding:\u003c/strong\u003e The models developed during this research are being integrated into the PREPARE platform for Operational Readiness Assessment as part of an effort funded by the US Department of Defense, Congressionally Directed Medical Research Program (Contract #: HT942524C0073) \u0026nbsp; \u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eEthics Statement:\u003c/strong\u003e This study received Institutional Review Board approval from the University of Toledo. All procedures involving human participants were performed in accordance with the ethical standards of the institutional research committee and with the 1964 Declaration of Helsinki and its later amendments. Written informed consent was obtained from all participants. Participants were informed that performance data would be used exclusively for research purposes and would not impact their academic standing.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eConsent to Participate:\u003c/strong\u003e Written informed consent was obtained from all individual participants included in the study.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eConsent to Publish:\u003c/strong\u003e Not applicable. This manuscript does not contain identifiable individual participant data.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAuthor Contributions:\u003c/strong\u003e ST, performed data analysis, supported data acquisition, model development, and writing of the manuscript, project was part of doctoral research. JS, performed data analysis, guided model development, and supporting writing/editing of the manuscript. UY, supported software development efforts of platform required for effort. SA, supported simulation curriculum development/delivery, provided expert faculty assessment of learners, and editing of manuscript. CA, supported simulation curriculum development/delivery , ensured allocation/dedication of simulation center staff and resources to study execution. KJ, supported simulation curriculum development/delivery, provided expert faculty assessment of learners. TP, supported simulation curriculum development/delivery, writing/editing of the manuscript, provided expert faculty assessment of learners. SToy supported writing and editing of manuscript, literature review, and data analysis efforts. KB, supported simulation curriculum development/delivery , ensured allocation/dedication of simulation center staff and resources to study execution. AB, supported simulation curriculum development/delivery, provided expert faculty assessment of learners. BA, supported simulation curriculum development, provided review and editing/feedback on manuscript. SP, guided model development and data analysis efforts, writing/editing of manuscript, led and provided oversight for all study aspects, served as PI of overall study and served as primary advisor of ST for doctoral research project.\u003c/p\u003e\n\u003cp\u003e\u0026nbsp;\u003cstrong\u003eData Availability:\u003c/strong\u003e The datasets generated and/or analyzed during the current study are not publicly available due to their inclusion of learner assessments, which raise privacy considerations for participants. \u0026nbsp;Deidentified data may be made available by the corresponding author upon reasonable request, subject to institutional review board approval, data use agreements, and compliance with ethical and legal restrictions.\u0026nbsp;\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\u003cli\u003e\u003cspan\u003eAlasmari, S., AlGhamdi, R., Tejani, G. G., \u0026amp; Sharma, S. K., and Seyed Jalaleddin Mousavirad (2025). Federated Learning-Based Multimodal Approach for Early Detection and Personalized Care in Cardiac Disease. \u003cem\u003eFrontiers in Physiology\u003c/em\u003e, \u003cem\u003e16\u003c/em\u003e. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.3389/fphys.2025.1563185\u003c/span\u003e\u003cspan address=\"10.3389/fphys.2025.1563185\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBatalden, P., Leach, D., Swing, S., \u0026amp; Dreyfus, H., and Stuart Dreyfus (2002). General Competencies and Accreditation in Graduate Medical Education. \u003cem\u003eHealth Affairs (Project Hope)\u003c/em\u003e, \u003cem\u003e21\u003c/em\u003e(5), 103\u0026ndash;111. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1377/hlthaff.21.5.103\u003c/span\u003e\u003cspan address=\"10.1377/hlthaff.21.5.103\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBenedek, M., \u0026amp; Kaernbach, C. (2010). A Continuous Measure of Phasic Electrodermal Activity. \u003cem\u003eJournal of Neuroscience Methods\u003c/em\u003e, \u003cem\u003e190\u003c/em\u003e(1), 80\u0026ndash;91. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1016/j.jneumeth.2010.04.028\u003c/span\u003e\u003cspan address=\"10.1016/j.jneumeth.2010.04.028\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBerendonk, C., Stalmeijer, R. E., Lambert, W. T., \u0026amp; Schuwirth (2013). Expertise in Performance Assessment: Assessors\u0026rsquo; Perspectives. \u003cem\u003eAdvances in Health Sciences Education: Theory and Practice\u003c/em\u003e, \u003cem\u003e18\u003c/em\u003e(4), 559\u0026ndash;571. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1007/s10459-012-9392-x\u003c/span\u003e\u003cspan address=\"10.1007/s10459-012-9392-x\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBoffet, A., Arsac, L. M., Ibanez, V., Sauvet, F., \u0026amp; V\u0026eacute;ronique, D. A. (2025). Detection of Cognitive Load Modulation by EDA and HRV. \u003cem\u003eSensors (Basel, Switzerland)\u003c/em\u003e, \u003cem\u003e25\u003c/em\u003e(8), 2343. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.3390/s25082343\u003c/span\u003e\u003cspan address=\"10.3390/s25082343\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBrooks, J., Crone, J. C., \u0026amp; Spangler, D. P. (2021). A Physiological and Dynamical Systems Model of Stress. \u003cem\u003eInternational Journal of Psychophysiology\u003c/em\u003e, \u003cem\u003e166\u003c/em\u003e, 83\u0026ndash;91.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eCampanella, S., Altaleb, A., Belli, A., Pierleoni, P., \u0026amp; Palma, L. (2023). A Method for Stress Detection Using Empatica E4 Bracelet and Machine-Learning Techniques. \u003cem\u003eSensors (Basel, Switzerland)\u003c/em\u003e, \u003cem\u003e23\u003c/em\u003e(7), 3565. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.3390/s23073565\u003c/span\u003e\u003cspan address=\"10.3390/s23073565\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eCavaleri, R., Withington, A., \u0026amp; Jane Chalmers, K., and Felicity Blackstock (2023). The Influence of Stress on Student Performance during Simulation-Based Learning: A Pilot Randomized Trial. \u003cem\u003eATS scholar\u003c/em\u003e, \u003cem\u003e4\u003c/em\u003e(4), 474\u0026ndash;489. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.34197/ats-scholar.2022-0042OC\u003c/span\u003e\u003cspan address=\"10.34197/ats-scholar.2022-0042OC\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eCook, D. A., Brydges, R., Zendejas, B., \u0026amp; Hamstra, S. J., and Rose Hatala (2013). Mastery Learning for Health Professionals Using Technology-Enhanced Simulation: A Systematic Review and Meta-Analysis. \u003cem\u003eAcademic Medicine: Journal of the Association of American Medical Colleges\u003c/em\u003e, \u003cem\u003e88\u003c/em\u003e(8), 1178\u0026ndash;1186. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1097/ACM.0b013e31829a365d\u003c/span\u003e\u003cspan address=\"10.1097/ACM.0b013e31829a365d\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eE4 Wristband (January 6, 2026). | Real-Time Physiological Signals | Wearable PPG, EDA, Temperature, Motion Sensors. \u003cem\u003eEmpatica\u003c/em\u003e. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://www.empatica.com/research/e4\u003c/span\u003e\u003cspan address=\"https://www.empatica.com/research/e4\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eElabd, K., Abdul-Kadir, H., Alkhenizan, A., Mohammed, K., \u0026amp; Alkhalifa A Comparison of the Checklist Scoring Systems, Global Rating Systems, and Borderline Regression Method for an Objective Structured Clinical Examination for a Small Cohort in a Saudi Medical School. \u003cem\u003eCureus\u003c/em\u003e 15(6): e39968. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.7759/cureus.39968\u003c/span\u003e\u003cspan address=\"10.7759/cureus.39968\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eElendu, C., Amaechi, D. C., Alexander, U., Okatta, Emmanuel, C., Amaechi, T. C., Elendu, Chiamaka, P., \u0026amp; Ezeh, and Ijeoma D. Elendu (2024). The Impact of Simulation-Based Training in Medical Education: A Review. \u003cem\u003eMedicine\u003c/em\u003e, \u003cem\u003e103\u003c/em\u003e(27), e38813. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1097/MD.0000000000038813\u003c/span\u003e\u003cspan address=\"10.1097/MD.0000000000038813\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eGellisch, M., Bablok, M., Brand-Saberi, B., Thorsten, \u0026amp; Sch\u0026auml;fer (2024). Neurobiological Stress Markers in Educational Research: A Systematic Review of Physiological Insights in Health Science Education. \u003cem\u003eTrends in Neuroscience and Education\u003c/em\u003e, \u003cem\u003e37\u003c/em\u003e, 100242. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1016/j.tine.2024.100242\u003c/span\u003e\u003cspan address=\"10.1016/j.tine.2024.100242\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHowie, E. E., Harari, R., Dias, R. D., Wigmore, S. J., \u0026amp; Skipworth, R. J. E., and Steven Yule (2024). Feasibility of Wearable Sensors to Assess Cognitive Load During Clinical Performance: Lessons Learned and Blueprint for Success. \u003cem\u003eThe Journal of Surgical Research\u003c/em\u003e, \u003cem\u003e302\u003c/em\u003e, 222\u0026ndash;231. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1016/j.jss.2024.07.009\u003c/span\u003e\u003cspan address=\"10.1016/j.jss.2024.07.009\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eJoseph, M., Ray, J. M., Jungsoo Chang, L. D., Cramer, J. W., Bonz, T. J., Yang, A. H., Wong, M. A., Auerbach, \u0026amp; Evans, L. V. (2022). All Clinical Stressors Are Not Created Equal: Differential Task Stress in a Simulated Clinical Environment. \u003cem\u003eAEM education and training\u003c/em\u003e, \u003cem\u003e6\u003c/em\u003e(2), e10726. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1002/aet2.10726\u003c/span\u003e\u003cspan address=\"10.1002/aet2.10726\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKasahara, Y., Yoshida, C., \u0026amp; Saito, M., and Yoshitaka Kimura (2021). Assessments of Heart Rate and Sympathetic and Parasympathetic Nervous Activities of Normal Mouse Fetuses at Different Stages of Fetal Development Using Fetal Electrocardiography. \u003cem\u003eFrontiers in Physiology\u003c/em\u003e, \u003cem\u003e12\u003c/em\u003e, 652828. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.3389/fphys.2021.652828\u003c/span\u003e\u003cspan address=\"10.3389/fphys.2021.652828\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKim, H. G., Cheon, E. J., Bai, D. S., Lee, Y. H., \u0026amp; Bon-Hoon, K. (2018). Stress and Heart Rate Variability: A Meta-Analysis and Review of the Literature. \u003cem\u003ePsychiatry Investigation\u003c/em\u003e, \u003cem\u003e15\u003c/em\u003e(3), 235\u0026ndash;245. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.30773/pi.2017.08.17\u003c/span\u003e\u003cspan address=\"10.30773/pi.2017.08.17\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKomasawa, N., \u0026amp; Yokohira, M. Simulation-Based Education in the Artificial Intelligence Era. \u003cem\u003eCureus\u003c/em\u003e 15(6): e40940. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.7759/cureus.40940\u003c/span\u003e\u003cspan address=\"10.7759/cureus.40940\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMakary, M. A., and Michael Daniel (2016). Medical Error-the Third Leading Cause of Death in the US. \u003cem\u003eBMJ (Clinical research ed)\u003c/em\u003e, \u003cem\u003e353\u003c/em\u003e, i2139. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1136/bmj.i2139\u003c/span\u003e\u003cspan address=\"10.1136/bmj.i2139\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMcDaniel, G. H., Pappada, S., Alyosif, Z., \u0026amp; Teye-Lartey, S., and Mohamad Moussa (2025). The Impact of Stress and Distraction on Bag-Valve-Mask Ventilation Performance. \u003cem\u003eCureus\u003c/em\u003e, \u003cem\u003e17\u003c/em\u003e(5), e84542. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.7759/cureus.84542\u003c/span\u003e\u003cspan address=\"10.7759/cureus.84542\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMcGaghie, W. C., Barry Issenberg, S., Cohen, E. R., \u0026amp; Barsuk, J. H., and Diane B. Wayne (2011). Does Simulation-Based Medical Education with Deliberate Practice Yield Better Results than Traditional Clinical Education? A Meta-Analytic Comparative Review of the Evidence. \u003cem\u003eAcademic Medicine: Journal of the Association of American Medical Colleges\u003c/em\u003e, \u003cem\u003e86\u003c/em\u003e(6), 706\u0026ndash;711. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1097/ACM.0b013e318217e119\u003c/span\u003e\u003cspan address=\"10.1097/ACM.0b013e318217e119\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMcKinley, R. K., Strand, J., Ward, L., Gray, T., \u0026amp; Alun-Jones, T., and Helen Miller (2008). Checklists for Assessment and Certification of Clinical Procedural Skills Omit Essential Competencies: A Systematic Review. \u003cem\u003eMedical Education\u003c/em\u003e, \u003cem\u003e42\u003c/em\u003e(4), 338\u0026ndash;349. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1111/j.1365-2923.2007.02970.x\u003c/span\u003e\u003cspan address=\"10.1111/j.1365-2923.2007.02970.x\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMendelowitz, D. (1999). Advances in Parasympathetic Control of Heart Rate and Cardiac Function. \u003cem\u003ePhysiology\u003c/em\u003e, \u003cem\u003e14\u003c/em\u003e(4), 155\u0026ndash;161.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eNakayama, N., Arakawa, N., Ejiri, H., \u0026amp; Matsuda, R., and Tsuneko Makino (2018). Heart Rate Variability Can Clarify Students\u0026rsquo; Level of Stress during Nursing Simulation. \u003cem\u003ePloS One\u003c/em\u003e, \u003cem\u003e13\u003c/em\u003e(4), e0195280. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1371/journal.pone.0195280\u003c/span\u003e\u003cspan address=\"10.1371/journal.pone.0195280\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eOkuda, Y., Bryson, E. O., DeMaria, S., Jacobson, L., Quinones, J., \u0026amp; Shen, B., and Adam I. Levine (2009). The Utility of Simulation in Medical Education: What Is the Evidence? \u003cem\u003eThe Mount Sinai Journal of Medicine New York\u003c/em\u003e, \u003cem\u003e76\u003c/em\u003e(4), 330\u0026ndash;343. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1002/msj.20127\u003c/span\u003e\u003cspan address=\"10.1002/msj.20127\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003ePappada, S., Owais, M., Aouthmany, S., Rega, P., Schneiderman, J., Toy, S., Schiavi, A., et al. (2022). Personalizing Simulation-Based Medical Education: The Case for Novel Learning Management Systems. \u003cem\u003eInternational Journal of Healthcare Simulation\u003c/em\u003e. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.54531/mngy8113\u003c/span\u003e\u003cspan address=\"10.54531/mngy8113\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003ePaudel, P. (2023). Scott Pappada, and Liang Cheng. Automated Multimodal Performance Evaluation in Simulation-Based Medical Education Using Natural Language Processing. In \u003cem\u003eProceedings of the ACM/IEEE 14th International Conference on Cyber-Physical Systems (with CPS-IoT Week 2023)\u003c/em\u003e, ICCPS \u0026rsquo;23, New York, NY, USA: Association for Computing Machinery, 258\u0026ndash;59. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1145/3576841.3589624\u003c/span\u003e\u003cspan address=\"10.1145/3576841.3589624\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003ePeek, R., \u0026amp; Moore, L., and Rachel Arnold (2023). Psychophysiological Fidelity: A Comparative Study of Stress Responses to Real and Simulated Clinical Emergencies. \u003cem\u003eMedical Education\u003c/em\u003e, \u003cem\u003e57\u003c/em\u003e(12), 1248\u0026ndash;1256. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1111/medu.15155\u003c/span\u003e\u003cspan address=\"10.1111/medu.15155\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003ePosada-Quintero, Hugo, F., Florian, J. P., Alvaro, D., Orjuela-Ca\u0026ntilde;\u0026oacute;n, \u0026amp; Chon, K. H. (2018). Electrodermal Activity Is Sensitive to Cognitive Stress under Water. \u003cem\u003eFrontiers in Physiology\u003c/em\u003e, \u003cem\u003e8\u003c/em\u003e. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.3389/fphys.2017.01128\u003c/span\u003e\u003cspan address=\"10.3389/fphys.2017.01128\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eRahma, O., Nur, A. P., Putra, A., \u0026amp; Rahmatillah (2022). Yang Sa\u0026rsquo;ada Kamila Ariyansah Putri, Nuzula Dwi Fajriaty, Khusnul Ain, and Rifai Chai. Electrodermal Activity for Measuring Cognitive and Emotional Stress Level. \u003cem\u003eJournal of Medical Signals and Sensors\u003c/em\u003e 12(2): 155\u0026ndash;62. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.4103/jmss.JMSS_78_20\u003c/span\u003e\u003cspan address=\"10.4103/jmss.JMSS_78_20\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eRobinson, B. F., Epstein, S. E., Beiser, G. D., \u0026amp; Braunwald, E. (1966). Control of Heart Rate by the Autonomic Nervous System. Studies in Man on the Interrelation between Baroreceptor Mechanisms and Exercise. \u003cem\u003eCirculation Research\u003c/em\u003e, \u003cem\u003e19\u003c/em\u003e(2), 400\u0026ndash;411. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1161/01.res.19.2.400\u003c/span\u003e\u003cspan address=\"10.1161/01.res.19.2.400\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eShoja, M. M., Darisel, N., Ventura Rodriguez, O., Avilova, \u0026amp; Rajput, V. Error Reduction in Healthcare Through Team Training and Cultural Transformation. \u003cem\u003eCureus\u003c/em\u003e 17(8): e91243. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.7759/cureus.91243\u003c/span\u003e\u003cspan address=\"10.7759/cureus.91243\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSolhjoo, S., Haigney, M. C., McBee, E., Jeroen, J. G., van Merrienboer, L., Schuwirth, A. R., Artino, A., Battista, et al. (2019). Heart Rate and Heart Rate Variability Correlate with Clinical Reasoning Performance and Self-Reported Measures of Cognitive Load. \u003cem\u003eScientific Reports\u003c/em\u003e, \u003cem\u003e9\u003c/em\u003e(1), 14668. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1038/s41598-019-50280-3\u003c/span\u003e\u003cspan address=\"10.1038/s41598-019-50280-3\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eThayer, J. F., Ahs, F., Fredrikson, M., Sollers, J. J., \u0026amp; Wager, T. D. (2012). A Meta-Analysis of Heart Rate Variability and Neuroimaging Studies: Implications for Heart Rate Variability as a Marker of Stress and Health. \u003cem\u003eNeuroscience and Biobehavioral Reviews\u003c/em\u003e, \u003cem\u003e36\u003c/em\u003e(2), 747\u0026ndash;756. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1016/j.neubiorev.2011.11.009\u003c/span\u003e\u003cspan address=\"10.1016/j.neubiorev.2011.11.009\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eVage, A., Spence, A. D., McKeown, G., Gormley, G. J., \u0026amp; Hamilton, P. K. (2024). Simulate to Stimulate? A Systematic Review of Stress, Learning, and Performance in Healthcare Simulation. \u003cem\u003eThe Ulster Medical Journal\u003c/em\u003e, \u003cem\u003e93\u003c/em\u003e(3), 119\u0026ndash;126.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eVirgillito, D., \u0026amp; Catalfo, P., and Caterina Ledda (2025). Wearables in Healthcare Organizations: Implications for Occupational Health, Organizational Performance, and Economic Outcomes. \u003cem\u003eHealthcare (Basel Switzerland)\u003c/em\u003e, \u003cem\u003e13\u003c/em\u003e(18), 2289. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.3390/healthcare13182289\u003c/span\u003e\u003cspan address=\"10.3390/healthcare13182289\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWeenk, M., Alken, A. P. B., Lucien, J. L. P. G., Engelen, S. J. H., Bredie, Tom, H., van de Belt, \u0026amp; Harry van Goor. (2018). Stress Measurement in Surgeons and Residents Using a Smart Patch. \u003cem\u003eAmerican Journal of Surgery\u003c/em\u003e, \u003cem\u003e216\u003c/em\u003e(2), 361\u0026ndash;368. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1016/j.amjsurg.2017.05.015\u003c/span\u003e\u003cspan address=\"10.1016/j.amjsurg.2017.05.015\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eYanofsky, S. D., Julie, G., \u0026amp; Nyquist (2010). Using the Affective Domain to Enhance Teaching of the ACGME Competencies in Anesthesiology Training. \u003cem\u003eThe journal of education in perioperative medicine: JEPM\u003c/em\u003e, \u003cem\u003e12\u003c/em\u003e(1), E055.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZoller, A., H\u0026ouml;lle, T., Wepler, M., Radermacher, P., \u0026amp; Nussbaum, B. L. (2021). Development of a Novel Global Rating Scale for Objective Structured Assessment of Technical Skills in an Emergency Medical Simulation Training. \u003cem\u003eBMC Medical Education\u003c/em\u003e, \u003cem\u003e21\u003c/em\u003e(1), 184. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1186/s12909-021-02580-4\u003c/span\u003e\u003cspan address=\"10.1186/s12909-021-02580-4\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"advances-in-health-sciences-education","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"ahse","sideBox":"Learn more about [Advances in Health Sciences Education](http://link.springer.com/journal/10459)","snPcode":"10459","submissionUrl":"https://submission.nature.com/new-submission/10459/3","title":"Advances in Health Sciences Education","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"em","reportingPortfolio":"Springer Hybrid","inReviewEnabled":true,"inReviewRevisionsEnabled":false},"keywords":"simulation-based medical education, competency assessment, physiological monitoring, machine learning, wearable sensors, medical error prevention","lastPublishedDoi":"10.21203/rs.3.rs-8842924/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-8842924/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003ch2\u003eBackground\u003c/h2\u003e \u003cp\u003eMedical errors remain a leading cause of preventable harm, yet current competency assessments often rely on subjective evaluations that overlook critical performance indicators, particularly learners' responses to clinical stress. Although physiological stress markers have been linked to performance outcomes, no widely adopted or scalable framework has integrated these biomarkers with performance data to identify learners requiring additional training before real-world practice.\u003c/p\u003e\u003ch2\u003eMethods\u003c/h2\u003e \u003cp\u003eThis prospective observational study developed machine learning models to classify clinical competency using multimodal data from healthcare learners. Data were collected from 152 learners (74 Emergency Medicine residents, 70 Anesthesiology residents, 8 Emergency Medical Services students) across 470 high-fidelity simulation scenarios. A multimodal assessment platform synchronized physiological signals (electrodermal activity, heart rate, skin temperature) from Empatica E4 wristbands with expert evaluations. A genetic algorithm was employed for feature selection, and neural network models were evaluated using multiple leave-N-out strategies to assess generalizability across learners and scenarios.\u003c/p\u003e\u003ch2\u003eResults\u003c/h2\u003e \u003cp\u003eThe neural network achieved 84\u0026ndash;85% balanced accuracy across thresholds 0.45\u0026ndash;0.70, with sensitivity 93.3\u0026ndash;95.4% and specificity 72.9\u0026ndash;76.2%. Despite class imbalance (80.6% competent, 19.4% novice), performance remained robust, with Matthew's correlation coefficients of 0.687\u0026ndash;0.706 and precision\u0026ndash;recall area-under-the-curve (PR-AUC) values of 0.969\u0026ndash;0.970 across thresholds.\u003c/p\u003e\u003ch2\u003eConclusions\u003c/h2\u003e \u003cp\u003eThis study demonstrates that integrating physiological metrics with machine learning supports objective, data-driven competency assessment. By capturing stress-performance relationships that traditional evaluations often overlook, this framework may provide an early warning system to identify learners who may require additional training and lay the foundation for more precise, data-informed medical education.\u003c/p\u003e","manuscriptTitle":"Multimodal Physiological Assessment for Clinical Competency Classification in Simulation-Based Medical Education: A Machine Learning Approach","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2026-02-22 12:56:08","doi":"10.21203/rs.3.rs-8842924/v1","editorialEvents":[{"type":"communityComments","content":0},{"type":"editorInvitedReview","content":"","date":"2026-05-04T10:52:48+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"316848016722345974580187223727579565801","date":"2026-04-09T18:37:10+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"127495692764344617399013626816414525578","date":"2026-04-07T06:39:34+00:00","index":"hide","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2026-04-06T22:34:03+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"36998497467788585428725791005121855109","date":"2026-03-02T11:12:29+00:00","index":"hide","fulltext":""},{"type":"reviewersInvited","content":"","date":"2026-02-17T15:08:21+00:00","index":"","fulltext":""},{"type":"editorAssigned","content":"","date":"2026-02-12T14:15:59+00:00","index":"","fulltext":""},{"type":"checksComplete","content":"","date":"2026-02-12T14:12:53+00:00","index":"","fulltext":""},{"type":"submitted","content":"Advances in Health Sciences Education","date":"2026-02-10T14:42:02+00:00","index":"","fulltext":""}],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"advances-in-health-sciences-education","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"ahse","sideBox":"Learn more about [Advances in Health Sciences Education](http://link.springer.com/journal/10459)","snPcode":"10459","submissionUrl":"https://submission.nature.com/new-submission/10459/3","title":"Advances in Health Sciences Education","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"em","reportingPortfolio":"Springer Hybrid","inReviewEnabled":true,"inReviewRevisionsEnabled":false}}],"origin":"","ownerIdentity":"9c02516f-6c93-4c7c-9921-8fe6d115121a","owner":[],"postedDate":"February 22nd, 2026","published":true,"recentEditorialEvents":[{"type":"editorInvitedReview","content":"","date":"2026-05-04T10:52:48+00:00","index":61,"fulltext":""}],"rejectedJournal":[],"revision":"","amendment":"","status":"under-review","subjectAreas":[],"tags":[],"updatedAt":"2026-02-22T12:56:08+00:00","versionOfRecord":[],"versionCreatedAt":"2026-02-22 12:56:08","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-8842924","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-8842924","identity":"rs-8842924","version":["v1"]},"buildId":"XKTyCvWXoU3ODBz1xrDgd","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.