Competency Development in Early Nursing Training: A Cross-Sectional OSCE Study of Self-Assessment Versus Examiner Ratings

preprint OA: closed
Full text JSON View at publisher
Full text 149,096 characters · extracted from preprint-html · click to expand
Competency Development in Early Nursing Training: A Cross-Sectional OSCE Study of Self-Assessment Versus Examiner Ratings | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article Competency Development in Early Nursing Training: A Cross-Sectional OSCE Study of Self-Assessment Versus Examiner Ratings Benjamin Roszipal, Gabriella Szelesi, Martin Ernst, Alexander Hoffelner, and 1 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-7518013/v1 This work is licensed under a CC BY 4.0 License Status: Published Journal Publication published 24 Dec, 2025 Read the published version in BMC Medical Education → Version 1 posted 10 You are reading this latest preprint version Abstract Background Accurate self-assessment is a core function of self-regulated learning: learners monitor their performance and adjust strategies accordingly. In nursing education, however, students often misjudge performance, especially in interpersonal communication, indicating gaps in calibration accuracy, the alignment between self- and examiner ratings. Although self-other discrepancies are reported in medicine and allied health, domain-specific patterns early in nursing training remain underexplored. We therefore examined calibration accuracy in first-year students’ OSCEs across professional knowledge/analytical, methodological/procedural, and social/communication domains, and tested whether age, gender, or prior healthcare training were associated with these discrepancies. Methods In this cross-sectional study, a complete cohort of 109 first-year nursing students undertook a standardized OSCE at the end of the second semester. The OSCE was conducted under summative assessment conditions and included stations assessing professional knowledge/analytical skills, methodological/procedural skills, and social/communication skills. Each domain was examiner-rated and self-rated immediately post-exam. Of 109 students, 102 provided complete data for analysis. Discrepancy scores (self minus examiner) were analyzed using repeated-measures ANOVA, paired t-tests, and linear regressions with demographic predictors. Results A significant interaction between rating type and competence domain was observed (η²ₚ = .42, p < .001). Calibration was domain-specific: Students calibrated accurately in professional knowledge and methodological skills (both ns), but strongly overestimated social/communication skills, with a large effect size (mean difference −0.14, d = −0.99, p < .001). Age negatively predicted overestimation in social/communication skills (R² = .07, p = .006), while gender and prior healthcare training showed no associations. Conclusion First-year nursing students calibrated well in technical and analytical skills but overestimated interpersonal communication. In SRL terms, transparent checklist criteria support monitoring, whereas implicit interpersonal standards are harder to judge. Age, rather than gender or prior healthcare training, was associated with smaller social-domain gaps. These findings align with prior work and point to interpersonal competence as a cross-disciplinary calibration challenge. Feasible, theory-aligned steps, facilitated video review against checklists, standardized vignette calibration with expert anchors, and pre- to post-feedback self-ratings, should be piloted and evaluated longitudinally before broader adoption. Figures Figure 1 Background Competency-based nursing education has brought increased attention to the accurate assessment of learners’ skills and behaviors. In this framework, trainees are expected to demonstrate competence across multiple domains, from clinical knowledge and technical skills to communication and professionalism, while engaging in continuous self-reflection and improvement (1–4). Within this context, calibration accuracy, defined as the alignment between self-assessed competence and externally assessed performance, serves as a central metacognitive target that links self-evaluation to effective learning (5–8). Positioning calibration accuracy as a basic metacognitive skill underscores its foundational role in self-regulated learning (SRL) and its relevance for clinical practice. Accurate self-assessment is therefore crucial: nursing students must be able to gauge their own strengths and weaknesses to identify learning needs and achieve competence as independent practitioners. At the same time, evaluations by faculty, examiners, or peers remain a cornerstone of competency judgment in education (9,10). OSCEs, for example, are widely used to objectively measure clinical skills and behaviors in a standardized format (11,12). OSCEs not only test students in various competency domains but also provide an opportunity for comparing students’ self-evaluations with external evaluations in a structured setting. Ensuring alignment between self- and examiner-based assessment is critical in nursing training, as significant mismatches could impair both learning and patient safety (8,13) and may also affect students’ professional identity formation and readiness for independent clinical practice (1,3). An overconfident student who overestimates their abilities may not seek or accept needed improvement, whereas an underconfident student might avoid tasks despite being competent (14,15). The concept of accurate self-assessment is embedded in the broader theoretical models of SRL and metacognition. SRL involves a cyclical process of planning, monitoring, and evaluating one’s own learning and performance, with metacognitive accuracy – knowing how well one has performed – being central to effective regulation (16,17). Within Zimmerman’s SRL framework (18), calibration accuracy operates as a core monitoring mechanism that links performance appraisal to adaptive strategy adjustment and subsequent reflection (19,20). Calibration accuracy, a specific expression of metacognitive accuracy, is a critical metacognitive skill (21). High calibration accuracy allows learners to direct study efforts efficiently, engage in deliberate practice, and seek targeted feedback. Conversely, miscalibration – either over- or underestimation – can lead to inappropriate confidence levels and suboptimal learning strategies. Recent theoretical work (6) also situates calibration within the concept of “feedback literacy” the capacity to understand, internalize, and act upon feedback, which is essential for closing the gap between perceived and actual performance (7,22). Inaccurate self-assessment may therefore not only reflect a metacognitive deficit but also insufficient feedback literacy, particularly in early-stage learners. In the context of nursing education, where clinical decision-making, procedural execution, and communication skills directly impact patient outcomes, the stakes of calibration are particularly high. OSCEs offer a unique methodological advantage for studying self-other agreement due to their structure, standardization, and the replicable environment in which multiple competencies are tested across a variety of stations. This design aligns well with SRL theory by enabling domain-specific examination under controlled conditions (19). This setting also minimizes confounding from case complexity and patient variability that complicate calibration research in real clinical environments. Each station presents clearly defined tasks, typically scored against objective checklists or global rating scales, which reduce ambiguity for examiners and allow for robust performance comparisons. For research on calibration accuracy, the OSCE’s compartmentalized structure enables domain-specific analysis, such as comparing communication versus procedural skills, within the same group of students, under identical conditions. This is particularly relevant given evidence that self-assessment accuracy varies by domain (23). Furthermore, because OSCEs occur at fixed points in the curriculum, they provide a natural “snapshot” of simulation-based performance, offering educators a baseline for targeted feedback interventions. Prior research suggests that nursing and other health professions students often struggle with accurate self-evaluation of their performance (5). Contemporary studies document a tendency toward overestimation biases among trainees, particularly in early stages of training (14,24). Meta-analytic evidence indicates that students are more likely to overestimate their performance in interpersonal or communication-based clinical encounters than in knowledge-based assessments (8,25). However, not all evidence points toward overestimation. Some studies have found that learners, particularly female students, occasionally underestimate their abilities compared to faculty ratings (25). Findings on gender differences are mixed, and factors such as age or prior healthcare experience remain underexplored in nursing cohorts. In addition, emerging work suggests that the quality, recency, and role expectations of prior clinical experience, rather than its mere presence, may shape professional self-concept and metacognitive calibration (23,26,27). It has been suggested that more advanced or experienced students may self-evaluate more accurately than novices, due to greater exposure to feedback (28), but evidence is inconsistent. Most existing studies focus on a single domain or competency area, limiting the understanding of cross-domain differences within the same learner cohort. Moreover, little is known about self-perception accuracy of competence among nursing students at the very beginning of their clinical training. Additionally, very few studies have examined these phenomena during students’ first OSCE, a critical summative assessment that shapes early self-efficacy and learning strategies. This early stage may represent a window of heightened plasticity in calibration accuracy, making it an ideal intervention point for educators. Establishing baseline self-other agreement patterns at this stage is critical for designing targeted feedback and reflection interventions within competency-based nursing education. By systematically examining discrepancies across multiple domains and exploring the influence of demographic moderators, the present study seeks to address a gap in the evidence base and contribute to both educational theory and practical curriculum design. However, evidence on whether domain-specific self–examiner discrepancies occur in first-year nursing OSCEs – and how age, gender, and prior healthcare training relate to these differences – remains limited. Grounding the study in SRL, we conceptualize calibration accuracy as the monitoring process that aligns self-ratings with external standards. We therefore expected smaller discrepancies in domains with transparent criteria and explored whether learner characteristics relate to calibration early in training. The present study aims to extend previous work by examining nursing students’ self-assessment versus examiner-based evaluation (self-other) accuracy across multiple competency domains during their first OSCE experience. Specifically, this study investigated whether the magnitude or direction of self-other rating differences varies between professional knowledge/analytical skills, methodological/procedural skills, and social/communication skills, consistent with prior reports of domain-specific variation in self-assessment accuracy on a 0-100% scale within a first-year OSCE (8,25). The first aim of this study is to quantify the magnitude and direction of self-other assessment differences in the three predefined domains. This will allow for identifying whether students tend to overestimate or underestimate their performance in specific areas of competence and to what extent such discrepancies occur in a standardized OSCE context. The second aim is to compare these discrepancies across domains to determine whether certain competencies are more prone to miscalibration. Prior research has suggested that self-assessment accuracy is often domain-specific, with interpersonal and communication skills potentially being more overestimated than technical or methodological skills. The third aim is to examine the influence of gender, age, and prior healthcare training on the magnitude and direction of self-other discrepancies. Although some studies indicate that prior healthcare experience may improve calibration accuracy through greater feedback exposure, findings regarding gender and age are inconsistent, particularly in early-stage nursing education. By addressing these aims, this study seeks to close key evidence gaps and provide educators with evidence-based guidance for tailoring feedback and self-reflection in competency-based nursing curricula. Enhancing self-assessment accuracy in novice nursing students can support lifelong learning and, ultimately, safer and more effective clinical practice (24,25). More broadly, the work contributes to assessment literacy in health professions education by linking metacognitive accuracy with quality of care. Accordingly, we examine calibration accuracy in first-year students’ OSCE, comparing self- and examiner ratings across domains and testing associations with age, gender, and prior healthcare training experience (5,8,23,27). Methods This cross-sectional study analyzed self-assessment and examiner evaluation data from a single OSCE conducted at the end of the second semester in the first year of a Bachelor of Science in Nursing program at a university of applied sciences (name withheld for anonymity). The OSCE followed standardized procedures and was administered in the university’s simulation center, which provides a controlled environment for evaluating clinical simulation-based performance. Multiple clinical scenarios were designed to assess students’ competence across three predefined competency domains, in alignment with the program’s competency framework. The OSCE was situated within a summative assessment context, meaning that students’ performance contributed to their course grade, thereby ensuring high engagement and ecological validity of the performance data. The study population comprised all students enrolled in the second semester of the Bachelor of Nursing program (n = 109). Inclusion criteria were enrollment in the first year of the program, eligibility to participate in the OSCE, and attendance on the examination day. Students were excluded if they did not complete the post-OSCE self-assessment questionnaire. No participants withdrew after the examination. The decision to include the entire cohort rather than a sample was deliberate, as it minimized sampling bias and allowed for full population-level inference within the institution. Had missing data occurred, listwise deletion would have been applied for analyses involving the missing variables, and missingness patterns would have been examined to ensure randomness. A priori sample size estimation was not performed because the study included the complete eligible cohort. Post hoc (observed) power was calculated at α = .05 from the reported effect sizes. The repeated-measures ANOVA effects, Rating Type, F(1, 101) = 18.09, η²ₚ = .152; Competence Domain, F(2, 202) = 94.82, η²ₚ = .484; and the Domain × Rating Type interaction, F(2, 202) = 71.96, η²ₚ = .416, had very high power (≥ .99). Polynomial trend analyses showed similar near-unity power (linear: F(1, 101) = 113.57; quadratic: F(1, 101) = 70.87). In contrast, planned paired comparisons for professional (t(102) = −1.69) and methodological domains (t(102) = 0.86) and regression models for methodological (F(1, 101) = 2.38, R² = .02) and professional discrepancies (F(1, 101) = 0.06, R² < .01) showed limited power (≈ .14–.39), warranting cautious interpretation of null results. By comparison, the social-domain paired comparison (t(103) = −10.14, d ≈ .99) and the regression of age on the social discrepancy (F(1, 102) = 7.73, R² = .07) were adequately to highly powered. All eligible students who attended the OSCE and met inclusion criteria were invited to participate immediately after the exam, resulting in a complete dataset for all variables. Participation in the research component (post-OSCE questionnaire) was voluntary, had no effect on course grades or OSCE outcomes, and could be declined without penalty; responses were pseudonymized prior to analysis and the linkage file was stored separately with restricted access and was not available to the research team. The OSCE comprised five stations, each lasting 12 minutes, which integrated elements from three predefined competency domains: professional knowledge/analytical skills, methodological/procedural skills, and social/communication skills. Stations were developed by a panel of nursing faculty and simulation educators to reflect realistic first-year clinical encounters, including skills like patient admission interviews, basic vital signs measurement, intravenous therapy preparation, and patient education. Each scenario was piloted with a faculty stuff prior to data collection to ensure clarity of instructions, feasibility of timing, and adequate domain coverage. Feedback from this pilot phase led to minor adjustments in checklist wording and sequence of tasks to improve validity. Each station contained multiple observable tasks (e.g., “Student performs hygienic hand disinfection for 30 seconds” or “Student explains procedure to patient in an understandable manner”), scored by trained examiners using a standardized 3-point scale (0 = not performed, 1 = partially performed, 2 = fully performed). All checklist items were assigned to one of the three competency domains based on expert consensus from thirteen senior faculty members. Items addressing technical execution of clinical procedures were mapped to professional knowledge/analytical skills; those requiring data collection, interpretation, or procedures were mapped to methodological/procedural skills; and those involving interaction, empathy, and verbal/non-verbal communication were mapped to social/communication skills. The domain mapping was documented within the OSCE software as well as in a codebook to ensure transparency and reproducibility of the classification process. Scores per domain were calculated as the sum of achieved points divided by the maximum possible, yielding domain-specific performance percentages. Examiners were faculty members with at least a master’s degree in nursing education. Three weeks before the OSCE, they completed a three-hour calibration session covering the competency framework, rating criteria, and scoring exercises with consensus discussions. Calibration sessions were designed according to best-practice recommendations for performance assessment training. To minimize measurement bias, examiners scored independently without discussing ratings during the live OSCE. Immediately after completing the OSCE and before receiving any feedback, students completed a self-developed, structured questionnaire aligned 1:1 with the OSCE stations and competency domains. Because no published instrument mapped one-to-one to the station-specific OSCE checklist and our competency framework, we developed a brief German-language self-assessment questionnaire mirroring those domains (29). For each skill and domain, students rated their performance on an 11-point percentage scale (0-100% in 10% increments; 0% = “not competent at all”, 100% = “fully competent”). This format was chosen for its sensitivity to small differences between self- and examiner ratings while remaining intuitive; the 10% steps reduce pseudo-precision yet preserve discrimination. Using percentages also enabled direct numerical comparison with examiner percentage scores generated by the OSCE management system. For analysis, all percentage-based scores (self and examiner) were rescaled to proportions by dividing by 100, yielding values on a 0-1 scale (e.g., 0.75 ≙ 75%). The final English translation is provided in Additional file 1. Content validity of the self-assessment questionnaire was established through expert review. Twelve nurse educators within the University’s faculty independently appraised item clarity, domain alignment, and coverage against the program’s competency framework. A team meeting reconciled the feedback, and minor wording changes (e.g., plain-language anchors, station-specific phrasing) were made. Estimated completion time was ~4-5 minutes. Additionally, Face-validity was examined in a brief cognitive pre-test with nine first-semester students using a four-point comprehensibility scale (1 = not understandable, 4 = well understandable): the mean rating was 3.20, 76.5% of item judgements were “understandable” or “well understandable,” and one judgement (1.2%) was “not understandable.” Internal consistency of the self-assessment scales was acceptable to good across domains. For the professional knowledge/analytical domain, Cronbach’s α = .76, Spearman-Brown split-half = .76, and Guttman split-half = .73. For the methodological/procedural domain, Cronbach’s α = .71, Spearman-Brown split-half = .73, and Guttman split-half = .71. For the social (communication) domain, Cronbach’s α = .80, Spearman-Brown split-half = .82, and Guttman split-half = .79. All scales comprised five station-specific self-ratings rescaled to 0-1. Demographic variables, including gender, age, and prior healthcare training, were obtained from institutional student records. Gender was recorded as reported by the student in official enrollment documents. Prior healthcare training was defined as any formal vocational or academic education in a healthcare profession before enrollment in the nursing program. The primary outcome variable was the self-other discrepancy score, calculated by subtracting examiner ratings from student self-assessments for each domain, with positive scores indicating overestimation and negative scores indicating underestimation. To address potential sources of bias, the study design sought to minimize selection bias by including all eligible students from the cohort, thereby avoiding volunteer bias. Measurement bias was reduced by using standardized checklists, trained and calibrated examiners, and a post-OSCE self-assessment that directly paralleled the examiner scoring system. Social desirability bias in self-assessments was mitigated by anonymizing responses and administering the questionnaire immediately after the OSCE, before feedback was given. The controlled timing of the self-assessment was intended to capture students’ immediate, unaided perceptions of their performance, unaltered by external cues or post-hoc rationalizations. Residual bias from subjective interpretation, especially in the social/communication domain, remains possible but was mitigated through examiner training. Data analysis was performed using IBM SPSS Statistics (version 29.0; IBM Corp., Armonk, NY, USA). Descriptive statistics (means, standard deviations, frequencies, and percentages) were computed for all variables. Paired-sample t-tests were used to compare self-assessment and examiner ratings within each domain; Wilcoxon signed-rank tests were planned for use if normality assumptions (Shapiro-Wilk test) were violated. Cross-domain comparisons of self-other discrepancies were examined using one-way repeated-measures ANOVA, with Greenhouse-Geisser correction applied if Mauchly’s test indicated a violation of sphericity. Demographic influences (gender, age group, prior healthcare training) on discrepancy scores were explored using mixed-model ANOVA, with domain as the within-subjects factor and demographics as between-subjects factors. Bonferroni-adjusted pairwise comparisons were used to control for multiple testing. Effect sizes were reported as Cohen’s d for t-tests and partial eta squared (η²p) for ANOVA, alongside 95% confidence intervals. Statistical significance was set at p < 0.05 (two-tailed). In addition to p-values, interpretation emphasized effect sizes and their confidence intervals to gauge the practical relevance of observed differences, in line with recommendations for transparent reporting in health professions education research. Results Of the 109 students enrolled in the second semester of the Bachelor of Nursing program, five datasets were excluded due to missing self-assessment information. Of the 104 students assessed, 102 provided complete data for the repeated-measures ANOVA (listwise deletion). For the paired t-tests, valid cases ranged between 103 and 104 depending on the domain. Based on the 102 complete data sets, the mean age of participants was 22.96years (SD 3.96, range 19-41), 69.6% identified as female and 30.4% as male, and 41.2% reported prior healthcare training before enrollment (Table 1). The gender distribution was consistent with the broader nursing student population, and the proportion with prior healthcare training was sufficient to examine potential experience-related effects on self-other agreement. Table 1. Demographic characteristics of the study cohort (n = 102) Variable n (%) M (SD) Range Gender Female 71 (69.6%) – – Male 31 (30.4%) 31 (30.4%) – – Age (years) – 22.96 (3.96) 19-41 Prior healthcare training Yes 42 (41.2%) – – No 60 (58.8%) – – Note. Data are presented as mean ± standard deviation (SD) for continuous variables and n (%) for categorical variables. Prior healthcare training refers to any formal vocational or academic education in a healthcare profession before enrolment. All statistical assumptions for the repeated-measures ANOVA were met; uncorrected degrees of freedom were applied. The 3 (Competence Domain: professional knowledge/analytical skills, methodological/procedural skills, and social/communication skills) × 2 (Rating Type: self-other) repeated-measures ANOVA revealed statistically significant main effects for both Rating Type, F(1, 101) = 18.09, p < .001, η²ₚ = .152, and Competence Domain, F(2, 202) = 94.82, p < .001, η²ₚ = .484. These effects were qualified by a statistically significant Rating Type × Competence Domain interaction, F(2, 202) = 71.96, p < .001, η²ₚ = .416, demonstrating that the pattern of self-other differences varied across domains. Descriptively, self-ratings were slightly higher than examiner ratings in professional knowledge/analytical skills and slightly lower in methodological competence, while the largest and most consistent difference occurred in social/communication skills, where self-ratings were notably higher (Table 2). Polynomial contrast analyses further supported the domain effects. For the main effect of Competence Domain, both linear and quadratic trends were statistically significant (both p < .001), indicating that ratings increased across domains but not in a strictly linear manner, with disproportionately high values in the social/communication domain (Figure 1). The interaction between Rating Type and Competence Domain also showed significant linear and quadratic components (both p < .001), reflecting those self-other discrepancies increased progressively across domains, peaking in social/communication skills (Table 2). Mean scores (± standard error) for professional knowledge/analytical skills, methodological/procedural skills, and social/communication skills. Self-ratings are shown as circles and examiner-ratings as squares. All values are proportions (0-1) derived from percentage scores. Table 2. Self- and examiner ratings across competence domains Competence Domain Rating Type Mean (SD) SE Self – Examiner professional knowledge/analytical skills Other-rated 0.6591 (0.0807) 0.0080 0.0198 Self-rated 0.6775 (0.1304) 0.0129 methodological/procedural skills Other-rated 0.6661 (0.0818) 0.0081 -0.0135 Self-rated 0.6520 (0.1454) 0.0144 social/communication skills Other-rated 0.6852 (0.0780) 0.0077 0.1406 Self-rated 0.8275 (0.1268) 0.0125 Note. M = mean; SD = standard deviation; SE = standard error, Self – Examiner = discrepancy between self- and examiner-rated, (other-related values were subtracted from self-related values), with positive ratings indicating overestimation. Self-rated values represent participants’ self-assessment scores; other-rated values represent examiner ratings during the OSCE. All values are proportions (0-1) derived from percentage scores. Planned paired-samples t-tests examined self-other discrepancies within each domain. As shown in Table 3, in the professional domain, ratings did not differ significantly between self and examiner assessments, t (102)=–1.69, p =.094, 95% CI [–0.043, 0.003]. Similarly, no significant difference emerged for methodological skills, t (102)=0.86, p =.394, 95% CI [–0.018, 0.045]. In contrast, self-ratings were substantially higher than examiner ratings in the social/communication domain, t (103)=–10.14, p <.001, 95% CI [–0.168, –0.113], d =0.99 Table 3. Paired-samples t-tests comparing self- and examiner-ratings within each competence domain Competence Domain t(df) p-value Mean Difference ± SD 95% CI professional knowledge/analytical skills −1.69 (102) 0.094 −0.020 ± 0.119 [−0.043, 0.003] methodological/procedural skills 0.86 (102) 0.394 0.014 ± 0.160 [−0.018, 0.045] social/communication skills −10.14 (103) < .001 −0.141 ± 0.141 [−0.168, −0.113] Note. CI = confidence interval; SD = standard deviation. Negative mean differences indicate higher self-ratings relative to examiner ratings. p < .05 considered statistically significant. Simple linear regressions examined the relationship between age and self-other discrepancies in each competence domain (Table 4). For social/communication skills, age was a significant negative predictor, F(1, 102) = 7.73, p = .006, R² = .07, indicating that older students tended to show smaller discrepancies. In contrast, age was not significantly associated with discrepancies in methodological/procedural skills, F(1, 101) = 2.38, p = .126, R² = .02, or professional knowledge/analytical skills, F(1, 101) = 0.06, p = .814, R² < .01. Table 4. Linear regression analyses predicting self- and examiner-discrepancy scores from age in each competence domain Competence Domain B SE β t(df) p-value 95% CI for B R² professional knowledge/analytical skills −0.001 0.003 −0.02 −0.24 (101) 0.814 [−0.007, 0.005] 0.001 methodological/procedural skills −0.006 0.004 −0.15 −1.54 (101) 0.126 [−0.014, 0.002] 0.020 social/communication skills −0.009 0.003 −0.27 −2.78 (102) 0.006 [−0.016, −0.003] 0.070 Note. B = unstandardised regression coefficient; SE = standard error; β = standardised coefficient; CI = confidence interval; R² = proportion of variance explained. Negative coefficients indicate smaller self-other discrepancies with increasing age. Independent-samples t-tests assessed the influence of prior healthcare training and gender on self-other discrepancies (Table 5). Independent-samples t-tests revealed no significant differences in discrepancy scores by prior healthcare training (professional: t (101)=–1.42, p =.160; methodological: t (101)=–0.16, p =.875; social: t (102)=–0.16, p =.876) or by gender (professional: t (101)=–1.14, p =.256; methodological: t (101)=–1.03, p =.306; social: t (102)=0.22, p =.825). Table 5. Independent-samples t-tests comparing self- and examiner- discrepancies by prior healthcare training and gender A. Healthcare Training Competence Domain t(df) p-value MD –Training MD – No Training 95% CI professional knowledge/analytical skills −1.42 (101) 0.16 0.04 (.133) 0.006 (.107) [−.081, .013] methodological/procedural skills −0.16 (101) 0.875 −0.011 (.147) −0.016 (.170) [−.069, .059] social/communication skills −0.16 (102) 0.876 0.143 (.128) 0.139 (.151) [−.061, .052] B. Gender Competence Domain t(df) p-value MD – Female MD – Male 95% CI professional knowledge/analytical skills −1.14 (101) 0.256 0.011 0.04 [–0.0796, 0.0214] methodological/procedural skills −1.03 (101) 0.306 −0.024 0.012 [–0.1048, 0.0332] social/communication skills 0.22 (102) 0.825 0.143 0.136 [–0.0537, 0.0672] Note. CI = confidence interval; SD = standard deviation. MD = mean discrepancy. Positive mean differences indicate higher self-ratings relative to examiner ratings. p < .05 considered statistically significant. Discussion The results demonstrated a clear domain-specific pattern: students consistently overestimated their performance only in the social/communication domain, whereas calibration in professional knowledge/analytical and methodological/procedural skills was accurate. Conceptually, these findings map onto SRL: they support the proposition that transparent, criterion-based tasks foster more accurate monitoring, and they localize miscalibration to domains where standards are tacit and relational. This discrepancy was substantial, with a large effect size, underlining its educational importance. By contrast, deviations in the technical and analytical domains were small and non-significant, suggesting that students may be able to judge observable, criterion-based skills better than more subjective interpersonal competencies. From a SRL standpoint, this indicates that calibration accuracy is intact where task criteria are explicit but vulnerable when standards are tacit or relational (5,18,20). Therefore, students may find it easier to accurately assess concrete procedural actions (e.g., “I disinfected my hands properly”) than more abstract interpersonal behaviors (e.g., “I responded appropriately”). This pattern resonates with earlier findings in medical and health professions education, where communication and interpersonal domains are often the most challenging for accurate self-assessment (24,30). Similar results have been reported in pharmacy and medical OSCEs, where students showed closer alignment in procedural or knowledge-based stations but inflated ratings for communication skills (9,31). However, other studies have observed broader overestimation across domains (5,13), including findings consistent with general miscalibration in early learners (8,13), suggesting that the domain-specific effect observed here may reflect both early-stage competency profiles and the structure of the curriculum. Consistent with this interpretation, a brief face-validity check with students indicated that the social/communication self-assessment was the only item that raised clarity concerns, reinforcing that this domain is harder for learners to judge. Interpersonal competencies typically yield weaker, delayed, or more ambiguous feedback signals than checklist-based tasks, which impairs monitoring and fosters overconfidence. Within Zimmerman’s SRL model, weak or delayed external cues constrain the monitoring phase, leading novices to rely on global self-beliefs rather than performance evidence. Furthermore, a Dunning-Kruger mechanism is plausible in novices whose limited proficiency constrains insight into performance gaps (13,18,32). Taken together, this pattern supports the calibration-as-monitoring account within SRL and aligns with feedback-literacy perspectives, in which weaker or delayed cues hamper learners’ ability to judge interpersonal performance accurately. The non-significant findings in the professional and methodological domains require careful consideration. While the absence of detectable discrepancies could reflect genuine calibration accuracy in more observable, criterion-based competencies, it may also partly be explained by limited statistical power for small effects. Indeed, effect size estimates were close to zero, and confidence intervals were narrow, suggesting that any systematic miscalibration in these domains is likely minimal and of limited educational relevance. By contrast, prior work has sometimes reported broader overestimation across domains (5,23), underscoring that our findings may be context-dependent. In our OSCE, the strong emphasis on explicit checklists and observable behaviors may have facilitated students’ self-monitoring, thereby reducing miscalibration. Within a SRL framework, task transparency strengthens the monitoring loop and supports better alignment between self-ratings and external criteria (18,20). Within an SRL frame, this corroborates that task transparency strengthens the monitoring loop, making sizable miscalibration unlikely where criteria are explicit. The absence of gender and prior training effects contrasts with research in medicine and allied health, where male students sometimes report higher self-ratings and prior experience has been associated with improved calibration (23,25,26). In our cohort, two factors likely contributed to null effects: first, the early training stage may reduce gender-patterned differences reported later in curricula; second, “prior healthcare training” was heterogeneous and may not have included explicit metacognitive practice or structured feedback – key ingredients for calibration learning (23,26,27,33).Overall, this suggests that the quality, supervision, and reflective depth of prior experiences – not just the presence – are what drive better calibration. It is also relevant that the cohort included substantially more women than men, which reduces power to detect gender-related effects. In a mostly female, early-stage cohort, gender differences seen elsewhere may not yet show up or may be smaller because most students are still novices. Several explanations are possible: early in the program, gender-related differences may be overshadowed by limited clinical exposure, while the variability in type and quality of prior training may have limited transferability to OSCE performance. In contrast, age emerged as a significant negative predictor of overestimation in communication skills, consistent with studies linking life experience and maturity to more realistic self-appraisal. This age effect is plausibly driven by greater life experience and more frequent corrective feedback, which provide clearer internal standards for interpersonal performance (8,15). With increasing age, emotional intelligence and a more integrated professional self-concept may be more developed, supporting more accurate perception of social cues and self-monitoring (33). In SRL terms, these attributes strengthen forethought and monitoring, thereby reducing overestimation when communication demands are high (19,21). Complementarily, recent evidence suggests that emotional intelligence underpins interpersonal and critical-thinking competencies and may be a prerequisite for accurate self-assessment in the social domain (33), aligning with SRL models in which affective-cognitive resources support effective monitoring and regulation (18,20). Accordingly, curricula that intentionally cultivate emotional intelligence – for example, through empathy training and guided reflective dialogue – may also improve calibration for communication-intensive tasks (33). Taken together, these null findings are informative: they indicate that miscalibration in early-stage nursing students is not a generalized phenomenon but clusters in complex interpersonal skills, clarifying boundary conditions for where calibration support is most needed. Even though the explained variance was modest, this finding highlights that non-academic factors may shape calibration, underscoring the importance of considering learner characteristics beyond formal training. Educationally, these findings suggest that calibration should be treated as a metacognitive target, particularly for social/communication skills. However, given the single-cohort, cross-sectional design, any curricular changes should be piloted and evaluated before wider adoption. Evidence from nursing and medical education suggests that such interventions can reduce self-other discrepancies (6,14), although reinforcement is necessary to sustain their impact over time (7,22). Considering these findings, facilitator-led sessions in which students review their own OSCE communication-station recordings alongside examiner checklists and global ratings, debriefing discrepancies with structured prompts, can be used to strengthen calibration. In parallel, standardized vignette calibration, students rate patient-interaction videos and then compare their judgments with expert-consensus anchors to build shared mental models of performance standards and a pre-OSCE self-prediction followed by a post-feedback reassessment using the same instrument provide mechanisms to track within-student calibration shifts across stations and over time. Implemented longitudinally, these steps operationalize SRL cycles, performance monitoring (self-other comparison), and reflection (strategy adjustment) and target the specific monitoring deficits observed in the social domain (5,18). Importantly, OSCEs offer a dual role: beyond summative evaluation, they can serve as formative learning opportunities when feedback is deliberately integrated. This dual potential has been emphasized in the wider literature on competency-based education (1,3). The interprofessional literature also points to the broader relevance of these findings. Overestimation in communication-related competencies has been observed not only in nursing but also in medical, dental, and psychology trainees (31,34,35). This suggests that the challenge is not unique to one discipline but reflects a more general issue in health professions education, where interpersonal and relational competencies are difficult to self-evaluate. Accordingly, interprofessional initiatives could share calibration assets (e.g., cross-disciplinary vignettes with consensus anchors and debrief guides) to harmonize expectations and feedback practices across programs (32). Addressing this issue through feedback and calibration training could therefore benefit multiple professions and support interprofessional education initiatives (32,36). This study has several limitations. Its cross-sectional design captures self-other agreement at a single time point, precluding inferences about developmental change. Percentage-based scoring facilitated comparability but may have obscured finer-grained variation within domains. Examiner ratings of interpersonal skills may involve subjective judgments, which could influence discrepancy scores despite prior examiner calibration. Although participation rates were high, exclusion of six students who did not complete the self-assessment introduces a minor risk of self-selection bias. A further consideration is potential common method/context bias: both ratings were anchored to the same OSCE event and collected within a close timeframe, which can inflate shared variance; however, the domain-specific pattern (null effects in technical/analytical domains alongside a large social/communication discrepancy) argues against a uniform method artifact driving the results. In addition, a “task transparency” effect cannot be ruled out: students likely find it easier to appraise concrete, checklist-based actions than abstract interpersonal behaviors, which may partially account for the observed domain specificity. Finally, findings may not generalize beyond first-year nursing cohorts or to programs with different OSCE structures. Together, these constraints underscore the value of theory-driven, SRL-anchored longitudinal and multimethod designs – e.g., including delayed self-ratings, independent behavioral indicators, and experimental manipulations of task transparency – to test mechanisms of calibration change. Future research should prioritize three directions. First, longitudinal studies are needed to track changes in self-other agreement over time and assess whether calibration improves with experience (21,37). Second, intervention studies should test the effectiveness of structured feedback, guided reflection, and calibration exercises, ideally embedded longitudinally across the curriculum (6,24). Third, comparative studies across institutions and professions would clarify whether the domain-specific patterns observed here are context-dependent or reflect broader challenges in health professions education (2,4). Incorporating qualitative approaches could also enrich understanding of how students interpret performance criteria, particularly in interpersonal domains (34) . Including measures of emotional intelligence and feedback literacy as potential mediators or moderators could explain variance in social-domain calibration and inform targeted scaffolds (6,33). Together, such theory-anchored efforts can move this area from descriptive mapping to testing SRL-consistent mechanisms of calibration and, ultimately, to designing curricula that reliably foster accurate self-assessment and competent professional practice. Together, such efforts would advance the evidence base on how to design curricula that foster accurate self-assessment and, ultimately, competent professional practice. Conclusion In conclusion, first-year nursing students demonstrated accurate calibration for technical and analytical skills but systematically overestimated social/communication competencies. Age was associated with more realistic self-assessment in this domain, whereas gender and prior healthcare training showed no reliable association. These results indicate that calibration is a relevant metacognitive target, especially for interpersonal skills, but recommendations for curriculum integration should be tested in pilot and longitudinal evaluations before broad implementation. Embedding carefully evaluated calibration activities within OSCEs may support competence development and align with competency-based frameworks. Addressing domain-specific overestimation early in nursing education has the potential to improve self-assessment accuracy and preparedness for practice. Declarations Ethics approval and consent to participate The study was initially reviewed and cleared by the internal review board of [University name withheld for anonymity] as educational research without patient involvement. Subsequently, external ethics approval was obtained from the Ethikkommission für das Land Niederösterreich (Approval ID: GS3-EK-12/942-2025). Participation in the post-OSCE self-assessment questionnaire was voluntary, and students’ grades were not affected by the decision to participate or by their responses. All participating students provided informed consent prior to inclusion. Data were pseudonymized before analysis; the linkage file was stored separately with restricted access and was not available to the research team. The research adhered to the ethical principles of the Declaration of Helsinki (latest revision). No patient data or human tissue were involved. Consent for publication Not applicable. Availability of data and materials The datasets generated and/or analyzed during the current study are not publicly available due to institutional data protection policies but are available from the corresponding author on reasonable request. Competing interests The authors declare that they have no competing interests. Funding No external funding was received for this research. The study was conducted as part of the regular curriculum evaluation process at the [University name withheld for anonymity]. Authors’ contributions BR conceived the study, designed the methodology, oversaw data collection, and prepared the manuscript. GS assisted with data collection and provided critical review of the manuscript. ME conducted the statistical analyses. AH contributed to the interpretation of the findings and drafting of the manuscript. MW contributed to the interpretation of the findings and drafting of the manuscript. All authors critically revised the manuscript for intellectual content, approved the final version, and agree to be accountable for all aspects of the work. Acknowledgements The authors would like to thank the faculty members who participated as OSCE examiners and the students who volunteered their time for this study. Portions of the manuscript text were refined with the assistance of a large language model (ChatGPT, GPT-5, OpenAI, San Francisco, CA, USA) for language polishing and structural clarity. All outputs were critically reviewed and edited by the authors, who take full responsibility for the final content. Authors’ information Not applicable. Clinical Trial Registration Clinical trial number: not applicable. References Frank JR, Danoff D. The CanMEDS initiative: implementing an outcomes-based framework of physician competencies. Medical Teacher. Januar 2007;29(7):642–7. Monteiro S, McConnell MM. Evaluating the Construct Validity of Competencies: A Retrospective Analysis. MedSciEduc. 8. Mai 2023;33(3):729–36. Weinberger SE, Pereira AG, Iobst WF, Mechaber AJ, Bronze MS, and the Alliance for Academic Internal Medicine Education Redesign Task Force II*. Competency-Based Education and Training in Internal Medicine. Ann Intern Med. 7. Dezember 2010;153(11):751–6. Zibrowski EM, Singh SI, Goldszmidt MA, Watling CJ, Kenyon CF, Schulz V, u. a. The sum of the parts detracts from the intended whole: competencies and in-training assessments. Medical Education. August 2009;43(8):741–8. León SP, Panadero E, García-Martínez I. How Accurate Are Our Students? A Meta-analytic Systematic Review on Self-assessment Scoring Accuracy. Educ Psychol Rev. Dezember 2023;35(4):106. Middleton R, Lewer K, Antoniou C, Pratt H, Bowdler S, Jans C, u. a. Understanding the processes, practices and influences of calibration on feedback literacy in higher education marking: A qualitative study. Nurse Education Today. April 2024;135:106106. Stone NJ. Exploring the Relationship between Calibration and Self-Regulated Learning. Educational Psychology Review. Dezember 2000;12(4):437–75. Zheng B, He Q, Lei J. Informing factors and outcomes of self-assessment practices in medical education: a systematic review. Annals of Medicine. 31. Dezember 2024;56(1):2421441. Bowers RD, Baker CN, Becker KK, Hamilton JN, Trotta K. Comparison of peer, self, and faculty objective structured clinical examination evaluations in a PharmD nonprescription therapeutics course. Currents in Pharmacy Teaching and Learning. November 2024;16(11):102159. Inayah AT, Anwer LA, Shareef MA, Nurhussen A, Alkabbani HM, Alzahrani AA, u. a. Objectivity in subjectivity: do students’ self and peer assessments correlate with examiners’ subjective and objective assessment in clinical skills? A prospective study. BMJ Open. Mai 2017;7(5):e012289. Harden RM. Revisiting ‘Assessment of clinical competence using an objective structured clinical examination (OSCE)’. Med Educ. April 2016;50(4):376–9. Nyangeni T, Ten Ham-Baloyi W, Van Rooyen DRM. Strengthening the planning and design of Objective Structured Clinical Examinations. Health SA Gesondheid [Internet]. 7. August 2024 [zitiert 13. August 2025];29. Verfügbar unter: http://www.hsag.co.za/index.php/hsag/article/view/2693 Knof H, Berndt M, Shiozawa T. Prevalence of Dunning-Kruger effect in first semester medical students: a correlational study of self-assessment and actual academic performance. BMC Med Educ. 24. Oktober 2024;24(1):1210. Seidel-Fischer J, Trifunovic-Koenig M, Gerber B, Otto B, Bentele M, Fischer MR, u. a. Interaction between overconfidence effects and training formats in nurses’ education in hand hygiene. BMC Nurs. 2. Juli 2024;23(1):451. Foster C, Renie P. Changes in students’ confidence calibration across a sequence of low-stakes confidence assessments. Asian Journal for Mathematics Education. Dezember 2024;3(4):406–27. Kurt E, Eskimez Z. Examining self-regulated learning of nursing students in clinical practice: A descriptive and cross-sectional study. Nurse Education Today. Februar 2022;109:105242. Tanimura C, Okuda R, Tokushima Y, Matsumoto Y, Katou S, Miyoshi M, u. a. Examining the reliability and validity of a self-regulated learning strategy scale for undergraduate nursing students and effective factors of self-regulated learning strategies. Nurse Education Today. September 2023;128:105872. Zimmerman BJ. Becoming a Self-Regulated Learner: An Overview. Theory Into Practice. Mai 2002;41(2):64–70. Torrano F, González-Torres MC. Self-Regulated Learning: Current and Future Directions. Electronic Journal of Research in Educational Psychology. 1. April 2004;2. Hemmler YM, Ifenthaler D. Self-regulated learning strategies in continuing education: A systematic review and meta-analysis. Educational Research Review. November 2024;45:100629. Davis E, Wands L. The Power of Choice: Fostering Engagement and Competence in Nursing Students. J Nurs Educ. 12. Februar 2025;1–4. Kolovelonis A, Goudas M, Samara E. The Effects of a Self-Regulated Learning Teaching Unit on Students’ Performance Calibration, Goal Attainment, and Attributions in Physical Education. The Journal of Experimental Education. 2. Januar 2022;90(1):112–29. Gonsalvez CJ, Riebel T, Nolan LJ, Pohlman S, Bartik W. Supervisor versus self‐assessment of trainee competence: Differences across developmental stages and competency domains. J Clin Psychol. Dezember 2023;79(12):2959–73. Abraham R, Singaram VS. Self and peer feedback engagement and receptivity among medical students with varied academic performance in the clinical skills laboratory. BMC Med Educ. 28. September 2024;24(1):1065. Bodard S, Bouzid D, Ferré VM, Carette C, Kivits J, Nguyen Y, u. a. Impact of gender on self-assessment accuracy among fourth-year French medical students on faculty’s online Objective Structured Clinical Examinations. BMC Med Educ. 30. Dezember 2024;24(1):1553. Yang H, Thompson C, Bland M. The effect of clinical experience, judgment task difficulty and time pressure on nurses’ confidence calibration in a high fidelity clinical simulation. BMC Med Inform Decis Mak. Dezember 2012;12(1):113. Aboalrob W, Ayed A, Malak MZ, Aqtam I. Understanding the influence of self-concept on clinical decision-making among nurses: A cross-sectional study. Rehman N, Herausgeber. PLoS One. 25. August 2025;20(8):e0330905. Alizadeh M, Behshid M, Cheraghi R, Dehghani G. Nursing students’ experiences of professional competence evaluation by Objective Structured Clinical examination method: a qualitative content analysis study. BMC Med Educ. 13. November 2024;24(1):1302. Roszipal B, Szelesi G, Ernst M, Hoffelner A, Wagner M. Post-OSCE Self-Assessment Questionnaire (English translation). BMC Medical Education; 2025. Liu MY, Liao LL, Huang YT, Lee YC, Lai IJ. Effectiveness of a scenario-based simulation course on improving the clinical communication skills of dietetic students. BMC Med Educ. 22. Januar 2025;25(1):106. Malekzadeh M, Social Determinant of Health Research Center, Yasuj University of Medical Sciences, Yasuj, Iran, Mohammadi F, Dental School, Yasuj University of Medical Sciences, Yasuj, Iran, Gholami SA, Dental School, Yasuj University of Medical Sciences, Yasuj, Iran, u. a. Evaluation of Clinical Communication Skills in Dental Students with Objective Structured Clinical Examination. J Clinic Care Skill. 1. Dezember 2021;2(4):173–9. Kwame A, Petrucka PM. A literature-based study of patient-centered care and communication in nurse-patient interactions: barriers, facilitators, and the way forward. BMC Nurs. Dezember 2021;20(1):158. Ayed A, Aqtam I, Malak MZ, Toqan D, Hammad BM, Qaddumi J, u. a. Insights into the relationship between emotional intelligence and critical thinking among nursing students. BMC Nurs. 23. August 2025;24(1):1107. Abu Dabrh AM, Waller TA, Bonacci RP, Nawaz AJ, Keith JJ, Agarwal A, u. a. Professionalism and inter-communication skills (ICS): a multi-site validity study assessing proficiency in core competencies and milestones in medical learners. BMC Med Educ. Dezember 2020;20(1):362. McCarrick CA, Moynihan A, McEntee PD, Boland PA, Donnelly S, Heneghan H, u. a. Impact of simulation training on communication skills and informed consent practices in medical students- a randomised controlled trial. BMC Med Educ. 18. Juli 2025;25(1):1078. Høegh-Larsen AM, Gonzalez MT, Reierson IÅ, Husebø SIE, Hofoss D, Ravik M. Nursing students’ clinical judgment skills in simulation and clinical placement: a comparison of student self-assessment and evaluator assessment. BMC Nurs. 9. März 2023;22(1):64. Lu YCA, Lee SH, Hsu MY, Shih FF, Yen WJ, Huang CY, u. a. Effects of Problem-Based Learning Strategies on Undergraduate Nursing Students’ Self-Evaluation of Their Core Competencies: A Longitudinal Cohort Study. IJERPH. 28. November 2022;19(23):15825. Additional Declarations No competing interests reported. Supplementary Files 20251008AdditionalFile1.pdf Cite Share Download PDF Status: Published Journal Publication published 24 Dec, 2025 Read the published version in BMC Medical Education → Version 1 posted Editorial decision: Accepted 06 Nov, 2025 Reviews received at journal 05 Nov, 2025 Reviews received at journal 05 Nov, 2025 Reviewers agreed at journal 05 Nov, 2025 Reviews received at journal 04 Nov, 2025 Reviewers agreed at journal 04 Nov, 2025 Reviewers agreed at journal 04 Nov, 2025 Reviewers invited by journal 04 Nov, 2025 Submission checks completed at journal 04 Nov, 2025 First submitted to journal 08 Oct, 2025 You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-7518013","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":526496566,"identity":"bda1178c-66f1-4f80-a812-b1f3e95da06d","order_by":0,"name":"Benjamin Roszipal","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAABYElEQVRIie2RMUvDQBSAXwgmy9k4XqjQv5DgoGJp/0qOQF1KEQqh4uCVyHU5cG1B8C84OgYOzFLSNSEOLYF2cSi4FKTqFRXT0MHRIR9vuHe8770HD6Ck5D+iKoMV7r0jQ6cy036+tU0odKeiqxSfjINDkweyQlaiHYq6pcju+PImqFuxU1Dyq+STY1WhVkKfkZUsFhny6p2mEew9LD3RqWC3n11Aowb6JMgpp75Cyehxjsy0bfsoanURdrRkGImuhol/NATXpsjNT7GEQoU5VlElbSv+PhOEY9DSzYNVZSBQHYCi0vfXTEWQhDOpfBBuBFq6/lIGbwiuHTCyguKDyQQ6iMGWSkA4OFqqfE+RrYQDuDiFAR63kMnb9ugucgmPCUt4dE5YbeZXkRXaDGdWXpmEc8C9etPQw+nyxWuQwa14ilfeGbnn4ewV9a5qhkGmO2/6y9bRreKNSkpKSkr+wCewUX7KlNb3BQAAAABJRU5ErkJggg==","orcid":"","institution":"Medical University of Vienna","correspondingAuthor":true,"prefix":"","firstName":"Benjamin","middleName":"","lastName":"Roszipal","suffix":""},{"id":526496567,"identity":"fc50a9e8-2b5d-40f4-b5f6-efd164bc3974","order_by":1,"name":"Gabriella Szelesi","email":"","orcid":"","institution":"St. Pölten University of Applied Sciences","correspondingAuthor":false,"prefix":"","firstName":"Gabriella","middleName":"","lastName":"Szelesi","suffix":""},{"id":526496568,"identity":"9b98c8d3-c5d8-40f8-8a0c-edb7b2a67576","order_by":2,"name":"Martin Ernst","email":"","orcid":"","institution":"St. Pölten University of Applied Sciences","correspondingAuthor":false,"prefix":"","firstName":"Martin","middleName":"","lastName":"Ernst","suffix":""},{"id":526496569,"identity":"0fd4e36a-5e2f-45c1-894c-45db6260848a","order_by":3,"name":"Alexander Hoffelner","email":"","orcid":"","institution":"Medical University of Vienna","correspondingAuthor":false,"prefix":"","firstName":"Alexander","middleName":"","lastName":"Hoffelner","suffix":""},{"id":526496570,"identity":"50a4bb83-9b9d-4619-8066-cc93d0f13502","order_by":4,"name":"Michael Wagner","email":"","orcid":"","institution":"Medical University of Vienna","correspondingAuthor":false,"prefix":"","firstName":"Michael","middleName":"","lastName":"Wagner","suffix":""}],"badges":[],"createdAt":"2025-09-02 13:08:23","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-7518013/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-7518013/v1","draftVersion":[],"editorialEvents":[{"content":"https://doi.org/10.1186/s12909-025-08279-0","type":"published","date":"2025-12-24T15:57:35+00:00"}],"editorialNote":"","failedWorkflow":false,"files":[{"id":93412069,"identity":"b0e410d7-59e6-45f8-9f3c-13541e16b06e","added_by":"auto","created_at":"2025-10-13 14:40:41","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":83058,"visible":true,"origin":"","legend":"\u003cp\u003eMean self- and examiner-ratings across competence domains\u003c/p\u003e","description":"","filename":"1.png","url":"https://assets-eu.researchsquare.com/files/rs-7518013/v1/4b960d268ec3917e48465aa6.png"},{"id":99172906,"identity":"0ffee928-b3c6-4543-be64-1fb1897a3339","added_by":"auto","created_at":"2025-12-29 16:12:05","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":1251562,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-7518013/v1/2d80cf9f-c274-410b-a5e8-be8cb86b08f4.pdf"},{"id":93412075,"identity":"9a6c5c37-a876-42f0-afcc-4be1267dea01","added_by":"auto","created_at":"2025-10-13 14:40:41","extension":"pdf","order_by":1,"title":"","display":"","copyAsset":false,"role":"supplement","size":232491,"visible":true,"origin":"","legend":"","description":"","filename":"20251008AdditionalFile1.pdf","url":"https://assets-eu.researchsquare.com/files/rs-7518013/v1/ff722b69fd43e0689824cf0f.pdf"}],"financialInterests":"No competing interests reported.","formattedTitle":"Competency Development in Early Nursing Training: A Cross-Sectional OSCE Study of Self-Assessment Versus Examiner Ratings","fulltext":[{"header":"Background","content":"\u003cp\u003eCompetency-based nursing education has brought increased attention to the accurate assessment of learners\u0026rsquo; skills and behaviors. In this framework, trainees are expected to demonstrate competence across multiple domains, from clinical knowledge and technical skills to communication and professionalism, while engaging in continuous self-reflection and improvement (1\u0026ndash;4). Within this context, calibration accuracy, defined as the alignment between self-assessed competence and externally assessed performance, serves as a central metacognitive target that links self-evaluation to effective learning\u0026nbsp;(5\u0026ndash;8). Positioning calibration accuracy as a basic metacognitive skill underscores its foundational role in self-regulated learning (SRL) and its relevance for clinical practice.\u003c/p\u003e\n\u003cp\u003eAccurate self-assessment is therefore crucial: nursing students must be able to gauge their own strengths and weaknesses to identify learning needs and achieve competence as independent practitioners. At the same time, evaluations by faculty, examiners, or peers remain a cornerstone of competency judgment in education (9,10). OSCEs, for example, are widely used to objectively measure clinical skills and behaviors in a standardized format (11,12). OSCEs not only test students in various competency domains but also provide an opportunity for comparing students\u0026rsquo; self-evaluations with external evaluations in a structured setting. Ensuring alignment between self- and examiner-based assessment is critical in nursing training, as significant mismatches could impair both learning and patient safety (8,13) and may also affect students\u0026rsquo; professional identity formation and readiness for independent clinical practice (1,3). An overconfident student who overestimates their abilities may not seek or accept needed improvement, whereas an underconfident student might avoid tasks despite being competent (14,15).\u003c/p\u003e\n\u003cp\u003eThe concept of accurate self-assessment is embedded in the broader theoretical models of SRL and metacognition. SRL involves a cyclical process of planning, monitoring, and evaluating one\u0026rsquo;s own learning and performance, with metacognitive accuracy \u0026ndash; knowing how well one has performed \u0026ndash; being central to effective regulation (16,17). Within Zimmerman\u0026rsquo;s SRL framework (18), calibration accuracy operates as a core monitoring mechanism that links performance appraisal to adaptive strategy adjustment and subsequent reflection (19,20). Calibration accuracy, a specific expression of metacognitive accuracy, is a critical metacognitive skill (21). High calibration accuracy allows learners to direct study efforts efficiently, engage in deliberate practice, and seek targeted feedback. Conversely, miscalibration \u0026ndash; either over- or underestimation \u0026ndash; can lead to inappropriate confidence levels and suboptimal learning strategies. Recent theoretical work (6) also situates calibration within the concept of \u0026ldquo;feedback literacy\u0026rdquo; the capacity to understand, internalize, and act upon feedback, which is essential for closing the gap between perceived and actual performance (7,22). Inaccurate self-assessment may therefore not only reflect a metacognitive deficit but also insufficient feedback literacy, particularly in early-stage learners. In the context of nursing education, where clinical decision-making, procedural execution, and communication skills directly impact patient outcomes, the stakes of calibration are particularly high.\u003c/p\u003e\n\u003cp\u003eOSCEs offer a unique methodological advantage for studying self-other agreement due to their structure, standardization, and the replicable environment in which multiple competencies are tested across a variety of stations. This design aligns well with SRL theory by enabling domain-specific examination under controlled conditions (19). This setting also minimizes confounding from case complexity and patient variability that complicate calibration research in real clinical environments. Each station presents clearly defined tasks, typically scored against objective checklists or global rating scales, which reduce ambiguity for examiners and allow for robust performance comparisons. For research on calibration accuracy, the OSCE\u0026rsquo;s compartmentalized structure enables domain-specific analysis, such as comparing communication versus procedural skills, within the same group of students, under identical conditions. This is particularly relevant given evidence that self-assessment accuracy varies by domain (23). Furthermore, because OSCEs occur at fixed points in the curriculum, they provide a natural \u0026ldquo;snapshot\u0026rdquo; of simulation-based performance, offering educators a baseline for targeted feedback interventions.\u003c/p\u003e\n\u003cp\u003ePrior research suggests that nursing and other health professions students often struggle with accurate self-evaluation of their performance (5). Contemporary studies document a tendency toward overestimation biases among trainees, particularly in early stages of training (14,24). Meta-analytic evidence indicates that students are more likely to overestimate their performance in interpersonal or communication-based clinical encounters than in knowledge-based assessments (8,25). However, not all evidence points toward overestimation. Some studies have found that learners, particularly female students, occasionally underestimate their abilities compared to faculty ratings (25). Findings on gender differences are mixed, and factors such as age or prior healthcare experience remain underexplored in nursing cohorts. In addition, emerging work suggests that the quality, recency, and role expectations of prior clinical experience, rather than its mere presence, may shape professional self-concept and metacognitive calibration (23,26,27). It has been suggested that more advanced or experienced students may self-evaluate more accurately than novices, due to greater exposure to feedback (28), but evidence is inconsistent. Most existing studies focus on a single domain or competency area, limiting the understanding of cross-domain differences within the same learner cohort. Moreover, little is known about self-perception accuracy of competence among nursing students at the very beginning of their clinical training. Additionally, very few studies have examined these phenomena during students\u0026rsquo; first OSCE, a critical summative assessment that shapes early self-efficacy and learning strategies. This early stage may represent a window of heightened plasticity in calibration accuracy, making it an ideal intervention point for educators.\u003c/p\u003e\n\u003cp\u003eEstablishing baseline self-other agreement patterns at this stage is critical for designing targeted feedback and reflection interventions within competency-based nursing education. By systematically examining discrepancies across multiple domains and exploring the influence of demographic moderators, the present study seeks to address a gap in the evidence base and contribute to both educational theory and practical curriculum design.\u003c/p\u003e\n\u003cp\u003eHowever, evidence on whether domain-specific self\u0026ndash;examiner discrepancies occur in first-year nursing OSCEs \u0026ndash; and how age, gender, and prior healthcare training relate to these differences \u0026ndash; remains limited. Grounding the study in SRL, we conceptualize calibration accuracy as the monitoring process that aligns self-ratings with external standards. We therefore expected smaller discrepancies in domains with transparent criteria and explored whether learner characteristics relate to calibration early in training. The present study aims to extend previous work by examining nursing students\u0026rsquo; self-assessment versus examiner-based evaluation (self-other) accuracy across multiple competency domains during their first OSCE experience. Specifically, this study investigated whether the magnitude or direction of self-other rating differences varies between professional knowledge/analytical skills, methodological/procedural skills, and social/communication skills, consistent with prior reports of domain-specific variation in self-assessment accuracy on a 0-100% scale within a first-year OSCE (8,25).\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eThe first aim of this study is to quantify the magnitude and direction of self-other assessment differences in the three predefined domains. This will allow for identifying whether students tend to overestimate or underestimate their performance in specific areas of competence and to what extent such discrepancies occur in a standardized OSCE context.\u003c/p\u003e\n\u003cp\u003eThe second aim is to compare these discrepancies across domains to determine whether certain competencies are more prone to miscalibration. Prior research has suggested that self-assessment accuracy is often domain-specific, with interpersonal and communication skills potentially being more overestimated than technical or methodological skills.\u003c/p\u003e\n\u003cp\u003eThe third aim is to examine the influence of gender, age, and prior healthcare training on the magnitude and direction of self-other discrepancies. Although some studies indicate that prior healthcare experience may improve calibration accuracy through greater feedback exposure, findings regarding gender and age are inconsistent, particularly in early-stage nursing education.\u003c/p\u003e\n\u003cp\u003eBy addressing these aims, this study seeks to close key evidence gaps and provide educators with evidence-based guidance for tailoring feedback and self-reflection in competency-based nursing curricula. Enhancing self-assessment accuracy in novice nursing students can support lifelong learning and, ultimately, safer and more effective clinical practice (24,25). More broadly, the work contributes to assessment literacy in health professions education by linking metacognitive accuracy with quality of care. Accordingly, we examine calibration accuracy in first-year students\u0026rsquo; OSCE, comparing self- and examiner ratings across domains and testing associations with age, gender, and prior healthcare training experience (5,8,23,27).\u003c/p\u003e"},{"header":"Methods","content":"\u003cp\u003eThis cross-sectional study analyzed self-assessment and examiner evaluation data from a single OSCE conducted at the end of the second semester in the first year of a Bachelor of Science in Nursing program at a university of applied sciences (name withheld for anonymity). The OSCE followed standardized procedures and was administered in the university\u0026rsquo;s simulation center, which provides a controlled environment for evaluating clinical simulation-based performance. Multiple clinical scenarios were designed to assess students\u0026rsquo; competence across three predefined competency domains, in alignment with the program\u0026rsquo;s competency framework. The OSCE was situated within a summative assessment context, meaning that students\u0026rsquo; performance contributed to their course grade, thereby ensuring high engagement and ecological validity of the performance data.\u003c/p\u003e\n\u003cp\u003eThe study population comprised all students enrolled in the second semester of the Bachelor of Nursing program (n = 109). Inclusion criteria were enrollment in the first year of the program, eligibility to participate in the OSCE, and attendance on the examination day. Students were excluded if they did not complete the post-OSCE self-assessment questionnaire. No participants withdrew after the examination. The decision to include the entire cohort rather than a sample was deliberate, as it minimized sampling bias and allowed for full population-level inference within the institution. Had missing data occurred, listwise deletion would have been applied for analyses involving the missing variables, and missingness patterns would have been examined to ensure randomness.\u003c/p\u003e\n\u003cp\u003eA priori sample size estimation was not performed because the study included the complete eligible cohort. Post hoc (observed) power was calculated at \u0026alpha; = .05 from the reported effect sizes. The repeated-measures ANOVA effects, Rating Type, F(1, 101) = 18.09, \u0026eta;\u0026sup2;ₚ = .152; Competence Domain, F(2, 202) = 94.82, \u0026eta;\u0026sup2;ₚ = .484; and the Domain \u0026times; Rating Type interaction, F(2, 202) = 71.96, \u0026eta;\u0026sup2;ₚ = .416, had very high power (\u0026ge; .99). Polynomial trend analyses showed similar near-unity power (linear: F(1, 101) = 113.57; quadratic: F(1, 101) = 70.87). In contrast, planned paired comparisons for professional (t(102) = \u0026minus;1.69) and methodological domains (t(102) = 0.86) and regression models for methodological (F(1, 101) = 2.38, R\u0026sup2; = .02) and professional discrepancies (F(1, 101) = 0.06, R\u0026sup2; \u0026lt; .01) showed limited power (\u0026asymp; .14\u0026ndash;.39), warranting cautious interpretation of null results. By comparison, the social-domain paired comparison (t(103) = \u0026minus;10.14, d \u0026asymp; .99) and the regression of age on the social discrepancy (F(1, 102) = 7.73, R\u0026sup2; = .07) were adequately to highly powered.\u003c/p\u003e\n\u003cp\u003eAll eligible students who attended the OSCE and met inclusion criteria were invited to participate immediately after the exam, resulting in a complete dataset for all variables. Participation in the research component (post-OSCE questionnaire) was voluntary, had no effect on course grades or OSCE outcomes, and could be declined without penalty; responses were pseudonymized prior to analysis and the linkage file was stored separately with restricted access and was not available to the research team.\u003c/p\u003e\n\u003cp\u003eThe OSCE comprised five stations, each lasting 12 minutes, which integrated elements from three predefined competency domains: professional knowledge/analytical skills, methodological/procedural skills, and social/communication skills. Stations were developed by a panel of nursing faculty and simulation educators to reflect realistic first-year clinical encounters, including skills like patient admission interviews, basic vital signs measurement, intravenous therapy preparation, and patient education. Each scenario was piloted with a faculty stuff prior to data collection to ensure clarity of instructions, feasibility of timing, and adequate domain coverage. Feedback from this pilot phase led to minor adjustments in checklist wording and sequence of tasks to improve validity. Each station contained multiple observable tasks (e.g., \u0026ldquo;Student performs hygienic hand disinfection for 30 seconds\u0026rdquo; or \u0026ldquo;Student explains procedure to patient in an understandable manner\u0026rdquo;), scored by trained examiners using a standardized 3-point scale (0 = not performed, 1 = partially performed, 2 = fully performed). All checklist items were assigned to one of the three competency domains based on expert consensus from thirteen senior faculty members. Items addressing technical execution of clinical procedures were mapped to professional knowledge/analytical skills; those requiring data collection, interpretation, or procedures were mapped to methodological/procedural skills; and those involving interaction, empathy, and verbal/non-verbal communication were mapped to social/communication skills. The domain mapping was documented within the OSCE software as well as in a codebook to ensure transparency and reproducibility of the classification process. Scores per domain were calculated as the sum of achieved points divided by the maximum possible, yielding domain-specific performance percentages.\u003c/p\u003e\n\u003cp\u003eExaminers were faculty members with at least a master\u0026rsquo;s degree in nursing education. Three weeks before the OSCE, they completed a three-hour calibration session covering the competency framework, rating criteria, and scoring exercises with consensus discussions.\u003c/p\u003e\n\u003cp\u003eCalibration sessions were designed according to best-practice recommendations for performance assessment training. To minimize measurement bias, examiners scored independently without discussing ratings during the live OSCE.\u003c/p\u003e\n\u003cp\u003eImmediately after completing the OSCE and before receiving any feedback, students completed a self-developed, structured questionnaire aligned 1:1 with the OSCE stations and competency domains. Because no published instrument mapped one-to-one to the station-specific OSCE checklist and our competency framework, we developed a brief German-language self-assessment questionnaire mirroring those domains (29). For each skill and domain, students rated their performance on an 11-point percentage scale (0-100% in 10% increments; 0% = \u0026ldquo;not competent at all\u0026rdquo;, 100% = \u0026ldquo;fully competent\u0026rdquo;). This format was chosen for its sensitivity to small differences between self- and examiner ratings while remaining intuitive; the 10% steps reduce pseudo-precision yet preserve discrimination. Using percentages also enabled direct numerical comparison with examiner percentage scores generated by the OSCE management system. For analysis, all percentage-based scores (self and examiner) were rescaled to proportions by dividing by 100, yielding values on a 0-1 scale (e.g., 0.75\u0026nbsp;≙\u0026nbsp;75%). The final English translation is provided in Additional file 1.\u003c/p\u003e\n\u003cp\u003eContent validity of the self-assessment questionnaire was established through expert review. Twelve nurse educators within the University\u0026rsquo;s faculty independently appraised item clarity, domain alignment, and coverage against the program\u0026rsquo;s competency framework. A team meeting reconciled the feedback, and minor wording changes (e.g., plain-language anchors, station-specific phrasing) were made. Estimated completion time was ~4-5 minutes. Additionally, Face-validity was examined in a brief cognitive pre-test with nine first-semester students using a four-point comprehensibility scale (1 = not understandable, 4 = well understandable): the mean rating was 3.20, 76.5% of item judgements were \u0026ldquo;understandable\u0026rdquo; or \u0026ldquo;well understandable,\u0026rdquo; and one judgement (1.2%) was \u0026ldquo;not understandable.\u0026rdquo; Internal consistency of the self-assessment scales was acceptable to good across domains. For the professional knowledge/analytical domain, Cronbach\u0026rsquo;s \u0026alpha; = .76, Spearman-Brown split-half = .76, and Guttman split-half = .73. For the methodological/procedural domain, Cronbach\u0026rsquo;s \u0026alpha; = .71, Spearman-Brown split-half = .73, and Guttman split-half = .71. For the social (communication) domain, Cronbach\u0026rsquo;s \u0026alpha; = .80, Spearman-Brown split-half = .82, and Guttman split-half = .79. All scales comprised five station-specific self-ratings rescaled to 0-1.\u003c/p\u003e\n\u003cp\u003eDemographic variables, including gender, age, and prior healthcare training, were obtained from institutional student records. Gender was recorded as reported by the student in official enrollment documents. Prior healthcare training was defined as any formal vocational or academic education in a healthcare profession before enrollment in the nursing program. The primary outcome variable was the self-other discrepancy score, calculated by subtracting examiner ratings from student self-assessments for each domain, with positive scores indicating overestimation and negative scores indicating underestimation.\u003c/p\u003e\n\u003cp\u003eTo address potential sources of bias, the study design sought to minimize selection bias by including all eligible students from the cohort, thereby avoiding volunteer bias. Measurement bias was reduced by using standardized checklists, trained and calibrated examiners, and a post-OSCE self-assessment that directly paralleled the examiner scoring system. Social desirability bias in self-assessments was mitigated by anonymizing responses and administering the questionnaire immediately after the OSCE, before feedback was given. The controlled timing of the self-assessment was intended to capture students\u0026rsquo; immediate, unaided perceptions of their performance, unaltered by external cues or post-hoc rationalizations. Residual bias from subjective interpretation, especially in the social/communication domain, remains possible but was mitigated through examiner training.\u003c/p\u003e\n\u003cp\u003eData analysis was performed using IBM SPSS Statistics (version 29.0; IBM Corp., Armonk, NY, USA). Descriptive statistics (means, standard deviations, frequencies, and percentages) were computed for all variables. Paired-sample t-tests were used to compare self-assessment and examiner ratings within each domain; Wilcoxon signed-rank tests were planned for use if normality assumptions (Shapiro-Wilk test) were violated. Cross-domain comparisons of self-other discrepancies were examined using one-way repeated-measures ANOVA, with Greenhouse-Geisser correction applied if Mauchly\u0026rsquo;s test indicated a violation of sphericity. Demographic influences (gender, age group, prior healthcare training) on discrepancy scores were explored using mixed-model ANOVA, with domain as the within-subjects factor and demographics as between-subjects factors. Bonferroni-adjusted pairwise comparisons were used to control for multiple testing. Effect sizes were reported as Cohen\u0026rsquo;s d for t-tests and partial eta squared (\u0026eta;\u0026sup2;p) for ANOVA, alongside 95% confidence intervals. Statistical significance was set at p \u0026lt; 0.05 (two-tailed). In addition to p-values, interpretation emphasized effect sizes and their confidence intervals to gauge the practical relevance of observed differences, in line with recommendations for transparent reporting in health professions education research.\u003c/p\u003e"},{"header":"Results","content":"\u003cp\u003eOf the 109 students enrolled in the second semester of the Bachelor of Nursing program, five datasets were excluded due to missing self-assessment information. Of the 104 students assessed, 102 provided complete data for the repeated-measures ANOVA (listwise deletion). For the paired t-tests, valid cases ranged between 103 and 104 depending on the domain. Based on the 102 complete data sets, the mean age of participants was 22.96years (SD 3.96, range 19-41), 69.6% identified as female and 30.4% as male, and 41.2% reported prior healthcare training before enrollment (Table 1). The gender distribution was consistent with the broader nursing student population, and the proportion with prior healthcare training was sufficient to examine potential experience-related effects on self-other agreement.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eTable 1. Demographic characteristics of the study cohort (n = 102)\u003c/strong\u003e\u003c/p\u003e\n\u003ctable border=\"0\" cellspacing=\"0\" cellpadding=\"0\" width=\"601\"\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 198px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eVariable\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 134px;\"\u003e\n \u003cp\u003e\u003cstrong\u003en (%)\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 134px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eM (SD)\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 134px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eRange\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 198px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eGender\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 134px;\"\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 134px;\"\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 134px;\"\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 198px;\"\u003e\n \u003cp\u003eFemale\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 134px;\"\u003e\n \u003cp\u003e\u0026nbsp;71 (69.6%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 134px;\"\u003e\n \u003cp\u003e\u0026ndash;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 134px;\"\u003e\n \u003cp\u003e\u0026ndash;\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 198px;\"\u003e\n \u003cp\u003eMale\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 134px;\"\u003e\n \u003cp\u003e\u0026nbsp;31 (30.4%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 134px;\"\u003e\n \u003cp\u003e31 (30.4%) \u0026ndash;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 134px;\"\u003e\n \u003cp\u003e\u0026ndash;\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 198px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eAge (years)\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 134px;\"\u003e\n \u003cp\u003e\u0026ndash;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 134px;\"\u003e\n \u003cp\u003e22.96 (3.96)\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 134px;\"\u003e\n \u003cp\u003e19-41\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 198px;\"\u003e\n \u003cp\u003e\u003cstrong\u003ePrior healthcare training\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 134px;\"\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 134px;\"\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 134px;\"\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 198px;\"\u003e\n \u003cp\u003eYes\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 134px;\"\u003e\n \u003cp\u003e42 (41.2%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 134px;\"\u003e\n \u003cp\u003e\u0026ndash;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 134px;\"\u003e\n \u003cp\u003e\u0026ndash;\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 198px;\"\u003e\n \u003cp\u003eNo\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 134px;\"\u003e\n \u003cp\u003e60 (58.8%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 134px;\"\u003e\n \u003cp\u003e\u0026ndash;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 134px;\"\u003e\n \u003cp\u003e\u0026ndash;\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n\u003c/table\u003e\n\u003cp\u003eNote. Data are presented as mean \u0026plusmn; standard deviation (SD) for continuous variables and n (%) for categorical variables. Prior healthcare training refers to any formal vocational or academic education in a healthcare profession before enrolment.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eAll statistical assumptions for the repeated-measures ANOVA were met; uncorrected degrees of freedom were applied. The 3 (Competence Domain: professional knowledge/analytical skills, methodological/procedural skills, and social/communication skills) \u0026times; 2 (Rating Type: self-other) repeated-measures ANOVA revealed statistically significant main effects for both Rating Type, F(1, 101) = 18.09, p \u0026lt; .001, \u0026eta;\u0026sup2;ₚ = .152, and Competence Domain, F(2, 202) = 94.82, p \u0026lt; .001, \u0026eta;\u0026sup2;ₚ = .484. These effects were qualified by a statistically significant Rating Type \u0026times; Competence Domain interaction, F(2, 202) = 71.96, p \u0026lt; .001, \u0026eta;\u0026sup2;ₚ = .416, demonstrating that the pattern of self-other differences varied across domains. Descriptively, self-ratings were slightly higher than examiner ratings in professional knowledge/analytical skills and slightly lower in methodological competence, while the largest and most consistent difference occurred in social/communication skills, where self-ratings were notably higher (Table 2).\u003c/p\u003e\n\u003cp\u003ePolynomial contrast analyses further supported the domain effects. For the main effect of Competence Domain, both linear and quadratic trends were statistically significant (both p \u0026lt; .001), indicating that ratings increased across domains but not in a strictly linear manner, with disproportionately high values in the social/communication domain (Figure 1). The interaction between Rating Type and Competence Domain also showed significant linear and quadratic components (both p \u0026lt; .001), reflecting those self-other discrepancies increased progressively across domains, peaking in social/communication skills (Table 2).\u003c/p\u003e\n\u003cp\u003eMean scores (\u0026plusmn; standard error) for professional knowledge/analytical skills, methodological/procedural skills, and social/communication skills. Self-ratings are shown as circles and examiner-ratings as squares. All values are proportions (0-1) derived from percentage scores.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eTable 2. Self- and examiner ratings across competence domains\u003c/strong\u003e\u003c/p\u003e\n\u003ctable border=\"0\" cellpadding=\"0\"\u003e\n \u003cthead\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 205px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eCompetence Domain\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 103px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eRating Type\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 123px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eMean (SD)\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 97px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eSE\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 84px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eSelf \u0026ndash; Examiner\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/thead\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 205px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eprofessional knowledge/analytical skills\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 103px;\"\u003e\n \u003cp\u003eOther-rated\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 123px;\"\u003e\n \u003cp\u003e0.6591 (0.0807)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 97px;\"\u003e\n \u003cp\u003e0.0080\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 84px;\"\u003e\n \u003cp\u003e0.0198\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 205px;\"\u003e\u003cbr\u003e\u003c/td\u003e\n \u003ctd style=\"width: 103px;\"\u003e\n \u003cp\u003eSelf-rated\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 123px;\"\u003e\n \u003cp\u003e0.6775 (0.1304)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 97px;\"\u003e\n \u003cp\u003e0.0129\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 84px;\"\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 205px;\"\u003e\n \u003cp\u003e\u003cstrong\u003emethodological/procedural skills\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 103px;\"\u003e\n \u003cp\u003eOther-rated\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 123px;\"\u003e\n \u003cp\u003e0.6661 (0.0818)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 97px;\"\u003e\n \u003cp\u003e0.0081\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 84px;\"\u003e\n \u003cp\u003e-0.0135\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 205px;\"\u003e\u003cbr\u003e\u003c/td\u003e\n \u003ctd style=\"width: 103px;\"\u003e\n \u003cp\u003eSelf-rated\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 123px;\"\u003e\n \u003cp\u003e0.6520 (0.1454)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 97px;\"\u003e\n \u003cp\u003e0.0144\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 84px;\"\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 205px;\"\u003e\n \u003cp\u003e\u003cstrong\u003esocial/communication skills\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 103px;\"\u003e\n \u003cp\u003eOther-rated\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 123px;\"\u003e\n \u003cp\u003e0.6852 (0.0780)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 97px;\"\u003e\n \u003cp\u003e0.0077\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 84px;\"\u003e\n \u003cp\u003e0.1406\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 205px;\"\u003e\u003cbr\u003e\u003c/td\u003e\n \u003ctd style=\"width: 103px;\"\u003e\n \u003cp\u003eSelf-rated\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 123px;\"\u003e\n \u003cp\u003e0.8275 (0.1268)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 97px;\"\u003e\n \u003cp\u003e0.0125\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 84px;\"\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n\u003c/table\u003e\n\u003cp\u003eNote. M = mean; SD = standard deviation; SE = standard error, Self \u0026ndash; Examiner = discrepancy between self- and examiner-rated, (other-related values were subtracted from self-related values), with positive ratings indicating overestimation. Self-rated values represent participants\u0026rsquo; self-assessment scores; other-rated values represent examiner ratings during the OSCE. All values are proportions (0-1) derived from percentage scores.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003ePlanned paired-samples t-tests examined self-other discrepancies within each domain. As shown in Table 3, in the professional domain, ratings did not differ significantly between self and examiner assessments, \u003cem\u003et\u003c/em\u003e(102)=\u0026ndash;1.69, \u003cem\u003ep\u003c/em\u003e=.094, 95% CI [\u0026ndash;0.043, 0.003]. Similarly, no significant difference emerged for methodological skills, \u003cem\u003et\u003c/em\u003e(102)=0.86, \u003cem\u003ep\u003c/em\u003e=.394, 95% CI [\u0026ndash;0.018, 0.045]. In contrast, self-ratings were substantially higher than examiner ratings in the social/communication domain, \u003cem\u003et\u003c/em\u003e(103)=\u0026ndash;10.14, \u003cem\u003ep\u003c/em\u003e\u0026lt;.001, 95% CI [\u0026ndash;0.168, \u0026ndash;0.113], \u003cem\u003ed\u003c/em\u003e=0.99 \u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eTable 3. Paired-samples t-tests comparing self- and examiner-ratings within each competence domain\u003c/strong\u003e\u003c/p\u003e\n\u003ctable border=\"0\" cellpadding=\"0\" width=\"642\"\u003e\n \u003cthead\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003eCompetence Domain\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003et(df)\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003ep-value\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003eMean Difference \u0026plusmn; SD\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003e95% CI\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/thead\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003eprofessional knowledge/analytical skills\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e\u0026minus;1.69 (102)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.094\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e\u0026minus;0.020 \u0026plusmn; 0.119\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e[\u0026minus;0.043, 0.003]\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003emethodological/procedural skills\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.86 (102)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.394\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.014 \u0026plusmn; 0.160\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e[\u0026minus;0.018, 0.045]\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003esocial/communication skills\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e\u0026minus;10.14 (103)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e\u0026lt; .001\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e\u0026minus;0.141 \u0026plusmn; 0.141\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e[\u0026minus;0.168, \u0026minus;0.113]\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n\u003c/table\u003e\n\u003cp\u003eNote. CI = confidence interval; SD = standard deviation. Negative mean differences indicate higher self-ratings relative to examiner ratings. p \u0026lt; .05 considered statistically significant.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eSimple linear regressions examined the relationship between age and self-other discrepancies in each competence domain (Table 4). For social/communication skills, age was a significant negative predictor, F(1, 102) = 7.73, p = .006, R\u0026sup2; = .07, indicating that older students tended to show smaller discrepancies. In contrast, age was not significantly associated with discrepancies in methodological/procedural skills, F(1, 101) = 2.38, p = .126, R\u0026sup2; = .02, or professional knowledge/analytical skills, F(1, 101) = 0.06, p = .814, R\u0026sup2; \u0026lt; .01.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eTable 4. Linear regression analyses predicting self- and examiner-discrepancy scores from age in each competence domain\u003c/strong\u003e\u003c/p\u003e\n\u003ctable border=\"0\" cellpadding=\"0\"\u003e\n \u003cthead\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003eCompetence Domain\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003eB\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003eSE\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd colspan=\"2\"\u003e\n \u003cp\u003e\u003cstrong\u003e\u0026beta;\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003et(df)\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003ep-value\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003e95% CI for B\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003eR\u0026sup2;\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/thead\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003eprofessional knowledge/analytical skills\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e\u0026minus;0.001\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.003\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e\u0026minus;0.02\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd colspan=\"2\"\u003e\n \u003cp\u003e\u0026minus;0.24 (101)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.814\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e[\u0026minus;0.007, 0.005]\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.001\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003emethodological/procedural skills\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e\u0026minus;0.006\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.004\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e\u0026minus;0.15\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd colspan=\"2\"\u003e\n \u003cp\u003e\u0026minus;1.54 (101)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.126\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e[\u0026minus;0.014, 0.002]\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.020\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003esocial/communication skills\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e\u0026minus;0.009\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.003\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e\u0026minus;0.27\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd colspan=\"2\"\u003e\n \u003cp\u003e\u0026minus;2.78 (102)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.006\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e[\u0026minus;0.016, \u0026minus;0.003]\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.070\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n\u003c/table\u003e\n\u003cp\u003eNote. B = unstandardised regression coefficient; SE = standard error; \u0026beta; = standardised coefficient; CI = confidence interval; R\u0026sup2; = proportion of variance explained. Negative coefficients indicate smaller self-other discrepancies with increasing age.\u003c/p\u003e\n\u003cp\u003eIndependent-samples t-tests assessed the influence of prior healthcare training and gender on self-other discrepancies (Table 5). Independent-samples t-tests revealed no significant differences in discrepancy scores by prior healthcare training (professional: \u003cem\u003et\u003c/em\u003e(101)=\u0026ndash;1.42, \u003cem\u003ep\u003c/em\u003e=.160; methodological: \u003cem\u003et\u003c/em\u003e(101)=\u0026ndash;0.16, \u003cem\u003ep\u003c/em\u003e=.875; social: \u003cem\u003et\u003c/em\u003e(102)=\u0026ndash;0.16, \u003cem\u003ep\u003c/em\u003e=.876) or by gender (professional: \u003cem\u003et\u003c/em\u003e(101)=\u0026ndash;1.14, \u003cem\u003ep\u003c/em\u003e=.256; methodological: \u003cem\u003et\u003c/em\u003e(101)=\u0026ndash;1.03, \u003cem\u003ep\u003c/em\u003e=.306; social: \u003cem\u003et\u003c/em\u003e(102)=0.22, \u003cem\u003ep\u003c/em\u003e=.825).\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eTable 5. Independent-samples t-tests comparing self- and examiner- discrepancies by prior healthcare training and gender\u003c/strong\u003e\u003c/p\u003e\n\u003ctable border=\"0\" cellspacing=\"0\" cellpadding=\"0\" width=\"648\"\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd colspan=\"6\" style=\"width: 648px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eA. Healthcare Training\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 187px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eCompetence Domain\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 95px;\"\u003e\n \u003cp\u003e\u003cstrong\u003et(df)\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 68px;\"\u003e\n \u003cp\u003e\u003cstrong\u003ep-value\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 105px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eMD \u0026ndash;Training\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 110px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eMD \u0026ndash;\u003c/strong\u003e\u003c/p\u003e\n \u003cp\u003e\u003cstrong\u003eNo Training\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 83px;\"\u003e\n \u003cp\u003e\u003cstrong\u003e95% CI\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 187px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eprofessional knowledge/analytical\u003c/strong\u003e\u003c/p\u003e\n \u003cp\u003e\u003cstrong\u003eskills\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 95px;\"\u003e\n \u003cp\u003e\u0026minus;1.42 (101)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 68px;\"\u003e\n \u003cp\u003e0.16\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 105px;\"\u003e\n \u003cp\u003e0.04 (.133)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 110px;\"\u003e\n \u003cp\u003e0.006 (.107)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 83px;\"\u003e\n \u003cp\u003e[\u0026minus;.081, .013]\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 187px;\"\u003e\n \u003cp\u003e\u003cstrong\u003emethodological/procedural skills\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 95px;\"\u003e\n \u003cp\u003e\u0026minus;0.16 (101)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 68px;\"\u003e\n \u003cp\u003e0.875\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 105px;\"\u003e\n \u003cp\u003e\u0026minus;0.011 (.147)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 110px;\"\u003e\n \u003cp\u003e\u0026minus;0.016 (.170)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 83px;\"\u003e\n \u003cp\u003e[\u0026minus;.069, .059]\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 187px;\"\u003e\n \u003cp\u003e\u003cstrong\u003esocial/communication\u003c/strong\u003e\u003c/p\u003e\n \u003cp\u003e\u003cstrong\u003eskills\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 95px;\"\u003e\n \u003cp\u003e\u0026minus;0.16 (102)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 68px;\"\u003e\n \u003cp\u003e0.876\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 105px;\"\u003e\n \u003cp\u003e0.143 (.128)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 110px;\"\u003e\n \u003cp\u003e0.139 (.151)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 83px;\"\u003e\n \u003cp\u003e[\u0026minus;.061, .052]\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd colspan=\"6\" style=\"width: 648px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eB. Gender\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 187px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eCompetence Domain\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 95px;\"\u003e\n \u003cp\u003e\u003cstrong\u003et(df)\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 68px;\"\u003e\n \u003cp\u003e\u003cstrong\u003ep-value\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 105px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eMD \u0026ndash; Female\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 110px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eMD \u0026ndash;\u003c/strong\u003e\u003c/p\u003e\n \u003cp\u003e\u003cstrong\u003eMale\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 83px;\"\u003e\n \u003cp\u003e\u003cstrong\u003e95% CI\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 187px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eprofessional knowledge/analytical\u003c/strong\u003e\u003c/p\u003e\n \u003cp\u003e\u003cstrong\u003eskills\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 95px;\"\u003e\n \u003cp\u003e\u0026minus;1.14 (101)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 68px;\"\u003e\n \u003cp\u003e0.256\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 105px;\"\u003e\n \u003cp\u003e0.011\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 110px;\"\u003e\n \u003cp\u003e0.04\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 83px;\"\u003e\n \u003cp\u003e[\u0026ndash;0.0796, 0.0214]\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 187px;\"\u003e\n \u003cp\u003e\u003cstrong\u003emethodological/procedural skills\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 95px;\"\u003e\n \u003cp\u003e\u0026minus;1.03 (101)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 68px;\"\u003e\n \u003cp\u003e0.306\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 105px;\"\u003e\n \u003cp\u003e\u0026minus;0.024\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 110px;\"\u003e\n \u003cp\u003e0.012\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 83px;\"\u003e\n \u003cp\u003e[\u0026ndash;0.1048, 0.0332]\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 187px;\"\u003e\n \u003cp\u003e\u003cstrong\u003esocial/communication skills\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 95px;\"\u003e\n \u003cp\u003e0.22 (102)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 68px;\"\u003e\n \u003cp\u003e0.825\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 105px;\"\u003e\n \u003cp\u003e0.143\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 110px;\"\u003e\n \u003cp\u003e0.136\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 83px;\"\u003e\n \u003cp\u003e[\u0026ndash;0.0537, 0.0672]\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n\u003c/table\u003e\n\u003cp\u003eNote. CI = confidence interval; SD = standard deviation. MD = mean discrepancy. Positive mean differences indicate higher self-ratings relative to examiner ratings. p \u0026lt; .05 considered statistically significant.\u0026nbsp;\u003c/p\u003e"},{"header":"Discussion","content":"\u003cp\u003eThe results demonstrated a clear domain-specific pattern: students consistently overestimated their performance only in the social/communication domain, whereas calibration in professional knowledge/analytical and methodological/procedural skills was accurate. Conceptually, these findings map onto SRL: they support the proposition that transparent, criterion-based tasks foster more accurate monitoring, and they localize miscalibration to domains where standards are tacit and relational. This discrepancy was substantial, with a large effect size, underlining its educational importance. By contrast, deviations in the technical and analytical domains were small and non-significant, suggesting that students may be able to judge observable, criterion-based skills better than more subjective interpersonal competencies. From a SRL standpoint, this indicates that calibration accuracy is intact where task criteria are explicit but vulnerable when standards are tacit or relational (5,18,20). Therefore, students may find it easier to accurately assess concrete procedural actions (e.g., \u0026ldquo;I disinfected my hands properly\u0026rdquo;) than more abstract interpersonal behaviors (e.g., \u0026ldquo;I responded appropriately\u0026rdquo;).\u003c/p\u003e\n\u003cp\u003eThis pattern resonates with earlier findings in medical and health professions education, where communication and interpersonal domains are often the most challenging for accurate self-assessment (24,30). Similar results have been reported in pharmacy and medical OSCEs, where students showed closer alignment in procedural or knowledge-based stations but inflated ratings for communication skills (9,31). However, other studies have observed broader overestimation across domains (5,13), including findings consistent with general miscalibration in early learners (8,13), suggesting that the domain-specific effect observed here may reflect both early-stage competency profiles and the structure of the curriculum. Consistent with this interpretation, a brief face-validity check with students indicated that the social/communication self-assessment was the only item that raised clarity concerns, reinforcing that this domain is harder for learners to judge. Interpersonal competencies typically yield weaker, delayed, or more ambiguous feedback signals than checklist-based tasks, which impairs monitoring and fosters overconfidence. Within Zimmerman\u0026rsquo;s SRL model, weak or delayed external cues constrain the monitoring phase, leading novices to rely on global self-beliefs rather than performance evidence. Furthermore, a Dunning-Kruger mechanism is plausible in novices whose limited proficiency constrains insight into performance gaps (13,18,32). Taken together, this pattern supports the calibration-as-monitoring account within SRL and aligns with feedback-literacy perspectives, in which weaker or delayed cues hamper learners\u0026rsquo; ability to judge interpersonal performance accurately.\u003c/p\u003e\n\u003cp\u003eThe non-significant findings in the professional and methodological domains require careful consideration. While the absence of detectable discrepancies could reflect genuine calibration accuracy in more observable, criterion-based competencies, it may also partly be explained by limited statistical power for small effects. Indeed, effect size estimates were close to zero, and confidence intervals were narrow, suggesting that any systematic miscalibration in these domains is likely minimal and of limited educational relevance. By contrast, prior work has sometimes reported broader overestimation across domains (5,23), underscoring that our findings may be context-dependent. In our OSCE, the strong emphasis on explicit checklists and observable behaviors may have facilitated students\u0026rsquo; self-monitoring, thereby reducing miscalibration. Within a SRL framework, task transparency strengthens the monitoring loop and supports better alignment between self-ratings and external criteria (18,20). Within an SRL frame, this corroborates that task transparency strengthens the monitoring loop, making sizable miscalibration unlikely where criteria are explicit.\u003c/p\u003e\n\u003cp\u003eThe absence of gender and prior training effects contrasts with research in medicine and allied health, where male students sometimes report higher self-ratings and prior experience has been associated with improved calibration (23,25,26). In our cohort, two factors likely contributed to null effects: first, the early training stage may reduce gender-patterned differences reported later in curricula; second, \u0026ldquo;prior healthcare training\u0026rdquo; was heterogeneous and may not have included explicit metacognitive practice or structured feedback \u0026ndash; key ingredients for calibration learning (23,26,27,33).Overall, this suggests that the quality, supervision, and reflective depth of prior experiences \u0026ndash; not just the presence \u0026ndash; are what drive better calibration. It is also relevant that the cohort included substantially more women than men, which reduces power to detect gender-related effects. In a mostly female, early-stage cohort, gender differences seen elsewhere may not yet show up or may be smaller because most students are still novices.\u003c/p\u003e\n\u003cp\u003eSeveral explanations are possible: early in the program, gender-related differences may be overshadowed by limited clinical exposure, while the variability in type and quality of prior training may have limited transferability to OSCE performance. In contrast, age emerged as a significant negative predictor of overestimation in communication skills, consistent with studies linking life experience and maturity to more realistic self-appraisal. This age effect is plausibly driven by greater life experience and more frequent corrective feedback, which provide clearer internal standards for interpersonal performance (8,15). With increasing age, emotional intelligence and a more integrated professional self-concept may be more developed, supporting more accurate perception of social cues and self-monitoring (33). In SRL terms, these attributes strengthen forethought and monitoring, thereby reducing overestimation when communication demands are high (19,21). Complementarily, recent evidence suggests that emotional intelligence underpins interpersonal and critical-thinking competencies and may be a prerequisite for accurate self-assessment in the social domain (33), aligning with SRL models in which affective-cognitive resources support effective monitoring and regulation (18,20). Accordingly, curricula that intentionally cultivate emotional intelligence \u0026ndash; for example, through empathy training and guided reflective dialogue \u0026ndash; may also improve calibration for communication-intensive tasks (33).\u003c/p\u003e\n\u003cp\u003eTaken together, these null findings are informative: they indicate that miscalibration in early-stage nursing students is not a generalized phenomenon but clusters in complex interpersonal skills, clarifying boundary conditions for where calibration support is most needed. Even though the explained variance was modest, this finding highlights that non-academic factors may shape calibration, underscoring the importance of considering learner characteristics beyond formal training.\u003c/p\u003e\n\u003cp\u003eEducationally, these findings suggest that calibration should be treated as a metacognitive target, particularly for social/communication skills. However, given the single-cohort, cross-sectional design, any curricular changes should be piloted and evaluated before wider adoption. Evidence from nursing and medical education suggests that such interventions can reduce self-other discrepancies (6,14), although reinforcement is necessary to sustain their impact over time (7,22). Considering these findings, facilitator-led sessions in which students review their own OSCE communication-station recordings alongside examiner checklists and global ratings, debriefing discrepancies with structured prompts, can be used to strengthen calibration. In parallel, standardized vignette calibration, students rate patient-interaction videos and then compare their judgments with expert-consensus anchors to build shared mental models of performance standards and a pre-OSCE self-prediction followed by a post-feedback reassessment using the same instrument provide mechanisms to track within-student calibration shifts across stations and over time. Implemented longitudinally, these steps operationalize SRL cycles, performance monitoring (self-other comparison), and reflection (strategy adjustment) and target the specific monitoring deficits observed in the social domain (5,18). Importantly, OSCEs offer a dual role: beyond summative evaluation, they can serve as formative learning opportunities when feedback is deliberately integrated. This dual potential has been emphasized in the wider literature on competency-based education (1,3).\u003c/p\u003e\n\u003cp\u003eThe interprofessional literature also points to the broader relevance of these findings. Overestimation in communication-related competencies has been observed not only in nursing but also in medical, dental, and psychology trainees (31,34,35). This suggests that the challenge is not unique to one discipline but reflects a more general issue in health professions education, where interpersonal and relational competencies are difficult to self-evaluate. Accordingly, interprofessional initiatives could share calibration assets (e.g., cross-disciplinary vignettes with consensus anchors and debrief guides) to harmonize expectations and feedback practices across programs (32). Addressing this issue through feedback and calibration training could therefore benefit multiple professions and support interprofessional education initiatives (32,36).\u003c/p\u003e\n\u003cp\u003eThis study has several limitations. Its cross-sectional design captures self-other agreement at a single time point, precluding inferences about developmental change. Percentage-based scoring facilitated comparability but may have obscured finer-grained variation within domains. Examiner ratings of interpersonal skills may involve subjective judgments, which could influence discrepancy scores despite prior examiner calibration. Although participation rates were high, exclusion of six students who did not complete the self-assessment introduces a minor risk of self-selection bias. A further consideration is potential common method/context bias: both ratings were anchored to the same OSCE event and collected within a close timeframe, which can inflate shared variance; however, the domain-specific pattern (null effects in technical/analytical domains alongside a large social/communication discrepancy) argues against a uniform method artifact driving the results. In addition, a \u0026ldquo;task transparency\u0026rdquo; effect cannot be ruled out: students likely find it easier to appraise concrete, checklist-based actions than abstract interpersonal behaviors, which may partially account for the observed domain specificity. Finally, findings may not generalize beyond first-year nursing cohorts or to programs with different OSCE structures. Together, these constraints underscore the value of theory-driven, SRL-anchored longitudinal and multimethod designs \u0026ndash; e.g., including delayed self-ratings, independent behavioral indicators, and experimental manipulations of task transparency \u0026ndash; to test mechanisms of calibration change.\u003c/p\u003e\n\u003cp\u003eFuture research should prioritize three directions. First, longitudinal studies are needed to track changes in self-other agreement over time and assess whether calibration improves with experience (21,37). Second, intervention studies should test the effectiveness of structured feedback, guided reflection, and calibration exercises, ideally embedded longitudinally across the curriculum (6,24). Third, comparative studies across institutions and professions would clarify whether the domain-specific patterns observed here are context-dependent or reflect broader challenges in health professions education (2,4). Incorporating qualitative approaches could also enrich understanding of how students interpret performance criteria, particularly in interpersonal domains (34) . Including measures of emotional intelligence and feedback literacy as potential mediators or moderators could explain variance in social-domain calibration and inform targeted scaffolds (6,33). Together, such theory-anchored efforts can move this area from descriptive mapping to testing SRL-consistent mechanisms of calibration and, ultimately, to designing curricula that reliably foster accurate self-assessment and competent professional practice. Together, such efforts would advance the evidence base on how to design curricula that foster accurate self-assessment and, ultimately, competent professional practice.\u003c/p\u003e"},{"header":"Conclusion","content":"\u003cp\u003eIn conclusion, first-year nursing students demonstrated accurate calibration for technical and analytical skills but systematically overestimated social/communication competencies. Age was associated with more realistic self-assessment in this domain, whereas gender and prior healthcare training showed no reliable association. These results indicate that calibration is a relevant metacognitive target, especially for interpersonal skills, but recommendations for curriculum integration should be tested in pilot and longitudinal evaluations before broad implementation. Embedding carefully evaluated calibration activities within OSCEs may support competence development and align with competency-based frameworks. Addressing domain-specific overestimation early in nursing education has the potential to improve self-assessment accuracy and preparedness for practice.\u003c/p\u003e\n\u003cp\u003e\u003cbr\u003e\u003c/p\u003e"},{"header":"Declarations","content":"\u003cp\u003e\u003cstrong\u003eEthics approval and consent to participate\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe study was initially reviewed and cleared by the internal review board of [University name withheld for anonymity] as educational research without patient involvement. Subsequently, external ethics approval was obtained from the Ethikkommission f\u0026uuml;r das Land Nieder\u0026ouml;sterreich (Approval ID: GS3-EK-12/942-2025). Participation in the post-OSCE self-assessment questionnaire was voluntary, and students\u0026rsquo; grades were not affected by the decision to participate or by their responses. All participating students provided informed consent prior to inclusion. Data were pseudonymized before analysis; the linkage file was stored separately with restricted access and was not available to the research team. The research adhered to the ethical principles of the Declaration of Helsinki (latest revision). No patient data or human tissue were involved.\u003cstrong\u003e\u0026nbsp;\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eConsent for publication\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eNot applicable.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAvailability of data and materials\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe datasets generated and/or analyzed during the current study are not publicly available due to institutional data protection policies but are available from the corresponding author on reasonable request.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eCompeting interests\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe authors declare that they have no competing interests.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eFunding\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eNo external funding was received for this research. The study was conducted as part of the regular curriculum evaluation process at the [University name withheld for anonymity].\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAuthors\u0026rsquo; contributions\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eBR conceived the study, designed the methodology, oversaw data collection, and prepared the manuscript. GS assisted with data collection and provided critical review of the manuscript. ME conducted the statistical analyses. AH contributed to the interpretation of the findings and drafting of the manuscript. MW contributed to the interpretation of the findings and drafting of the manuscript. All authors critically revised the manuscript for intellectual content, approved the final version, and agree to be accountable for all aspects of the work.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAcknowledgements\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe authors would like to thank the faculty members who participated as OSCE examiners and the students who volunteered their time for this study. Portions of the manuscript text were refined with the assistance of a large language model (ChatGPT, GPT-5, OpenAI, San Francisco, CA, USA) for language polishing and structural clarity. All outputs were critically reviewed and edited by the authors, who take full responsibility for the final content.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAuthors\u0026rsquo; information\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eNot applicable.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eClinical Trial Registration\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eClinical trial number: not applicable.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\n\u003cli\u003eFrank JR, Danoff D. The CanMEDS initiative: implementing an outcomes-based framework of physician competencies. Medical Teacher. Januar 2007;29(7):642\u0026ndash;7. \u003c/li\u003e\n\u003cli\u003eMonteiro S, McConnell MM. Evaluating the Construct Validity of Competencies: A Retrospective Analysis. MedSciEduc. 8. Mai 2023;33(3):729\u0026ndash;36. \u003c/li\u003e\n\u003cli\u003eWeinberger SE, Pereira AG, Iobst WF, Mechaber AJ, Bronze MS, and the Alliance for Academic Internal Medicine Education Redesign Task Force II*. Competency-Based Education and Training in Internal Medicine. Ann Intern Med. 7. Dezember 2010;153(11):751\u0026ndash;6. \u003c/li\u003e\n\u003cli\u003eZibrowski EM, Singh SI, Goldszmidt MA, Watling CJ, Kenyon CF, Schulz V, u. a. The sum of the parts detracts from the intended whole: competencies and in-training assessments. Medical Education. August 2009;43(8):741\u0026ndash;8. \u003c/li\u003e\n\u003cli\u003eLe\u0026oacute;n SP, Panadero E, Garc\u0026iacute;a-Mart\u0026iacute;nez I. How Accurate Are Our Students? A Meta-analytic Systematic Review on Self-assessment Scoring Accuracy. Educ Psychol Rev. Dezember 2023;35(4):106. \u003c/li\u003e\n\u003cli\u003eMiddleton R, Lewer K, Antoniou C, Pratt H, Bowdler S, Jans C, u. a. Understanding the processes, practices and influences of calibration on feedback literacy in higher education marking: A qualitative study. Nurse Education Today. April 2024;135:106106. \u003c/li\u003e\n\u003cli\u003eStone NJ. Exploring the Relationship between Calibration and Self-Regulated Learning. Educational Psychology Review. Dezember 2000;12(4):437\u0026ndash;75. \u003c/li\u003e\n\u003cli\u003eZheng B, He Q, Lei J. Informing factors and outcomes of self-assessment practices in medical education: a systematic review. Annals of Medicine. 31. Dezember 2024;56(1):2421441. \u003c/li\u003e\n\u003cli\u003eBowers RD, Baker CN, Becker KK, Hamilton JN, Trotta K. Comparison of peer, self, and faculty objective structured clinical examination evaluations in a PharmD nonprescription therapeutics course. Currents in Pharmacy Teaching and Learning. November 2024;16(11):102159. \u003c/li\u003e\n\u003cli\u003eInayah AT, Anwer LA, Shareef MA, Nurhussen A, Alkabbani HM, Alzahrani AA, u. a. Objectivity in subjectivity: do students\u0026rsquo; self and peer assessments correlate with examiners\u0026rsquo; subjective and objective assessment in clinical skills? A prospective study. BMJ Open. Mai 2017;7(5):e012289. \u003c/li\u003e\n\u003cli\u003eHarden RM. Revisiting \u0026lsquo;Assessment of clinical competence using an objective structured clinical examination (OSCE)\u0026rsquo;. Med Educ. April 2016;50(4):376\u0026ndash;9. \u003c/li\u003e\n\u003cli\u003eNyangeni T, Ten Ham-Baloyi W, Van Rooyen DRM. Strengthening the planning and design of Objective Structured Clinical Examinations. Health SA Gesondheid [Internet]. 7. August 2024 [zitiert 13. August 2025];29. Verf\u0026uuml;gbar unter: http://www.hsag.co.za/index.php/hsag/article/view/2693\u003c/li\u003e\n\u003cli\u003eKnof H, Berndt M, Shiozawa T. Prevalence of Dunning-Kruger effect in first semester medical students: a correlational study of self-assessment and actual academic performance. BMC Med Educ. 24. Oktober 2024;24(1):1210. \u003c/li\u003e\n\u003cli\u003eSeidel-Fischer J, Trifunovic-Koenig M, Gerber B, Otto B, Bentele M, Fischer MR, u. a. Interaction between overconfidence effects and training formats in nurses\u0026rsquo; education in hand hygiene. BMC Nurs. 2. Juli 2024;23(1):451. \u003c/li\u003e\n\u003cli\u003eFoster C, Renie P. Changes in students\u0026rsquo; confidence calibration across a sequence of low-stakes confidence assessments. Asian Journal for Mathematics Education. Dezember 2024;3(4):406\u0026ndash;27. \u003c/li\u003e\n\u003cli\u003eKurt E, Eskimez Z. Examining self-regulated learning of nursing students in clinical practice: A descriptive and cross-sectional study. Nurse Education Today. Februar 2022;109:105242. \u003c/li\u003e\n\u003cli\u003eTanimura C, Okuda R, Tokushima Y, Matsumoto Y, Katou S, Miyoshi M, u. a. Examining the reliability and validity of a self-regulated learning strategy scale for undergraduate nursing students and effective factors of self-regulated learning strategies. Nurse Education Today. September 2023;128:105872. \u003c/li\u003e\n\u003cli\u003eZimmerman BJ. Becoming a Self-Regulated Learner: An Overview. Theory Into Practice. Mai 2002;41(2):64\u0026ndash;70. \u003c/li\u003e\n\u003cli\u003eTorrano F, Gonz\u0026aacute;lez-Torres MC. Self-Regulated Learning: Current and Future Directions. Electronic Journal of Research in Educational Psychology. 1. April 2004;2. \u003c/li\u003e\n\u003cli\u003eHemmler YM, Ifenthaler D. Self-regulated learning strategies in continuing education: A systematic review and meta-analysis. Educational Research Review. November 2024;45:100629. \u003c/li\u003e\n\u003cli\u003eDavis E, Wands L. The Power of Choice: Fostering Engagement and Competence in Nursing Students. J Nurs Educ. 12. Februar 2025;1\u0026ndash;4. \u003c/li\u003e\n\u003cli\u003eKolovelonis A, Goudas M, Samara E. The Effects of a Self-Regulated Learning Teaching Unit on Students\u0026rsquo; Performance Calibration, Goal Attainment, and Attributions in Physical Education. The Journal of Experimental Education. 2. Januar 2022;90(1):112\u0026ndash;29. \u003c/li\u003e\n\u003cli\u003eGonsalvez CJ, Riebel T, Nolan LJ, Pohlman S, Bartik W. Supervisor versus self‐assessment of trainee competence: Differences across developmental stages and competency domains. J Clin Psychol. Dezember 2023;79(12):2959\u0026ndash;73. \u003c/li\u003e\n\u003cli\u003eAbraham R, Singaram VS. Self and peer feedback engagement and receptivity among medical students with varied academic performance in the clinical skills laboratory. BMC Med Educ. 28. September 2024;24(1):1065. \u003c/li\u003e\n\u003cli\u003eBodard S, Bouzid D, Ferr\u0026eacute; VM, Carette C, Kivits J, Nguyen Y, u. a. Impact of gender on self-assessment accuracy among fourth-year French medical students on faculty\u0026rsquo;s online Objective Structured Clinical Examinations. BMC Med Educ. 30. Dezember 2024;24(1):1553. \u003c/li\u003e\n\u003cli\u003eYang H, Thompson C, Bland M. The effect of clinical experience, judgment task difficulty and time pressure on nurses\u0026rsquo; confidence calibration in a high fidelity clinical simulation. BMC Med Inform Decis Mak. Dezember 2012;12(1):113. \u003c/li\u003e\n\u003cli\u003eAboalrob W, Ayed A, Malak MZ, Aqtam I. Understanding the influence of self-concept on clinical decision-making among nurses: A cross-sectional study. Rehman N, Herausgeber. PLoS One. 25. August 2025;20(8):e0330905. \u003c/li\u003e\n\u003cli\u003eAlizadeh M, Behshid M, Cheraghi R, Dehghani G. Nursing students\u0026rsquo; experiences of professional competence evaluation by Objective Structured Clinical examination method: a qualitative content analysis study. BMC Med Educ. 13. November 2024;24(1):1302. \u003c/li\u003e\n\u003cli\u003eRoszipal B, Szelesi G, Ernst M, Hoffelner A, Wagner M. Post-OSCE Self-Assessment Questionnaire (English translation). BMC Medical Education; 2025. \u003c/li\u003e\n\u003cli\u003eLiu MY, Liao LL, Huang YT, Lee YC, Lai IJ. Effectiveness of a scenario-based simulation course on improving the clinical communication skills of dietetic students. BMC Med Educ. 22. Januar 2025;25(1):106. \u003c/li\u003e\n\u003cli\u003eMalekzadeh M, Social Determinant of Health Research Center, Yasuj University of Medical Sciences, Yasuj, Iran, Mohammadi F, Dental School, Yasuj University of Medical Sciences, Yasuj, Iran, Gholami SA, Dental School, Yasuj University of Medical Sciences, Yasuj, Iran, u. a. Evaluation of Clinical Communication Skills in Dental Students with Objective Structured Clinical Examination. J Clinic Care Skill. 1. Dezember 2021;2(4):173\u0026ndash;9. \u003c/li\u003e\n\u003cli\u003eKwame A, Petrucka PM. A literature-based study of patient-centered care and communication in nurse-patient interactions: barriers, facilitators, and the way forward. BMC Nurs. Dezember 2021;20(1):158. \u003c/li\u003e\n\u003cli\u003eAyed A, Aqtam I, Malak MZ, Toqan D, Hammad BM, Qaddumi J, u. a. Insights into the relationship between emotional intelligence and critical thinking among nursing students. BMC Nurs. 23. August 2025;24(1):1107. \u003c/li\u003e\n\u003cli\u003eAbu Dabrh AM, Waller TA, Bonacci RP, Nawaz AJ, Keith JJ, Agarwal A, u. a. Professionalism and inter-communication skills (ICS): a multi-site validity study assessing proficiency in core competencies and milestones in medical learners. BMC Med Educ. Dezember 2020;20(1):362. \u003c/li\u003e\n\u003cli\u003eMcCarrick CA, Moynihan A, McEntee PD, Boland PA, Donnelly S, Heneghan H, u. a. Impact of simulation training on communication skills and informed consent practices in medical students- a randomised controlled trial. BMC Med Educ. 18. Juli 2025;25(1):1078. \u003c/li\u003e\n\u003cli\u003eH\u0026oslash;egh-Larsen AM, Gonzalez MT, Reierson I\u0026Aring;, Huseb\u0026oslash; SIE, Hofoss D, Ravik M. Nursing students\u0026rsquo; clinical judgment skills in simulation and clinical placement: a comparison of student self-assessment and evaluator assessment. BMC Nurs. 9. M\u0026auml;rz 2023;22(1):64. \u003c/li\u003e\n\u003cli\u003eLu YCA, Lee SH, Hsu MY, Shih FF, Yen WJ, Huang CY, u. a. Effects of Problem-Based Learning Strategies on Undergraduate Nursing Students\u0026rsquo; Self-Evaluation of Their Core Competencies: A Longitudinal Cohort Study. IJERPH. 28. November 2022;19(23):15825. \u003c/li\u003e\n\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":false,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":true,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":true,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"bmc-medical-education","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"meed","sideBox":"Learn more about [BMC Medical Education](http://bmcmededuc.biomedcentral.com/)","snPcode":"","submissionUrl":"https://www.editorialmanager.com/meed/default.aspx","title":"BMC Medical Education","twitterHandle":"BMC_series","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"em","reportingPortfolio":"BMC Series","inReviewEnabled":true,"inReviewRevisionsEnabled":true},"keywords":"","lastPublishedDoi":"10.21203/rs.3.rs-7518013/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-7518013/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003e\u003cstrong\u003eBackground\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eAccurate self-assessment is a core function of self-regulated learning: learners monitor their performance and adjust strategies accordingly. In nursing education, however, students often misjudge performance, especially in interpersonal communication, indicating gaps in calibration accuracy, the alignment between self- and examiner ratings. Although self-other discrepancies are reported in medicine and allied health, domain-specific patterns early in nursing training remain underexplored. We therefore examined calibration accuracy in first-year students’ OSCEs across professional knowledge/analytical, methodological/procedural, and social/communication domains, and tested whether age, gender, or prior healthcare training were associated with these discrepancies.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eMethods\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eIn this cross-sectional study, a complete cohort of 109 first-year nursing students undertook a standardized OSCE at the end of the second semester. The OSCE was conducted under summative assessment conditions and included stations assessing professional knowledge/analytical skills, methodological/procedural skills, and social/communication skills. Each domain was examiner-rated and self-rated immediately post-exam. Of 109 students, 102 provided complete data for analysis. Discrepancy scores (self minus examiner) were analyzed using repeated-measures ANOVA, paired t-tests, and linear regressions with demographic predictors.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eResults\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eA significant interaction between rating type and competence domain was observed (η²ₚ = .42, p \u0026lt; .001). Calibration was domain-specific: Students calibrated accurately in professional knowledge and methodological skills (both ns), but strongly overestimated social/communication skills, with a large effect size (mean difference −0.14, d = −0.99, p \u0026lt; .001). Age negatively predicted overestimation in social/communication skills (R² = .07, p = .006), while gender and prior healthcare training showed no associations.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eConclusion\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eFirst-year nursing students calibrated well in technical and analytical skills but overestimated interpersonal communication. In SRL terms, transparent checklist criteria support monitoring, whereas implicit interpersonal standards are harder to judge. Age, rather than gender or prior healthcare training, was associated with smaller social-domain gaps. These findings align with prior work and point to interpersonal competence as a cross-disciplinary calibration challenge. Feasible, theory-aligned steps, facilitated video review against checklists, standardized vignette calibration with expert anchors, and pre- to post-feedback self-ratings, should be piloted and evaluated longitudinally before broader adoption.\u003c/p\u003e","manuscriptTitle":"Competency Development in Early Nursing Training: A Cross-Sectional OSCE Study of Self-Assessment Versus Examiner Ratings","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-10-13 14:40:36","doi":"10.21203/rs.3.rs-7518013/v1","editorialEvents":[{"type":"communityComments","content":0},{"type":"decision","content":"Accepted","date":"2025-11-06T05:55:38+00:00","index":"","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2025-11-05T19:10:25+00:00","index":"hide","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2025-11-05T08:48:52+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"26894658056195805463636077448965487389","date":"2025-11-05T08:43:06+00:00","index":"hide","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2025-11-04T16:24:29+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"86511294626650947051761847890470448796","date":"2025-11-04T16:21:54+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"45544379234633952188434236902755769049","date":"2025-11-04T13:50:25+00:00","index":"hide","fulltext":""},{"type":"reviewersInvited","content":"","date":"2025-11-04T13:37:55+00:00","index":"","fulltext":""},{"type":"checksComplete","content":"","date":"2025-11-04T06:47:58+00:00","index":"","fulltext":""},{"type":"submitted","content":"BMC Medical Education","date":"2025-10-08T11:46:52+00:00","index":"","fulltext":""}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"bmc-medical-education","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"meed","sideBox":"Learn more about [BMC Medical Education](http://bmcmededuc.biomedcentral.com/)","snPcode":"","submissionUrl":"https://www.editorialmanager.com/meed/default.aspx","title":"BMC Medical Education","twitterHandle":"BMC_series","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"em","reportingPortfolio":"BMC Series","inReviewEnabled":true,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"c0a4f84e-25eb-450a-baa4-047e5c5c58d8","owner":[],"postedDate":"October 13th, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"published-in-journal","subjectAreas":[],"tags":[],"updatedAt":"2025-12-29T16:10:43+00:00","versionOfRecord":{"articleIdentity":"rs-7518013","link":"https://doi.org/10.1186/s12909-025-08279-0","journal":{"identity":"bmc-medical-education","isVorOnly":false,"title":"BMC Medical Education"},"publishedOn":"2025-12-24 15:57:35","publishedOnDateReadable":"December 24th, 2025"},"versionCreatedAt":"2025-10-13 14:40:36","video":"","vorDoi":"10.1186/s12909-025-08279-0","vorDoiUrl":"https://doi.org/10.1186/s12909-025-08279-0","workflowStages":[]},"version":"v1","identity":"rs-7518013","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-7518013","identity":"rs-7518013","version":["v1"]},"buildId":"8U1c8b4HqxoKbykW_rLl7","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

Ask this paper AI returns verbatim quotes from the full text · source: preprint-html

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2025) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc
last seen: 2026-05-20T01:45:00.602351+00:00