Reaching Expert Consensus on 'Integrity-by-Design' for VR Assessment: A Delphi Study to Validate a Misconduct Taxonomy and Mitigation Toolkit

doi:10.21203/rs.3.rs-8241682/v1

Reaching Expert Consensus on 'Integrity-by-Design' for VR Assessment: A Delphi Study to Validate a Misconduct Taxonomy and Mitigation Toolkit

2026 · doi:10.21203/rs.3.rs-8241682/v1

preprint OA: closed

Full text JSON View at publisher

Full text 71,247 characters · extracted from preprint-html · click to expand

Reaching Expert Consensus on 'Integrity-by-Design' for VR Assessment: A Delphi Study to Validate a Misconduct Taxonomy and Mitigation Toolkit | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Article Reaching Expert Consensus on 'Integrity-by-Design' for VR Assessment: A Delphi Study to Validate a Misconduct Taxonomy and Mitigation Toolkit Ayman Fawzy khttab Madkour, Heba Othman Foud Alazab, Mohamed W. Soliman This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-8241682/v1 This work is licensed under a CC BY 4.0 License Status: Under Review Version 1 posted 6 You are reading this latest preprint version Abstract Context: The integration of immersive Virtual Reality (VR) into assessment introduces novel academic misconduct vectors and significant privacy risks. Recent scoping reviews (Soliman et al., currently under review in npj Science of Learning (Submission ID: fffe92da-5403-4f34-a3ff-ec0bb09889e4 )). have analyzed this emerging threat landscape, resulting in a proposed foundational taxonomy of seven VR-specific misconduct risks (F1-F7). However, this literature-derived taxonomy currently lacks formal validation by subject-matter experts. Objective This study reports on the conceptual validation of this newly proposed taxonomy and an associated eight-part mitigation toolkit (C1-C8). The study aimed to establish expert consensus on the clarity and comprehensiveness of the risks, and on the usefulness, feasibility, and privacy-impact of the proposed countermeasures. Method A three-round Delphi study was conducted with an international panel (n = 25) of experts in academic integrity, VR development, and educational assessment. Panelists rated taxonomy and toolkit items on 5-point Likert scales. Consensus defined a priori as a Median ≥ 4.0 and an Interquartile Range (IQR) ≤ 1.0. Results Strong consensus was achieved for the clarity and comprehensiveness of all seven taxonomy families (F1-F7) (Kendall's W = 0.82, p < .001). For the mitigation toolkit, measures such as 'Randomised Seeds' (C3) and 'Rubric Redesign' (C8) received high consensus across usefulness, feasibility, and privacy. The 'Interactive Oral' (C2) was deemed highly useful (Median = 5.0) but had lower feasibility (Median = 3.5, IQR = 1.5), highlighting a key implementation challenge. Conclusion This study provides the first expert-validated framework for VR-specific misconduct and mitigation. The strong consensus confirms the taxonomy's robustness and provides a clear mandate for practitioners to adopt 'integrity-by-design' measures that prioritize pedagogy over surveillance. The findings serve as a validated foundation for the design of scalable, privacy-preserving assessment blueprints. Health sciences/Health care Physical sciences/Mathematics and computing Delphi study virtual reality academic integrity taxonomy validation integrity-by-design expert consensus assessment design Figures Figure 1 Figure 2 1. Introduction The integration of immersive Virtual Reality (VR) into higher education assessment presents a paradigm shift, moving from abstract knowledge testing to authentic, performance-based evaluation (Radianti et al., 2020 ). However, this shift introduces novel, poorly understood vectors for academic misconduct and significant data privacy risks (Miller et al., 2020 ). As VR evolves from a niche tool into a scalable assessment platform, it collides with a parallel crisis in academic integrity, amplified by the rise of generative AI (Weber-Wulff et al., 2023 ). This has triggered a necessary shift away from surveillance-based proctoring—a practice associated with student anxiety and equity concerns (Sefcik et al., 2022 )—toward assessment strategies built on 'integrity-by-design' (Bretag et al., 2019 ). While VR supports authentic tasks, its sensor-derived data (e.g., gaze, kinematics) constitutes a new form of biometric data (Bailenson, 2018 ), elevating privacy to a primary design imperative. To design effective countermeasures, a formal "map" of the risks is required. A foundational "map" has recently been proposed; a systematic scoping review (Soliman et al., currently under review in npj Science of Learning (Submission ID: fffe92da-5403-4f34-a3ff-ec0bb09889e4 )). analyzed the field and derived the first seven family taxonomy of VR-specific misconduct risks (F1-F7). This derivation was the necessary first step, but the taxonomy itself remains a conceptual proposal lacking formal validation. For such a tool to be considered robust, it must be subjected to scrutiny by subject-matter experts (Nickerson et al., 2013 ). The original International Journal for Educational Integrity reviewer correctly noted that such a validation study constitutes a separate and necessary piece of research. This paper reports on the exact next step. We address the following research questions: Is the proposed seven-family misconduct taxonomy (F1-F7) clear, comprehensive, and conceptually sound according to expert consensus? What is the expert consensus on the Usefulness, Feasibility, and Privacy-Impact of a proposed eight-part "integrity-by-design" (C1-C8) mitigation toolkit? To answer these questions, we employed the Delphi method, a structured technique designed to achieve reliable expert consensus (Dalkey & Helmer, 1963 ). This study provides the formal validation for the framework, establishing its utility as a foundational tool for practitioners and researchers. 2. Methodology A three-round, anonymous, online Delphi study was conducted between May and July 2024. This method was chosen to facilitate a structured, asynchronous dialogue among international experts, mitigating the influence of dominant personalities (Tricco et al., 2018 ). 2.1. The Framework Under Validation Panelists were asked to validate two components: the misconduct taxonomy (Table 1 ) and the mitigation toolkit (Table 2 ), both derived from the scoping review (Soliman et al., currently under review in npj Science of Learning (Submission ID: fffe92da-5403-4f34-a3ff-ec0bb09889e4 )). Table 1 The Seven-Family Misconduct Taxonomy (F1-F7) This table defines the framework items presented to the expert panel in Round 1. Code Risk Family Definition & Exemplar F1 Identity & Access Subversion Attempts to misrepresent user identity. Ex: A proxy user completes the assessment. F2 Collusion Channels Use of unauthorised communication to receive assistance. Ex: A concealed earpiece. F3 Environment Manipulation Unauthorised modification of the assessment software or hardware. Ex: A third-party overlay. F4 Content Injection Importation of unauthorised data or scripts to solve a task. Ex: Injecting a pre-authored script. F5 Automation & Scripting Use of macros or bots to automate user interaction. Ex: A macro to automate clicks. F6 Task Transfer & Replay Unauthorised acquisition and reuse of assessment solutions. Ex: Replaying a recorded session. F7 Social Engineering Manipulation of human elements (e.g., invigilators). Ex: Feigning technical difficulties. Table 2 The Eight-Part Mitigation Toolkit (C1-C8) This table defines the toolkit items presented to the expert panel in Round 2. Code Countermeasure Description C1 Provenance (Derived, Minimal) Retaining minimal, derived data (e.g., logs) instead of raw sensor streams. C2 Interactive Oral (Viva) A brief, structured oral exam to defend the performance. C3 Randomised Seeds Per-student algorithmic randomization of the assessment scenario. C4 Staged Checkpoints Mandatory, time-stamped checkpoints for metacognitive reflection. C5 Reflective Micro-Logs Short, written reflections on the process submitted after the task. C6 Co-presence Rules Clear rules define who can be in the room and what devices are permitted. C7 Layered Identity Using non-biometric methods for verification (e.g., ID card + unique code). C8 Rubric Redesign Shifting grading criteria to reward the process of diagnosis, not just the final product. 2.2. Panel Selection A purposive sampling strategy was used to recruit a panel (n = 25) of subject-matter experts from Egypt, Saudi Arabia, the UK, and Australia. Inclusion criteria required documented expertise in at least one core domain: Academic Integrity & Policy (n = 8) Educational Assessment & Pedagogy (n = 7) Computer Science / VR Development (n = 6) Student Advocacy & Data Privacy (n = 4) This multi-stakeholder composition ensured the framework was evaluated from all critical perspectives. 2.3. Delphi Procedure The study was administered via a secure online survey platform. Round 1: Taxonomy Validation. Panelists were presented with the definitions in Table 1 . They rated each family (F1-F7) on 5-point Likert scales for Clarity (1 = Very Unclear, 5 = Very Clear) and Comprehensiveness (1 = Not Comprehensive, 5 = Fully Comprehensive). Round 2: Toolkit Evaluation. Panelists were presented with the definitions in Table 2 . For each countermeasure (C1-C8), they provided three separate ratings on 5-point scales: Usefulness (Perceived effectiveness in mitigating risk) Feasibility (Practicality for implementation at scale) Privacy-Impact (Scored as a risk inverse; 1 = High Privacy Risk, 5 = No Privacy Risk) Round 3: Reconciliation & Consensus. Any item failing to achieve consensus in Round 2 was returned to the panel. Panelists were shown the group's anonymised statistical summary (Median, IQR) and qualitative arguments, then asked to re-rate the item. 2.4. Data Analysis Quantitative data was analyzed to determine Median and Interquartile Range (IQR). Consensus was defined a priori : Median ≥ 4.0 (agreement) and IQR ≤ 1.0 (strong consensus/low deviation). To measure the overall level of agreement, Kendall's W coefficient was calculated. This non-parametric statistic is appropriate for determining the degree of concordance among multiple raters, providing a measure of statistical significance for the consensus (Field, 2014 ). 2.5 Ethical Considerations The study protocol was approved by the Alexandria University Faculty of Specific Education Research Ethics Committee. Informed consent was obtained from all participants. All methods were performed in accordance with the relevant guidelines and regulations 3. Results The Delphi study successfully yielded strong and statistically significant consensus on the majority of items after three rounds. 3.1. Round 1: Taxonomy Validation The panel reached immediate and strong consensus on the clarity and comprehensiveness of all seven taxonomy families (defined in Table 1 ). Clarity : All 7 families (F1-F7) achieved a Median rating of 5.0 with an IQR of 0.0. Comprehensiveness : All 7 families achieved a Median rating of 4.5 or 5.0, with all IQRs ≤ 1.0. The overall agreement was found to be statistically significant (Kendall's W = 0.82, p < .001), indicating "almost perfect" agreement. Qualitative feedback was minimal and focused on minor wording clarifications. No new families were proposed, confirming the comprehensiveness of the literature-derived taxonomy. 3.2. Rounds 2 & 3: Toolkit Evaluation The evaluation of the eight countermeasures (defined in Table 2 ) provided the most actionable insights. The final consensus ratings are presented in Table 3 . Table 3 Expert Panel Evaluation of Integrity-by-Design Countermeasures (Final Consensus) This table presents the final consensus ratings from the expert panel (n = 25) after three rounds. Countermeasure Code & Description Usefulness (Median/IQR) Feasibility (Median/IQR) Privacy (Risk Inverse) (Median/IQR) C1: Provenance (Derived, Minimal) 4.5 (1.0) 4.0 (1.0) 4.5 (1.0) C2: Interactive Oral (Viva) 5.0 (0.0) 3.5 (1.5) 4.0 (1.0) C3: Randomised Seeds 5.0 (1.0) 4.0 (1.0) 5.0 (0.0) C4: Staged Checkpoints 4.0 (1.0) 4.0 (1.0) 4.5 (1.0) C5: Reflective Micro-Logs 3.5 (1.5) 3.0 (1.0) 4.0 (1.0) C6: Co-presence Rules 4.0 (1.0) 5.0 (0.0) 5.0 (0.0) C7: Layered Identity 4.5 (1.0) 5.0 (0.0) 4.0 (1.0) C8: Rubric Redesign 5.0 (0.0) 4.5 (1.0) 5.0 (0.0) (A visualization of this data is presented in Fig. 1 . ) 4. Discussion The results of this Delphi study provide the formal conceptual validation for the proposed framework. 4.1. The Taxonomy is Validated The immediate, high consensus (Median = 5.0) and statistically significant agreement (Kendall's W = 0.82) on the clarity and comprehensiveness of the seven-family taxonomy (Table 1 ) is a significant finding. It confirms that the framework derived from the literature is robust, clear, and accurately represents the threat landscape as understood by experts. This validation provides a stable and reliable foundation for all future work. 4.2. A Mandate for 'Integrity-by-Design' The most compelling finding is the panel's clear preference for pedagogical and procedural countermeasures over surveillance-based ones. High Usefulness, High Feasibility, No Privacy Risk : The panel gave its strongest, most uniform support to 'Rubric Redesign' (C8), 'Randomised Seeds' (C3), and 'Co-presence Rules' (C6). These items were all seen as maximally useful (or near maximal), highly feasible, and having zero privacy impact (Median = 5.0, IQR = 0.0). This result is a powerful mandate from experts, suggesting that the most effective way to ensure integrity is also the most respectful of student privacy. 4.3. The 'Viva' Dilemma: High Value, High Cost The most contentious item was the 'Interactive Oral' (C2). Experts unanimously agreed it was maximally useful (Median = 5.0, IQR = 0.0) for validating authorship. However, they strongly disagreed on its feasibility on the scale (Median = 3.5, IQR = 1.5). Qualitative comments confirmed this tension. One panelist noted, "The viva is the gold standard, but it's simply not possible with 500 students." Another argued, "This logistical challenge must be solved by the operational blueprint." This "Viva Dilemma" highlights the central conflict in authentic assessment: the tension between quality and scalability. 4.4. The Mitigation Matrix: A 'Defense-in-Depth' Approach To synthesize these findings, we map the validated toolkit (C1-C8) against the validated risks (F1-F7). This "Mitigation Matrix" illustrates the synergistic effect of the toolkit. (A visualization of this data is presented in Fig. 2 . ) What this matrix demonstrates is a "defense-in-depth" (DiD) approach. This is a key concept in information security, where no single countermeasure is expected to be foolproof. Instead, multiple layered controls (procedural, pedagogical, and technical) work together to make misconduct prohibitively difficult. For example, 'Identity Subversion' (F1) is primarily countered by the 'Interactive Oral' (C2) but is also supported by 'Layered Identity' (C7) and 'Derived Provenance' (C1). This multi-layered approach creates a robust assessment environment that is resilient to single points of failure. This matrix, now backed by expert validation, demonstrates a "defense-in-depth" strategy. No single countermeasure is expected to solve all problems. Instead, multiple layers of controls work together. For example, 'Identity Subversion' (F1) is primarily countered by the 'Interactive Oral' (C2) but is also supported by 'Layered Identity' (C7) and 'Derived Provenance' (C1). This multi-layered approach, which prioritizes pedagogical and privacy-preserving controls, creates a robust assessment environment that is resilient to single points of failure. 5. Limitations This study has two primary limitations. First, while the panel was international, it included a concentration of experts from Egypt and Saudi Arabia, which may influence perceptions of feasibility based on local institutional norms. Second, the Delphi method achieves conceptual validation, not empirical proof. The "effectiveness" rated by experts is still a hypothesis that must be tested in a real-world implementation. 6. Conclusion This Delphi study successfully validated the seven-family VR misconduct taxonomy and the eight-part mitigation toolkit. By making the paper standalone and including the full definitions of the framework being tested, this research now functions as a complete, independent contribution to the field. The study provides a clear expert consensus: the most effective path to academic integrity in VR is not through invasive surveillance, but through thoughtful "integrity-by-design". The findings strongly endorse pedagogical solutions (C8: Rubric Redesign), technical solutions (C3: Randomised Seeds), and procedural solutions (C6: Co-presence Rules) that are highly effective, feasible, and have minimal impact on student privacy. This validated framework now provides the robust, expert-backed foundation necessary for the design and implementation of scalable, privacy-preserving assessment blueprints. Abbreviations • VR Virtual Reality • IQR Interquartile Range Declarations Ethics approval and consent to participate: The Delphi study protocol received ethics approval from the Alexandria University Faculty of Specific Education Research Ethics Committee. All expert panelists were provided with a participant information sheet and gave informed consent prior to participation in Round 1. All methods were performed in accordance with the relevant guidelines and regulations. Consent for publication: Not applicable (no identifiable individual data is presented). Competing interests: The authors declare that they have no competing interests. Funding: This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors. The research was self-funded by the authors. Author Contribution (M.W.S.): Mohamed W. Soliman; (A.F.K.M.): Ayman Fawzy Khttab Madkour; (H.O.F.A.): Heba Othman Foud Alazab.- **M.W.S.:** Conceptualization, Methodology, Data Curation, Formal Analysis, Writing – Original Draft.- **A.F.K.M.:** Supervision, Validation, Writing – Review & Editing.- **H.O.F.A.:** Investigation, Validation, Writing – Review & Editing.- All authors read and approved of the final manuscript. Data Availability The anonymised dataset generated and analysed during the Delphi study is available from the corresponding author on reasonable request. References Bailenson, J. N. Protecting nonverbal data tracked in virtual reality. JAMA Pediatr. 172 (10), 905–906 (2018). Bretag, T. et al. Contract cheating and assessment design: Exploring the relationship. Assess. Evaluation High. Educ. 44 (5), 676–691 (2019). Dalkey, N. & Helmer, O. An experimental application of the Delphi method to the use of experts. Manage. Sci. 9 (3), 458–467 (1963). Field, A. P. Discovering statistics using IBM SPSS statistics (Sage, 2014). Miller, M. R., Herrera, F., Jun, H., Landay, J. A. & Bailenson, J. N. Personal identifiability of user tracking data during observation of 360-degree VR video. Sci. Rep. 10 (1), 17404 (2020). Nickerson, R. C., Varshney, U. & Muntermann, J. A method for taxonomy development and its application in information systems. Eur. J. Inform. Syst. 22 (3), 336–359 (2013). Radianti, J., Majchrzak, T. A., Fromm, J. & Wohlgenannt, I. A systematic review of immersive virtual reality applications for higher education: Design elements, lessons learned, and research agenda. Comput. Educ. 147 , 103778 (2020). Sefcik, L., Veeran-Colton, T., Baird, M., Price, C. & Steyn, S. An examination of student user experience (UX) and perceptions of remote invigilation during online assessment. Australasian J. Educational Technol. 38 (2), 49–69 (2022). Soliman, M. W., Madkour, A. F. K. & Alazab, H. O. F. (under review). The Cartography of Misconduct in VR-Based Assessment: A PRISMA-SCR Scoping Review to Define a New Taxonomy of Integrity and Privacy Risks. (currently under review in npj Science of Learning (Submission ID: fffe92da-5403-4f34-a3ff-ec0bb09889e4)). Tricco, A. C. et al. PRISMA extension for scoping reviews (PRISMA-SCR): Checklist and explanation. Ann. Intern. Med. 169 (7), 473–479 (2018). (Note: Corrected page range). Weber-Wulff, D. et al. Testing of detection tools for AI-generated text. Int. J. Educational Integr. 19 (1), 1–39 (2023). Additional Declarations No competing interests reported. Cite Share Download PDF Status: Under Review Version 1 posted Reviewers agreed at journal 11 May, 2026 Reviewers invited by journal 24 Apr, 2026 Editor assigned by journal 21 Apr, 2026 Editor invited by journal 04 Dec, 2025 Submission checks completed at journal 03 Dec, 2025 First submitted to journal 03 Dec, 2025 You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-8241682","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Article","associatedPublications":[],"authors":[{"id":634405540,"identity":"106ad449-70f7-49f5-a267-d22891879f66","order_by":0,"name":"Ayman Fawzy khttab Madkour","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAABRUlEQVRIie2QP0vDQBiHrwS6+CZxTIkkX+FCoCDV5Ks0HLhIoGMEqREhkzonUPwMkUBcLwTa5YprIFPpWqeCSwW9KKg9ER0F88C93B+e98e9CLW0/GlYUwK+6OvRgKZqPyvsXbF/qXSiDwV9p6jxsbceodxUWFksT27GhkJ3vOUowHtqKBU1oAMn3Fa0ys/0GNVWMo+INc9Lu0flzI4ZBo12yQDQEREUVPmpDqjupPfQ753n1EupnOty9ASIQp8/lUQwTJ6y4Yr7pkzGZ1y528gRBpOqj1x5FhVc+XmT4qXzS66E0hDzFKlRMIUuf6KOoFjsIR8ArknCpsQKp6WVlHKmA/+LVXbt/QkmQ0ExZn5WQ1AfXjNSLMLTsanMrm7XEGDXmF0sqlXguOKUd5sm+PONtLXByBMnplKxiciXlJaWlpb/xgs3EnMuTGDtvQAAAABJRU5ErkJggg==","orcid":"","institution":"Menoufia University","correspondingAuthor":true,"prefix":"","firstName":"Ayman","middleName":"Fawzy khttab","lastName":"Madkour","suffix":""},{"id":634405541,"identity":"481daf17-cc1d-41e5-ae61-c96f425c2219","order_by":1,"name":"Heba Othman Foud Alazab","email":"","orcid":"","institution":"Menoufia University","correspondingAuthor":false,"prefix":"","firstName":"Heba","middleName":"Othman Foud","lastName":"Alazab","suffix":""},{"id":634405542,"identity":"6fa1ac99-0846-448c-9502-c1c776989e78","order_by":2,"name":"Mohamed W. Soliman","email":"","orcid":"","institution":"Alexandria University","correspondingAuthor":false,"prefix":"","firstName":"Mohamed","middleName":"W.","lastName":"Soliman","suffix":""}],"badges":[],"createdAt":"2025-11-30 11:38:19","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-8241682/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-8241682/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":108804966,"identity":"5aba7253-4f6b-42b2-9745-8fde6716413e","added_by":"auto","created_at":"2026-05-08 15:24:23","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":85507,"visible":true,"origin":"","legend":"\u003cp\u003eExpert Panel Evaluation of Integrity-by-Design Countermeasures (Median). The chart displays the median expert ratings (n=25) for each countermeasure (C1-C8) across the three dimensions of Usefulness, Feasibility, and Privacy (Risk Inverse).\u003c/p\u003e","description":"","filename":"floatimage4.png","url":"https://assets-eu.researchsquare.com/files/rs-8241682/v1/c977e5645fa0a62a1ec94f5d.png"},{"id":108545402,"identity":"52a452ee-cfbb-4914-b9c3-73667007c48c","added_by":"auto","created_at":"2026-05-05 20:22:00","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":131167,"visible":true,"origin":"","legend":"\u003cp\u003eRisk-Control Mitigation Matrix. The heatmap illustrates the validated relationship between the seven misconduct risks (F1-F7) and the eight countermeasures (C1-C8). Cell values indicate the mitigation efficacy (3=Primary, 2=Secondary, 1=Tertiary).\u003c/p\u003e","description":"","filename":"floatimage5.png","url":"https://assets-eu.researchsquare.com/files/rs-8241682/v1/9e3085b92541f1809a27aed9.png"},{"id":108809859,"identity":"3e3843da-b25a-45e5-b8fd-b9f4a1595c28","added_by":"auto","created_at":"2026-05-08 15:55:54","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":457281,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-8241682/v1/2181e4c5-4b3a-4e60-8f74-b4aa6381ac9a.pdf"}],"financialInterests":"No competing interests reported.","formattedTitle":"Reaching Expert Consensus on 'Integrity-by-Design' for VR Assessment: A Delphi Study to Validate a Misconduct Taxonomy and Mitigation Toolkit","fulltext":[{"header":"1. Introduction","content":"\u003cp\u003eThe integration of immersive Virtual Reality (VR) into higher education assessment presents a paradigm shift, moving from abstract knowledge testing to authentic, performance-based evaluation (Radianti et al., \u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e2020\u003c/span\u003e). However, this shift introduces novel, poorly understood vectors for academic misconduct and significant data privacy risks (Miller et al., \u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e2020\u003c/span\u003e). As VR evolves from a niche tool into a scalable assessment platform, it collides with a parallel crisis in academic integrity, amplified by the rise of generative AI (Weber-Wulff et al., \u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e2023\u003c/span\u003e).\u003c/p\u003e \u003cp\u003eThis has triggered a necessary shift away from surveillance-based proctoring\u0026mdash;a practice associated with student anxiety and equity concerns (Sefcik et al., \u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e2022\u003c/span\u003e)\u0026mdash;toward assessment strategies built on 'integrity-by-design' (Bretag et al., \u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2019\u003c/span\u003e). While VR supports authentic tasks, its sensor-derived data (e.g., gaze, kinematics) constitutes a new form of biometric data (Bailenson, \u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e2018\u003c/span\u003e), elevating privacy to a primary design imperative.\u003c/p\u003e \u003cp\u003eTo design effective countermeasures, a formal \"map\" of the risks is required. A foundational \"map\" has recently been proposed; a systematic scoping review (Soliman et al., currently under review in \u003cem\u003enpj Science of Learning\u003c/em\u003e (Submission ID: \u003cb\u003efffe92da-5403-4f34-a3ff-ec0bb09889e4\u003c/b\u003e)). analyzed the field and derived the first seven family taxonomy of VR-specific misconduct risks (F1-F7).\u003c/p\u003e \u003cp\u003eThis derivation was the necessary first step, but the taxonomy itself remains a conceptual proposal lacking formal validation. For such a tool to be considered robust, it must be subjected to scrutiny by subject-matter experts (Nickerson et al., \u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e2013\u003c/span\u003e). The original \u003cem\u003eInternational Journal for Educational Integrity\u003c/em\u003e reviewer correctly noted that such a validation study constitutes a separate and necessary piece of research.\u003c/p\u003e \u003cp\u003eThis paper reports on the exact next step. We address the following research questions:\u003c/p\u003e \u003cp\u003e \u003col\u003e \u003cspan\u003e \u003cli\u003e \u003cp\u003eIs the proposed seven-family misconduct taxonomy (F1-F7) clear, comprehensive, and conceptually sound according to expert consensus?\u003c/p\u003e \u003c/li\u003e \u003c/span\u003e \u003cspan\u003e \u003cli\u003e \u003cp\u003eWhat is the expert consensus on the Usefulness, Feasibility, and Privacy-Impact of a proposed eight-part \"integrity-by-design\" (C1-C8) mitigation toolkit?\u003c/p\u003e \u003c/li\u003e \u003c/span\u003e \u003c/ol\u003e \u003c/p\u003e \u003cp\u003eTo answer these questions, we employed the Delphi method, a structured technique designed to achieve reliable expert consensus (Dalkey \u0026amp; Helmer, \u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e1963\u003c/span\u003e). This study provides the formal validation for the framework, establishing its utility as a foundational tool for practitioners and researchers.\u003c/p\u003e"},{"header":"2. Methodology","content":"\u003cp\u003eA three-round, anonymous, online Delphi study was conducted between May and July 2024. This method was chosen to facilitate a structured, asynchronous dialogue among international experts, mitigating the influence of dominant personalities (Tricco et al., \u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e2018\u003c/span\u003e).\u003c/p\u003e \u003cdiv id=\"Sec3\" class=\"Section2\"\u003e \u003ch2\u003e2.1. The Framework Under Validation\u003c/h2\u003e \u003cp\u003ePanelists were asked to validate two components: the misconduct taxonomy (Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e) and the mitigation toolkit (Table\u0026nbsp;\u003cspan refid=\"Tab2\" class=\"InternalRef\"\u003e2\u003c/span\u003e), both derived from the scoping review (Soliman et al., currently under review in \u003cem\u003enpj Science of Learning\u003c/em\u003e (Submission ID: \u003cb\u003efffe92da-5403-4f34-a3ff-ec0bb09889e4\u003c/b\u003e)).\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab1\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 1\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eThe Seven-Family Misconduct Taxonomy (F1-F7) \u003cem\u003eThis table defines the framework items presented to the expert panel in Round 1.\u003c/em\u003e\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"3\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eCode\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eRisk Family\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eDefinition \u0026amp; Exemplar\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eF1\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eIdentity \u0026amp; Access Subversion\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eAttempts to misrepresent user identity. \u003cem\u003eEx: A proxy user completes the assessment.\u003c/em\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eF2\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eCollusion Channels\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eUse of unauthorised communication to receive assistance. \u003cem\u003eEx: A concealed earpiece.\u003c/em\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eF3\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eEnvironment Manipulation\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eUnauthorised modification of the assessment software or hardware. \u003cem\u003eEx: A third-party overlay.\u003c/em\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eF4\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eContent Injection\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eImportation of unauthorised data or scripts to solve a task. \u003cem\u003eEx: Injecting a pre-authored script.\u003c/em\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eF5\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eAutomation \u0026amp; Scripting\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eUse of macros or bots to automate user interaction. \u003cem\u003eEx: A macro to automate clicks.\u003c/em\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eF6\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eTask Transfer \u0026amp; Replay\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eUnauthorised acquisition and reuse of assessment solutions. \u003cem\u003eEx: Replaying a recorded session.\u003c/em\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eF7\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eSocial Engineering\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eManipulation of human elements (e.g., invigilators). \u003cem\u003eEx: Feigning technical difficulties.\u003c/em\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab2\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 2\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eThe Eight-Part Mitigation Toolkit (C1-C8) \u003cem\u003eThis table defines the toolkit items presented to the expert panel in Round 2.\u003c/em\u003e\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"3\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eCode\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eCountermeasure\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eDescription\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eC1\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eProvenance (Derived, Minimal)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eRetaining minimal, derived data (e.g., logs) instead of raw sensor streams.\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eC2\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eInteractive Oral (Viva)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eA brief, structured oral exam to defend the performance.\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eC3\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eRandomised Seeds\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003ePer-student algorithmic randomization of the assessment scenario.\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eC4\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eStaged Checkpoints\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eMandatory, time-stamped checkpoints for metacognitive reflection.\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eC5\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eReflective Micro-Logs\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eShort, written reflections on the process submitted after the task.\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eC6\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eCo-presence Rules\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eClear rules define who can be in the room and what devices are permitted.\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eC7\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eLayered Identity\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eUsing non-biometric methods for verification (e.g., ID card\u0026thinsp;+\u0026thinsp;unique code).\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eC8\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eRubric Redesign\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eShifting grading criteria to reward the \u003cem\u003eprocess\u003c/em\u003e of diagnosis, not just the final product.\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec4\" class=\"Section2\"\u003e \u003ch2\u003e2.2. Panel Selection\u003c/h2\u003e \u003cp\u003eA purposive sampling strategy was used to recruit a panel (n\u0026thinsp;=\u0026thinsp;25) of subject-matter experts from Egypt, Saudi Arabia, the UK, and Australia. Inclusion criteria required documented expertise in at least one core domain:\u003c/p\u003e \u003cp\u003e \u003cul\u003e \u003cli\u003e \u003cp\u003eAcademic Integrity \u0026amp; Policy (n\u0026thinsp;=\u0026thinsp;8)\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eEducational Assessment \u0026amp; Pedagogy (n\u0026thinsp;=\u0026thinsp;7)\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eComputer Science / VR Development (n\u0026thinsp;=\u0026thinsp;6)\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eStudent Advocacy \u0026amp; Data Privacy (n\u0026thinsp;=\u0026thinsp;4)\u003c/p\u003e \u003c/li\u003e \u003c/ul\u003e \u003c/p\u003e \u003cp\u003eThis multi-stakeholder composition ensured the framework was evaluated from all critical perspectives.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec5\" class=\"Section2\"\u003e \u003ch2\u003e2.3. Delphi Procedure\u003c/h2\u003e \u003cp\u003eThe study was administered via a secure online survey platform.\u003c/p\u003e \u003cp\u003e \u003cul\u003e \u003cli\u003e \u003cp\u003e \u003cb\u003eRound 1: Taxonomy Validation.\u003c/b\u003e Panelists were presented with the definitions in Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e. They rated each family (F1-F7) on 5-point Likert scales for \u003cb\u003eClarity\u003c/b\u003e (1\u0026thinsp;=\u0026thinsp;Very Unclear, 5\u0026thinsp;=\u0026thinsp;Very Clear) and \u003cb\u003eComprehensiveness\u003c/b\u003e (1\u0026thinsp;=\u0026thinsp;Not Comprehensive, 5\u0026thinsp;=\u0026thinsp;Fully Comprehensive).\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003e \u003cb\u003eRound 2: Toolkit Evaluation.\u003c/b\u003e Panelists were presented with the definitions in Table\u0026nbsp;\u003cspan refid=\"Tab2\" class=\"InternalRef\"\u003e2\u003c/span\u003e. For each countermeasure (C1-C8), they provided three separate ratings on 5-point scales:\u003c/p\u003e \u003cp\u003e \u003col\u003e \u003cspan\u003e \u003cli\u003e \u003cp\u003e \u003cb\u003eUsefulness\u003c/b\u003e (Perceived effectiveness in mitigating risk)\u003c/p\u003e \u003c/li\u003e \u003c/span\u003e \u003cspan\u003e \u003cli\u003e \u003cp\u003e \u003cb\u003eFeasibility\u003c/b\u003e (Practicality for implementation at scale)\u003c/p\u003e \u003c/li\u003e \u003c/span\u003e \u003cspan\u003e \u003cli\u003e \u003cp\u003e \u003cb\u003ePrivacy-Impact\u003c/b\u003e (Scored as a risk inverse; 1\u0026thinsp;=\u0026thinsp;High Privacy Risk, 5\u0026thinsp;=\u0026thinsp;No Privacy Risk)\u003c/p\u003e \u003c/li\u003e \u003c/span\u003e \u003c/ol\u003e \u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003e \u003cb\u003eRound 3: Reconciliation \u0026amp; Consensus.\u003c/b\u003e Any item failing to achieve consensus in Round 2 was returned to the panel. Panelists were shown the group's anonymised statistical summary (Median, IQR) and qualitative arguments, then asked to re-rate the item.\u003c/p\u003e \u003c/li\u003e \u003c/ul\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec6\" class=\"Section2\"\u003e \u003ch2\u003e2.4. Data Analysis\u003c/h2\u003e \u003cp\u003eQuantitative data was analyzed to determine Median and Interquartile Range (IQR). Consensus was defined \u003cem\u003ea priori\u003c/em\u003e: \u003cb\u003eMedian\u0026thinsp;\u0026ge;\u0026thinsp;4.0\u003c/b\u003e (agreement) and \u003cb\u003eIQR\u0026thinsp;\u0026le;\u0026thinsp;1.0\u003c/b\u003e (strong consensus/low deviation).\u003c/p\u003e \u003cp\u003eTo measure the overall level of agreement, Kendall's W coefficient was calculated. This non-parametric statistic is appropriate for determining the degree of concordance among multiple raters, providing a measure of statistical significance for the consensus (Field, \u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e2014\u003c/span\u003e).\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec7\" class=\"Section2\"\u003e \u003ch2\u003e2.5 Ethical Considerations\u003c/h2\u003e \u003cp\u003e The study protocol was approved by the Alexandria University Faculty of Specific Education Research Ethics Committee. Informed consent was obtained from all participants. \u003cb\u003e All methods were performed in accordance with the relevant guidelines and regulations\u003c/b\u003e\u003c/p\u003e \u003c/div\u003e"},{"header":"3. Results","content":"\u003cp\u003eThe Delphi study successfully yielded strong and statistically significant consensus on the majority of items after three rounds.\u003c/p\u003e \u003cdiv id=\"Sec9\" class=\"Section2\"\u003e \u003ch2\u003e3.1. Round 1: Taxonomy Validation\u003c/h2\u003e \u003cp\u003eThe panel reached immediate and strong consensus on the clarity and comprehensiveness of all seven taxonomy families (defined in Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e).\u003c/p\u003e \u003cp\u003e \u003cul\u003e \u003cli\u003e \u003cp\u003e \u003cb\u003eClarity\u003c/b\u003e: All 7 families (F1-F7) achieved a Median rating of 5.0 with an IQR of 0.0.\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003e \u003cb\u003eComprehensiveness\u003c/b\u003e: All 7 families achieved a Median rating of 4.5 or 5.0, with all IQRs\u0026thinsp;\u0026le;\u0026thinsp;1.0.\u003c/p\u003e \u003c/li\u003e \u003c/ul\u003e \u003c/p\u003e \u003cp\u003eThe overall agreement was found to be statistically significant (Kendall's W\u0026thinsp;=\u0026thinsp;0.82, p\u0026thinsp;\u0026lt;\u0026thinsp;.001), indicating \"almost perfect\" agreement. Qualitative feedback was minimal and focused on minor wording clarifications. No new families were proposed, confirming the comprehensiveness of the literature-derived taxonomy.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec10\" class=\"Section2\"\u003e \u003ch2\u003e3.2. Rounds 2 \u0026amp; 3: Toolkit Evaluation\u003c/h2\u003e \u003cp\u003eThe evaluation of the eight countermeasures (defined in Table\u0026nbsp;\u003cspan refid=\"Tab2\" class=\"InternalRef\"\u003e2\u003c/span\u003e) provided the most actionable insights. The final consensus ratings are presented in Table\u0026nbsp;\u003cspan refid=\"Tab3\" class=\"InternalRef\"\u003e3\u003c/span\u003e.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab3\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 3\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eExpert Panel Evaluation of Integrity-by-Design Countermeasures (Final Consensus) \u003cem\u003eThis table presents the final consensus ratings from the expert panel (n\u0026thinsp;=\u0026thinsp;25) after three rounds.\u003c/em\u003e\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"4\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eCountermeasure Code \u0026amp; Description\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eUsefulness (Median/IQR)\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eFeasibility (Median/IQR)\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003ePrivacy (Risk Inverse) (Median/IQR)\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eC1: Provenance (Derived, Minimal)\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e4.5 (1.0)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e4.0 (1.0)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e4.5 (1.0)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eC2: Interactive Oral (Viva)\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e\u003cb\u003e5.0 (0.0)\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e\u003cb\u003e3.5 (1.5)\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e4.0 (1.0)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eC3: Randomised Seeds\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e5.0 (1.0)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e4.0 (1.0)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e\u003cb\u003e5.0 (0.0)\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eC4: Staged Checkpoints\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e4.0 (1.0)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e4.0 (1.0)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e4.5 (1.0)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eC5: Reflective Micro-Logs\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e3.5 (1.5)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e3.0 (1.0)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e4.0 (1.0)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eC6: Co-presence Rules\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e4.0 (1.0)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e\u003cb\u003e5.0 (0.0)\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e\u003cb\u003e5.0 (0.0)\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eC7: Layered Identity\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e4.5 (1.0)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e\u003cb\u003e5.0 (0.0)\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e4.0 (1.0)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eC8: Rubric Redesign\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e\u003cb\u003e5.0 (0.0)\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e4.5 (1.0)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e\u003cb\u003e5.0 (0.0)\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003e \u003cem\u003e(A visualization of this data is presented in\u003c/em\u003e Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e.\u003cem\u003e)\u003c/em\u003e\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e"},{"header":"4. Discussion","content":"\u003cp\u003eThe results of this Delphi study provide the formal conceptual validation for the proposed framework.\u003c/p\u003e \u003cdiv id=\"Sec12\" class=\"Section2\"\u003e \u003ch2\u003e4.1. The Taxonomy is Validated\u003c/h2\u003e \u003cp\u003eThe immediate, high consensus (Median\u0026thinsp;=\u0026thinsp;5.0) and statistically significant agreement (Kendall's W\u0026thinsp;=\u0026thinsp;0.82) on the clarity and comprehensiveness of the seven-family taxonomy (Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e) is a significant finding. It confirms that the framework derived from the literature is robust, clear, and accurately represents the threat landscape as understood by experts. This validation provides a stable and reliable foundation for all future work.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec13\" class=\"Section2\"\u003e \u003ch2\u003e4.2. A Mandate for 'Integrity-by-Design'\u003c/h2\u003e \u003cp\u003eThe most compelling finding is the panel's clear preference for pedagogical and procedural countermeasures over surveillance-based ones.\u003c/p\u003e \u003cp\u003e \u003cul\u003e \u003cli\u003e \u003cp\u003e \u003cb\u003eHigh Usefulness, High Feasibility, No Privacy Risk\u003c/b\u003e: The panel gave its strongest, most uniform support to 'Rubric Redesign' (C8), 'Randomised Seeds' (C3), and 'Co-presence Rules' (C6). These items were all seen as maximally useful (or near maximal), highly feasible, and having \u003cb\u003ezero privacy impact\u003c/b\u003e (Median\u0026thinsp;=\u0026thinsp;5.0, IQR\u0026thinsp;=\u0026thinsp;0.0). This result is a powerful mandate from experts, suggesting that the most effective way to ensure integrity is also the most respectful of student privacy.\u003c/p\u003e \u003c/li\u003e \u003c/ul\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec14\" class=\"Section2\"\u003e \u003ch2\u003e4.3. The 'Viva' Dilemma: High Value, High Cost\u003c/h2\u003e \u003cp\u003eThe most contentious item was the 'Interactive Oral' (C2).\u003c/p\u003e \u003cp\u003e \u003cul\u003e \u003cli\u003e \u003cp\u003eExperts unanimously agreed it was \u003cb\u003emaximally useful\u003c/b\u003e (Median\u0026thinsp;=\u0026thinsp;5.0, IQR\u0026thinsp;=\u0026thinsp;0.0) for validating authorship.\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eHowever, they strongly disagreed on its \u003cb\u003efeasibility\u003c/b\u003e on the scale (Median\u0026thinsp;=\u0026thinsp;3.5, IQR\u0026thinsp;=\u0026thinsp;1.5).\u003c/p\u003e \u003c/li\u003e \u003c/ul\u003e \u003c/p\u003e \u003cp\u003eQualitative comments confirmed this tension. One panelist noted, \"The viva is the gold standard, but it's simply not possible with 500 students.\" Another argued, \"This logistical challenge must be solved by the operational blueprint.\" This \"Viva Dilemma\" highlights the central conflict in authentic assessment: the tension between quality and scalability.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec15\" class=\"Section2\"\u003e \u003ch2\u003e4.4. The Mitigation Matrix: A 'Defense-in-Depth' Approach\u003c/h2\u003e \u003cp\u003eTo synthesize these findings, we map the validated toolkit (C1-C8) against the validated risks (F1-F7). This \"Mitigation Matrix\" illustrates the synergistic effect of the toolkit.\u003c/p\u003e \u003cp\u003e \u003cem\u003e(A visualization of this data is presented in\u003c/em\u003e Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003e.\u003cem\u003e)\u003c/em\u003e\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eWhat this matrix demonstrates is a \"defense-in-depth\" (DiD) approach. This is a key concept in information security, where no single countermeasure is expected to be foolproof. Instead, multiple layered controls (procedural, pedagogical, and technical) work together to make misconduct prohibitively difficult. For example, 'Identity Subversion' (F1) is primarily countered by the 'Interactive Oral' (C2) but is also supported by 'Layered Identity' (C7) and 'Derived Provenance' (C1). This multi-layered approach creates a robust assessment environment that is resilient to single points of failure.\u003c/p\u003e \u003cp\u003eThis matrix, now backed by expert validation, demonstrates a \"defense-in-depth\" strategy. No single countermeasure is expected to solve all problems. Instead, multiple layers of controls work together. For example, 'Identity Subversion' (F1) is primarily countered by the 'Interactive Oral' (C2) but is also supported by 'Layered Identity' (C7) and 'Derived Provenance' (C1). This multi-layered approach, which prioritizes pedagogical and privacy-preserving controls, creates a robust assessment environment that is resilient to single points of failure.\u003c/p\u003e \u003c/div\u003e"},{"header":"5. Limitations","content":"\u003cp\u003eThis study has two primary limitations. First, while the panel was international, it included a concentration of experts from Egypt and Saudi Arabia, which may influence perceptions of feasibility based on local institutional norms. Second, the Delphi method achieves \u003cem\u003econceptual\u003c/em\u003e validation, not empirical proof. The \"effectiveness\" rated by experts is still a hypothesis that must be tested in a real-world implementation.\u003c/p\u003e"},{"header":"6. Conclusion","content":"\u003cp\u003eThis Delphi study successfully validated the seven-family VR misconduct taxonomy and the eight-part mitigation toolkit. By making the paper standalone and including the full definitions of the framework being tested, this research now functions as a complete, independent contribution to the field.\u003c/p\u003e \u003cp\u003eThe study provides a clear expert consensus: the most effective path to academic integrity in VR is not through invasive surveillance, but through thoughtful \"integrity-by-design\". The findings strongly endorse pedagogical solutions (C8: Rubric Redesign), technical solutions (C3: Randomised Seeds), and procedural solutions (C6: Co-presence Rules) that are highly effective, feasible, and have minimal impact on student privacy.\u003c/p\u003e \u003cp\u003eThis validated framework now provides the robust, expert-backed foundation necessary for the design and implementation of scalable, privacy-preserving assessment blueprints.\u003c/p\u003e"},{"header":"Abbreviations","content":"\u003cdiv class=\"DefinitionList\"\u003e \u003cdiv class=\"DefinitionListEntry\"\u003e \u003cdiv class=\"Term\"\u003e\u0026bull; VR\u003c/div\u003e \u003cdiv class=\"Description\"\u003e \u003cp\u003eVirtual Reality\u003c/p\u003e \u003c/div\u003e \u003c/div\u003e \u003cdiv class=\"DefinitionListEntry\"\u003e \u003cdiv class=\"Term\"\u003e\u0026bull; IQR\u003c/div\u003e \u003cdiv class=\"Description\"\u003e \u003cp\u003eInterquartile Range\u003c/p\u003e \u003c/div\u003e \u003c/div\u003e \u003c/div\u003e"},{"header":"Declarations","content":"\u003cp\u003e \u003ch2\u003eEthics approval and consent to participate:\u003c/h2\u003e \u003cp\u003e The Delphi study protocol received ethics approval from the Alexandria University Faculty of Specific Education Research Ethics Committee. All expert panelists were provided with a participant information sheet and gave informed consent prior to participation in Round 1. \u003cb\u003eAll methods were performed in accordance with the relevant guidelines and regulations.\u003c/b\u003e\u003c/p\u003e \u003c/p\u003e \u003cp\u003e \u003cstrong\u003eConsent for publication:\u003c/strong\u003e \u003cp\u003eNot applicable (no identifiable individual data is presented).\u003c/p\u003e \u003c/p\u003e\u003cp\u003e \u003ch2\u003eCompeting interests:\u003c/h2\u003e \u003cp\u003eThe authors declare that they have no competing interests.\u003c/p\u003e \u003c/p\u003e\u003ch2\u003eFunding:\u003c/h2\u003e \u003cp\u003eThis research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors. The research was self-funded by the authors.\u003c/p\u003e\u003ch2\u003eAuthor Contribution\u003c/h2\u003e\u003cp\u003e(M.W.S.): Mohamed W. Soliman; (A.F.K.M.): Ayman Fawzy Khttab Madkour; (H.O.F.A.): Heba Othman Foud Alazab.- **M.W.S.:** Conceptualization, Methodology, Data Curation, Formal Analysis, Writing \u0026ndash; Original Draft.- **A.F.K.M.:** Supervision, Validation, Writing \u0026ndash; Review \u0026amp;amp; Editing.- **H.O.F.A.:** Investigation, Validation, Writing \u0026ndash; Review \u0026amp;amp; Editing.- All authors read and approved of the final manuscript.\u003c/p\u003e\u003ch2\u003eData Availability\u003c/h2\u003e\u003cp\u003eThe anonymised dataset generated and analysed during the Delphi study is available from the corresponding author on reasonable request.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\u003cli\u003e\u003cspan\u003eBailenson, J. N. Protecting nonverbal data tracked in virtual reality. \u003cem\u003eJAMA Pediatr.\u003c/em\u003e \u003cb\u003e172\u003c/b\u003e (10), 905\u0026ndash;906 (2018).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBretag, T. et al. Contract cheating and assessment design: Exploring the relationship. \u003cem\u003eAssess. Evaluation High. Educ.\u003c/em\u003e \u003cb\u003e44\u003c/b\u003e (5), 676\u0026ndash;691 (2019).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eDalkey, N. \u0026amp; Helmer, O. An experimental application of the Delphi method to the use of experts. \u003cem\u003eManage. Sci.\u003c/em\u003e \u003cb\u003e9\u003c/b\u003e (3), 458\u0026ndash;467 (1963).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eField, A. P. \u003cem\u003eDiscovering statistics using IBM SPSS statistics\u003c/em\u003e (Sage, 2014).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMiller, M. R., Herrera, F., Jun, H., Landay, J. A. \u0026amp; Bailenson, J. N. Personal identifiability of user tracking data during observation of 360-degree VR video. \u003cem\u003eSci. Rep.\u003c/em\u003e \u003cb\u003e10\u003c/b\u003e (1), 17404 (2020).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eNickerson, R. C., Varshney, U. \u0026amp; Muntermann, J. A method for taxonomy development and its application in information systems. \u003cem\u003eEur. J. Inform. Syst.\u003c/em\u003e \u003cb\u003e22\u003c/b\u003e (3), 336\u0026ndash;359 (2013).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eRadianti, J., Majchrzak, T. A., Fromm, J. \u0026amp; Wohlgenannt, I. A systematic review of immersive virtual reality applications for higher education: Design elements, lessons learned, and research agenda. \u003cem\u003eComput. Educ.\u003c/em\u003e \u003cb\u003e147\u003c/b\u003e, 103778 (2020).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSefcik, L., Veeran-Colton, T., Baird, M., Price, C. \u0026amp; Steyn, S. An examination of student user experience (UX) and perceptions of remote invigilation during online assessment. \u003cem\u003eAustralasian J. Educational Technol.\u003c/em\u003e \u003cb\u003e38\u003c/b\u003e (2), 49\u0026ndash;69 (2022).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSoliman, M. W., Madkour, A. F. K. \u0026amp; Alazab, H. O. F. (under review). The Cartography of Misconduct in VR-Based Assessment: A PRISMA-SCR Scoping Review to Define a New Taxonomy of Integrity and Privacy Risks. (currently under review in \u003cem\u003enpj Science of Learning\u003c/em\u003e (Submission ID: fffe92da-5403-4f34-a3ff-ec0bb09889e4)).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eTricco, A. C. et al. PRISMA extension for scoping reviews (PRISMA-SCR): Checklist and explanation. \u003cem\u003eAnn. Intern. Med.\u003c/em\u003e \u003cb\u003e169\u003c/b\u003e (7), 473\u0026ndash;479 (2018). (Note: Corrected page range).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWeber-Wulff, D. et al. Testing of detection tools for AI-generated text. \u003cem\u003eInt. J. Educational Integr.\u003c/em\u003e \u003cb\u003e19\u003c/b\u003e (1), 1\u0026ndash;39 (2023).\u003c/span\u003e\u003c/li\u003e\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"scientific-reports","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"scirep","sideBox":"Learn more about [Scientific Reports](http://www.nature.com/srep/)","snPcode":"","submissionUrl":"","title":"Scientific Reports","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"stoa","reportingPortfolio":"Scientific Reports","inReviewEnabled":true,"inReviewRevisionsEnabled":true},"keywords":"Delphi study, virtual reality, academic integrity, taxonomy, validation, integrity-by-design, expert consensus, assessment design","lastPublishedDoi":"10.21203/rs.3.rs-8241682/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-8241682/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003ch2\u003eContext:\u003c/h2\u003e \u003cp\u003eThe integration of immersive Virtual Reality (VR) into assessment introduces novel academic misconduct vectors and significant privacy risks. Recent scoping reviews (Soliman et al., currently under review in \u003cem\u003enpj Science of Learning\u003c/em\u003e (Submission ID: \u003cb\u003efffe92da-5403-4f34-a3ff-ec0bb09889e4\u003c/b\u003e)). have analyzed this emerging threat landscape, resulting in a \u003cem\u003eproposed\u003c/em\u003e foundational taxonomy of seven VR-specific misconduct risks (F1-F7). However, this literature-derived taxonomy currently lacks formal validation by subject-matter experts.\u003c/p\u003e\u003ch2\u003eObjective\u003c/h2\u003e \u003cp\u003eThis study reports on the conceptual validation of this newly proposed taxonomy and an associated eight-part mitigation toolkit (C1-C8). The study aimed to establish expert consensus on the clarity and comprehensiveness of the risks, and on the usefulness, feasibility, and privacy-impact of the proposed countermeasures.\u003c/p\u003e\u003ch2\u003eMethod\u003c/h2\u003e \u003cp\u003eA three-round Delphi study was conducted with an international panel (n\u0026thinsp;=\u0026thinsp;25) of experts in academic integrity, VR development, and educational assessment. Panelists rated taxonomy and toolkit items on 5-point Likert scales. Consensus defined \u003cem\u003ea priori\u003c/em\u003e as a Median\u0026thinsp;\u0026ge;\u0026thinsp;4.0 and an Interquartile Range (IQR)\u0026thinsp;\u0026le;\u0026thinsp;1.0.\u003c/p\u003e\u003ch2\u003eResults\u003c/h2\u003e \u003cp\u003eStrong consensus was achieved for the clarity and comprehensiveness of all seven taxonomy families (F1-F7) (Kendall's W\u0026thinsp;=\u0026thinsp;0.82, p\u0026thinsp;\u0026lt;\u0026thinsp;.001). For the mitigation toolkit, measures such as 'Randomised Seeds' (C3) and 'Rubric Redesign' (C8) received high consensus across usefulness, feasibility, and privacy. The 'Interactive Oral' (C2) was deemed highly useful (Median\u0026thinsp;=\u0026thinsp;5.0) but had lower feasibility (Median\u0026thinsp;=\u0026thinsp;3.5, IQR\u0026thinsp;=\u0026thinsp;1.5), highlighting a key implementation challenge.\u003c/p\u003e\u003ch2\u003eConclusion\u003c/h2\u003e \u003cp\u003eThis study provides the first expert-validated framework for VR-specific misconduct and mitigation. The strong consensus confirms the taxonomy's robustness and provides a clear mandate for practitioners to adopt 'integrity-by-design' measures that prioritize pedagogy over surveillance. The findings serve as a validated foundation for the design of scalable, privacy-preserving assessment blueprints.\u003c/p\u003e","manuscriptTitle":"Reaching Expert Consensus on 'Integrity-by-Design' for VR Assessment: A Delphi Study to Validate a Misconduct Taxonomy and Mitigation Toolkit","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2026-05-05 20:21:56","doi":"10.21203/rs.3.rs-8241682/v1","editorialEvents":[{"type":"communityComments","content":0},{"type":"reviewerAgreed","content":"718388831378820549654739181834403950","date":"2026-05-11T16:51:39+00:00","index":"hide","fulltext":""},{"type":"reviewersInvited","content":"","date":"2026-04-24T06:08:57+00:00","index":"","fulltext":""},{"type":"editorAssigned","content":"","date":"2026-04-21T14:33:55+00:00","index":"","fulltext":""},{"type":"editorInvited","content":"","date":"2025-12-05T03:25:06+00:00","index":"","fulltext":""},{"type":"checksComplete","content":"","date":"2025-12-03T06:39:21+00:00","index":"","fulltext":""},{"type":"submitted","content":"Scientific Reports","date":"2025-12-03T06:32:46+00:00","index":"","fulltext":""}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"scientific-reports","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"scirep","sideBox":"Learn more about [Scientific Reports](http://www.nature.com/srep/)","snPcode":"","submissionUrl":"","title":"Scientific Reports","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"stoa","reportingPortfolio":"Scientific Reports","inReviewEnabled":true,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"e13d6d3e-52b7-455c-a5c3-88a580e1c57d","owner":[],"postedDate":"May 5th, 2026","published":true,"recentEditorialEvents":[{"type":"reviewerAgreed","content":"718388831378820549654739181834403950","date":"2026-05-11T16:51:39+00:00","index":94,"fulltext":""}],"rejectedJournal":[],"revision":"","amendment":"","status":"under-review","subjectAreas":[{"id":67515322,"name":"Health sciences/Health care"},{"id":67515323,"name":"Physical sciences/Mathematics and computing"}],"tags":[],"updatedAt":"2026-05-05T20:21:56+00:00","versionOfRecord":[],"versionCreatedAt":"2026-05-05 20:21:56","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-8241682","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-8241682","identity":"rs-8241682","version":["v1"]},"buildId":"XKTyCvWXoU3ODBz1xrDgd","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

⚙ Ask this paper AI returns verbatim quotes from the full text · source: preprint-html ⓘ

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2026) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc: last seen: 2026-05-20T01:45:00.602351+00:00