Adoption, usability and perceived clinical value of a UK AI clinical reference platform: a mixed-methods formative evaluation of real-world usage and a 1,223-respondent user survey

preprint OA: closed
Full text JSON View at publisher
Full text 92,307 characters · extracted from preprint-html · click to expand
Adoption, usability and perceived clinical value of a UK AI clinical reference platform: a mixed-methods formative evaluation of real-world usage and a 1,223-respondent user survey | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article Adoption, usability and perceived clinical value of a UK AI clinical reference platform: a mixed-methods formative evaluation of real-world usage and a 1,223-respondent user survey Kolawole Tytler This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-7593409/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract Objectives To describe the design of a UK-centred retrieval-augmented clinical reference platform, and report early real-world adoption and perceived clinical utility from a formative implementation evaluation. Methods We conducted a mixed-methods study comprising (i) a retrospective observational analysis of platform usage across web, iOS and Android over 16 weeks and (ii) a cross-sectional, in-product intercept survey. Usage data (unique users, engagement events, clinical queries) were sourced from Google Analytics 4, Apple App Store Connect and Google Play Console. A client-side script randomised survey prompts to ~ 10% of web sessions, displaying single items from a predefined battery. Proportions are reported with Wilson 95% confidence intervals; qualitative comments underwent thematic content analysis. No personal identifiers were collected. Results The web application attracted 19,269 unique users and 202,660 engagement events (~ 10.5 per active user), with approximately 40,000 clinical queries across platforms. The intercept survey yielded 1,223 item-level responses. Among respondents: useful 86.2% (50/58); time saved 60.9% (14/23); would use again 93.3% (14/15); would recommend 88.4% (38/43); perceived accuracy 75.0% (30/40); perceived reliability 79.4% (27/34). Themes highlighted speed, guideline-grounded answers and UK specificity. Discussion Findings provide formative signals of value for rapid, provenance-bound information retrieval. Key limitations include small item-level Ns, early-adopter/selection bias, and absence of gold-standard accuracy benchmarking; results should not be interpreted as evidence of clinical effectiveness. Conclusion A safety-governed, RAG-based platform showed early uptake and favourable user sentiment among UK clinicians. Prospective evaluations, time-and-motion studies and objective accuracy/safety audits, are warranted to assess impact on clinical workflows and care quality. Artificial Intelligence and Machine Learning Medical Informatics Information Retrieval and Management Health Economics & Outcomes Research Hospital Medicine Artificial Intelligence Large Language Models Retrieval-Augmented Generation Clinical Decision Support Systems Digital Health Medical Informatics mHealth Figures Figure 1 Introduction The exponential growth of biomedical literature and clinical guidelines presents a formidable challenge for healthcare professionals striving to practice evidence-based medicine. The volume of medical knowledge is estimated to double at an accelerating rate (Densen, 2011 ), making it impossible for clinicians to assimilate all relevant information. This phenomenon, termed information overload, requires clinicians to synthesize vast amounts of data, inclusive of national guidelines, local protocols, and pharmacological information, often under significant time constraints at the point of care (Al-Dhahir, 2023 ; Misra et al., 2022 ). The cognitive burden associated with navigating this complex information landscape is a significant contributor to clinician burnout and can impede the consistent application of best practices (Sinsky et al., 2016 ). Traditional digital resources, ranging from PDF documents on institutional intranets to established clinical decision support systems (CDSS) like UpToDate, DynaMed, or BMJ Best Practice, while authoritative, often require time-consuming manual searches and navigation. Studies have shown that clinicians frequently abandon searches if the information is not rapidly accessible, potentially delaying evidence-based care (Westbrook et al., 2007 ; Ely et al., 2005 ). The format of these resources is often not optimized for the rapid, query-based nature of frontline clinical practice. The emergence of Large Language Models (LLMs) has catalyzed interest in novel approaches to clinical information retrieval and synthesis. LLMs demonstrate remarkable capabilities in natural language understanding and generation, offering the potential to provide instant, conversational answers to complex clinical queries (Thirunavukarasu et al., 2023 ). However, the direct application of general-purpose LLMs (e.g., ChatGPT, Google Gemini) in clinical settings poses significant risks. These models are prone to "hallucination", namely generating plausible-sounding but factually incorrect or outdated information, and often lack transparent provenance for their outputs (Moor et al., 2023 ; Lee et al., 2023 ). Furthermore, general LLMs may provide advice that contradicts specific, localized clinical guidelines crucial for safe practice in specific jurisdictions, such as the United Kingdom. To harness the benefits of LLMs while mitigating these risks, the machine learning field has advanced the Retrieval-Augmented Generation (RAG) architecture (Lewis et al., 2020 ). RAG systems enhance LLM performance by grounding the generation process in a specific, external knowledge base. Instead of relying solely on the knowledge internalized during the LLM's training, a RAG system first retrieves relevant documents from a trusted corpus and then uses the LLM to synthesize an answer based on that retrieved information. This approach significantly improves factual accuracy, ensures currency, and provides clear provenance (Zakka et al., 2024 ). This paper analyses iatroX, a novel clinical decision support platform utilizing an algorithmic RAG architecture tailored for UK healthcare professionals. iatroX is designed to provide rapid, reliable, and contextually appropriate answers by ensuring all outputs are synthesized directly from a curated, continuously updated knowledge base of authoritative clinical guidelines accepted in UK practice. The platform incorporates a proprietary algorithmic search engine and safety mechanisms to manage uncertainty. The objective of this study is to describe the methodology behind the platform and to conduct a mixed-methods formative evaluation of its initial real-world adoption, user engagement, and perceived clinical utility through a large-scale analysis of platform analytics and an in-product user survey. Methods Study Design and Setting This study employed a mixed-methods formative evaluation design, combining a retrospective observational analysis of real-world usage data with a cross-sectional analysis of in-product user feedback. The study focused on iatroX, a generative AI platform providing clinical decision support for UK healthcare professionals, delivered via a web application and mobile applications (iOS and Android). Data were collected during a 16-week observational window from 8th April 2025 to 31st July 2025. The study reporting adheres to the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) statement (von Elm et al., 2007 ) and the Checklist for Reporting Results of Internet E-Surveys (CHERRIES) (Eysenbach, 2004 ). The iatroX System System Architecture and Regulatory Status iatroX is built on a decoupled architecture comprising a Next.js web frontend, a React Native mobile application, and a central Node.js/Express backend API. The primary data store is MongoDB. The system's core functionality relies on a proprietary Retrieval-Augmented Generation (RAG) pipeline. The platform is registered with the UK Medicines and Healthcare products Regulatory Agency (MHRA) as a Class I Medical Device (Reference: 2025042201417535), adhering to structured quality assurance and software lifecycle processes aligned with standards such as IEC 62304 (Software Lifecycle Processes). Retrieval-Augmented Generation (RAG) Pipeline The RAG pipeline is engineered to ensure responses are strictly grounded in a verified knowledge base, mitigating the risk of LLM hallucination (Fig. 1 ). Knowledge Corpus : The retrieval corpus is constructed from publicly available and accessible UK clinical guidelines and resources from authoritative bodies (e.g., endorsed by NICE, SIGN, Royal Colleges). Ad-hoc scripts monitor these sources daily for updates to ensure content currency. Ingestion and Indexing : Documents undergo a proprietary cleaning pipeline and are segmented into semantically coherent text chunks (average 500 tokens, maximum ~ 3000 tokens). Optimization of this chunking is critical for retrieval relevance (Raffel et al., 2020 ). These chunks are converted into numerical vectors using advanced embedding models (GPT embedding family) and indexed in a vector database (Pinecone) for efficient semantic search. Retrieval and Confidence Scoring : When a user submits a query, a proprietary algorithmic search engine executes a semantic search, retrieving up to 15 of the most relevant text passages. The algorithm assesses the semantic similarity between the query and the retrieved passages to determine a confidence score. Dynamic Scope Extension : If the initial retrieval from the core guidelines yields low confidence scores (indicating potential gaps or ambiguity), the system dynamically extends its scope by triangulating findings with a search of publicly available peer-reviewed research. This secondary search algorithmically ranks evidence by hierarchical strength (e.g., prioritizing meta-analyses over case studies). Safety Threshold (Refusal to Answer) : If the confidence score remains below a predefined safety threshold after scope extension, the system declines to answer the query. This mechanism is a critical safety feature to prevent the dissemination of information not robustly supported by the evidence base. Synthesis : If the confidence threshold is met, the retrieved passages and the original query are passed as context to the generator model. The platform utilizes a cascade of LLMs, including models from the Google Gemini and OpenAI families, alongside a proprietary, post-trained model ("Thea") optimized for clinical synthesis within the iatroX workflow. Provenance and Performance All generated answers are presented with explicit citations and links to the source documents. The median response latency is 12 seconds (range: 3–30 seconds). Data Collection Usage Analytics Platform usage data was collected using Google Analytics 4 (GA4) for the web application, Google Play Console, and Apple App Store Connect for mobile applications. Standard bot filtering was applied. Key metrics included unique users, engagement events (total interactions), active users (DAU/WAU/MAU), and total clinical queries submitted to the RAG pipeline. In-Product Intercept Survey To capture in-context user perceptions and minimize self-selection bias (Dillman, 2011 ), a systematic random intercept survey was deployed within the web application. Sampling and Administration A client-side script presented survey prompts to a random 10% of web user sessions. To maximize the diversity of feedback and prevent survey fatigue, users were presented with randomized single-item questions from a larger question battery throughout their session. A browser cookie prevented the same user from being shown the same prompt repeatedly. Survey Instrument The survey comprised single-item, closed-ended questions assessing domains including perceived usefulness, clinical reliability, system performance, and intent to adopt (Table 2 ). Responses were recorded as "Yes," "No," or "Don't Know." Qualitative Feedback Unsolicited qualitative feedback was collected from online blogs, professional social media forums (e.g., Facebook doctor groups), and direct emails to the developer. Outcomes and Analysis The primary outcomes were the proportions of positive responses ("Yes") to core survey items. Secondary outcomes included descriptive usage metrics (adoption rates, engagement frequency). Descriptive statistics were used for usage analytics. For survey data, proportions were calculated based on the number of responses received for each specific question. Corresponding 95% Wilson score confidence intervals are reported to estimate the precision of these proportions (Brown et al., 2001 ). Qualitative feedback underwent a rapid thematic content analysis (Gale et al., 2013 ) to identify recurrent themes regarding the platform's impact and user trust. Ethical Considerations and Governance This work was conducted as a service evaluation to assess the performance of a live digital health tool and guide quality improvement. An assessment using the UK Health Research Authority (HRA) decision tool confirmed the project's status as a service evaluation, which does not require Research Ethics Committee (REC) review (Medical Research Council and NHS Health Research Authority, 2017 ). The lawful basis for processing anonymized user data was legitimate interests. Results Platform Adoption and Engagement During the 16-week observational period (8th April 2025 to 31st July 2025), the iatroX platform demonstrated rapid and substantial user adoption (Table 1 ). The web application attracted 19,269 unique users, generating 202,660 engagement events, indicating a high average interaction rate of 10.5 events per active user. Approximately 40,000 clinical questions were asked across all platforms during the study period. The user base was predominantly located in the United Kingdom. Traffic analysis revealed that "Direct" traffic was the largest source (9.1k sessions), indicating strong brand recall and habitual return usage. Significant traffic also originated from professional social media groups (e.g., Facebook, 7.7k sessions) and organic search (Google, 4.7k sessions), supporting the observation of organic, word-of-mouth adoption within clinical communities. Mobile application adoption was robust. The iOS application was downloaded 1,960 times. The Android application demonstrated consistent growth, reaching a peak of over 750 daily active users by late July 2025. The platform employs a usage-based registration model, allowing guest users three free queries per week before requiring registration. A total of 1,997 users registered during the study period. This represents a visitor-to-registered-user conversion rate of 10.4%, indicating that registration occurred after users had established the utility of the tool (Product-Qualified Users). Table 1 Summary of iatroX Platform Usage Metrics (8th April 2025–31st July 2025) Metric Value Web Platform Unique Users 19,269 Total Engagement Events 202,660 Average Events per User 10.5 Mobile Platform iOS Downloads 1,960 Android Peak Daily Active Users > 750 Cross-Platform Total Registered Users 1,997 Total Clinical Queries Asked (Approx.) 40,000 User-Perceived Performance and Utility A total of 1,223 responses were obtained through the in-product intercept survey. The responses indicated a strongly positive perception across all domains (Table 2 ). Perceived Usefulness and Efficiency Users overwhelmingly found the platform beneficial. When asked, "Do you find iatroX useful?", 86.2% (50/58) responded affirmatively. A majority also reported that the platform saved them time (60.9%, 14/23). Adoption Intent Users indicated a high likelihood of continued use; when asked if they would use iatroX again, 93.3% (14/15) responded "Yes". The willingness to recommend the platform to colleagues was also high (88.4%, 38/43). Perceived Clinical Reliability and Trust The platform was perceived as clinically accurate and reliable. For the item, "Do you feel that the information provided by iatroX is accurate?", 75.0% (30/40) responded "Yes". When asked, "Does iatroX look and feel reliable to you?", 79.4% (27/34) responded affirmatively. System Performance and Usability User perception of the system's technical performance and usability was positive. The majority of users found the speed of responses satisfactory (78.9%, 45/57), and the platform easy to navigate (82.2%, 37/45). Table 2 User Responses to Key Intercept Survey Items Domain Survey Item Yes (%) 95% CI (Wilson) No (%) Don't Know (%) Total Responses (N) Usefulness & Efficiency Do you find iatroX useful? 86.20% 75.1–92.9% 3.40% 10.30% 58 Did iatroX save you time today? 60.90% 40.8–77.8% 13.00% 26.10% 23 Adoption Intent Would you use iatroX again? 93.30% 69.9–99.0% 0.00% 6.70% 15 Would you recommend iatroX to a colleague? 88.40% 75.5–95.0% 2.30% 9.30% 43 Reliability & Trust Do you feel that the information provided by iatroX is accurate? 75.00% 59.7–86.1% 2.50% 22.50% 40 Does iatroX look and feel reliable to you? 79.40% 63.2–89.9% 5.90% 14.70% 34 Performance & Usability Is the speed of responses from iatroX satisfactory? 78.90% 67.2–87.3% 5.30% 15.80% 57 Is the platform easy to navigate on your device? 82.20% 69.2–90.6% 4.40% 13.30% 45 Qualitative Feedback Themes A large volume of unsolicited qualitative feedback was analyzed. Thematic analysis revealed three recurrent themes: High Clinical Utility and Efficiency : Users frequently described the platform as "fantastic," "brilliant," and a "huge service." Feedback emphasized the time saved compared to searching traditional guideline repositories and the platform's ability to synthesize complex management plans rapidly. Trust Driven by Governance and Provenance : The MHRA registration (Class I Medical Device) was frequently cited by users as a key differentiator and trust factor compared to general-purpose AI tools. The clear citation of UK-specific guidelines reinforced confidence in the clinical appropriateness of the answers. Organic Adoption via Professional Networks : The qualitative data provided numerous examples of users actively sharing the platform within their closed professional circles (e.g., departmental groups, trainee forums), confirming the viral growth observed in the usage analytics. Discussion Principal Findings This mixed-methods evaluation describes the initial real-world implementation and reception of iatroX, a RAG-based AI clinical decision support platform focused on UK guidelines. The results demonstrate rapid and substantial organic user adoption, with over 19,000 unique web users and 40,000 clinical queries within the first 16 weeks. The in-product survey data from 1,223 respondents indicates that this early adopter cohort perceives iatroX as highly useful (86.2%), accurate (75.0%), and efficient (60.9%). These findings suggest that iatroX successfully addresses a significant unmet need for rapid, trustworthy information retrieval among clinicians. Context and Comparison with Existing Literature The challenges of information retrieval at the point of care are well-documented drivers of cognitive burden and workflow inefficiency (Sinsky et al., 2016 ; O'Malley et al., 2020 ). The rapid uptake of iatroX underscores the demand for tools that can synthesize authoritative guidelines more efficiently than traditional methods. The differentiating factor in iatroX's positive reception appears to be its specialized implementation of Retrieval-Augmented Generation (RAG). While there is growing interest in using LLMs in medicine, studies consistently highlight clinician skepticism due to concerns about accuracy, lack of provenance, and potential for "hallucination" (Meskó, 2023 ; Lee et al., 2023 ). General-purpose tools, while accessible, are not designed for the safety-critical environment of clinical decision-making and lack the necessary grounding in localized guidelines. RAG architectures are increasingly recognized as the most viable approach for deploying LLMs in high-stakes, knowledge-intensive domains (Lewis et al., 2020 ; Zakka et al., 2024 ). By restricting the LLM's generation process to a curated, verified corpus, in this case, UK-accepted clinical guidelines, iatroX directly addresses these primary safety concerns. The platform's proprietary algorithmic search engine further enhances this approach. The mechanisms for assessing confidence, dynamically expanding scope to peer-reviewed literature when necessary, and, critically, refusing to answer below a defined safety threshold, provide essential layers of safety. The high level of perceived accuracy (75.0%) and reliability (79.4%) suggests that this RAG implementation is effective in building clinician trust. Furthermore, the qualitative finding that the MHRA registration fostered trust aligns with research indicating that robust governance and regulatory oversight are critical facilitators for the adoption of clinical AI (Waring et al., 2014 ; Kelly et al., 2019 ). While other tools exist in the clinical reference space, such as UpToDate, BMJ Best Practice, and newer AI entrants like Glass Health (focusing on differential diagnosis) or approaches utilizing advanced models like Med-Gemini (Singhal et al., 2024 ), iatroX differentiates itself through its specific focus on UK guidelines, its algorithmic RAG implementation with explicit safety thresholds, and its regulatory status. The organic, word-of-mouth growth observed (high direct traffic, social media referrals, and 88.4% willingness to recommend) is a strong indicator of product-market fit. It suggests the platform's value proposition is compelling enough for clinicians to recommend it proactively to peers, a powerful mechanism for diffusion of innovation in healthcare (Greenhalgh et al., 2004 ). Strengths and Limitations The primary strength of this study is its reliance on a large, real-world dataset, analyzing the behavior of over 19,000 users and capturing 1,223 survey interactions. The use of an in-product intercept survey is a methodological strength, as it captures feedback in the immediate context of use and reduces the recall bias and self-selection bias common in retrospective email surveys (Eysenbach, 2004 ). However, the study has several limitations. First, the intercept survey was anonymous, preventing the correlation of user feedback with demographic data (e.g., professional role, specialty, grade). Future research should aim to capture this data to understand adoption patterns across different clinical subgroups. Second, the study population consists of self-selected early adopters, potentially biasing the results towards more positive perceptions; the findings may not generalize to the entire UK healthcare workforce. Third, the survey utilized single-item, non-validated questions. While appropriate for a formative service evaluation, future evaluations should incorporate validated instruments such as the System Usability Scale (SUS) (Brooke, 1996 ) or the Health Information Technology Usability Evaluation Scale (Health-ITUES) (Yen et al., 2017 ). Fourth, due to the randomized presentation of single items from a larger battery, the sample sizes (N) for individual questions are relatively small. This results in wide confidence intervals for some metrics (e.g., time-saving), limiting the precision of these specific estimates despite the large overall number of respondents. Finally, while users perceived the information as accurate, this study did not objectively measure the clinical correctness of the generated answers against a gold standard, though it is noted that the iatroX's system is designed to extract, rather than generate, information from the available context. Implications for Practice and Future Research The positive results of this evaluation suggest that RAG-based AI tools can significantly improve the efficiency of clinical information retrieval. By reducing the time clinicians spend searching for guidelines, these tools may help alleviate cognitive burden and potentially improve the consistency of evidence-based practice. Future research should focus on rigorous evaluation of the platform's impact. This includes objective assessments of answer accuracy and safety through standardized clinical vignettes reviewed by expert panels. Prospective studies, such as time-and-motion studies comparing iatroX to traditional resources (e.g., NICE website search), are needed to quantify the reported time savings. Ultimately, randomized controlled trials will be necessary to assess the impact of the platform on clinical decision-making quality and patient outcomes. Conclusion iatroX, a novel RAG-based AI platform registered as a Class I Medical Device, has demonstrated rapid organic adoption and a highly positive reception among a large cohort of UK healthcare professionals. By prioritizing a localized knowledge base, implementing sophisticated retrieval mechanisms with confidence scoring and refusal-to-answer capabilities, and adhering to regulatory standards, the platform has successfully fostered clinician trust. This real-world evaluation provides strong evidence that well-designed and safely implemented RAG systems can meet the critical need for rapid, reliable information synthesis at the point of care, signaling a promising direction for the future of clinical decision support. Declarations Participant consent was waived as this was classified as a service evaluation under the UK Health Research Authority (HRA) guidelines, which do not require formal ethical approval or documented consent for anonymized data. For the in-product survey, participants were informed via a prompt of the voluntary, anonymous nature, with proceeding constituting implied consent. Acknowledgements I thank iatroX users who provided in-product feedback during the evaluation period. Funding No external funding. Platform operations were funded by iatroX. Competing interests K.T. is the founder and lead developer of iatroX and holds equity in the operating company. Usage analytics were obtained from routine telemetry (GA4/App Store Connect/Google Play Console). The survey instrument was pre-specified; anonymised aggregate data and analysis scripts are made available upon reasonable request. No other relationships or activities that could appear to have influenced the submitted work. Patient and public involvement Patients/the public were not involved in the design, conduct, reporting, or dissemination plans of this research. Ethics statements Patient consent for publication: Not required. Ethics approval: The UK HRA decision tool classified this project as a service evaluation ; Research Ethics Committee review was not required. Lawful basis: legitimate interests. Data availability statement Anonymised aggregate data, the full survey instrument, and analysis scripts are made available upon reasonable request. References Al-Dhahir MA (2023) Information Overload. StatPearls [Internet]. StatPearls Publishing, Treasure Island (FL) Brooke J (1996) SUS: a quick and dirty usability scale. In: Jordan PW, Thomas B, Weerdmeester BA, McClelland IL (eds) Usability evaluation in industry. Taylor & Francis, London, pp 189–194 Brown LD, Cai TT, DasGupta A (2001) Interval estimation for a binomial proportion. Stat Sci 16(2):101–133 Densen P (2011) Challenges and opportunities facing medical education. Transactions of the American Clinical and Climatological Association , 122, p.48 Dillman DA (2011) Mail and Internet Surveys: The Tailored Design Method. Wiley Ely JW, Osheroff JA, Ebell MH, Chambliss ML, Vinson DC, Stevermer JJ, Pifer EA (2005) Obstacles to answering doctors' questions about patient care with evidence: the roles of patients, information resources, and physicians themselves. J Am Med Inform Assoc 12(2):217–224 Eysenbach G (2004) Improving the quality of Web surveys: the Checklist for Reporting Results of Internet E-Surveys (CHERRIES). J Med Internet Res 6(3):e34 Gale NK, Heath G, Cameron E, Rashid S, Redwood S (2013) Using the framework method for the analysis of qualitative data in multi-disciplinary health research. BMC Med Res Methodol 13(1):1–8 Greenhalgh T, Robert G, Macfarlane F, Bate P, Kyriakidou O (2004) Diffusion of innovations in service organizations: systematic review and recommendations. Milbank Q 82(4):581–629 Kelly CJ, Karthikesalingam A, Suleyman M, Corrado G, King D (2019) Key challenges for delivering clinical impact with artificial intelligence. BMC Med 17(1):1–9 Lee P, Bubeck S, Nori A (2023) The Doctor and the Machine: A New Era of AI in Medicine. Microsoft Research Lewis P, Perez E, Piktus A, Petroni F, Karpukhin V, Goyal N, Küttler H, Kiela D, Rocktäschel T, Riedel S (2020) Retrieval-augmented generation for knowledge-intensive NLP tasks. Adv Neural Inf Process Syst 33:9459–9474 Medical Research Council and NHS Health Research Authority (2017) Defining Research: HRA and MRC guidance . [Online] Available at: https://www.hra.nhs.uk/planning-and-improving-research/research-planning/defining-research/ [Accessed 24 August 2025] Meskó B (2023) The ChatGPT (and other LLMs) earthquake in healthcare. J Med Internet Res 25:e49856 Misra S, Ewing LM, Saseendrakumar BR (2022) Information Overload and the Quest for Evidence-Based Practice: Navigating the Labyrinth. Cureus, 14(7) Moor M, Banerjee O, Abad ZSH, Krumholz HM, Leskovec J, Topol EJ, Rajpurkar P (2023) Foundation models for generalist medical artificial intelligence. Nature 616(7956):259–265 O'Malley PG, Jbaily A, El-Kareh R (2020) The burden of getting ready to see the patient: an electronic health record time study. J Gen Intern Med 35:1314–1315 Raffel C, Shazeer N, Roberts A, Lee K, Narang S, Matena M, Zhou Y, Li W, Liu PJ (2020) Exploring the limits of transfer learning with a unified text-to-text transformer. J Mach Learn Res 21(1):5485–5551 Singhal K et al (2024) (Google Research and Google DeepMind), A milestone in health AI: Introducing Med-Gemini. Google: The Keyword Sinsky C, Colligan L, Li L, Prgomet M, Reynolds S, Goeders L, Sinsky J, Trockel M, Dyrbye L (2016) Allocation of physician time in ambulatory practice: a time and motion study in 4 specialties. Ann Intern Med 165(11):753–760 Thirunavukarasu AJ, Ting DSJ, Elangovan K, Gutierrez L, Tan TF, Ting DSW (2023) Large language models in medicine. Nat Med 29(8):1930–1940 von Elm E, Altman DG, Egger M, Pocock SJ, Gøtzsche PC, Vandenbroucke JP (2007) The Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) statement: guidelines for reporting observational studies. Lancet 370(9596):1453–1457 Waring J, Roe B, Marshall F, Bishop S (2014) The role of context in the successful implementation of decision support technologies: a case study from UK community pharmacy. Health Soc Care Commun 22(3):276–284 Westbrook JI, Coiera EW, Gosling AS (2007) Do online information retrieval systems help experienced clinicians answer clinical questions? J Am Med Inform Assoc 14(3):315–321 Yen PY, Wantland D, Bakken S (2017) Development and validation of the Health Information Technology Usability Evaluation Scale (Health-ITUES) for the assessment of clinician-reported usability of EHRs. J Am Med Inform Assoc 24(6):1081–1088 Zakka C, Celi LA, Kvedar JP (2024) The role of retrieval-augmented generation in the modernization of clinical decision support. NEJM AI 1(2):AIdbp2300092 Additional Declarations The authors declare potential competing interests as follows: Funding No external funding. Platform operations were funded by iatroX. Competing interests K.T. is the founder and lead developer of iatroX and holds equity in the operating company. Usage analytics were obtained from routine telemetry (GA4/App Store Connect/Google Play Console). The survey instrument was pre-specified; anonymised aggregate data and analysis scripts are made available upon reasonable request. No other relationships or activities that could appear to have influenced the submitted work. Patient and public involvement Patients/the public were not involved in the design, conduct, reporting, or dissemination plans of this research. Ethics statements Patient consent for publication: Not required. Ethics approval: The UK HRA decision tool classified this project as a service evaluation; Research Ethics Committee review was not required. Lawful basis: legitimate interests. Data availability statement Anonymised aggregate data, the full survey instrument, and analysis scripts are made available upon reasonable request. Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-7593409","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":513756067,"identity":"5a3c6480-ce45-4b65-b18b-522b0185239a","order_by":0,"name":"Kolawole Tytler","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAA6klEQVRIie3OsQrCMBCA4QuBupy6plTqK7QI4qD4Ki2Cm6s4RgpxEWc330JXS0GX4Cx0cXKuCOLgYFoXXVJHwfzLkXAfCYDJ9IMxoDyfDQC6K26sCsBWTwhXGx6q3SFAoAZVRGfeCLYLAopon7Fn0+n1Ih5Yd+TtdLmvmzUKJMk0xME4YrHw0F6MNv4ySH1R9jGXhbwgnqyuHQxS8g2J7jnpSzznpF9KHBaK1yuIVk7CUmLPY9GRhxYyabXt5TAdCEr4VmoI20fJcTJ23fqcnlnWTXurWZRkEw0pItbnkZcB1eOLHZPJZPrfng+ATfOWK+AKAAAAAElFTkSuQmCC","orcid":"https://orcid.org/0000-0002-4553-6971","institution":"University of Cambridge","correspondingAuthor":true,"prefix":"","firstName":"Kolawole","middleName":"","lastName":"Tytler","suffix":""}],"badges":[],"createdAt":"2025-09-11 15:34:52","currentVersionCode":1,"declarations":{"humanSubjects":false,"vertebrateSubjects":false,"conflictsOfInterestStatement":true,"humanSubjectEthicalGuidelines":false,"humanSubjectConsent":false,"humanSubjectClinicalTrial":false,"humanSubjectCaseReport":false,"vertebrateSubjectEthicalGuidelines":false},"doi":"10.21203/rs.3.rs-7593409/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-7593409/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":91452968,"identity":"d4564a4c-3b26-409a-9a92-088b9fec6e5e","added_by":"auto","created_at":"2025-09-16 15:43:47","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":42619,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eSafety-aware RAG decision flow\u003c/strong\u003e\u003c/p\u003e","description":"","filename":"floatimage1.png","url":"https://assets-eu.researchsquare.com/files/rs-7593409/v1/27034bd2de2eed55e9d6558a.png"},{"id":91454426,"identity":"0b8438cc-2066-4dd6-b18b-02544c27aba5","added_by":"auto","created_at":"2025-09-16 15:59:48","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":992067,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-7593409/v1/e9966e74-50d2-4eea-9e40-35cdc464628f.pdf"}],"financialInterests":"The authors declare potential competing interests as follows: Funding\nNo external funding. Platform operations were funded by iatroX.\n\nCompeting interests\nK.T. is the founder and lead developer of iatroX and holds equity in the operating company. Usage analytics were obtained from routine telemetry (GA4/App Store Connect/Google Play Console). The survey instrument was pre-specified; anonymised aggregate data and analysis scripts are made available upon reasonable request. No other relationships or activities that could appear to have influenced the submitted work.\n\nPatient and public involvement\nPatients/the public were not involved in the design, conduct, reporting, or dissemination plans of this research.\n\nEthics statements\nPatient consent for publication: Not required.\n\nEthics approval: The UK HRA decision tool classified this project as a service evaluation; Research Ethics Committee review was not required. Lawful basis: legitimate interests.\n\nData availability statement\nAnonymised aggregate data, the full survey instrument, and analysis scripts are made available upon reasonable request.\n","formattedTitle":"\u003cp\u003e\u003cstrong\u003eAdoption, usability and perceived clinical value of a UK AI clinical reference platform: a mixed-methods formative evaluation of real-world usage and a 1,223-respondent user survey\u003c/strong\u003e\u003c/p\u003e","fulltext":[{"header":"Introduction","content":"\u003cp\u003e The exponential growth of biomedical literature and clinical guidelines presents a formidable challenge for healthcare professionals striving to practice evidence-based medicine. The volume of medical knowledge is estimated to double at an accelerating rate (Densen, \u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e2011\u003c/span\u003e), making it impossible for clinicians to assimilate all relevant information. This phenomenon, termed information overload, requires clinicians to synthesize vast amounts of data, inclusive of national guidelines, local protocols, and pharmacological information, often under significant time constraints at the point of care (Al-Dhahir, \u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e2023\u003c/span\u003e; Misra et al., \u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e2022\u003c/span\u003e). The cognitive burden associated with navigating this complex information landscape is a significant contributor to clinician burnout and can impede the consistent application of best practices (Sinsky et al., \u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e2016\u003c/span\u003e).\u003c/p\u003e\u003cp\u003eTraditional digital resources, ranging from PDF documents on institutional intranets to established clinical decision support systems (CDSS) like UpToDate, DynaMed, or BMJ Best Practice, while authoritative, often require time-consuming manual searches and navigation. Studies have shown that clinicians frequently abandon searches if the information is not rapidly accessible, potentially delaying evidence-based care (Westbrook et al., \u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e2007\u003c/span\u003e; Ely et al., \u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e2005\u003c/span\u003e). The format of these resources is often not optimized for the rapid, query-based nature of frontline clinical practice.\u003c/p\u003e\u003cp\u003eThe emergence of Large Language Models (LLMs) has catalyzed interest in novel approaches to clinical information retrieval and synthesis. LLMs demonstrate remarkable capabilities in natural language understanding and generation, offering the potential to provide instant, conversational answers to complex clinical queries (Thirunavukarasu et al., \u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e2023\u003c/span\u003e). However, the direct application of general-purpose LLMs (e.g., ChatGPT, Google Gemini) in clinical settings poses significant risks. These models are prone to \"hallucination\", namely generating plausible-sounding but factually incorrect or outdated information, and often lack transparent provenance for their outputs (Moor et al., \u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e2023\u003c/span\u003e; Lee et al., \u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e2023\u003c/span\u003e). Furthermore, general LLMs may provide advice that contradicts specific, localized clinical guidelines crucial for safe practice in specific jurisdictions, such as the United Kingdom.\u003c/p\u003e\u003cp\u003eTo harness the benefits of LLMs while mitigating these risks, the machine learning field has advanced the Retrieval-Augmented Generation (RAG) architecture (Lewis et al., \u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e2020\u003c/span\u003e). RAG systems enhance LLM performance by grounding the generation process in a specific, external knowledge base. Instead of relying solely on the knowledge internalized during the LLM's training, a RAG system first retrieves relevant documents from a trusted corpus and then uses the LLM to synthesize an answer based on that retrieved information. This approach significantly improves factual accuracy, ensures currency, and provides clear provenance (Zakka et al., \u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e2024\u003c/span\u003e).\u003c/p\u003e\u003cp\u003eThis paper analyses iatroX, a novel clinical decision support platform utilizing an algorithmic RAG architecture tailored for UK healthcare professionals. iatroX is designed to provide rapid, reliable, and contextually appropriate answers by ensuring all outputs are synthesized directly from a curated, continuously updated knowledge base of authoritative clinical guidelines accepted in UK practice. The platform incorporates a proprietary algorithmic search engine and safety mechanisms to manage uncertainty.\u003c/p\u003e\u003cp\u003eThe objective of this study is to describe the methodology behind the platform and to conduct a mixed-methods formative evaluation of its initial real-world adoption, user engagement, and perceived clinical utility through a large-scale analysis of platform analytics and an in-product user survey.\u003c/p\u003e"},{"header":"Methods","content":"\u003cdiv id=\"Sec3\" class=\"Section2\"\u003e\u003ch2\u003eStudy Design and Setting\u003c/h2\u003e\u003cp\u003eThis study employed a mixed-methods formative evaluation design, combining a retrospective observational analysis of real-world usage data with a cross-sectional analysis of in-product user feedback. The study focused on iatroX, a generative AI platform providing clinical decision support for UK healthcare professionals, delivered via a web application and mobile applications (iOS and Android).\u003c/p\u003e\u003cp\u003eData were collected during a 16-week observational window from 8th April 2025 to 31st July 2025. The study reporting adheres to the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) statement (von Elm et al., \u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e2007\u003c/span\u003e) and the Checklist for Reporting Results of Internet E-Surveys (CHERRIES) (Eysenbach, \u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e2004\u003c/span\u003e).\u003c/p\u003e\u003c/div\u003e\n\u003ch3\u003eThe iatroX System\u003c/h3\u003e\n\u003cdiv id=\"Sec5\" class=\"Section2\"\u003e\u003ch2\u003eSystem Architecture and Regulatory Status\u003c/h2\u003e\u003cp\u003eiatroX is built on a decoupled architecture comprising a Next.js web frontend, a React Native mobile application, and a central Node.js/Express backend API. The primary data store is MongoDB. The system's core functionality relies on a proprietary Retrieval-Augmented Generation (RAG) pipeline.\u003c/p\u003e\u003cp\u003eThe platform is registered with the UK Medicines and Healthcare products Regulatory Agency (MHRA) as a Class I Medical Device (Reference: 2025042201417535), adhering to structured quality assurance and software lifecycle processes aligned with standards such as IEC 62304 (Software Lifecycle Processes).\u003c/p\u003e\u003c/div\u003e\n\u003ch3\u003eRetrieval-Augmented Generation (RAG) Pipeline\u003c/h3\u003e\n\u003cp\u003eThe RAG pipeline is engineered to ensure responses are strictly grounded in a verified knowledge base, mitigating the risk of LLM hallucination (Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e).\u003c/p\u003e\u003cp\u003e\u003col\u003e\u003cspan\u003e\u003cli\u003e\u003cp\u003e\u003cb\u003eKnowledge Corpus\u003c/b\u003e: The retrieval corpus is constructed from publicly available and accessible UK clinical guidelines and resources from authoritative bodies (e.g., endorsed by NICE, SIGN, Royal Colleges). Ad-hoc scripts monitor these sources daily for updates to ensure content currency.\u003c/p\u003e\u003c/li\u003e\u003c/span\u003e\u003cspan\u003e\u003cli\u003e\u003cp\u003e\u003cb\u003eIngestion and Indexing\u003c/b\u003e: Documents undergo a proprietary cleaning pipeline and are segmented into semantically coherent text chunks (average 500 tokens, maximum\u0026thinsp;~\u0026thinsp;3000 tokens). Optimization of this chunking is critical for retrieval relevance (Raffel et al., \u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e2020\u003c/span\u003e). These chunks are converted into numerical vectors using advanced embedding models (GPT embedding family) and indexed in a vector database (Pinecone) for efficient semantic search.\u003c/p\u003e\u003c/li\u003e\u003c/span\u003e\u003cspan\u003e\u003cli\u003e\u003cp\u003e\u003cb\u003eRetrieval and Confidence Scoring\u003c/b\u003e: When a user submits a query, a proprietary algorithmic search engine executes a semantic search, retrieving up to 15 of the most relevant text passages. The algorithm assesses the semantic similarity between the query and the retrieved passages to determine a confidence score.\u003c/p\u003e\u003c/li\u003e\u003c/span\u003e\u003cspan\u003e\u003cli\u003e\u003cp\u003e\u003cb\u003eDynamic Scope Extension\u003c/b\u003e: If the initial retrieval from the core guidelines yields low confidence scores (indicating potential gaps or ambiguity), the system dynamically extends its scope by triangulating findings with a search of publicly available peer-reviewed research. This secondary search algorithmically ranks evidence by hierarchical strength (e.g., prioritizing meta-analyses over case studies).\u003c/p\u003e\u003c/li\u003e\u003c/span\u003e\u003cspan\u003e\u003cli\u003e\u003cp\u003e\u003cb\u003eSafety Threshold (Refusal to Answer)\u003c/b\u003e: If the confidence score remains below a predefined safety threshold after scope extension, the system declines to answer the query. This mechanism is a critical safety feature to prevent the dissemination of information not robustly supported by the evidence base.\u003c/p\u003e\u003c/li\u003e\u003c/span\u003e\u003cspan\u003e\u003cli\u003e\u003cp\u003e\u003cb\u003eSynthesis\u003c/b\u003e: If the confidence threshold is met, the retrieved passages and the original query are passed as context to the generator model. The platform utilizes a cascade of LLMs, including models from the Google Gemini and OpenAI families, alongside a proprietary, post-trained model (\"Thea\") optimized for clinical synthesis within the iatroX workflow.\u003c/p\u003e\u003c/li\u003e\u003c/span\u003e\u003c/ol\u003e\u003c/p\u003e\u003cp\u003e\u003c/p\u003e\n\u003ch3\u003eProvenance and Performance\u003c/h3\u003e\n\u003cp\u003eAll generated answers are presented with explicit citations and links to the source documents. The median response latency is 12 seconds (range: 3\u0026ndash;30 seconds).\u003c/p\u003e\u003cdiv id=\"Sec8\" class=\"Section2\"\u003e\u003ch2\u003eData Collection\u003c/h2\u003e\u003cdiv id=\"Sec9\" class=\"Section3\"\u003e\u003ch2\u003eUsage Analytics\u003c/h2\u003e\u003cp\u003ePlatform usage data was collected using Google Analytics 4 (GA4) for the web application, Google Play Console, and Apple App Store Connect for mobile applications. Standard bot filtering was applied. Key metrics included unique users, engagement events (total interactions), active users (DAU/WAU/MAU), and total clinical queries submitted to the RAG pipeline.\u003c/p\u003e\u003c/div\u003e\u003c/div\u003e\n\u003ch3\u003eIn-Product Intercept Survey\u003c/h3\u003e\n\u003cp\u003eTo capture in-context user perceptions and minimize self-selection bias (Dillman, \u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e2011\u003c/span\u003e), a systematic random intercept survey was deployed within the web application.\u003c/p\u003e\u003cp\u003e\u003cstrong\u003eSampling and Administration\u003c/strong\u003e\u003cp\u003eA client-side script presented survey prompts to a random 10% of web user sessions. To maximize the diversity of feedback and prevent survey fatigue, users were presented with randomized single-item questions from a larger question battery throughout their session. A browser cookie prevented the same user from being shown the same prompt repeatedly.\u003c/p\u003e\u003c/p\u003e\u003cp\u003e\u003cstrong\u003eSurvey Instrument\u003c/strong\u003e\u003cp\u003eThe survey comprised single-item, closed-ended questions assessing domains including perceived usefulness, clinical reliability, system performance, and intent to adopt (Table\u0026nbsp;\u003cspan refid=\"Tab2\" class=\"InternalRef\"\u003e2\u003c/span\u003e). Responses were recorded as \"Yes,\" \"No,\" or \"Don't Know.\"\u003c/p\u003e\u003c/p\u003e\u003cdiv id=\"Sec11\" class=\"Section2\"\u003e\u003ch2\u003eQualitative Feedback\u003c/h2\u003e\u003cp\u003eUnsolicited qualitative feedback was collected from online blogs, professional social media forums (e.g., Facebook doctor groups), and direct emails to the developer.\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec12\" class=\"Section2\"\u003e\u003ch2\u003eOutcomes and Analysis\u003c/h2\u003e\u003cp\u003eThe primary outcomes were the proportions of positive responses (\"Yes\") to core survey items. Secondary outcomes included descriptive usage metrics (adoption rates, engagement frequency).\u003c/p\u003e\u003cp\u003eDescriptive statistics were used for usage analytics. For survey data, proportions were calculated based on the number of responses received for each specific question. Corresponding 95% Wilson score confidence intervals are reported to estimate the precision of these proportions (Brown et al., \u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e2001\u003c/span\u003e). Qualitative feedback underwent a rapid thematic content analysis (Gale et al., \u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e2013\u003c/span\u003e) to identify recurrent themes regarding the platform's impact and user trust.\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec13\" class=\"Section2\"\u003e\u003ch2\u003eEthical Considerations and Governance\u003c/h2\u003e\u003cp\u003eThis work was conducted as a service evaluation to assess the performance of a live digital health tool and guide quality improvement. An assessment using the UK Health Research Authority (HRA) decision tool confirmed the project's status as a service evaluation, which does not require Research Ethics Committee (REC) review (Medical Research Council and NHS Health Research Authority, \u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e2017\u003c/span\u003e). The lawful basis for processing anonymized user data was legitimate interests.\u003c/p\u003e\u003c/div\u003e"},{"header":"Results","content":"\u003cdiv id=\"Sec15\" class=\"Section2\"\u003e\u003ch2\u003ePlatform Adoption and Engagement\u003c/h2\u003e\u003cp\u003eDuring the 16-week observational period (8th April 2025 to 31st July 2025), the iatroX platform demonstrated rapid and substantial user adoption (Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e). The web application attracted 19,269 unique users, generating 202,660 engagement events, indicating a high average interaction rate of 10.5 events per active user. Approximately 40,000 clinical questions were asked across all platforms during the study period.\u003c/p\u003e\u003cp\u003eThe user base was predominantly located in the United Kingdom. Traffic analysis revealed that \"Direct\" traffic was the largest source (9.1k sessions), indicating strong brand recall and habitual return usage. Significant traffic also originated from professional social media groups (e.g., Facebook, 7.7k sessions) and organic search (Google, 4.7k sessions), supporting the observation of organic, word-of-mouth adoption within clinical communities.\u003c/p\u003e\u003cp\u003eMobile application adoption was robust. The iOS application was downloaded 1,960 times. The Android application demonstrated consistent growth, reaching a peak of over 750 daily active users by late July 2025.\u003c/p\u003e\u003cp\u003eThe platform employs a usage-based registration model, allowing guest users three free queries per week before requiring registration. A total of 1,997 users registered during the study period. This represents a visitor-to-registered-user conversion rate of 10.4%, indicating that registration occurred after users had established the utility of the tool (Product-Qualified Users).\u003c/p\u003e\u003cp\u003e\u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab1\" border=\"1\"\u003e\u003ccaption language=\"En\"\u003e\u003cdiv class=\"CaptionNumber\"\u003eTable 1\u003c/div\u003e\u003cdiv class=\"CaptionContent\"\u003e\u003cp\u003eSummary of iatroX Platform Usage Metrics (8th April 2025\u0026ndash;31st July 2025)\u003c/p\u003e\u003c/div\u003e\u003c/caption\u003e\u003ccolgroup cols=\"2\"\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e\u003cthead\u003e\u003ctr\u003e\u003cth align=\"left\" colname=\"c1\"\u003e\u003cp\u003eMetric\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c2\"\u003e\u003cp\u003eValue\u003c/p\u003e\u003c/th\u003e\u003c/tr\u003e\u003c/thead\u003e\u003ctbody\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003e\u003cb\u003eWeb Platform\u003c/b\u003e\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u0026nbsp;\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eUnique Users\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e19,269\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eTotal Engagement Events\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e202,660\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eAverage Events per User\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e10.5\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003e\u003cb\u003eMobile Platform\u003c/b\u003e\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u0026nbsp;\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eiOS Downloads\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e1,960\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eAndroid Peak Daily Active Users\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e\u0026gt;\u0026thinsp;750\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003e\u003cb\u003eCross-Platform\u003c/b\u003e\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u0026nbsp;\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eTotal Registered Users\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e1,997\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eTotal Clinical Queries Asked (Approx.)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e40,000\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003c/tbody\u003e\u003c/colgroup\u003e\u003c/table\u003e\u003c/div\u003e\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec16\" class=\"Section2\"\u003e\u003ch2\u003eUser-Perceived Performance and Utility\u003c/h2\u003e\u003cp\u003eA total of 1,223 responses were obtained through the in-product intercept survey. The responses indicated a strongly positive perception across all domains (Table\u0026nbsp;\u003cspan refid=\"Tab2\" class=\"InternalRef\"\u003e2\u003c/span\u003e).\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec17\" class=\"Section2\"\u003e\u003ch2\u003ePerceived Usefulness and Efficiency\u003c/h2\u003e\u003cp\u003eUsers overwhelmingly found the platform beneficial. When asked, \"Do you find iatroX useful?\", 86.2% (50/58) responded affirmatively. A majority also reported that the platform saved them time (60.9%, 14/23).\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec18\" class=\"Section2\"\u003e\u003ch2\u003eAdoption Intent\u003c/h2\u003e\u003cp\u003eUsers indicated a high likelihood of continued use; when asked if they would use iatroX again, 93.3% (14/15) responded \"Yes\". The willingness to recommend the platform to colleagues was also high (88.4%, 38/43).\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec19\" class=\"Section2\"\u003e\u003ch2\u003ePerceived Clinical Reliability and Trust\u003c/h2\u003e\u003cp\u003eThe platform was perceived as clinically accurate and reliable. For the item, \"Do you feel that the information provided by iatroX is accurate?\", 75.0% (30/40) responded \"Yes\". When asked, \"Does iatroX look and feel reliable to you?\", 79.4% (27/34) responded affirmatively.\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec20\" class=\"Section2\"\u003e\u003ch2\u003eSystem Performance and Usability\u003c/h2\u003e\u003cp\u003eUser perception of the system's technical performance and usability was positive. The majority of users found the speed of responses satisfactory (78.9%, 45/57), and the platform easy to navigate (82.2%, 37/45).\u003c/p\u003e\u003cp\u003e\u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab2\" border=\"1\"\u003e\u003ccaption language=\"En\"\u003e\u003cdiv class=\"CaptionNumber\"\u003eTable 2\u003c/div\u003e\u003cdiv class=\"CaptionContent\"\u003e\u003cp\u003eUser Responses to Key Intercept Survey Items\u003c/p\u003e\u003c/div\u003e\u003c/caption\u003e\u003ccolgroup cols=\"7\"\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c7\" colnum=\"7\"\u003e\u003c/div\u003e\u003cthead\u003e\u003ctr\u003e\u003cth align=\"left\" colname=\"c1\"\u003e\u003cp\u003eDomain\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c2\"\u003e\u003cp\u003eSurvey Item\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c3\"\u003e\u003cp\u003eYes (%)\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c4\"\u003e\u003cp\u003e95% CI (Wilson)\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c5\"\u003e\u003cp\u003eNo (%)\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c6\"\u003e\u003cp\u003eDon't Know (%)\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c7\"\u003e\u003cp\u003eTotal Responses (N)\u003c/p\u003e\u003c/th\u003e\u003c/tr\u003e\u003c/thead\u003e\u003ctbody\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eUsefulness \u0026amp; Efficiency\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eDo you find iatroX useful?\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e86.20%\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e75.1\u0026ndash;92.9%\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\u003cp\u003e3.40%\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e\u003cp\u003e10.30%\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e\u003cp\u003e58\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eDid iatroX save you time today?\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e60.90%\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e40.8\u0026ndash;77.8%\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\u003cp\u003e13.00%\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e\u003cp\u003e26.10%\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e\u003cp\u003e23\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eAdoption Intent\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eWould you use iatroX again?\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e93.30%\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e69.9\u0026ndash;99.0%\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\u003cp\u003e0.00%\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e\u003cp\u003e6.70%\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e\u003cp\u003e15\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eWould you recommend iatroX to a colleague?\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e88.40%\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e75.5\u0026ndash;95.0%\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\u003cp\u003e2.30%\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e\u003cp\u003e9.30%\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e\u003cp\u003e43\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eReliability \u0026amp; Trust\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eDo you feel that the information provided by iatroX is accurate?\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e75.00%\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e59.7\u0026ndash;86.1%\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\u003cp\u003e2.50%\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e\u003cp\u003e22.50%\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e\u003cp\u003e40\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eDoes iatroX look and feel reliable to you?\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e79.40%\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e63.2\u0026ndash;89.9%\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\u003cp\u003e5.90%\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e\u003cp\u003e14.70%\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e\u003cp\u003e34\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003ePerformance \u0026amp; Usability\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eIs the speed of responses from iatroX satisfactory?\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e78.90%\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e67.2\u0026ndash;87.3%\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\u003cp\u003e5.30%\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e\u003cp\u003e15.80%\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e\u003cp\u003e57\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eIs the platform easy to navigate on your device?\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e82.20%\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e69.2\u0026ndash;90.6%\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\u003cp\u003e4.40%\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e\u003cp\u003e13.30%\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e\u003cp\u003e45\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003c/tbody\u003e\u003c/colgroup\u003e\u003c/table\u003e\u003c/div\u003e\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec21\" class=\"Section2\"\u003e\u003ch2\u003eQualitative Feedback Themes\u003c/h2\u003e\u003cp\u003eA large volume of unsolicited qualitative feedback was analyzed. Thematic analysis revealed three recurrent themes:\u003c/p\u003e\u003cp\u003e\u003col\u003e\u003cspan\u003e\u003cli\u003e\u003cp\u003e\u003cb\u003eHigh Clinical Utility and Efficiency\u003c/b\u003e: Users frequently described the platform as \"fantastic,\" \"brilliant,\" and a \"huge service.\" Feedback emphasized the time saved compared to searching traditional guideline repositories and the platform's ability to synthesize complex management plans rapidly.\u003c/p\u003e\u003c/li\u003e\u003c/span\u003e\u003cspan\u003e\u003cli\u003e\u003cp\u003e\u003cb\u003eTrust Driven by Governance and Provenance\u003c/b\u003e: The MHRA registration (Class I Medical Device) was frequently cited by users as a key differentiator and trust factor compared to general-purpose AI tools. The clear citation of UK-specific guidelines reinforced confidence in the clinical appropriateness of the answers.\u003c/p\u003e\u003c/li\u003e\u003c/span\u003e\u003cspan\u003e\u003cli\u003e\u003cp\u003e\u003cb\u003eOrganic Adoption via Professional Networks\u003c/b\u003e: The qualitative data provided numerous examples of users actively sharing the platform within their closed professional circles (e.g., departmental groups, trainee forums), confirming the viral growth observed in the usage analytics.\u003c/p\u003e\u003c/li\u003e\u003c/span\u003e\u003c/ol\u003e\u003c/p\u003e\u003c/div\u003e"},{"header":"Discussion","content":"\u003cdiv id=\"Sec23\" class=\"Section2\"\u003e\u003ch2\u003ePrincipal Findings\u003c/h2\u003e\u003cp\u003e This mixed-methods evaluation describes the initial real-world implementation and reception of iatroX, a RAG-based AI clinical decision support platform focused on UK guidelines. The results demonstrate rapid and substantial organic user adoption, with over 19,000 unique web users and 40,000 clinical queries within the first 16 weeks. The in-product survey data from 1,223 respondents indicates that this early adopter cohort perceives iatroX as highly useful (86.2%), accurate (75.0%), and efficient (60.9%). These findings suggest that iatroX successfully addresses a significant unmet need for rapid, trustworthy information retrieval among clinicians.\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec24\" class=\"Section2\"\u003e\u003ch2\u003eContext and Comparison with Existing Literature\u003c/h2\u003e\u003cp\u003eThe challenges of information retrieval at the point of care are well-documented drivers of cognitive burden and workflow inefficiency (Sinsky et al., \u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e2016\u003c/span\u003e; O'Malley et al., \u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e2020\u003c/span\u003e). The rapid uptake of iatroX underscores the demand for tools that can synthesize authoritative guidelines more efficiently than traditional methods.\u003c/p\u003e\u003cp\u003eThe differentiating factor in iatroX's positive reception appears to be its specialized implementation of Retrieval-Augmented Generation (RAG). While there is growing interest in using LLMs in medicine, studies consistently highlight clinician skepticism due to concerns about accuracy, lack of provenance, and potential for \"hallucination\" (Mesk\u0026oacute;, \u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e2023\u003c/span\u003e; Lee et al., \u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e2023\u003c/span\u003e). General-purpose tools, while accessible, are not designed for the safety-critical environment of clinical decision-making and lack the necessary grounding in localized guidelines.\u003c/p\u003e\u003cp\u003eRAG architectures are increasingly recognized as the most viable approach for deploying LLMs in high-stakes, knowledge-intensive domains (Lewis et al., \u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e2020\u003c/span\u003e; Zakka et al., \u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e2024\u003c/span\u003e). By restricting the LLM's generation process to a curated, verified corpus, in this case, UK-accepted clinical guidelines, iatroX directly addresses these primary safety concerns.\u003c/p\u003e\u003cp\u003eThe platform's proprietary algorithmic search engine further enhances this approach. The mechanisms for assessing confidence, dynamically expanding scope to peer-reviewed literature when necessary, and, critically, refusing to answer below a defined safety threshold, provide essential layers of safety. The high level of perceived accuracy (75.0%) and reliability (79.4%) suggests that this RAG implementation is effective in building clinician trust.\u003c/p\u003e\u003cp\u003eFurthermore, the qualitative finding that the MHRA registration fostered trust aligns with research indicating that robust governance and regulatory oversight are critical facilitators for the adoption of clinical AI (Waring et al., \u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e2014\u003c/span\u003e; Kelly et al., \u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e2019\u003c/span\u003e).\u003c/p\u003e\u003cp\u003eWhile other tools exist in the clinical reference space, such as UpToDate, BMJ Best Practice, and newer AI entrants like Glass Health (focusing on differential diagnosis) or approaches utilizing advanced models like Med-Gemini (Singhal et al., \u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e2024\u003c/span\u003e), iatroX differentiates itself through its specific focus on UK guidelines, its algorithmic RAG implementation with explicit safety thresholds, and its regulatory status.\u003c/p\u003e\u003cp\u003eThe organic, word-of-mouth growth observed (high direct traffic, social media referrals, and 88.4% willingness to recommend) is a strong indicator of product-market fit. It suggests the platform's value proposition is compelling enough for clinicians to recommend it proactively to peers, a powerful mechanism for diffusion of innovation in healthcare (Greenhalgh et al., \u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e2004\u003c/span\u003e).\u003c/p\u003e\u003cdiv id=\"Sec25\" class=\"Section3\"\u003e\u003ch2\u003eStrengths and Limitations\u003c/h2\u003e\u003cp\u003eThe primary strength of this study is its reliance on a large, real-world dataset, analyzing the behavior of over 19,000 users and capturing 1,223 survey interactions. The use of an in-product intercept survey is a methodological strength, as it captures feedback in the immediate context of use and reduces the recall bias and self-selection bias common in retrospective email surveys (Eysenbach, \u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e2004\u003c/span\u003e).\u003c/p\u003e\u003cp\u003eHowever, the study has several limitations. First, the intercept survey was anonymous, preventing the correlation of user feedback with demographic data (e.g., professional role, specialty, grade). Future research should aim to capture this data to understand adoption patterns across different clinical subgroups. Second, the study population consists of self-selected early adopters, potentially biasing the results towards more positive perceptions; the findings may not generalize to the entire UK healthcare workforce.\u003c/p\u003e\u003cp\u003eThird, the survey utilized single-item, non-validated questions. While appropriate for a formative service evaluation, future evaluations should incorporate validated instruments such as the System Usability Scale (SUS) (Brooke, \u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e1996\u003c/span\u003e) or the Health Information Technology Usability Evaluation Scale (Health-ITUES) (Yen et al., \u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e2017\u003c/span\u003e). Fourth, due to the randomized presentation of single items from a larger battery, the sample sizes (N) for individual questions are relatively small. This results in wide confidence intervals for some metrics (e.g., time-saving), limiting the precision of these specific estimates despite the large overall number of respondents.\u003c/p\u003e\u003cp\u003eFinally, while users perceived the information as accurate, this study did not objectively measure the clinical correctness of the generated answers against a gold standard, though it is noted that the iatroX's system is designed to extract, rather than generate, information from the available context.\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec26\" class=\"Section3\"\u003e\u003ch2\u003eImplications for Practice and Future Research\u003c/h2\u003e\u003cp\u003eThe positive results of this evaluation suggest that RAG-based AI tools can significantly improve the efficiency of clinical information retrieval. By reducing the time clinicians spend searching for guidelines, these tools may help alleviate cognitive burden and potentially improve the consistency of evidence-based practice.\u003c/p\u003e\u003cp\u003eFuture research should focus on rigorous evaluation of the platform's impact. This includes objective assessments of answer accuracy and safety through standardized clinical vignettes reviewed by expert panels. Prospective studies, such as time-and-motion studies comparing iatroX to traditional resources (e.g., NICE website search), are needed to quantify the reported time savings. Ultimately, randomized controlled trials will be necessary to assess the impact of the platform on clinical decision-making quality and patient outcomes.\u003c/p\u003e\u003c/div\u003e\u003c/div\u003e"},{"header":"Conclusion","content":"\u003cp\u003eiatroX, a novel RAG-based AI platform registered as a Class I Medical Device, has demonstrated rapid organic adoption and a highly positive reception among a large cohort of UK healthcare professionals. By prioritizing a localized knowledge base, implementing sophisticated retrieval mechanisms with confidence scoring and refusal-to-answer capabilities, and adhering to regulatory standards, the platform has successfully fostered clinician trust. This real-world evaluation provides strong evidence that well-designed and safely implemented RAG systems can meet the critical need for rapid, reliable information synthesis at the point of care, signaling a promising direction for the future of clinical decision support.\u003c/p\u003e"},{"header":"Declarations","content":"\u003cp\u003eParticipant consent was waived as this was classified as a service evaluation under the UK Health Research Authority (HRA) guidelines, which do not require formal ethical approval or documented consent for anonymized data. For the in-product survey, participants were informed via a prompt of the voluntary, anonymous nature, with proceeding constituting implied consent.\u003c/p\u003e\u003cp\u003e\u003cstrong\u003eAcknowledgements\u003c/strong\u003e\u003cbr\u003e\u0026nbsp;I thank iatroX users who provided in-product feedback during the evaluation period.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eFunding\u003c/strong\u003e\u003cbr\u003e\u0026nbsp;No external funding. Platform operations were funded by iatroX.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eCompeting interests\u003c/strong\u003e\u003cbr\u003eK.T. is the founder and lead developer of \u003cem\u003eiatroX\u003c/em\u003e and holds equity in the operating company. Usage analytics were obtained from routine telemetry (GA4/App Store Connect/Google Play Console). The survey instrument was pre-specified; anonymised aggregate data and analysis scripts are made available upon reasonable request. No other relationships or activities that could appear to have influenced the submitted work.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003ePatient and public involvement\u003c/strong\u003e\u003cbr\u003e\u0026nbsp;Patients/the public were not involved in the design, conduct, reporting, or dissemination plans of this research.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eEthics statements\u003c/strong\u003e\u003cbr\u003e\u003cem\u003ePatient consent for publication:\u003c/em\u003e Not required.\u003cbr\u003e\u003cem\u003eEthics approval:\u003c/em\u003e The UK HRA decision tool classified this project as a \u003cstrong\u003eservice evaluation\u003c/strong\u003e; Research Ethics Committee review was not required. Lawful basis: legitimate interests.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eData availability statement\u003c/strong\u003e\u003cbr\u003e\u0026nbsp;Anonymised aggregate data, the full survey instrument, and analysis scripts are made available upon reasonable request.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\u003cli\u003e\u003cspan\u003eAl-Dhahir MA (2023) Information Overload. \u003cem\u003eStatPearls\u003c/em\u003e [Internet]. StatPearls Publishing, Treasure Island (FL)\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eBrooke J (1996) SUS: a quick and dirty usability scale. In: Jordan PW, Thomas B, Weerdmeester BA, McClelland IL (eds) Usability evaluation in industry. Taylor \u0026amp; Francis, London, pp 189\u0026ndash;194\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eBrown LD, Cai TT, DasGupta A (2001) Interval estimation for a binomial proportion. Stat Sci 16(2):101\u0026ndash;133\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eDensen P (2011) Challenges and opportunities facing medical education. \u003cem\u003eTransactions of the American Clinical and Climatological Association\u003c/em\u003e, 122, p.48\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eDillman DA (2011) Mail and Internet Surveys: The Tailored Design Method. Wiley\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eEly JW, Osheroff JA, Ebell MH, Chambliss ML, Vinson DC, Stevermer JJ, Pifer EA (2005) Obstacles to answering doctors' questions about patient care with evidence: the roles of patients, information resources, and physicians themselves. J Am Med Inform Assoc 12(2):217\u0026ndash;224\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eEysenbach G (2004) Improving the quality of Web surveys: the Checklist for Reporting Results of Internet E-Surveys (CHERRIES). J Med Internet Res 6(3):e34\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eGale NK, Heath G, Cameron E, Rashid S, Redwood S (2013) Using the framework method for the analysis of qualitative data in multi-disciplinary health research. BMC Med Res Methodol 13(1):1\u0026ndash;8\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eGreenhalgh T, Robert G, Macfarlane F, Bate P, Kyriakidou O (2004) Diffusion of innovations in service organizations: systematic review and recommendations. Milbank Q 82(4):581\u0026ndash;629\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eKelly CJ, Karthikesalingam A, Suleyman M, Corrado G, King D (2019) Key challenges for delivering clinical impact with artificial intelligence. BMC Med 17(1):1\u0026ndash;9\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eLee P, Bubeck S, Nori A (2023) The Doctor and the Machine: A New Era of AI in Medicine. \u003cem\u003eMicrosoft Research\u003c/em\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eLewis P, Perez E, Piktus A, Petroni F, Karpukhin V, Goyal N, K\u0026uuml;ttler H, Kiela D, Rockt\u0026auml;schel T, Riedel S (2020) Retrieval-augmented generation for knowledge-intensive NLP tasks. Adv Neural Inf Process Syst 33:9459\u0026ndash;9474\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eMedical Research Council and NHS Health Research Authority (2017) \u003cem\u003eDefining Research: HRA and MRC guidance\u003c/em\u003e. [Online] Available at: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://www.hra.nhs.uk/planning-and-improving-research/research-planning/defining-research/\u003c/span\u003e\u003cspan address=\"https://www.hra.nhs.uk/planning-and-improving-research/research-planning/defining-research/\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e [Accessed 24 August 2025]\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eMesk\u0026oacute; B (2023) The ChatGPT (and other LLMs) earthquake in healthcare. J Med Internet Res 25:e49856\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eMisra S, Ewing LM, Saseendrakumar BR (2022) Information Overload and the Quest for Evidence-Based Practice: Navigating the Labyrinth. Cureus, 14(7)\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eMoor M, Banerjee O, Abad ZSH, Krumholz HM, Leskovec J, Topol EJ, Rajpurkar P (2023) Foundation models for generalist medical artificial intelligence. Nature 616(7956):259\u0026ndash;265\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eO'Malley PG, Jbaily A, El-Kareh R (2020) The burden of getting ready to see the patient: an electronic health record time study. J Gen Intern Med 35:1314\u0026ndash;1315\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eRaffel C, Shazeer N, Roberts A, Lee K, Narang S, Matena M, Zhou Y, Li W, Liu PJ (2020) Exploring the limits of transfer learning with a unified text-to-text transformer. J Mach Learn Res 21(1):5485\u0026ndash;5551\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eSinghal K et al (2024) (Google Research and Google DeepMind), A milestone in health AI: Introducing Med-Gemini. \u003cem\u003eGoogle: The Keyword\u003c/em\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eSinsky C, Colligan L, Li L, Prgomet M, Reynolds S, Goeders L, Sinsky J, Trockel M, Dyrbye L (2016) Allocation of physician time in ambulatory practice: a time and motion study in 4 specialties. Ann Intern Med 165(11):753\u0026ndash;760\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eThirunavukarasu AJ, Ting DSJ, Elangovan K, Gutierrez L, Tan TF, Ting DSW (2023) Large language models in medicine. Nat Med 29(8):1930\u0026ndash;1940\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003evon Elm E, Altman DG, Egger M, Pocock SJ, G\u0026oslash;tzsche PC, Vandenbroucke JP (2007) The Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) statement: guidelines for reporting observational studies. Lancet 370(9596):1453\u0026ndash;1457\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eWaring J, Roe B, Marshall F, Bishop S (2014) The role of context in the successful implementation of decision support technologies: a case study from UK community pharmacy. Health Soc Care Commun 22(3):276\u0026ndash;284\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eWestbrook JI, Coiera EW, Gosling AS (2007) Do online information retrieval systems help experienced clinicians answer clinical questions? J Am Med Inform Assoc 14(3):315\u0026ndash;321\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eYen PY, Wantland D, Bakken S (2017) Development and validation of the Health Information Technology Usability Evaluation Scale (Health-ITUES) for the assessment of clinician-reported usability of EHRs. J Am Med Inform Assoc 24(6):1081\u0026ndash;1088\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eZakka C, Celi LA, Kvedar JP (2024) The role of retrieval-augmented generation in the modernization of clinical decision support. NEJM AI 1(2):AIdbp2300092\u003c/span\u003e\u003c/li\u003e\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":true,"hideJournal":true,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"Artificial Intelligence, Large Language Models, Retrieval-Augmented Generation, Clinical Decision Support Systems, Digital Health, Medical Informatics, mHealth","lastPublishedDoi":"10.21203/rs.3.rs-7593409/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-7593409/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003e\u003cb\u003eObjectives\u003c/b\u003e\u003c/p\u003e\u003cp\u003eTo describe the design of a UK-centred retrieval-augmented clinical reference platform, and report early real-world adoption and perceived clinical utility from a formative implementation evaluation.\u003c/p\u003e\u003cp\u003e\u003cb\u003eMethods\u003c/b\u003e\u003c/p\u003e\u003cp\u003eWe conducted a mixed-methods study comprising (i) a retrospective observational analysis of platform usage across web, iOS and Android over 16 weeks and (ii) a cross-sectional, in-product intercept survey. Usage data (unique users, engagement events, clinical queries) were sourced from Google Analytics 4, Apple App Store Connect and Google Play Console. A client-side script randomised survey prompts to ~\u0026thinsp;10% of web sessions, displaying single items from a predefined battery. Proportions are reported with Wilson 95% confidence intervals; qualitative comments underwent thematic content analysis. No personal identifiers were collected.\u003c/p\u003e\u003cp\u003e\u003cb\u003eResults\u003c/b\u003e\u003c/p\u003e\u003cp\u003eThe web application attracted 19,269 unique users and 202,660 engagement events (~\u0026thinsp;10.5 per active user), with approximately 40,000 clinical queries across platforms. The intercept survey yielded 1,223 item-level responses. Among respondents: useful 86.2% (50/58); time saved 60.9% (14/23); would use again 93.3% (14/15); would recommend 88.4% (38/43); perceived accuracy 75.0% (30/40); perceived reliability 79.4% (27/34). Themes highlighted speed, guideline-grounded answers and UK specificity.\u003c/p\u003e\u003cp\u003e\u003cb\u003eDiscussion\u003c/b\u003e\u003c/p\u003e\u003cp\u003eFindings provide formative signals of value for rapid, provenance-bound information retrieval. Key limitations include small item-level Ns, early-adopter/selection bias, and absence of gold-standard accuracy benchmarking; results should not be interpreted as evidence of clinical effectiveness.\u003c/p\u003e\u003cp\u003e\u003cb\u003eConclusion\u003c/b\u003e\u003c/p\u003e\u003cp\u003eA safety-governed, RAG-based platform showed early uptake and favourable user sentiment among UK clinicians. Prospective evaluations, time-and-motion studies and objective accuracy/safety audits, are warranted to assess impact on clinical workflows and care quality.\u003c/p\u003e","manuscriptTitle":"Adoption, usability and perceived clinical value of a UK AI clinical reference platform: a mixed-methods formative evaluation of real-world usage and a 1,223-respondent user survey","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-09-16 15:43:43","doi":"10.21203/rs.3.rs-7593409/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"61759df6-7709-4342-8e3c-bc98c3ee343d","owner":[],"postedDate":"September 16th, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[{"id":54824360,"name":"Artificial Intelligence and Machine Learning"},{"id":54824361,"name":"Medical Informatics"},{"id":54824362,"name":"Information Retrieval and Management"},{"id":54824363,"name":"Health Economics \u0026 Outcomes Research"},{"id":54824364,"name":"Hospital Medicine"}],"tags":[],"updatedAt":"2025-09-16T15:43:43+00:00","versionOfRecord":[],"versionCreatedAt":"2025-09-16 15:43:43","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-7593409","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-7593409","identity":"rs-7593409","version":["v1"]},"buildId":"8U1c8b4HqxoKbykW_rLl7","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

Ask this paper AI returns verbatim quotes from the full text · source: preprint-html

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2025) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc
last seen: 2026-05-20T01:45:00.602351+00:00