Assessing the Fitness-for-Purpose of Published Breath Analysis Data: A Quality Assessment Framework for Diabetes Biomarker Research | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article Assessing the Fitness-for-Purpose of Published Breath Analysis Data: A Quality Assessment Framework for Diabetes Biomarker Research Lin Guo, Wei Zhang, Yinchu Wang, Zilong Liu, Xingchuang Xiong This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-7455257/v1 This work is licensed under a CC BY 4.0 License Status: Under Review Version 1 posted 5 You are reading this latest preprint version Abstract Background Exhaled breath analysis is a promising field for non-invasive diabetes diagnostics, but its clinical translation is hindered by contradictory findings across studies. We argue that this inconsistency stems from significant methodological heterogeneity and the lack of appropriate criteria for screening published data for secondary analysis. Existing tools, such as QUADAS-2, assess the quality of clinical study design but are not equipped to evaluate the technical comparability and fitness-for-purpose of the quantitative data itself. This study aimed to develop and validate a novel, data-driven framework to systematically assess the fitness-for-purpose of published data, thereby addressing this critical gap. Methods We developed the National Institute of Metrology - Diabetes Breath Assessment (NIM-DBA) framework, a multi-domain quality and fitness-for-purpose assessment tool. Its theoretical basis is derived from the stringent specifications for Standard Reference Data in metrology and aligned with international ISO data quality standards. A systematic literature search was conducted in PubMed, Scopus, Embase, and Web of Science (up to April 2025) to identify studies reporting quantitative data on breath volatile organic compounds (VOCs) in diabetic patients. Using breath acetone as a case study, we applied the NIM-DBA framework to the resulting literature pool. A parallel assessment using the QUADAS-2 tool was also performed on the same pool to compare the data-driven (NIM-DBA) and hypothesis-driven (QUADAS-2) evaluation paradigms. Results The systematic search identified an initial pool of 38 eligible studies. Application of the multi-stage NIM-DBA screening process filtered this heterogeneous pool down to a core subset of only six studies (15.8%) that met all criteria for high data quality and fitness-for-purpose. In contrast, the parallel QUADAS-2 assessment of the same 38 studies revealed widespread high or unclear risk of bias, particularly in the domains of Patient Selection (79% high risk) and Index Test reporting (82% unclear risk). The six studies that passed the NIM-DBA framework demonstrated a highly consistent biological pattern—elevated breath acetone concentrations in diabetic patients—and shared common methodological best practices, such as standardized alveolar gas collection and the use of high-sensitivity analytical instruments. Conclusion The prevalent contradictory conclusions in breath analysis literature are likely attributable to differences in methodological rigor rather than biomarker instability. The proposed NIM-DBA framework is an effective tool for systematically managing data heterogeneity, filtering literature for secondary analysis, and identifying methodologically robust studies. This data-driven approach provides a necessary complement to classic clinical evaluation tools, offering a new perspective on research quality assessment and providing valuable guidance for future study design in the field. Breath Analysis Diabetes Mellitus Volatile Organic Compounds (VOCs) Acetone Data Quality Fitness-for-Purpose Systematic Review Assessment Framework Biomarker Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Figure 6 Background In the field of non-invasive diagnostics, exhaled breath analysis has emerged as a highly promising technique. This is underscored by its recognition from the World Economic Forum as one of the top ten emerging technologies of 2021 [ 1 ]. The scientific basis for this technique is that human breath is a complex product of metabolic processes. As Sharma et al. noted in a 2023 review, a single exhalation contains hundreds of volatile organic compounds (VOCs) capable of providing a real-time snapshot of an individual's metabolic state [ 2 ]. Indeed, a recent census of the "human volatilome" identified nearly 1,500 distinct VOCs in the breath of healthy individuals alone [ 3 ]. Among its many potential clinical applications, the technique holds particular importance for managing diabetes mellitus, a major global health challenge; data cited by Mahnoor et al. in a 2024 review indicate that the global prevalence of diabetes reached 529 million people in 2021 [ 4 ]. Consequently, developing convenient and painless monitoring tools using exhaled VOCs to supplement, or even replace, traditional invasive blood glucose testing has become a key research direction in the field [ 5 ]. However, despite the field's long history and immense potential, a persistent challenge has hindered its clinical translation: the conclusions from different studies are often contradictory or even conflicting. For instance, in a 2021 review, Dixit et al. highlighted that the efficacy of breath acetone as a standalone biomarker is "uncertain," as its correlation with blood glucose levels has been variously reported as positive, negative, or entirely absent [ 6 ]. This lack of consensus directly impedes clinical translation. As Miekisch et al. noted in a 2024 commentary, despite numerous studies, no breath test has yet transitioned into clinical practice as a standard diagnostic procedure [ 7 ]. We argue that the root of these inconsistent conclusions stems not solely from the intrinsic biological complexity but largely from the absence of appropriate data screening criteria for secondary analysis. As Ma et al. clearly stated in a 2023 review, a major obstacle to the clinical translation of exhaled breath diagnostics is the "lack of standardized operating procedures," which spans the entire process from breath sampling and storage to analysis [ 8 ]. This lack of standards has led to significant methodological heterogeneity in existing research. Taking breath acetone detection as an example, a 2021 review by Obeidat systematically covered as many as seven mainstream detection technologies, from Gas Chromatography-Mass Spectrometry (GC-MS) to various sensors, and detailed the vast differences in sensitivity, selectivity, and operating conditions for each [ 9 ]. This renders data from different studies difficult to compare directly. As summarized by Liu et al. in a recent 2024 review, the diversity in sampling methods, analytical instruments, and data processing approaches makes the acetone concentrations obtained from different studies "hardly comparable" [ 10 ]. The significant methodological heterogeneity is a problem that even meta-analysis, often considered the highest level of evidence, cannot resolve—fundamentally because its own data screening criteria are typically too broad. For instance, in a 2021 meta-analysis on breath tests for diagnosing diabetes, Wang et al.'s inclusion criteria merely required that studies be "diagnostic accuracy studies" and "provide sufficient data to construct a 2x2 table" [ 11 ]. This approach, based solely on study type and data completeness, largely overlooks how the data were generated—crucial methodological details such as whether the study used high-precision mass spectrometry versus interference-prone sensors, or whether breath samples were standardized alveolar air versus mixed expired air. As Mathew et al. emphasized in a 2015 review, fundamental differences exist among analytical techniques (e.g., mass spectrometry, spectroscopy, sensor arrays) in terms of sensitivity, selectivity, and robustness against interferents like humidity [ 12 ]. Therefore, directly pooling and statistically analyzing data with such significant methodological heterogeneity casts doubt on the reliability of the results. Indeed, the meta-analysis by Wang et al. could only draw the cautious conclusion that breath testing has "a moderate diagnostic accuracy" for diabetes, and they specifically noted that their findings were severely limited by "significant heterogeneity" [ 11 ]. This demonstrates that existing secondary analysis methods, due to their coarse-grained screening criteria, fail to address the root problem. Instead, they risk incorporating a large volume of flawed data, ultimately yielding ambiguous conclusions. A reasonable inference, therefore, is that if a framework for assessing data quality and applicability—one that delves into methodological details—were used to select a high-quality, homogeneous subset of studies, a secondary analysis of this subset would very likely yield highly consistent conclusions, thereby validating the clinical value of breath analysis technology. The reality is that within the specific field of breath analysis, no recognized, standardized framework currently exists that is specifically designed to assess the quality of secondary data. The vast majority of existing systematic reviews are concentrated in areas like cancer screening, and even within these most intensively studied domains, the assessment tools employed have fundamental limitations. For example, several recent, high-quality meta-analyses on diseases such as lung and breast cancer have all used QUADAS-2 to assess the risk of bias in their included studies [ 13 – 15 ]. While QUADAS-2 is the gold standard for evaluating the quality of diagnostic accuracy studies, its core philosophy is to assess the rigor of a study's clinical design rather than the technical comparability of its underlying data. In other words, its assessment domains focus on macroscopic clinical design aspects like patient selection, flow, and timing, while lacking specific items to evaluate the critical technical details unique to breath analysis that determine data quality (e.g., breath sampling methods, sample preconcentration techniques, and instrument calibration). This scarcity of systematic methodological assessment tools is even more pronounced in the specific area of diabetes. A telling example is a 2025 publication, one of the very few systematic reviews specifically targeting breath ketone analysis for diabetes, which, for its quality assessment, still had to adopt a "modified" version of the generic QUADAS-2 tool [ 16 ]. This move highlights that existing clinical assessment tools fall short when researchers attempt to delve deeper into the technical reliability of the data. Therefore, what the field lacks is not a refutation of QUADAS-2, but a complementary systematic tool from a different perspective—a framework designed specifically for the objective assessment of the Fitness-for-Purpose of published quantitative data, one rooted in a more "metrological" mindset. To address this challenge and ensure our proposed framework is both methodologically sound and innovative, we first conducted a broad and systematic survey of the paradigms for creating data quality assessment frameworks. We analyzed 38 representative publications from diverse academic fields (Table 1 ). While Table 1 presents only the authors, citation numbers, and broad categories of the assessed data, a detailed summary of the development rationale for each framework is provided in Supplementary Table S1 . Our survey revealed that the creation of a new data quality assessment framework typically follows one of five established academic paradigms. The construction methodology for the framework proposed in this study is based on adhering to and combining these established paradigms. Specifically, our framework integrates two of these mainstream approaches: "Category 2: Extension and Contextualization of Existing Authoritative Frameworks" and "Category 4: Use-Case Driven Construction." This dual approach ensures that our method is not created from scratch but is instead grounded in a foundation of scientifically recognized methodologies, while also guaranteeing its precise applicability to the specific challenge we address. Building on this foundation, the uniqueness and authority of our framework lie not in simply refining existing clinical assessment tools, but in proposing a different paradigm. It introduces the stringent technical review specifications for Standard Reference Data from the field of metrology and aligns deeply with international data quality standards such as ISO 8000 and ISO/IEC 25012. This ensures that its assessment scale possesses a higher degree of rigor and objectivity. Table 1 Overview of Data Quality Assessment Frameworks Categorized by Development Methodology Reference Broad Category of Data Evaluated Category 1: Induction and Synthesis from Systematic Literature Reviews Schwabe et al. [ 17 ] Medical data for model training Fadahunsi et al. [ 18 ] Electronic health information Patone & Zhang [ 19 ] Social media data Fadahunsi et al. [ 20 ] Digital health technology data Declerck et al. [ 21 ] Health data (for secondary use) Schmidt et al. [ 22 ] Observational health research data Zhang et al. [ 23 ] Internet of Things (IoT) data Ijab et al. [ 24 ] Big data (in the public sector) Daikeler et al. [ 25 ] Digital social science research data Cichy & Rass [ 26 ] General business data Category 2: Extension and Contextualization of Existing Authoritative Frameworks Bhana et al. [ 27 ] Public safety data Zou & Berger [ 28 ] Real-World Data (RWD) (in healthcare) Shabani et al. [ 29 ] Healthcare data (newborn indicators) Okwaraji et al. [ 30 ] Healthcare data (low birth weight and preterm birth) Gyrard et al. [ 31 ] Healthcare data (cancer-related) Jin et al. [ 32 ] Medical diagnostic study data Healy et al. [ 33 ] Health and social care data Laberge & Shachak [ 34 ] Sociodemographic data Comero et al. [ 35 ] Environmental monitoring data Category 3: Qualitative Methods & Expert Consensus Ng et al. [ 36 ] Health and biomedical datasets Category 4: Use-Case Driven Construction Tute et al. [ 37 ] Clinical data (pediatric intensive care) Kookal et al. [ 38 ] Dental electronic health record data Kuusisto et al. [ 39 ] Healthcare data (palliative care) Widad et al. [ 40 ] Big data McCord et al. [ 41 ] Ecological data Blacketer et al. [ 42 ] Observational medical data Hillert et al. [ 43 ] Healthcare data (multiple sclerosis) Corrales et al. [ 44 ] Datasets for classification tasks Jiang et al. [ 45 ] Real-World Data (RWD) (in healthcare) Wirsching et al. [ 46 ] Observational epidemiological study data Nguyen et al. [ 47 ] Big data (in education) Deady et al. [ 48 ] Healthcare data (vaccine adverse events) Naik et al. [ 49 ] Nursing quality indicator data Smith et al. [ 50 ] Administrative data Iverson et al. [ 51 ] Healthcare data (metabolic charts) Category 5: Deduction and Construction based on Specific Theories Krishna et al. [ 52 ] Road infrastructure data Larburu et al. [ 53 ] Healthcare data (telemedicine) This paper has two core contributions: First, from a data-driven perspective and based on the stringent review criteria for standard reference data in metrology, we have created a quality assessment framework for diabetes biomarker research, the National Institute of Metrology - Diabetes Breath Assessment Framework (NIM-DBA). We will elaborate on the construction logic and specific contents of this multi-dimensional framework. Second, using the highly debated topic of "breath acetone as a biomarker for diabetes" as a case study, we will conduct a comparative analysis of NIM-DBA and QUADAS-2 to demonstrate how NIM-DBA provides insights from a distinct, complementary research perspective. Methods To systematically assess the fitness-for-purpose of published quantitative data on breath VOCs in diabetic patients, we designed and implemented a comprehensive evaluation framework (Figure 1). The workflow constitutes an end-to-end process encompassing three primary stages: (1) Data Acquisition, (2) the NIM-DBA Data Assessment Process, and (3) the Application of Assessment Results. Figure 1 illustrates the complete architecture of this framework, detailing the specific components of each stage and highlighting the structure and basic workflow of the core NIM-DBA process. Data Acquisition and Standardization Literature Search and AI-Assisted Screening To systematically collect studies reporting quantitative data on breath VOCs in diabetic patients, a comprehensive search of four major academic databases was conducted on April 12, 2025: PubMed/MEDLINE, Scopus, Embase, and the Web of Science (WoS) Core Collection. The search was limited to articles published between January 1, 2005, and April 12, 2025. The search strategy was constructed around three core concepts: 1) the disease/population (diabetes), 2) the detection method (breath analysis), and 3) the analytes (volatile organic compounds). Recognizing the differences in search syntax, subject headings (e.g., MeSH terms), and built-in filters across databases, we developed search strategies that were logically consistent but adapted to the specifics of each platform. For instance, databases like PubMed/MEDLINE and Embase offer a direct filter to limit studies to "humans," which we applied. For databases lacking such a direct filter (e.g., Scopus, Web of Science), non-human studies were excluded during the subsequent screening phase. The detailed search terms, Boolean logic, and results for each step are provided in Table S2. Following the database search, all retrieved records were imported into the reference management software Zotero. After initial retrieval and deduplication using Zotero, 330 unique articles were obtained. The deduplication results from each database are detailed in Table S3. To manage this large volume of literature efficiently, we deviated from a traditional title and abstract screening process and instead adopted an AI-assisted, two-step screening strategy. This approach was chosen because relevant quantitative data may be present even when a study's title and abstract are not indicative of it. For example, a study titled for research on chronic kidney disease might still contain valuable quantitative VOC data for a diabetic cohort and a control group within the body of the paper[58]. The first step of our AI strategy was a broad preliminary screening. We utilized a Large Language Model (Google Gemini 2.0) to rapidly identify all 330 articles that could potentially contain "quantitative human breath VOC data." The screening criteria at this stage were intentionally broad to minimize the risk of missing potentially relevant studies , with the detailed prompt provided in Supplementary File 3 - AI Prompt (Step 1 - Broad Screening).txt. Subsequently, articles that passed this initial screen were subjected to a more stringent AI-assisted precise extraction and deep screening process , the prompt for which is provided in Supplementary File 4 - AI Prompt (Step 2 - Deep Extraction).txt. 2 The objective of this second step was to systematically identify all potentially relevant quantitative data points and their exact positions within the source PDF files. Pinpointing the data's location is crucial for ensuring traceability and facilitating subsequent manual verification. Only primary quantitative data confirmed to be originally generated by the respective studies were ultimately included in our dataset. Manual Verification, Extraction, and Standardization While the AI-assisted process can efficiently locate potential data points, it cannot fully replicate the expert judgment required to understand complex academic contexts and discriminate between data sources. Therefore, to ensure the reliability and fitness-for-purpose of the finally included data, the 39 articles shortlisted by the AI underwent a manual verification and data extraction process. This procedure was performed independently by two researchers, and the results were cross-validated. Any discrepancies were resolved through discussion. The process comprised three core tasks: 1. Verification of Data Presence: Based on the location information provided by the AI, the researchers returned to the original articles to verify the existence of any quantitative VOC data related to breath testing at the indicated positions. 2. Discrimination of Data Source: For all confirmed quantitative data, we carefully discriminated its origin, clearly distinguishing between primary data generated by the study itself and secondary data merely cited from other literature. Only articles containing primary quantitative data were deemed eligible for the next step. 3. Data Extraction and Standardization: For the qualified data that passed the above verification steps, the researchers extracted it from the text, tables, figures, or supplementary materials of the source articles. The extracted data were concurrently entered into a pre-designed, standardized data format Supplementary File 2 (Sheet 1).The complete results of this screening process, including the flow of studies at each stage, are presented in the Results section. The Core Evaluation Framework: NIM-DBA The NIM-DBA framework is a novel methodological tool designed specifically for the objective assessment of published literature. Conceptualization and Theoretical Basis of the Framework The data quality and fitness-for-purpose assessment framework developed in this study is named the NIM-DBA (National Institute of Metrology - Diabetes Breath Assessment) Framework. Its theoretical cornerstone is the Technical Specification for Standard Reference Data Review, an unpublished work instruction developed by the National Institute of Metrology, China's Center for Metrology Scientific Data and Energy Metrology (NIM-NMDC). We selected this specification precisely because it represents the highest standard for data quality in the field of metrology, a status reinforced by its official accreditation from both the China National Accreditation Service for Conformity Assessment (CNAS) and the China Metrology Accreditation (CMA). Grounding our framework in this authoritative document allowed us to instill the most rigorous principles of data quality from the outset. Within the metrological system, "Standard Reference Data" (SRD) are not ordinary scientific data; they are datasets of the highest accuracy and credibility that have undergone the most stringent evaluation, intended for calibrating measurement systems, evaluating measurement methods, and assigning values to materials. Consequently, this Technical Specification imposes requirements for data provenance, production processes, completeness, consistency, and uncertainty assessment that are far more stringent than those for conventional research data. By adapting and applying this rigorous review philosophy from metrology to the assessment of data from secondary literature, our framework is therefore grounded in a higher standard for data credibility and traceability. This ensures our evaluation scale is not merely a subjective checklist, but rather a systematic tool designed to scrutinize and screen conventional research data through the lens of Standard Reference Data. This constitutes the core feature and advantage of our study's methodology. To ensure the framework's applicability, we purposefully adapted the original specification: we retained its systematic evaluation structure while removing assessment content irrelevant to our research (such as physical measurement uncertainty), thereby focusing the evaluation on the fitness-for-purpose of quantitative human breath VOC data in diabetes biomarker research. To ensure the scientific validity and international recognition of each evaluation domain, we referenced international standards such as ISO 8000 (Data Quality) and ISO/IEC 25012 and 25024 (Data Quality Model) [54, 55] to develop precise, operational definitions and assessment criteria for each data attribute (e.g., study subjects, sampling methods). Consequently, the final assessment framework is a systematic tool that integrates the stringent requirements of metrology with international data quality theory. It is customized for the specific purpose of this study, ensuring a structured evaluation process and scientifically sound assessment criteria, as illustrated in Figure 2. The detailed criteria for each core domain are presented in Results section. Framework Implementation and Methodological Feature Extraction The methodological evaluation process in this study consisted of two main steps. First, we applied the NIM-DBA framework, as defined in the preceding section, to conduct an item-by-item screening of all articles that passed the initial selection process detailed in the "Data Acquisition and Standardization" section. Any article that failed to meet any of the key criteria within the NIM-DBA framework was excluded from the final subset of studies with high fitness-for-purpose. Following this, the subset of articles that passed all NIM-DBA screening steps advanced to an exploratory phase of methodological feature extraction. The purpose of this stage was not to adhere to a pre-defined medical information extraction protocol, but rather to systematically and comprehensively characterize the shared methodological profile of these "high-quality data" studies. This allows for a subsequent analysis of the underlying factors contributing to their success. To this end, we systematically extracted a series of key methodological features from these articles. These included: study groups and sample size (n), total sample size (N), age, inclusion criteria, medication status, pre-sampling conditions, type of gas sampled, collection and storage equipment, analytical instrument(s) used, sample preprocessing techniques, statistical test methods, and the reported p-values. These extracted features serve as the data foundation for our subsequent cross-study comparative analysis and for the distillation of "best practice" principles. These procedures correspond specifically to the consecutive stages of the 'NIM-DBA Data Assessment Process' and the 'Application of Assessment Results' as depicted in Figure 1. Comparative Analysis: Characterizing the Clinical Bias Landscape using QUADAS-2 The NIM-DBA framework proposed in this study represents a data-driven paradigm, the core of which is to assess whether the data themselves are fit for secondary analysis. To more clearly demonstrate the necessity of this data-driven paradigm and to reveal the challenges faced by traditional evaluation methods in this field, we conducted a parallel analysis of the study pool (N=38) to serve as a comparative reference. This analysis utilized the internationally recognized "Quality Assessment of Diagnostic Accuracy Studies 2" (QUADAS-2) tool. QUADAS-2 is a classic evaluation framework born from the foundational principles of Evidence-Based Medicine (EBM) and operates on a rigorous clinical hypothesis-driven principle. To clearly delineate the distinct evaluative dimensions of these two frameworks—and to emphasize that NIM-DBA is not a replacement for QUADAS-2 but rather a complementary tool that assesses research data from another perspective—their fundamental differences are summarized in Table 2. Table 2. Comparison of the Core Principles of the NIM-DBA and QUADAS-2 Frameworks NIM-DBA QUADAS-2 Paradigm Data-Driven Clinical Hypothesis-Driven Focus The Data Itself The Process of Testing a Hypothesis Core Question "Is the data fit for purpose?" "Is the conclusion credible?" Assessment Dimensions Technical attributes of the data: format, units, precision, completeness, instrument information, etc. Methodology of the clinical study design: patient selection, blinding, appropriateness of the reference standard, etc. It must be emphasized that the purpose of this QUADAS-2 analysis was not to screen or exclude articles. Instead, it served as a demonstrative tool. Through this assessment, we aimed to characterize the overall landscape of clinical study design quality and the distribution of bias risk within the current field of diabetes breath analysis. In the subsequent "Discussion" section, this landscape will be contrasted with the results obtained from our NIM-DBA screening to highlight the unique value of the data-driven approach. To execute this comparative analysis systematically, we first defined a specific "Review Question," which is the logical starting point for any QUADAS-2 assessment. The question was formulated as: "In people with suspected diabetes, what is the accuracy of a breath test (index test) for diagnosing diabetes (outcome) compared to standard glucose testing (reference standard)?" Based on this review question, we assessed the articles using the QUADAS-2 signaling question checklist adapted by Hanna et al. (2019) and previously used by Wang et al. (2021) in their systematic review. The assessment covered both major domains of the tool: risk of bias and applicability concerns [11]. (The detailed evaluation rules are provided in the supplementary material, Table S4). Results This section presents the twofold results of our study. First, we introduce the primary outcome of our research: the complete structure and content of the NIM-DBA framework. Second, we report the findings from the application of this framework in a case study designed to assess the literature on breath acetone as a biomarker for diabetes. The NIM-DBA Framework The primary result of our framework development process is the NIM-DBA framework. It should be emphasized that NIM-DBA is designed as a general assessment framework for all types of VOC data in the field of diabetes breath testing; therefore, its criteria are universally applicable. When the assessment is focused on a specific VOC, such as acetone, the semantic settings within NIM-DBA must be adjusted accordingly. For the sake of clarity and brevity, the version of the framework presented in Table 3 has been streamlined: the "Source Credibility" assessment domain includes criteria specifically for journal articles, omitting those for other publication types like conference proceedings or books. This simplification was made to manage the table's size while ensuring the complete conceptual structure of the framework remains intact. The complete structure of the framework, encompassing its four domains and their specific assessment criteria, is detailed below. Table 3 The Data Quality Assessment Framework and Detailed Criteria Criterion ID Data Quality Characteristic Assessment Procedure 1.1 Source credibility Step 1: Journal and Database Review Question: Is the journal peer-reviewed and indexed in reputable international or national databases? Decision: No → Proceed to Step 2. Yes → Proceed to Step 3. Step 2: Predatory Journal Check Question: Are there clear indicators of a "predatory journal"? Decision: Yes → [Assessment Outcome: Not Credible] No → Proceed to Step 3. Step 3: Author and Institutional Background Review Question: Are the authors' affiliations and professional backgrounds relevant and reputable? Decision: No → [Assessment Outcome: Not Credible] Yes → Proceed to Step 4. Step 4: Conflict of Interest and Risk of Bias Assessment Question: Are funding sources/conflicts of interest disclosed and considered? Decision: No, and significant risk of bias exists → [Assessment Outcome: Not Credible] Yes / No, but with low risk of bias → [Assessment Outcome: Credible] 2.1 Data values completeness Step 1: Core Element Check Question: Are all core elements (VOC name, numerical value, unit) present? Decision: No → [Assessment Outcome: Core Incomplete] Yes → Proceed to Step 2. Step 2: Attribute Information Check Question: Does the data have clear attribute information? (e.g., diabetic patients, healthy control group) Decision: No → [Assessment Outcome: Attribute Missing] Yes → [Assessment Outcome: Complete] 2.2 Semantic consistency Step 1: Source Verification Question: Does the data originate from human breath analysis? Decision: &p;No (e.g., from cell/animal experiments) → [Assessment Outcome: Semantically Irrelevant] Yes → Proceed to Step 2. Step 2: Condition Verification Question: Is the data explicitly from diabetic patients or a relevant control group? Decision: No → [Assessment Outcome: Semantically Irrelevant] Yes → Proceed to Step 3. Step 3: VOC Type Verification Question: Is the data for the semantically specified VOCs? Decision: No → [Assessment Outcome: Semantically Irrelevant] Yes → [Assessment Outcome: Semantically Consistent] 2.3 Data format consistency Step 1: Unit Consistency Assessment Question: Are all units natively uniform or mutually convertible? Decision: No → [Assessment Outcome: Inconsistent Format (Units cannot be unified)] Yes → Proceed to Step 2. Step 2: Statistical Representation Consistency Assessment Question: Are there major inconsistencies in statistical representation? Decision: Yes → [Assessment Outcome: Inconsistent Format (Major conflict in statistical representation)] No → [Assessment Outcome: Consistent Format] 3.1 Data accuracy assurance Step 1: Clarity of Analytical Method Assessment Question: Does the article specify the analytical method used? Decision: No (e.g., not mentioned or vaguely described) → [Assessment Outcome: Not Credible] Yes (e.g., GC-MS, SIFT-MS) → [Assessment Outcome: Credible] 3.2 Traceability of data values Step 1: Sample Size Threshold Question: Is a clear sample size (N) reported for the data? Decision: No → [Assessment Outcome: Not Credible] Yes → Proceed to Step 2. Step 2: Contextual Threshold Question: Is the context from which the data was extracted clear and unambiguous? Decision: No (e.g., ambiguous context) → [Assessment Outcome: Not Credible] Yes (e.g., from a numbered figure/table) → [Assessment Outcome: Credible] 4.1 Data accuracy range Presence of Uncertainty Information Check Question: Is any measure of dispersion provided for the data points? Decision: No (no dispersion information provided) → [Assessment Outcome: Filtered Out (Missing dispersion information)] Yes (any of SD, SEM, Range, etc., is provided) → [Rating: Pass] 4.2 Values credibility Step 1: Identification and Filtering of Potential Outliers Procedure: Compare the quantitative data reported in the literature against known physiological ranges. Criteria for Outliers: Scenario 1: Value far exceeds the accepted maximum. Scenario 2: Value is far below the accepted minimum. Scenario 3: Obvious statistical/typographical error. Action: Data points identified as outliers are filtered out. Step 2: Assessment of Statistical Trend Strength Objective: To evaluate differentiation between the patient and control groups. Rating Criteria: p < 0.05 → [Rating: Credible (Strong statistical support)] 0.05 ≤ p < 0.10 → [Rating: Credible (Suggests a statistical trend)] p ≥ 0.10 → [Rating: Not Credible (Lacks statistical support)] Process: Obtain or estimate the p-value according to the following priority. 1. Direct Path: Use the p-value if directly reported. 2. Estimation Path: If not reported, estimate as follows: 2a. Obtain Measure of Central Tendency 2b. Obtain Measure of Dispersion 2c. P-value Estimation and Rating (via two-sample t-test) 4.3 Semantic data accuracy Comprehensive Evaluation Case Study: Assessing the Literature on Breath Acetone To specifically validate the practical value and analytical power of our evaluation framework, we selected breath acetone as the target analyte for this case study. This choice was based on several key reasons. First, acetone is the most historic, extensively studied, and widely recognized biomarker in the field of breath analysis for diabetes. Second, precisely because of the large volume of research, the conclusions regarding its quantitative results in the existing literature are fraught with contradictions and inconsistencies. This provides an ideal scenario to test the effectiveness of our framework in filtering heterogeneous data and clarifying existing controversies. Finally, focusing the framework's initial application on a single, crucial VOC allows for a clearer and more in-depth demonstration of its evaluation process and analytical logic. Therefore, this section will detail the process and findings of applying our framework to systematically assess the quality and fitness-for-purpose of quantitative acetone data from the literature. The semantic context for this assessment was defined as: "the concentration (in ppbV) of acetone in the breath test results of individuals with diabetes." Literature Screening and Sample Establishment The systematic literature screening and selection process was conducted according to the methodology detailed in the Methods section. The initial search across the four specified databases yielded a total of 712 records. After the removal of duplicates, 330 unique articles remained for screening. These articles were then subjected to our two-step AI-assisted screening followed by a rigorous manual verification process. Through this multi-stage funnel, studies were excluded primarily for reasons such as not reporting primary quantitative data or not including a relevant diabetic cohort. Ultimately, this process resulted in a final sample of 38 1 studies that met all inclusion criteria (a full list is provided in Supplementary File 1 under the heading "References for Supplementary Material"). These articles constitute the literature pool for the subsequent NIM-DBA quality assessment and the case study analysis. A complete visual breakdown of the study selection process, detailing the number of records at each stage and the specific reasons for exclusion, is presented in the PRISMA flow diagram in Fig. 3 . Fitness-for-Purpose Screening with the NIM-DBA Framework Figure 4 illustrates the specific number and proportion of articles that passed or were screened out at each step of the NIM-DBA assessment for the 38 studies included via the PRISMA process. The detailed results of this framework evaluation are available in Table S5. As the literature search was conducted across four authoritative academic databases, all 38 studies passed the initial evaluation domains of "Source Credibility" (Criterion 1.1) and "Data Values Completeness" (Criterion 2.1). In the subsequent assessment steps, non-compliant articles were progressively screened out. In the "Semantic Consistency" (Criterion 2.2) assessment, nine articles were excluded because their data did not explicitly originate from acetone levels in human breath tests related to diabetes. Next, in the "Data Format Consistency" (Criterion 2.3) assessment of the remaining 29 articles, another nine were filtered out due to severe inconsistencies or an inability to unify their statistical representations or units. The 20 articles that passed the format consistency assessment all passed the evaluations for "Data Accuracy Assurance" (Criterion 3.1) and "Traceability of Data Values" (Criterion 3.2). In the "Data Accuracy Range" (Criterion 4.1) step, one article was filtered out for lacking a data range. For "Values Credibility" (Criterion 4.2), 11 articles were filtered out because a statistical analysis of their quantitative data showed no significant difference between the diabetic and control groups (a lenient threshold of p < 0.10 was used as the screening criterion here). In the final "Comprehensive Assessment" (Criterion 4.3) stage, we conducted a deeper, holistic evaluation of the remaining studies, excluding two representative articles at this point. One of our data mining principles is to capture data from all diabetes-related contexts. From this perspective, Study #8, which was not primarily a diabetes study, passed the preceding steps, demonstrating the framework's efficacy in capturing latent data from non-directly related research. However, a deeper analysis revealed that Study #8 included three sample groups. Although a standalone comparison of its diabetic and control groups showed a statistical difference, the Kruskal-Wallis test performed by the original authors across all three groups was not significant. Prioritizing the overall multi-group test, we concluded that the statistical distinction of its data was not robust and thus deemed it non-compliant. The other excluded article, Study #22, had an extremely small sample size (N = 4). We believe that quantitative data from studies with low sample sizes can still be of high quality depending on the research objectives. Therefore, the "Comprehensive Assessment" further examined other aspects of Study #22. After determining that critical methodological information such as sample age, medication status, and sampling conditions were all missing, we ultimately concluded that Study #22 did not represent a high-quality study suitable for secondary analysis and it was therefore excluded. Following this series of rigorous, multi-stage screening procedures, only six articles fully passed all assessments of the framework. Their quantitative data were considered to be of high quality and high fitness-for-purpose, forming the basis for our subsequent cross-study comparative analysis. The Landscape of Bias Risk in Clinical Study Design As shown in Fig. 5 , our QUADAS-2 assessment of all 38 articles reveals a complex situation of the clinical study design quality in this field. This evaluation is based on a classic hypothesis-driven paradigm, which aims to judge whether a study's clinical conclusions are subject to bias. The assessment reveals two noteworthy systemic issues. First, in the "Risk of Bias" assessment (left panel), the most prominent problem lies in the "Patient Selection" domain, where a high proportion (79%) of studies were rated as high risk. This primarily reflects a prevalent reliance on case-control designs, which, while convenient to implement, are widely known to potentially overestimate diagnostic accuracy. Second, the assessment revealed a widespread lack of reporting transparency. In the "Index Test" domain, a high percentage (82%) of studies were rated as "Unclear" due to missing information, making it impossible to determine if their testing process and result interpretation were biased. This figure clearly demonstrates that from the traditional QUADAS-2 perspective, which is oriented towards validating clinical hypotheses, the current literature pool shows a high prevalence of design flaws and reporting deficiencies that could render their "clinical conclusions" unreliable. Common Methodological Features of High-Fitness-for-Purpose Studies To systematically investigate the core elements that constitute a "high fitness-for-purpose" study, we conducted a detailed comparative analysis of the methodological features of the six articles that ultimately passed our evaluation framework (Goerl et al. [ 56 ], Ghimenti et al. [ 57 ], Chien et al. [ 58 ], Lekha and Suchetha [ 59 ], Sha et al. [ 60 ], and Li et al. [ 61 ]). The specific details of this analysis are provided in Supplementary File 2 (Sheet 3), and the quantitative data distributions are visualized in Fig. 6 . A core screening criterion of our framework is "Values Credibility" (Criterion 4.2), which requires the study's data to show some potential for statistical distinction between diabetic and control groups (using a lenient threshold of p < 0.10). Therefore, the purpose of this section is not simply to reiterate that "these studies all show a difference," but rather to uncover whether a common and reliable biological pattern exists within the studies that passed our rigorous methodological screening and demonstrated this statistical trend. Among these high-quality studies, the analysis reveals a clear pattern: when the data possess statistical differentiability, the trend consistently indicates that breath acetone concentration is higher in diabetic patients than in healthy controls. This pattern was observed across studies employing different technological pathways; for example, both Li et al. [ 61 ], which used the gold-standard GC-MS, and Chien et al. [ 58 ], which used a novel biosensor, reported significantly elevated acetone levels in the diabetic group. Notably, the inclusion of Goerl et al. [ 56 ] clearly demonstrates the objectivity and depth of our evaluation framework. This study also passed the majority of our methodological screening criteria, yet its conclusion was unique: the acetone concentration in the diabetic group had lower variability, not a higher mean level. Our framework did not exclude this study for its non-conforming conclusion; instead, it prompted us to trace back its specific methodology. The analysis revealed that the study subjects in Goerl et al. [ 56 ] were end-stage renal disease (ESRD) patients on hemodialysis, a special population whose metabolic and clearance mechanisms are distinctly different from those of typical diabetic patients. Therefore, the "exception" of Goerl et al. [ 56 ] does not weaken our overall finding. On the contrary, it demonstrates that our framework can identify methodologically rigorous studies, and that when conclusions differ, the discrepancy can be rationally explained by the unique study design—in this case, the choice of population. A further analysis of the commonalities among these high-quality studies reveals a shared set of core methodological principles. In terms of analytical techniques, Gas Chromatography-Mass Spectrometry (GC-MS) with complementary sample pre-processing (such as SPME or TD) represents one reliable pathway to obtaining high-quality quantitative data (Goerl et al. [ 56 ], Ghimenti et al. [ 57 ], Li et al. [ 61 ]). Concurrently, the other three studies (Chien et al. [ 58 ], Lekha and Suchetha [ 59 ], Sha et al. [ 60 ]) collectively highlight that novel sensor technologies are a critical development direction for achieving rapid, non-invasive detection. On the crucial front of sample collection, the precise collection of alveolar gas and strict control of pre-sampling conditions such as fasting were identified as key prerequisites for ensuring data credibility. In summary, our framework successfully filtered out a high-quality subset of studies. Within this subset, we not only revealed a highly consistent biological pattern regarding breath acetone in diabetes but, more importantly, we demonstrated that a study's methodological design is the fundamental factor determining the reliability and fitness-for-purpose of its conclusions. The "best practice" principles identified through this process can provide invaluable guidance for the design of future research in this field. Discussion The core objective of this study was to address the issue of contradictory conclusions in the secondary analysis of literature on breath VOCs for diabetes, a problem arising from the methodological heterogeneity of primary studies. To this end, we proposed and validated a novel data quality and fitness-for-purpose evaluation framework rooted in metrological science. Our results clearly reveal the necessity and effectiveness of applying this framework: of the 38 relevant articles initially identified, only six fully met our established quality and fitness-for-purpose standards. This stark disparity is a core finding in itself, quantitatively confirming the significant methodological heterogeneity in the design, execution, and reporting of current research in the breath analysis field. This heterogeneity is arguably the fundamental reason for the long-standing lack of clinical consensus. Therefore, the contribution of this study is not simply to conclude on the efficacy of a specific biomarker like acetone, but to provide a methodological tool that can systematically resolve such controversies. The value of this framework is threefold. First, it transforms the subjective question of "Is the data reliable?" into a series of objective, quantifiable evaluation criteria, providing an operational standard for data inclusion and exclusion in secondary analyses. Second, by enabling a comparative analysis of the high-quality studies that pass the assessment, the framework systematically reveals the common methodological principles—the "best practices"—that underpin reliable conclusions. Finally, its evaluation criteria can serve as a prospective design guide for future researchers, helping to improve the data quality and comparability of the entire field from the source. Two Paradigms, Two Landscapes: A Comparison with Classic Clinical Evaluation The most significant finding of this study is the emergence of two vastly different quality landscapes when the same literature pool was assessed by the data-driven NIM-DBA framework versus the classic hypothesis-driven QUADAS-2 tool. The QUADAS-2 assessment revealed widespread methodological issues: a literature pool rife with design flaws and reporting deficiencies that could render their clinical conclusions unreliable (Fig. 5 ). In stark contrast, the NIM-DBA framework successfully identified a "core subset" of six studies with high fitness-for-purpose from within the same pool (Fig. 4 ). This significant divergence visually substantiates our core thesis: a framework designed to assess the credibility of a "study's conclusion" (QUADAS-2) operates on a completely different logic from a framework designed to assess the usability of the "data itself" (NIM-DBA). QUADAS-2 answers the question, "Can we trust the authors' conclusions?" whereas NIM-DBA answers, "Can we confidently use the authors' data for secondary analysis?" In the current era, where secondary research and data reuse are increasingly important, the latter question is of growing significance. The detailed, study-by-study assessment data for these two landscapes are presented in Supplementary Table S5 (NIM-DBA) and Table S6 (QUADAS-2), providing direct evidentiary support for the arguments in this section and constituting the fundamental value of our proposed data-driven paradigm. Methodological Rigor as the Cornerstone of Conclusion Consistency While our framework does not attempt to define a universal "gold standard," it successfully filtered out a high-quality subset of studies, revealing an important pattern: the studies that provide the most reliable and fit-for-purpose data tend to share a similar set of more rigorous core methodological principles. Our analysis found that when studies strictly adhered to certain "best practices"—such as the precise collection of alveolar gas, pre-sampling preparations like subject fasting, and the use of high-sensitivity analytical techniques like GC-MS—their conclusions regarding elevated breath acetone in diabetic patients were highly consistent. This finding suggests that the prevalent "contradictory" conclusions in the literature likely stem not from the intrinsic instability of the biomarker itself, but from the vast differences in methodological rigor across studies. The evaluation framework proposed herein provides an objective yardstick to identify and quantify this degree of rigor. Beyond Quality Judgment: The Critical Role of "Fitness-for-Purpose" The unique value of this framework lies in its transcendence of a simple binary "good/bad" quality judgment, introducing instead the core concept of "Fitness-for-Purpose." This was fully embodied in our analysis of Goerl et al. [ 56 ]. That study was methodologically rigorous, but the specificity of its study population (end-stage renal disease patients) meant its data were not fit for the purpose of directly answering the question about acetone levels in the general diabetic population. Our framework did not incorrectly label it as "low quality" but accurately identified the boundaries of its applicability. This means the framework is not only a retrospective evaluation tool but also a prospective "guide for assessing research applicability." It helps secondary data analysts (e.g., meta-analysts) to quickly screen a large body of literature for datasets that are not only of high quality but whose study design also matches their target semantics, thereby avoiding erroneous inferences caused by mismatched designs and greatly enhancing the reliability of secondary research. Likewise, the concept of "fitness-for-purpose" must be applied to the evaluation of different technological pathways, especially the rapidly evolving field of novel sensor technologies. Among the six high-quality studies we identified, three (Chien et al. [ 58 ], Lekha and Suchetha [ 59 ], and Sha et al. [ 60 ]) employed novel non-GC-MS sensor technologies, which undoubtedly represent a critical future direction for achieving real-time, non-invasive detection. Our framework affirmed the methodological rigor of these studies' designs. However, from the perspective of secondary quantitative analysis, the fitness-for-purpose of data from these emerging technologies presents new challenges. Compared to gold-standard methods like GC-MS, the quantitative accuracy, specificity (i.e., susceptibility to interference from other gases), and transparency of calibration methods are key to assessing the applicability of sensor data, and this information is not always fully reported. For instance, some studies may focus more on reporting the accuracy of classification models (as in Lha and Suchetha [ 59 ]) rather than providing standardized concentration values with uncertainty information that are suitable for direct comparison. Therefore, the value of the NIM-DBA framework here lies not only in identifying rigorous sensor-based studies but also in helping secondary analysts judge whether the data produced by these studies are fit, in both form and precision, for their own quantitative meta-analysis objectives. Implications for Future Research: A Dual-Purpose Guide The framework and findings of this study can serve as a dual-purpose guide for future research. For secondary researchers, it provides an operational, retrospective screening tool. For primary researchers, the "best practice" principles distilled from this study can serve as a prospective design guide. It informs future investigators that to ensure their research data—whether from traditional GC-MS or novel sensors—can be widely and reliably reused and compared by the academic community, they should strive from the outset to adhere to the core principles revealed by this framework (e.g., clear population definitions, standardized sampling procedures, and transparent data reporting). This will help to elevate the research quality of the entire breath analysis field and accelerate the translation from laboratory discoveries to clinical applications. Strengths and Limitations The primary strength of this study lies in its methodological innovation: it is the first to combine the stringent requirements for standard reference data from the field of metrology with international ISO data quality standards for the systematic evaluation of secondary literature. Second, through a direct comparison with the classic QUADAS-2 framework, this study is the first to empirically and clearly reveal the fundamental differences and complementary value of the data-driven versus hypothesis-driven evaluation paradigms. Furthermore, our "AI-assisted, manual verification" workflow offers an efficient and rigorous paradigm for handling large-scale literature reviews. However, this study also has limitations. First, our case study focused solely on a single biomarker, breath acetone, and the framework's applicability to other VOCs awaits further validation. Second, some aspects of the data evaluation, such as the estimation of p-values, still relied on statistical assumptions and cannot fully replace an analysis of the original raw data. Third, and most importantly, although our framework successfully identified a high-quality subset of studies, the number of articles (n = 6) is insufficient to draw a definitive conclusion on the core clinical question of whether breath acetone is a reliable biomarker for diabetes. This reflects, on one hand, the absolute scarcity of high-quality studies in the existing literature, and on the other, suggests that the framework's screening criteria may have room for further optimization to include more valuable data while maintaining rigor. Conclusion The core objective of this study was not to re-validate the efficacy of breath acetone as a biomarker for diabetes, but rather to answer a deeper methodological question: among the numerous published studies, what are the common design and execution elements that enable some to yield statistically significant positive conclusions? By applying our custom-built framework, which is grounded in metrological principles, we successfully filtered a heterogeneous pool of 38 articles down to a high-quality, high-fitness-for-purpose subset of just six studies. An in-depth analysis of this subset suggests that the observation of a significant difference between diabetic and control groups in these studies is likely not coincidental, but rather stems from their high degree of convergence on key methodological principles. These "best practice" principles—such as the precise collection of alveolar gas, strict control of pre-sampling conditions like fasting, and the use of high-sensitivity analytical techniques like GC-MS—appear to be important prerequisites for ensuring data quality and the reliability of conclusions. Therefore, a key conclusion of this study is that the "contradictions" prevalent in the literature are likely rooted in differences in methodological rigor. Our framework, acting as an effective "filter," demonstrates its value by providing a systematic process to assess and manage the challenges posed by data heterogeneity, thereby identifying those studies that reached reliable conclusions precisely because they adhered to these "best practices." This provides a valuable prospective guide for future research design in the field, emphasizing the important role of standardizing and optimizing research protocols from the outset to accelerate the entire field's translation from the laboratory to clinical application. Looking ahead, the data quality and fitness-for-purpose framework proposed in this study has broad application prospects. On one hand, future work can extend this framework to the evaluation of other diseases and VOC biomarkers. On the other hand, the standardized process of the framework provides a solid theoretical foundation for developing automated AI tools for literature quality assessment. This may improve the efficiency and objectivity, thereby accelerating the progress of the entire breath diagnostics field. Abbreviations AC: Assessment Criterion AI: Artificial Intelligence CMA: China Metrology Accreditation CNAS: China National Accreditation Service for Conformity Assessment EBM: Evidence-Based Medicine e-nose: Electronic Nose ESRD: End-Stage Renal Disease GC-MS: Gas Chromatography-Mass Spectrometry IQR: Interquartile Range ISO: International Organization for Standardization LLM: Large Language Model MeSH: Medical Subject Headings NIM-DBA: National Institute of Metrology - Diabetes Breath Assessment NIM-NMDC: National Institute of Metrology, China - National Center for Metrology Scientific Data and Energy Metrology NR: Not Reported ppbv: Parts Per Billion by Volume PRISMA: Preferred Reporting Items for Systematic Reviews and Meta-Analyses QUADAS-2: Quality Assessment of Diagnostic Accuracy Studies 2 RoB: Risk of Bias RWD: Real-World Data SD: Standard Deviation SE / SEM: Standard Error of the Mean SIFT-MS: Selected Ion Flow Tube Mass Spectrometry SOP: Standard Operating Procedure SPME: Solid-Phase Microextraction SRD: Standard Reference Data T1DM: Type 1 Diabetes Mellitus T2DM: Type 2 Diabetes Mellitus TD: Thermal Desorption VOCs: Volatile Organic Compounds WoS: Web of Science Declarations Ethics approval and consent to participate Not applicable. This study is a systematic review of previously published literature and does not involve any new human participants, data, or tissue. Consent for publication Not applicable. Availability of data and materials All data generated or analyzed during this study are included in this published article and its supplementary information files. Competing interests The authors declare that they have no competing interests. Funding This work was supported by the Science & Technology Fundamental Resources Investigation Program (Grant No. 2022FY101200). Authors' contributions L.G. and X.X. conceptualized the study. L.G. developed the methodology, designed the AI-assisted workflow, performed the data analysis, and wrote the original draft. W.Z. and Y.W. conducted the literature screening and performed the manual data verification and extraction. Z.L. assisted with data visualization and software implementation. X.X. acquired funding, provided supervision, and reviewed and edited the manuscript. All authors read and approved the final manuscript. Acknowledgements The authors would like to acknowledge the National Institute of Metrology, China (NIM), for providing the research platform and resources that made this study possible. We are also sincerely grateful to Dr. Bin Wang and Dr. Heng Zhou for their insightful and inspiring discussions on medicine, artificial intelligence models, and data science, which were instrumental to the development of this work. References Lee B, Lee J-O, Lee J, Park I, Lee D-S. Breath gas sensors for diabetes and lung cancer diagnosis. J Sens Sci Technol. 2023;32:1–9. https://doi.org/10.46670/JSST.2023.32.1.1. Sharma A, Kumar R, Varadwaj P. Smelling the disease: diagnostic potential of breath analysis. Mol Diagn Ther. 2023;27:321–47. https://doi.org/10.1007/s40291-023-00640-7. Drabińska N, Flynn C, Ratcliffe N, Belluomo I, Myridakis A, Gould O, et al. A literature survey of all volatiles from healthy human breath and bodily fluids: the human volatilome. J Breath Res. 2021;15. https://doi.org/10.1088/1752-7163/abf1d0. Mahnoor M, Shah AA, Inam A. Acetone detection using various techniques for diagnosis of diabetes mellitus from human exhaled breath: a review. AIP Conf. Proc. American Institute of Physics; 2024. https://doi.org/10.1063/5.0214527. Haripriya P, Rangarajan M, Pandya HJ. Breath VOC analysis and machine learning approaches for disease screening: a review. J Breath Res. 2023;17. https://doi.org/10.1088/1752-7163/acb283. Dixit K, Fardindoost S, Ravishankara A, Tasnim N, Hoorfar M. Exhaled breath analysis for diabetes diagnosis and monitoring: relevance, challenges and possibilities. Biosensors. 2021;11:476. https://doi.org/10.3390/bios11120476. Miekisch W, Sukul P, Schubert JK. Diagnostic potential of breath analysis – focus on the dynamics of volatile organic compounds. TrAC, Trends Anal Chem. 2024;180. https://doi.org/10.1016/j.trac.2024.117977. Ma P, Li J, Chen Y, Zhou Montano BA, Luo H, Zhang D, et al. Non-invasive exhaled breath diagnostic and monitoring technologies. Microwave Opt Technol Lett. 2023;65:1475–88. https://doi.org/10.1002/mop.33133. Obeidat Y. The most common methods for breath acetone concentration detection: a review. IEEE Sensors J. 2021;21:14540–58. https://doi.org/10.1109/JSEN.2021.3074610. Liu H, Liu W, Sun C, Huang W, Cui X. A review of non-invasive blood glucose monitoring through breath acetone and body surface. Sens Actuators, A. 2024;374. https://doi.org/10.1016/j.sna.2024.115500. Wang W, Zhou W, Wang S, Huang J, Le Y, Nie S, et al. Accuracy of breath test for diabetes mellitus diagnosis: a systematic review and meta-analysis. BMJ Open Diabetes Res Care. 2021;9. https://doi.org/10.1136/bmjdrc-2021-002174. Mathew TL, Pownraj P, Abdulla S, Pullithadathil B. Technologies for clinical diagnosis using expired human breath analysis. Diagnostics. 2015;5:27–60. https://doi.org/10.3390/diagnostics5010027. Fan X, Zhong R, Liang H, Zhong Q, Huang H, He J, et al. Exhaled VOC detection in lung cancer screening: a comprehensive meta-analysis. BMC Cancer. 2024;24:775. https://doi.org/10.1186/s12885-024-12537-7. Scheepers MHMC, Al-Difaie Z, Brandts L, Peeters A, van Grinsven B, Bouvy ND. Diagnostic performance of electronic noses in cancer diagnoses using exhaled breath a systematic review and meta-analysis. JAMA Netw Open. 2022;5:e2219372. https://doi.org/10.1001/jamanetworkopen.2022.19372. Alamilla-Valenzuela A, Erazo-Lema JS, Hernández-Hernández BS, Vega-Escalante B de J, Sarabia-Aguayo VV, Aguirre-Cervantes EL, et al. Exhaled volatile organic compounds: effective in detecting breast cancer? Gac Mex Oncol. 2023;22:122–9. https://doi.org/10.24875/j.gamo.22000099. Marfatia K, Ni J, Preda V, Nasiri N. Is breath best? A systematic review on the accuracy and utility of nanotechnology based breath analysis of ketones in type 1 diabetes. Biosens-Basel. 2025;15:62. https://doi.org/10.3390/bios15010062. Schwabe D, Becker K, Seyferth M, Klaß A, Schaeffter T. The METRIC-framework for assessing data quality for trustworthy AI in medicine: a systematic review. npj Digital Med. 2024;7:1–30. https://doi.org/10.1038/s41746-024-01196-4. Fadahunsi KP, Akinlua JT, O’Connor S, Wark PA, Gallagher J, Carroll C, et al. Protocol for a systematic review and qualitative synthesis of information quality frameworks in eHealth. BMJ OPEN. 2019;9:e024722. https://doi.org/10.1136/bmjopen-2018-024722. Patone M, Zhang L-C. On two existing approaches to statistical analysis of social media data. Int Stat Rev. 2021;89:54–71. https://doi.org/10.1111/insr.12404. Fadahunsi KP, O’Connor S, Akinlua JT, Wark PA, Gallagher J, Carroll C, et al. Information quality frameworks for digital health technologies: systematic review. J Med Internet Res. 2021;23:e23479. https://doi.org/10.2196/23479. Declerck J, Kalra D, Vander Stichele R, Coorevits P. Frameworks, dimensions, definitions of aspects, and assessment methods for the appraisal of quality of health data for secondary use: comprehensive overview of reviews. JMIR Med Inf. 2024;12:e51560. https://doi.org/10.2196/51560. Schmidt CO, Struckmann S, Enzenbach C, Reineke A, Stausberg J, Damerow S, et al. Facilitating harmonized data quality assessments. A data quality framework for observational health research data collections with software implementations in R. BMC Med Res Methodol. 2021;21:63. https://doi.org/10.1186/s12874-021-01252-7. Zhang L, Jeong D, Lee S. Data quality management in the internet of things. Sens. 2021;21:5834. https://doi.org/10.3390/s21175834. Ijab MT, Surin ESM, Nayan NM. Conceptualizing big data quality framework from a systematic literature review perspective. Malays J Comput Sci. 2019;:25–37. https://doi.org/10.22452/mjcs.sp2019no1.2. Daikeler J, Froehling L, Sen I, Birkenmaier L, Gummer T, Schwalbach J, et al. Assessing data quality in the age of digital social research: a systematic review. Social Sci Comput Rev. 2024. https://doi.org/10.1177/08944393241245395. Cichy C, Rass S. An overview of data quality frameworks. IEEE Access. 2019;7:24634–48. https://doi.org/10.1109/ACCESS.2019.2899751. Bhana B, Flowerday S, Satt A. Using participatory crowdsourcing in South Africa to create a safer living environment. Int J Distrib Sens Netw. 2013;:907196. https://doi.org/10.1155/2013/907196. Zou KH, Berger ML. Real-world data and real-world evidence in healthcare in the United States and Europe union. Bioeng-basel. 2024;11:784. https://doi.org/10.3390/bioengineering11080784. Shabani J, Salim N, Bohne C, Day LT, Kumalija C, Makuwani AM, et al. Neonatal indicator data in Tanzania district health information system: evaluation of availability and quality of selected newborn indicators, 2015-2022. BMC Pediatr. 2025;23:658. https://doi.org/10.1186/s12887-025-05417-x. Okwaraji YB, Bradley E, Ohuma EO, Yargawa J, Suarez-Idueta L, Requejo J, et al. National routine data for low birthweight and preterm births: systematic data quality assessment for united nations member states (2000-2020). Bjog-Int J Obstet Gynaecol. 2024;131:917–28. https://doi.org/10.1111/1471-0528.17699. Gyrard A, Abedian S, Gribbon P, Manias G, van Nuland R, Zatloukal K, et al. Lessons learned from european health data projects with cancer use cases: implementation of health standards and internet of things semantic interoperability. J Med Internet Res. 2025;27:e66273. https://doi.org/10.2196/66273. Jin M-J, Li E-M, Xu L-Y. Diagnostic accuracy of breath tests based on volatile organic compounds for cancer: a systematic review and meta-analysis. Clinical Biochemistry. 2025;136:110898. https://doi.org/10.1016/j.clinbiochem.2025.110898. Healy A, Duggan C, Foley B, Flynn R, Huss T. Development of a data quality framework for health and social care - a strategic approach to assess and improve the quality of health data and information in Ireland. J Epidemiol Community Health. 2019;73:A102–A102. https://doi.org/10.1136/jech-2019-SSMabstracts.218. Laberge M, Shachak A. Developing a tool to assess the quality of socio-demographic data in community health centres. Appl Clin Inf. 2013;4:1–11. https://doi.org/10.4338/ACI-2012-10-CR-0041. Comero S, Dalla Costa S, Cusinato A, Korytar P, Kephalopoulos S, Bopp S, et al. A conceptual data quality framework for IPCHEM - the european commission information platform for chemical monitoring. TrAC, Trends Anal Chem. 2020;127:115879. https://doi.org/10.1016/j.trac.2020.115879. Ng MY, Youssef A, Miner AS, Sarellano D, Long J, Larson DB, et al. Perceptions of data set experts on important characteristics of health data sets ready for machine learning. JAMA Netw Open. 2023;6:e2345892. https://doi.org/10.1001/jamanetworkopen.2023.45892. Tute E, Mast M, Wulff A. Targeted data quality analysis for a clinical decision support system for SIRS detection in critically ill pediatric patients. Methods Inf Med. 2023;62:e1–9. https://doi.org/10.1055/s-0042-1760238. Kookal KK, Walji MF, Brandon R, Kivanc F, Mertz E, Kottek A, et al. Systematically assessing the quality of dental electronic health record data for an investigation into oral health care disparities. J Public Health Dent. 2024;84:242–50. https://doi.org/10.1111/jphd.12618. Kuusisto A, Saranto K, Korhonen P, Haavisto E. Quality of information transferred to palliative care. J Clin Nurs. 2023;32:3421–33. https://doi.org/10.1111/jocn.16453. Widad E, Saida E, Gahi Y. Quality anomaly detection using predictive techniques: an extensive big data quality framework for reliable data analysis. IEEE Access. 2023;11:103306–18. https://doi.org/10.1109/ACCESS.2023.3317354. McCord SE, Webb NP, Van Zee JW, Burnett SH, Christensen EM, Courtright EM, et al. Provoking a cultural shift in data quality. Bioscience. 2021;71:647–57. https://doi.org/10.1093/biosci/biab020. Blacketer C, Defalco FJ, Ryan PB, Rijnbeek PR. Increasing trust in real-world evidence through evaluation of observational data quality. J Am Med Inf Assoc. 2021;28:2251–7. https://doi.org/10.1093/jamia/ocab132. Hillert J, Butzkueven H, Magyari M, Wergeland S, Moore N, Soilu-Hanninen M, et al. Harmonized data quality indicators maintain data quality in long-term safety studies using multiple sclerosis registries/data sources: experience from the CLARION study. Clin Epidemiol. 2024;16:717–32. https://doi.org/10.2147/CLEP.S480525. Camilo Corrales D, Ledezma A, Carlos Corrales J. From theory to practice: a data quality framework for classification tasks. Symmetry-basel. 2018;10:248. https://doi.org/10.3390/sym10070248. Jiang G, Dhruva SS, Chen J, Schulz WL, Doshi AA, Noseworthy PA, et al. Feasibility of capturing real-world data from health information technology systems at multiple centers to assess cardiac ablation device outcomes: a fit-for-purpose informatics analysis report. J Am Med Inf Assoc. 2021;28:2241–50. https://doi.org/10.1093/jamia/ocab117. Wirsching J, Graßmann S, Eichelmann F, Harms LM, Schenk M, Barth E, et al. Development and reliability assessment of a new quality appraisal tool for cross-sectional studies using biomarker data (BIOCROSS). BMC Med Res Methodol. 2018;18:122. https://doi.org/10.1186/s12874-018-0583-x. Nguyen T, Nguyen H-T, Nguyen-Hoang T-A. Data quality management in big data: strategies, tools, and educational implications. J Parallel Distrib Comput. 2025;200:105067. https://doi.org/10.1016/j.jpdc.2025.105067. Deady M, Duncan R, Jones LD, Sang A, Goodness B, Pandey A, et al. Data quality and timeliness analysis for post-vaccination adverse event cases reported through healthcare data exchange to FDA BEST pilot platform. Front Public Health. 2024;12:1379973. https://doi.org/10.3389/fpubh.2024.1379973. Naik S, Voong S, Bamford M, Smith K, Joyce A, Grinspun D. Assessment of the nursing quality indicators for reporting and evaluation (NQuIRE) database using a data quality index. J Am Med Inf Assoc. 2020;27:776–82. https://doi.org/10.1093/jamia/ocaa031. Smith M, Lix LM, Azimaee M, Enns JE, Orr J, Hong S, et al. Assessing the quality of administrative data for research: a framework from the manitoba centre for health policy. J Am Med Inf Assoc. 2018;25:224–9. https://doi.org/10.1093/jamia/ocx078. Iverson R, Taljaard M, Geraghty MT, Pugliese M, Tingley K, Coyle D, et al. Assessing the quality and value of metabolic chart data for capturing core outcomes for pediatric medium-chain acyl-CoA dehydrogenase (MCAD) deficiency. BMC Pediatr. 2024;24:37. https://doi.org/10.1186/s12887-023-04393-4. Krishna CM, Ruikar K, Jha KN. Determinants of data quality dimensions for assessing highway infrastructure data using semiotic framework. Buildings. 2023;13:944. https://doi.org/10.3390/buildings13040944. Larburu N, Bults RGA, Van Sinderen MJ, Widya I, Hermens HJ. An ontology for telemedicine systems resiliency to technological context variations in pervasive healthcare. IEEE J Transl Eng Health Med. 2015;3:2900110. https://doi.org/10.1109/JTEHM.2015.2458870. Iso/iec 25012:2008. Iso. https://www.iso.org/standard/35736.html. Accessed 20 Aug 2025. Iso/iec 25024:2015. Iso. https://www.iso.org/standard/35749.html. Accessed 20 Aug 2025. Goerl T, Kischkel S, Sawacki A, Fuchs P, Miekisch W, Schubert JK. Volatile breath biomarkers for patient monitoring during haemodialysis. J Breath Res. 2013;7:17116. https://doi.org/10.1088/1752-7155/7/1/017116. Ghimenti S, Tabucchi S, Lomonaco T, Di Francesco F, Fuoco R, Onor M, et al. Monitoring breath during oral glucose tolerance tests. J Breath Res. 2013;7:17115. https://doi.org/10.1088/1752-7155/7/1/017115. Chien P-J, Suzuki T, Tsujii M, Ye M, Minami I, Toda K, et al. Biochemical gas sensors (biosniffers) using forward and reverse reactions of secondary alcohol dehydrogenase for breath isopropanol and acetone as potential volatile biomarkers of diabetes mellitus. Anal Chem. 2017;89:12261–8. https://doi.org/10.1021/acs.analchem.7b03191. Lekha S, Suchetha MS. Real-time non-invasive detection and classification of diabetes using modified convolution neural network. IEEE J Biomed Health Inform. 2018;22:1630–6. https://doi.org/10.1109/JBHI.2017.2757510. Sha MS, Maurya MR, Shafath S, Cabibihan J-J, Al-Ali A, Malik RA, et al. Breath analysis for the in vivo detection of diabetic ketoacidosis. ACS Omega. 2022;7:4257–66. https://doi.org/10.1021/acsomega.1c05948. Li W, Liu Y, Lu X, Huang Y, Liu Y, Cheng S, et al. A cross-sectional study of breath acetone based on diabetic metabolic disorders. J Breath Res. 2015;9. https://doi.org/10.1088/1752-7155/9/1/016005. Footnotes 1 The number of studies in this final sample (n=38) is coincidentally identical to the 38 representative publications analyzed for the framework's paradigmatic development (see Table 1). These two sets of literature are distinct and should not be confused. 2 Note on the deep screening prompt (Supplementary File 4): The prompt intentionally instructs the AI to 'Include' documents containing either primary (original) or secondary (cited) data. This was a strategic choice. Although studies containing only secondary data were excluded from the final analysis of the present study, identifying them is highly valuable for future work. For instance, these references can be used to trace and discover additional original studies (a process known as citation snowballing). The prompt was therefore designed in its current form to serve this broader research objective. Additional Declarations No competing interests reported. Supplementary Files SupplementaryFile1.docx Additional file 1: Supplementary Tables and Detailed NIM-DBA Assessment Results. This file contains supplementary tables referenced in the main text, including large tables that do not fit in the main manuscript body. It also provides the detailed, item-by-item assessment results for all 38 included studies using the NIM-DBA framework. (Format: .docx) SupplementaryFile2.xlsx Additional file 2: Extracted Quantitative Data and Detailed QUADAS-2 Assessment. This spreadsheet file contains three sheets: (1) the complete set of quantitative breath acetone data extracted from the 38 included studies; (2) the detailed QUADAS-2 assessment results for all 38 studies, including the rationale for each judgment; and (3) a summary of additional methodological information for the six studies that passed the final NIM-DBA evaluation. (Format: .xlsx) SupplementaryFile3AIPromptStep1BroadScreening.txt Additional file 3: AI Prompt for Broad Literature Screening. This text file contains the exact prompt provided to the Large Language Model for the initial, broad screening of the literature to identify potentially relevant articles. (Format: .txt) SupplementaryFile4AIPromptStep2DeepExtraction.txt Additional file 4: AI Prompt for Deep Data Extraction. This text file contains the exact prompt provided to the Large Language Model for the second-step, deep extraction and screening of quantitative data from the shortlisted literature. (Format: .txt) Cite Share Download PDF Status: Under Review Version 1 posted Reviewers invited by journal 25 Sep, 2025 Editor assigned by journal 23 Sep, 2025 Editor invited by journal 28 Aug, 2025 Submission checks completed at journal 27 Aug, 2025 First submitted to journal 27 Aug, 2025 You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-7455257","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":525343461,"identity":"461e75d8-4ac5-4678-88b3-0050eacc80ac","order_by":0,"name":"Lin Guo","email":"","orcid":"","institution":"National Institute of Metrology China","correspondingAuthor":false,"prefix":"","firstName":"Lin","middleName":"","lastName":"Guo","suffix":""},{"id":525343462,"identity":"2577edb3-85ce-475d-b60c-88b3fa661d29","order_by":1,"name":"Wei Zhang","email":"","orcid":"","institution":"National Institute of Metrology China","correspondingAuthor":false,"prefix":"","firstName":"Wei","middleName":"","lastName":"Zhang","suffix":""},{"id":525343465,"identity":"85f21f16-e520-44e6-aebc-d24eae46c261","order_by":2,"name":"Yinchu Wang","email":"","orcid":"","institution":"National Institute of Metrology China","correspondingAuthor":false,"prefix":"","firstName":"Yinchu","middleName":"","lastName":"Wang","suffix":""},{"id":525343466,"identity":"bb792496-d221-4aec-9fed-f17d3b8ab1e2","order_by":3,"name":"Zilong Liu","email":"","orcid":"","institution":"National Institute of Metrology China","correspondingAuthor":false,"prefix":"","firstName":"Zilong","middleName":"","lastName":"Liu","suffix":""},{"id":525343467,"identity":"b132d126-e198-490f-91b5-44694d77b26c","order_by":4,"name":"Xingchuang Xiong","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAAy0lEQVRIiWNgGAWjYDACZgiVwM/A2ECiFskGorVAQYLBAWKVGhxnPvbwa5tdnvHt5tYNDDV2DPyzCdgm2cyWbixzJrnY7M7BthsMx5IZJO4QsI+fmcdMWqLiQOK2G4lALWwHGAwkEvBrYQNrMTiQuHkGSMs/IrSAbJH8ALRlgwRQC2MbEVqAfkmTZjiTnDgD5LDEvmQeiRsEtBicP3xM8mebXWL/jPRnNz58s5Pjn0FACwgw88BYQMU8eBQiAOMPopSNglEwCkbBiAUAQ2ZBGVyviYIAAAAASUVORK5CYII=","orcid":"","institution":"National Institute of Metrology China","correspondingAuthor":true,"prefix":"","firstName":"Xingchuang","middleName":"","lastName":"Xiong","suffix":""}],"badges":[],"createdAt":"2025-08-25 15:38:12","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-7455257/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-7455257/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":93029641,"identity":"a28ae022-9acf-452e-85a5-cdcb69e2caa1","added_by":"auto","created_at":"2025-10-08 09:59:47","extension":"docx","order_by":0,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":889800,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.docx","url":"https://assets-eu.researchsquare.com/files/rs-7455257/v1/1dc5dc3463fe4c36b56927a3.docx"},{"id":93029644,"identity":"9be22a8f-9539-404f-8e07-ec484dc7f837","added_by":"auto","created_at":"2025-10-08 09:59:47","extension":"json","order_by":1,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":8666,"visible":true,"origin":"","legend":"","description":"","filename":"138a95f5e64e473dac89ef9f2bccde6c.json","url":"https://assets-eu.researchsquare.com/files/rs-7455257/v1/e319c51762f24907b600e6f7.json"},{"id":93029681,"identity":"ee70a0e1-cf00-45c1-994b-bc8f9cb85ba7","added_by":"auto","created_at":"2025-10-08 09:59:50","extension":"txt","order_by":2,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":4612,"visible":true,"origin":"","legend":"","description":"","filename":"SupplementaryFile3AIPromptStep1BroadScreening.txt","url":"https://assets-eu.researchsquare.com/files/rs-7455257/v1/e397901dd858abbf74b0370e.txt"},{"id":93030268,"identity":"0a5ca9cd-2ec8-4459-8ab4-e6a4824853bb","added_by":"auto","created_at":"2025-10-08 10:07:49","extension":"txt","order_by":3,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":8753,"visible":true,"origin":"","legend":"","description":"","filename":"SupplementaryFile4AIPromptStep2DeepExtraction.txt","url":"https://assets-eu.researchsquare.com/files/rs-7455257/v1/fc55b81d65c7eba4a0f87914.txt"},{"id":93030266,"identity":"65c3394c-44af-40c6-ac8d-f3e32efba4a7","added_by":"auto","created_at":"2025-10-08 10:07:48","extension":"docx","order_by":4,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":43845,"visible":true,"origin":"","legend":"","description":"","filename":"SupplementaryFile1.docx","url":"https://assets-eu.researchsquare.com/files/rs-7455257/v1/c6cd4bb9700336d345a09ef2.docx"},{"id":93029680,"identity":"26ccb861-709b-4070-bcf9-d8d809b282d7","added_by":"auto","created_at":"2025-10-08 09:59:50","extension":"xlsx","order_by":5,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":116449,"visible":true,"origin":"","legend":"","description":"","filename":"SupplementaryFile2.xlsx","url":"https://assets-eu.researchsquare.com/files/rs-7455257/v1/e922a9ce6998beafd0cf56e0.xlsx"},{"id":93029650,"identity":"cd55fabd-944a-45b5-a613-890d1983865f","added_by":"auto","created_at":"2025-10-08 09:59:48","extension":"xml","order_by":6,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":194586,"visible":true,"origin":"","legend":"","description":"","filename":"138a95f5e64e473dac89ef9f2bccde6c1enriched.xml","url":"https://assets-eu.researchsquare.com/files/rs-7455257/v1/a365bd9032296e8be8bad7ee.xml"},{"id":93030262,"identity":"1017b158-de10-46ce-ba80-1302af050ce6","added_by":"auto","created_at":"2025-10-08 10:07:48","extension":"png","order_by":7,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":204668,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage1.png","url":"https://assets-eu.researchsquare.com/files/rs-7455257/v1/58a926b5cd0fca541b818e00.png"},{"id":93030261,"identity":"a8b634b5-c712-47f0-b793-cfcd7eacf7f6","added_by":"auto","created_at":"2025-10-08 10:07:47","extension":"png","order_by":8,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":225287,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage2.png","url":"https://assets-eu.researchsquare.com/files/rs-7455257/v1/c8c44553df9bd26c48ef4a07.png"},{"id":93029663,"identity":"0bb4c8d3-3f6c-41fa-8972-12c409774377","added_by":"auto","created_at":"2025-10-08 09:59:48","extension":"png","order_by":9,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":139298,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage3.png","url":"https://assets-eu.researchsquare.com/files/rs-7455257/v1/c9ac0172c826bcd55fdf748c.png"},{"id":93029656,"identity":"2f810395-bd2d-4cdc-96ac-30aec25e0887","added_by":"auto","created_at":"2025-10-08 09:59:48","extension":"png","order_by":10,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":58311,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage4.png","url":"https://assets-eu.researchsquare.com/files/rs-7455257/v1/7f3064ae759060b798c82682.png"},{"id":93030263,"identity":"260fd02d-b3f0-405c-bee7-5e9c2c41ee33","added_by":"auto","created_at":"2025-10-08 10:07:48","extension":"png","order_by":11,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":83010,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage5.png","url":"https://assets-eu.researchsquare.com/files/rs-7455257/v1/7226ddd565c617c724447236.png"},{"id":93029671,"identity":"2e36c98b-854b-4343-84d8-e38c07255ea8","added_by":"auto","created_at":"2025-10-08 09:59:48","extension":"png","order_by":12,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":107570,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage6.png","url":"https://assets-eu.researchsquare.com/files/rs-7455257/v1/0637169b6754b89e138a9a92.png"},{"id":93029653,"identity":"da5d3801-c504-4e6d-ab86-e276df856f96","added_by":"auto","created_at":"2025-10-08 09:59:48","extension":"png","order_by":13,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":74172,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage1.png","url":"https://assets-eu.researchsquare.com/files/rs-7455257/v1/c5a69d019984be23fa025358.png"},{"id":93029669,"identity":"2ec77833-7671-44e9-8718-7e3dcefbc790","added_by":"auto","created_at":"2025-10-08 09:59:48","extension":"png","order_by":14,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":142502,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage2.png","url":"https://assets-eu.researchsquare.com/files/rs-7455257/v1/273b61402f2ae8d1bcdfb136.png"},{"id":93029674,"identity":"b16690c4-d271-4f49-b667-11bcbce9db12","added_by":"auto","created_at":"2025-10-08 09:59:49","extension":"png","order_by":15,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":90011,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage3.png","url":"https://assets-eu.researchsquare.com/files/rs-7455257/v1/8f6b1c6727fe3916353214a1.png"},{"id":93029665,"identity":"819a84af-0957-4ac5-bcf4-5ff300495d77","added_by":"auto","created_at":"2025-10-08 09:59:48","extension":"png","order_by":16,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":21378,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage4.png","url":"https://assets-eu.researchsquare.com/files/rs-7455257/v1/ee37b6de130e76e03cedcba2.png"},{"id":93029673,"identity":"3ea766d9-c5b7-4a8d-aabc-fe6f4ef2521d","added_by":"auto","created_at":"2025-10-08 09:59:49","extension":"png","order_by":17,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":28554,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage5.png","url":"https://assets-eu.researchsquare.com/files/rs-7455257/v1/47ecee57adeb1ce61eb1c888.png"},{"id":93029648,"identity":"202f3045-eae3-475a-9fad-39ceb90d14bf","added_by":"auto","created_at":"2025-10-08 09:59:48","extension":"png","order_by":18,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":33549,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage6.png","url":"https://assets-eu.researchsquare.com/files/rs-7455257/v1/c0cfacff9fb818d183fae5cd.png"},{"id":93029675,"identity":"97e0e0c6-62b3-4f6d-99dc-addfd7973b3a","added_by":"auto","created_at":"2025-10-08 09:59:49","extension":"xml","order_by":19,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":193130,"visible":true,"origin":"","legend":"","description":"","filename":"138a95f5e64e473dac89ef9f2bccde6c1structuring.xml","url":"https://assets-eu.researchsquare.com/files/rs-7455257/v1/aa5263a9ceeb51ad5a38f94a.xml"},{"id":93029678,"identity":"91e1d6f9-4a71-43be-bdaf-5f4176b78325","added_by":"auto","created_at":"2025-10-08 09:59:49","extension":"html","order_by":20,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":206711,"visible":true,"origin":"","legend":"","description":"","filename":"earlyproof.html","url":"https://assets-eu.researchsquare.com/files/rs-7455257/v1/3d9f8b7d61978a279ec1ced1.html"},{"id":93029642,"identity":"24a77744-821c-4df3-bae7-bd988a8801aa","added_by":"auto","created_at":"2025-10-08 09:59:47","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":240977,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eEvaluation Framework for Research on VOCs in Human Breath Tests for Diabetes. \u003c/strong\u003eThe figure illustrates a three-stage systematic workflow. The process begins with (1) Data Acquisition, following standard literature review protocols. The extracted data then enters (2) the core NIM-DBA Data Assessment Process, a four-domain evaluation of source credibility, compatibility, data generation process, and data value accuracy. Finally, the filtered, high-quality data are used for (3) the Application of Assessment Results to synthesize conclusions and identify best practices. Abbreviations: NIM-DBA, National Institute of Metrology's Data assessment framework for Biomarker Analysis; AC, Assessment Criterion.\u003c/p\u003e","description":"","filename":"1.png","url":"https://assets-eu.researchsquare.com/files/rs-7455257/v1/f64ba0877ef86a2e33da514a.png"},{"id":93029643,"identity":"ede4bf29-3d36-4f71-b559-8fa1c894cb11","added_by":"auto","created_at":"2025-10-08 09:59:47","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":143561,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eConceptual mapping of the NIM-DBA framework's derivation.\u003c/strong\u003e The figure illustrates the systematic process of adapting the \u003cem\u003eTechnical Specification for Standard Reference Data Review\u003c/em\u003e into the final NIM-DBA framework. Key attributes from the source specification were selected (green checks) and aligned with general ISO data quality attributes. These were then specialized into specific, actionable criteria for VOCs research, guided by standards like ISO/IEC 25024, and finally organized into the four core assessment domains of the NIM-DBA framework.\u003c/p\u003e","description":"","filename":"2.png","url":"https://assets-eu.researchsquare.com/files/rs-7455257/v1/c76b37ea4dd6fcf9f398fb78.png"},{"id":93029645,"identity":"eb8e0935-bb10-41d2-a30e-8110e1b2c3e8","added_by":"auto","created_at":"2025-10-08 09:59:47","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":73906,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003ePRISMA 2020 flow diagram for the study selection process.\u003c/strong\u003e The diagram illustrates the flow of information through the different phases of the systematic review. It details the number of records identified from databases, the number of duplicates removed, and the number of reports screened and assessed for eligibility. The reasons for exclusion at each stage are provided, leading to the final set of 38 studies included in the qualitative synthesis and quality assessment.\u003c/p\u003e","description":"","filename":"3.png","url":"https://assets-eu.researchsquare.com/files/rs-7455257/v1/90b0697017b27d90e7f65123.png"},{"id":93029668,"identity":"de5b115b-4d96-48b2-8b9d-ff47ce36d5a0","added_by":"auto","created_at":"2025-10-08 09:59:48","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":56471,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eStep-wise literature screening using the NIM-DBA framework.\u003c/strong\u003e The stacked horizontal bar chart illustrates the sequential screening process applied to the initial pool of 38 studies. Each bar corresponds to a specific assessment criterion within the NIM-DBA framework. The green portion of a bar represents the number of studies that passed that step and proceeded to the next, while the red portion represents the number of studies that were excluded at that stage. The funneling process shows that after the full, nine-step evaluation, a final subset of six studies was deemed to have high fitness-for-purpose.\u003c/p\u003e","description":"","filename":"4.png","url":"https://assets-eu.researchsquare.com/files/rs-7455257/v1/866c395cab30b79d7c62b048.png"},{"id":93030267,"identity":"85c8ec0b-fb37-422c-b256-1e3257582195","added_by":"auto","created_at":"2025-10-08 10:07:49","extension":"png","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":61204,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eSummary of risk of bias and applicability concerns based on the QUADAS-2 assessment.\u003c/strong\u003e This figure summarizes the quality assessment results for all 38 studies using the QUADAS-2 tool. The left panel illustrates the risk of bias assessment across four domains, and the right panel illustrates the applicability concerns across three domains. Each stacked bar shows the proportion of studies judged to have low (green), high (red), or unclear (yellow) risk or concern for that domain. A detailed study-by-study assessment is available in Supplementary File 2 (Sheet 2).\u003c/p\u003e","description":"","filename":"5.png","url":"https://assets-eu.researchsquare.com/files/rs-7455257/v1/5ef285889f7544d8fda03f9e.png"},{"id":93029667,"identity":"5e48f6e5-9c48-4dca-9278-951f7a90effa","added_by":"auto","created_at":"2025-10-08 09:59:48","extension":"png","order_by":6,"title":"Figure 6","display":"","copyAsset":false,"role":"figure","size":97059,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eDistribution of quantitative breath acetone data from the six high-fitness-for-purpose studies. \u003c/strong\u003eThe panel plots display the reported breath acetone concentrations (in ppbv) for diabetic and control groups from the six studies that passed all NIM-DBA evaluation criteria. Data are presented as reported in the original studies, using various statistical metrics (mean, median) and error bars (SD, SE, range, IQR) as indicated in the legend.\u003c/p\u003e","description":"","filename":"6.png","url":"https://assets-eu.researchsquare.com/files/rs-7455257/v1/e81c2e897deecf6f713320b8.png"},{"id":93031571,"identity":"c536dea4-f314-45a2-966d-17e0cdb6b68b","added_by":"auto","created_at":"2025-10-08 10:15:48","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":2181762,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-7455257/v1/ed47ba61-1bc8-4562-bd8f-5135c3770ad4.pdf"},{"id":93029679,"identity":"6568d9a5-957a-456f-b394-5f8c4168c1fb","added_by":"auto","created_at":"2025-10-08 09:59:50","extension":"docx","order_by":1,"title":"","display":"","copyAsset":false,"role":"supplement","size":43845,"visible":true,"origin":"","legend":"\u003cp\u003eAdditional file 1: Supplementary Tables and Detailed NIM-DBA Assessment Results.\u003c/p\u003e\n\u003cp\u003eThis file contains supplementary tables referenced in the main text, including large tables that do not fit in the main manuscript body. It also provides the detailed, item-by-item assessment results for all 38 included studies using the NIM-DBA framework.\u003c/p\u003e\n\u003cp\u003e(Format: .docx)\u003c/p\u003e","description":"","filename":"SupplementaryFile1.docx","url":"https://assets-eu.researchsquare.com/files/rs-7455257/v1/2b115398edf078a396dd9fe1.docx"},{"id":93029658,"identity":"bdd4cb50-1fca-4aad-9d16-8edc511fdce8","added_by":"auto","created_at":"2025-10-08 09:59:48","extension":"xlsx","order_by":2,"title":"","display":"","copyAsset":false,"role":"supplement","size":116449,"visible":true,"origin":"","legend":"\u003cp\u003eAdditional file 2: Extracted Quantitative Data and Detailed QUADAS-2 Assessment.\u003c/p\u003e\n\u003cp\u003eThis spreadsheet file contains three sheets: (1) the complete set of quantitative breath acetone data extracted from the 38 included studies; (2) the detailed QUADAS-2 assessment results for all 38 studies, including the rationale for each judgment; and (3) a summary of additional methodological information for the six studies that passed the final NIM-DBA evaluation.\u003c/p\u003e\n\u003cp\u003e(Format: .xlsx)\u003c/p\u003e","description":"","filename":"SupplementaryFile2.xlsx","url":"https://assets-eu.researchsquare.com/files/rs-7455257/v1/e5cb03e08d6596a0421666bb.xlsx"},{"id":93029647,"identity":"049552e8-dc00-49f9-854e-2624b94f7a4f","added_by":"auto","created_at":"2025-10-08 09:59:48","extension":"txt","order_by":3,"title":"","display":"","copyAsset":false,"role":"supplement","size":4612,"visible":true,"origin":"","legend":"\u003cp\u003eAdditional file 3: AI Prompt for Broad Literature Screening.\u003c/p\u003e\n\u003cp\u003eThis text file contains the exact prompt provided to the Large Language Model for the initial, broad screening of the literature to identify potentially relevant articles.\u003c/p\u003e\n\u003cp\u003e(Format: .txt)\u003c/p\u003e","description":"","filename":"SupplementaryFile3AIPromptStep1BroadScreening.txt","url":"https://assets-eu.researchsquare.com/files/rs-7455257/v1/1a587f3b80e175407a9f5fc5.txt"},{"id":93029670,"identity":"2f19a1bc-a6ea-4705-a030-be3293a91acb","added_by":"auto","created_at":"2025-10-08 09:59:48","extension":"txt","order_by":4,"title":"","display":"","copyAsset":false,"role":"supplement","size":8753,"visible":true,"origin":"","legend":"\u003cp\u003eAdditional file 4: AI Prompt for Deep Data Extraction.\u003c/p\u003e\n\u003cp\u003eThis text file contains the exact prompt provided to the Large Language Model for the second-step, deep extraction and screening of quantitative data from the shortlisted literature.\u003c/p\u003e\n\u003cp\u003e(Format: .txt)\u003c/p\u003e","description":"","filename":"SupplementaryFile4AIPromptStep2DeepExtraction.txt","url":"https://assets-eu.researchsquare.com/files/rs-7455257/v1/c74c176f5f66aa39d1ecd8c5.txt"}],"financialInterests":"No competing interests reported.","formattedTitle":"Assessing the Fitness-for-Purpose of Published Breath Analysis Data: A Quality Assessment Framework for Diabetes Biomarker Research","fulltext":[{"header":"Background","content":"\u003cp\u003eIn the field of non-invasive diagnostics, exhaled breath analysis has emerged as a highly promising technique. This is underscored by its recognition from the World Economic Forum as one of the top ten emerging technologies of 2021 [\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e]. The scientific basis for this technique is that human breath is a complex product of metabolic processes. As Sharma et al. noted in a 2023 review, a single exhalation contains hundreds of volatile organic compounds (VOCs) capable of providing a real-time snapshot of an individual's metabolic state [\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e]. Indeed, a recent census of the \"human volatilome\" identified nearly 1,500 distinct VOCs in the breath of healthy individuals alone [\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e]. Among its many potential clinical applications, the technique holds particular importance for managing diabetes mellitus, a major global health challenge; data cited by Mahnoor et al. in a 2024 review indicate that the global prevalence of diabetes reached 529\u0026nbsp;million people in 2021 [\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e]. Consequently, developing convenient and painless monitoring tools using exhaled VOCs to supplement, or even replace, traditional invasive blood glucose testing has become a key research direction in the field [\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e]. However, despite the field's long history and immense potential, a persistent challenge has hindered its clinical translation: the conclusions from different studies are often contradictory or even conflicting. For instance, in a 2021 review, Dixit et al. highlighted that the efficacy of breath acetone as a standalone biomarker is \"uncertain,\" as its correlation with blood glucose levels has been variously reported as positive, negative, or entirely absent [\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e]. This lack of consensus directly impedes clinical translation. As Miekisch et al. noted in a 2024 commentary, despite numerous studies, no breath test has yet transitioned into clinical practice as a standard diagnostic procedure [\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e].\u003c/p\u003e\u003cp\u003eWe argue that the root of these inconsistent conclusions stems not solely from the intrinsic biological complexity but largely from the absence of appropriate data screening criteria for secondary analysis. As Ma et al. clearly stated in a 2023 review, a major obstacle to the clinical translation of exhaled breath diagnostics is the \"lack of standardized operating procedures,\" which spans the entire process from breath sampling and storage to analysis [\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e]. This lack of standards has led to significant methodological heterogeneity in existing research. Taking breath acetone detection as an example, a 2021 review by Obeidat systematically covered as many as seven mainstream detection technologies, from Gas Chromatography-Mass Spectrometry (GC-MS) to various sensors, and detailed the vast differences in sensitivity, selectivity, and operating conditions for each [\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e]. This renders data from different studies difficult to compare directly. As summarized by Liu et al. in a recent 2024 review, the diversity in sampling methods, analytical instruments, and data processing approaches makes the acetone concentrations obtained from different studies \"hardly comparable\" [\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e]. The significant methodological heterogeneity is a problem that even meta-analysis, often considered the highest level of evidence, cannot resolve\u0026mdash;fundamentally because its own data screening criteria are typically too broad. For instance, in a 2021 meta-analysis on breath tests for diagnosing diabetes, Wang et al.'s inclusion criteria merely required that studies be \"diagnostic accuracy studies\" and \"provide sufficient data to construct a 2x2 table\" [\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e]. This approach, based solely on study type and data completeness, largely overlooks how the data were generated\u0026mdash;crucial methodological details such as whether the study used high-precision mass spectrometry versus interference-prone sensors, or whether breath samples were standardized alveolar air versus mixed expired air. As Mathew et al. emphasized in a 2015 review, fundamental differences exist among analytical techniques (e.g., mass spectrometry, spectroscopy, sensor arrays) in terms of sensitivity, selectivity, and robustness against interferents like humidity [\u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e]. Therefore, directly pooling and statistically analyzing data with such significant methodological heterogeneity casts doubt on the reliability of the results. Indeed, the meta-analysis by Wang et al. could only draw the cautious conclusion that breath testing has \"a moderate diagnostic accuracy\" for diabetes, and they specifically noted that their findings were severely limited by \"significant heterogeneity\" [\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e]. This demonstrates that existing secondary analysis methods, due to their coarse-grained screening criteria, fail to address the root problem. Instead, they risk incorporating a large volume of flawed data, ultimately yielding ambiguous conclusions. A reasonable inference, therefore, is that if a framework for assessing data quality and applicability\u0026mdash;one that delves into methodological details\u0026mdash;were used to select a high-quality, homogeneous subset of studies, a secondary analysis of this subset would very likely yield highly consistent conclusions, thereby validating the clinical value of breath analysis technology.\u003c/p\u003e\u003cp\u003eThe reality is that within the specific field of breath analysis, no recognized, standardized framework currently exists that is specifically designed to assess the quality of secondary data. The vast majority of existing systematic reviews are concentrated in areas like cancer screening, and even within these most intensively studied domains, the assessment tools employed have fundamental limitations. For example, several recent, high-quality meta-analyses on diseases such as lung and breast cancer have all used QUADAS-2 to assess the risk of bias in their included studies [\u003cspan additionalcitationids=\"CR14\" citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e15\u003c/span\u003e]. While QUADAS-2 is the gold standard for evaluating the quality of diagnostic accuracy studies, its core philosophy is to assess the rigor of a study's clinical design rather than the technical comparability of its underlying data. In other words, its assessment domains focus on macroscopic clinical design aspects like patient selection, flow, and timing, while lacking specific items to evaluate the critical technical details unique to breath analysis that determine data quality (e.g., breath sampling methods, sample preconcentration techniques, and instrument calibration). This scarcity of systematic methodological assessment tools is even more pronounced in the specific area of diabetes. A telling example is a 2025 publication, one of the very few systematic reviews specifically targeting breath ketone analysis for diabetes, which, for its quality assessment, still had to adopt a \"modified\" version of the generic QUADAS-2 tool [\u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e]. This move highlights that existing clinical assessment tools fall short when researchers attempt to delve deeper into the technical reliability of the data. Therefore, what the field lacks is not a refutation of QUADAS-2, but a complementary systematic tool from a different perspective\u0026mdash;a framework designed specifically for the objective assessment of the Fitness-for-Purpose of published quantitative data, one rooted in a more \"metrological\" mindset.\u003c/p\u003e\u003cp\u003eTo address this challenge and ensure our proposed framework is both methodologically sound and innovative, we first conducted a broad and systematic survey of the paradigms for creating data quality assessment frameworks. We analyzed 38 representative publications from diverse academic fields (Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e). While Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e presents only the authors, citation numbers, and broad categories of the assessed data, a detailed summary of the development rationale for each framework is provided in Supplementary Table \u003cspan refid=\"MOESM1\" class=\"InternalRef\"\u003eS1\u003c/span\u003e. Our survey revealed that the creation of a new data quality assessment framework typically follows one of five established academic paradigms. The construction methodology for the framework proposed in this study is based on adhering to and combining these established paradigms. Specifically, our framework integrates two of these mainstream approaches: \"Category 2: Extension and Contextualization of Existing Authoritative Frameworks\" and \"Category 4: Use-Case Driven Construction.\" This dual approach ensures that our method is not created from scratch but is instead grounded in a foundation of scientifically recognized methodologies, while also guaranteeing its precise applicability to the specific challenge we address. Building on this foundation, the uniqueness and authority of our framework lie not in simply refining existing clinical assessment tools, but in proposing a different paradigm. It introduces the stringent technical review specifications for Standard Reference Data from the field of metrology and aligns deeply with international data quality standards such as ISO 8000 and ISO/IEC 25012. This ensures that its assessment scale possesses a higher degree of rigor and objectivity.\u003c/p\u003e\u003cp\u003e\u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab1\" border=\"1\"\u003e\u003ccaption language=\"En\"\u003e\u003cdiv class=\"CaptionNumber\"\u003eTable 1\u003c/div\u003e\u003cdiv class=\"CaptionContent\"\u003e\u003cp\u003eOverview of Data Quality Assessment Frameworks Categorized by Development Methodology\u003c/p\u003e\u003c/div\u003e\u003c/caption\u003e\u003ccolgroup cols=\"3\"\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e\u003cthead\u003e\u003ctr\u003e\u003cth align=\"left\" colname=\"c1\"\u003e\u003cp\u003eReference\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colspan=\"2\" nameend=\"c3\" namest=\"c2\"\u003e\u003cp\u003eBroad Category of Data Evaluated\u003c/p\u003e\u003c/th\u003e\u003c/tr\u003e\u003c/thead\u003e\u003ctbody\u003e\u003ctr\u003e\u003ctd align=\"left\" colspan=\"2\" nameend=\"c2\" namest=\"c1\"\u003e\u003cp\u003e\u003cb\u003eCategory 1: Induction and Synthesis from Systematic Literature Reviews\u003c/b\u003e\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colspan=\"1\" nameend=\"c3\" namest=\"c3\"\u003e\u0026nbsp;\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eSchwabe et al. [\u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e17\u003c/span\u003e]\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colspan=\"2\" nameend=\"c3\" namest=\"c2\"\u003e\u003cp\u003eMedical data for model training\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eFadahunsi et al. [\u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e18\u003c/span\u003e]\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colspan=\"2\" nameend=\"c3\" namest=\"c2\"\u003e\u003cp\u003eElectronic health information\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003ePatone \u0026amp; Zhang [\u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e19\u003c/span\u003e]\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colspan=\"2\" nameend=\"c3\" namest=\"c2\"\u003e\u003cp\u003eSocial media data\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eFadahunsi et al. [\u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e20\u003c/span\u003e]\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colspan=\"2\" nameend=\"c3\" namest=\"c2\"\u003e\u003cp\u003eDigital health technology data\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eDeclerck et al. [\u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e21\u003c/span\u003e]\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colspan=\"2\" nameend=\"c3\" namest=\"c2\"\u003e\u003cp\u003eHealth data (for secondary use)\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eSchmidt et al. [\u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e22\u003c/span\u003e]\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colspan=\"2\" nameend=\"c3\" namest=\"c2\"\u003e\u003cp\u003eObservational health research data\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eZhang et al. [\u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e23\u003c/span\u003e]\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colspan=\"2\" nameend=\"c3\" namest=\"c2\"\u003e\u003cp\u003eInternet of Things (IoT) data\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eIjab et al. [\u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e24\u003c/span\u003e]\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colspan=\"2\" nameend=\"c3\" namest=\"c2\"\u003e\u003cp\u003eBig data (in the public sector)\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eDaikeler et al. [\u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e25\u003c/span\u003e]\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colspan=\"2\" nameend=\"c3\" namest=\"c2\"\u003e\u003cp\u003eDigital social science research data\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eCichy \u0026amp; Rass [\u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e26\u003c/span\u003e]\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colspan=\"2\" nameend=\"c3\" namest=\"c2\"\u003e\u003cp\u003eGeneral business data\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colspan=\"2\" nameend=\"c2\" namest=\"c1\"\u003e\u003cp\u003e\u003cb\u003eCategory 2: Extension and Contextualization of Existing Authoritative Frameworks\u003c/b\u003e\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colspan=\"1\" nameend=\"c3\" namest=\"c3\"\u003e\u0026nbsp;\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eBhana et al. [\u003cspan citationid=\"CR27\" class=\"CitationRef\"\u003e27\u003c/span\u003e]\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colspan=\"2\" nameend=\"c3\" namest=\"c2\"\u003e\u003cp\u003ePublic safety data\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eZou \u0026amp; Berger [\u003cspan citationid=\"CR28\" class=\"CitationRef\"\u003e28\u003c/span\u003e]\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colspan=\"2\" nameend=\"c3\" namest=\"c2\"\u003e\u003cp\u003eReal-World Data (RWD) (in healthcare)\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eShabani et al. [\u003cspan citationid=\"CR29\" class=\"CitationRef\"\u003e29\u003c/span\u003e]\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colspan=\"2\" nameend=\"c3\" namest=\"c2\"\u003e\u003cp\u003eHealthcare data (newborn indicators)\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eOkwaraji et al. [\u003cspan citationid=\"CR30\" class=\"CitationRef\"\u003e30\u003c/span\u003e]\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colspan=\"2\" nameend=\"c3\" namest=\"c2\"\u003e\u003cp\u003eHealthcare data (low birth weight and preterm birth)\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eGyrard et al. [\u003cspan citationid=\"CR31\" class=\"CitationRef\"\u003e31\u003c/span\u003e]\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colspan=\"2\" nameend=\"c3\" namest=\"c2\"\u003e\u003cp\u003eHealthcare data (cancer-related)\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eJin et al. [\u003cspan citationid=\"CR32\" class=\"CitationRef\"\u003e32\u003c/span\u003e]\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colspan=\"2\" nameend=\"c3\" namest=\"c2\"\u003e\u003cp\u003eMedical diagnostic study data\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eHealy et al. [\u003cspan citationid=\"CR33\" class=\"CitationRef\"\u003e33\u003c/span\u003e]\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colspan=\"2\" nameend=\"c3\" namest=\"c2\"\u003e\u003cp\u003eHealth and social care data\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eLaberge \u0026amp; Shachak [\u003cspan citationid=\"CR34\" class=\"CitationRef\"\u003e34\u003c/span\u003e]\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colspan=\"2\" nameend=\"c3\" namest=\"c2\"\u003e\u003cp\u003eSociodemographic data\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eComero et al. [\u003cspan citationid=\"CR35\" class=\"CitationRef\"\u003e35\u003c/span\u003e]\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colspan=\"2\" nameend=\"c3\" namest=\"c2\"\u003e\u003cp\u003eEnvironmental monitoring data\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colspan=\"2\" nameend=\"c2\" namest=\"c1\"\u003e\u003cp\u003e\u003cb\u003eCategory 3: Qualitative Methods \u0026amp; Expert Consensus\u003c/b\u003e\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colspan=\"1\" nameend=\"c3\" namest=\"c3\"\u003e\u0026nbsp;\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eNg et al. [\u003cspan citationid=\"CR36\" class=\"CitationRef\"\u003e36\u003c/span\u003e]\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colspan=\"2\" nameend=\"c3\" namest=\"c2\"\u003e\u003cp\u003eHealth and biomedical datasets\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colspan=\"2\" nameend=\"c2\" namest=\"c1\"\u003e\u003cp\u003e\u003cb\u003eCategory 4: Use-Case Driven Construction\u003c/b\u003e\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colspan=\"1\" nameend=\"c3\" namest=\"c3\"\u003e\u0026nbsp;\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eTute et al. [\u003cspan citationid=\"CR37\" class=\"CitationRef\"\u003e37\u003c/span\u003e]\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colspan=\"2\" nameend=\"c3\" namest=\"c2\"\u003e\u003cp\u003eClinical data (pediatric intensive care)\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eKookal et al. [\u003cspan citationid=\"CR38\" class=\"CitationRef\"\u003e38\u003c/span\u003e]\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colspan=\"2\" nameend=\"c3\" namest=\"c2\"\u003e\u003cp\u003eDental electronic health record data\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eKuusisto et al. [\u003cspan citationid=\"CR39\" class=\"CitationRef\"\u003e39\u003c/span\u003e]\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colspan=\"2\" nameend=\"c3\" namest=\"c2\"\u003e\u003cp\u003eHealthcare data (palliative care)\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eWidad et al. [\u003cspan citationid=\"CR40\" class=\"CitationRef\"\u003e40\u003c/span\u003e]\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colspan=\"2\" nameend=\"c3\" namest=\"c2\"\u003e\u003cp\u003eBig data\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eMcCord et al. [\u003cspan citationid=\"CR41\" class=\"CitationRef\"\u003e41\u003c/span\u003e]\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colspan=\"2\" nameend=\"c3\" namest=\"c2\"\u003e\u003cp\u003eEcological data\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eBlacketer et al. [\u003cspan citationid=\"CR42\" class=\"CitationRef\"\u003e42\u003c/span\u003e]\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colspan=\"2\" nameend=\"c3\" namest=\"c2\"\u003e\u003cp\u003eObservational medical data\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eHillert et al. [\u003cspan citationid=\"CR43\" class=\"CitationRef\"\u003e43\u003c/span\u003e]\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colspan=\"2\" nameend=\"c3\" namest=\"c2\"\u003e\u003cp\u003eHealthcare data (multiple sclerosis)\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eCorrales et al. [\u003cspan citationid=\"CR44\" class=\"CitationRef\"\u003e44\u003c/span\u003e]\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colspan=\"2\" nameend=\"c3\" namest=\"c2\"\u003e\u003cp\u003eDatasets for classification tasks\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eJiang et al. [\u003cspan citationid=\"CR45\" class=\"CitationRef\"\u003e45\u003c/span\u003e]\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colspan=\"2\" nameend=\"c3\" namest=\"c2\"\u003e\u003cp\u003eReal-World Data (RWD) (in healthcare)\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eWirsching et al. [\u003cspan citationid=\"CR46\" class=\"CitationRef\"\u003e46\u003c/span\u003e]\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colspan=\"2\" nameend=\"c3\" namest=\"c2\"\u003e\u003cp\u003eObservational epidemiological study data\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eNguyen et al. [\u003cspan citationid=\"CR47\" class=\"CitationRef\"\u003e47\u003c/span\u003e]\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colspan=\"2\" nameend=\"c3\" namest=\"c2\"\u003e\u003cp\u003eBig data (in education)\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eDeady et al. [\u003cspan citationid=\"CR48\" class=\"CitationRef\"\u003e48\u003c/span\u003e]\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colspan=\"2\" nameend=\"c3\" namest=\"c2\"\u003e\u003cp\u003eHealthcare data (vaccine adverse events)\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eNaik et al. [\u003cspan citationid=\"CR49\" class=\"CitationRef\"\u003e49\u003c/span\u003e]\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colspan=\"2\" nameend=\"c3\" namest=\"c2\"\u003e\u003cp\u003eNursing quality indicator data\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eSmith et al. [\u003cspan citationid=\"CR50\" class=\"CitationRef\"\u003e50\u003c/span\u003e]\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colspan=\"2\" nameend=\"c3\" namest=\"c2\"\u003e\u003cp\u003eAdministrative data\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eIverson et al. [\u003cspan citationid=\"CR51\" class=\"CitationRef\"\u003e51\u003c/span\u003e]\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colspan=\"2\" nameend=\"c3\" namest=\"c2\"\u003e\u003cp\u003eHealthcare data (metabolic charts)\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colspan=\"2\" nameend=\"c2\" namest=\"c1\"\u003e\u003cp\u003e\u003cb\u003eCategory 5: Deduction and Construction based on Specific Theories\u003c/b\u003e\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colspan=\"1\" nameend=\"c3\" namest=\"c3\"\u003e\u0026nbsp;\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eKrishna et al. [\u003cspan citationid=\"CR52\" class=\"CitationRef\"\u003e52\u003c/span\u003e]\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colspan=\"2\" nameend=\"c3\" namest=\"c2\"\u003e\u003cp\u003eRoad infrastructure data\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eLarburu et al. [\u003cspan citationid=\"CR53\" class=\"CitationRef\"\u003e53\u003c/span\u003e]\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colspan=\"2\" nameend=\"c3\" namest=\"c2\"\u003e\u003cp\u003eHealthcare data (telemedicine)\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003c/tbody\u003e\u003c/colgroup\u003e\u003c/table\u003e\u003c/div\u003e\u003c/p\u003e\u003cp\u003eThis paper has two core contributions: First, from a data-driven perspective and based on the stringent review criteria for standard reference data in metrology, we have created a quality assessment framework for diabetes biomarker research, the National Institute of Metrology - Diabetes Breath Assessment Framework (NIM-DBA). We will elaborate on the construction logic and specific contents of this multi-dimensional framework. Second, using the highly debated topic of \"breath acetone as a biomarker for diabetes\" as a case study, we will conduct a comparative analysis of NIM-DBA and QUADAS-2 to demonstrate how NIM-DBA provides insights from a distinct, complementary research perspective.\u003c/p\u003e"},{"header":"Methods","content":"\u003cp\u003eTo systematically assess the fitness-for-purpose of published quantitative data on breath VOCs in diabetic patients, we designed and implemented a comprehensive evaluation framework (Figure 1). The workflow constitutes an end-to-end process encompassing three primary stages: (1) Data Acquisition, (2) the NIM-DBA Data Assessment Process, and (3) the Application of Assessment Results. Figure 1 illustrates the complete architecture of this framework, detailing the specific components of each stage and highlighting the structure and basic workflow of the core NIM-DBA process.\u003c/p\u003e\n\u003cp\u003eData Acquisition and Standardization\u003c/p\u003e\n\u003cp\u003eLiterature Search and AI-Assisted Screening\u003c/p\u003e\n\u003cp\u003eTo systematically collect studies reporting quantitative data on breath VOCs in diabetic patients, a comprehensive search of four major academic databases was conducted on April 12, 2025: PubMed/MEDLINE, Scopus, Embase, and the Web of Science (WoS) Core Collection. The search was limited to articles published between January 1, 2005, and April 12, 2025. The search strategy was constructed around three core concepts: 1) the disease/population (diabetes), 2) the detection method (breath analysis), and 3) the analytes (volatile organic compounds). Recognizing the differences in search syntax, subject headings (e.g., MeSH terms), and built-in filters across databases, we developed search strategies that were logically consistent but adapted to the specifics of each platform. For instance, databases like PubMed/MEDLINE and Embase offer a direct filter to limit studies to \u0026quot;humans,\u0026quot; which we applied. For databases lacking such a direct filter (e.g., Scopus, Web of Science), non-human studies were excluded during the subsequent screening phase. The detailed search terms, Boolean logic, and results for each step are provided in Table S2. Following the database search, all retrieved records were imported into the reference management software Zotero. After initial retrieval and deduplication using Zotero, 330 unique articles were obtained. The deduplication results from each database are detailed in Table S3.\u003c/p\u003e\n\u003cp\u003eTo manage this large volume of literature efficiently, we deviated from a traditional title and abstract screening process and instead adopted an AI-assisted, two-step screening strategy. This approach was chosen because relevant quantitative data may be present even when a study\u0026apos;s title and abstract are not indicative of it. For example, a study titled for research on chronic kidney disease might still contain valuable quantitative VOC data for a diabetic cohort and a control group within the body of the paper[58]. The first step of our AI strategy was a broad preliminary screening. We utilized a Large Language Model (Google Gemini 2.0) to rapidly identify all 330 articles that could potentially contain \u0026quot;quantitative human breath VOC data.\u0026quot; The screening criteria at this stage were intentionally broad to minimize the risk of missing potentially relevant studies , with the detailed prompt provided in Supplementary File 3 - AI Prompt (Step 1 - Broad Screening).txt. Subsequently, articles that passed this initial screen were subjected to a more stringent AI-assisted precise extraction and deep screening process , the prompt for which is provided in Supplementary File 4 - AI Prompt (Step 2 - Deep Extraction).txt.\u003csup\u003e2\u003c/sup\u003e The objective of this second step was to systematically identify all potentially relevant quantitative data points and their exact positions within the source PDF files. Pinpointing the data\u0026apos;s location is crucial for ensuring traceability and facilitating subsequent manual verification. Only primary quantitative data confirmed to be originally generated by the respective studies were ultimately included in our dataset.\u003c/p\u003e\n\u003cp\u003eManual Verification, Extraction, and Standardization\u003c/p\u003e\n\u003cp\u003eWhile the AI-assisted process can efficiently locate potential data points, it cannot fully replicate the expert judgment required to understand complex academic contexts and discriminate between data sources. Therefore, to ensure the reliability and fitness-for-purpose of the finally included data, the 39 articles shortlisted by the AI underwent a manual verification and data extraction process. This procedure was performed independently by two researchers, and the results were cross-validated. Any discrepancies were resolved through discussion. The process comprised three core tasks:\u003c/p\u003e\n\u003cp\u003e1. Verification of Data Presence: Based on the location information provided by the AI, the researchers returned to the original articles to verify the existence of any quantitative VOC data related to breath testing at the indicated positions.\u003c/p\u003e\n\u003cp\u003e2. Discrimination of Data Source: For all confirmed quantitative data, we carefully discriminated its origin, clearly distinguishing between primary data generated by the study itself and secondary data merely cited from other literature. Only articles containing primary quantitative data were deemed eligible for the next step.\u003c/p\u003e\n\u003cp\u003e3. Data Extraction and Standardization: For the qualified data that passed the above verification steps, the researchers extracted it from the text, tables, figures, or supplementary materials of the source articles. The extracted data were concurrently entered into a pre-designed, standardized data format Supplementary File 2 (Sheet 1).The complete results of this screening process, including the flow of studies at each stage, are presented in the Results section.\u003c/p\u003e\n\u003cp\u003eThe Core Evaluation Framework: NIM-DBA\u003c/p\u003e\n\u003cp\u003eThe NIM-DBA framework is a novel methodological tool designed specifically for the objective assessment of published literature.\u003c/p\u003e\n\u003cp\u003eConceptualization and Theoretical Basis of the Framework\u003c/p\u003e\n\u003cp\u003eThe data quality and fitness-for-purpose assessment framework developed in this study is named the NIM-DBA (National Institute of Metrology - Diabetes Breath Assessment) Framework. Its theoretical cornerstone is the Technical Specification for Standard Reference Data Review, an unpublished work instruction developed by the National Institute of Metrology, China\u0026apos;s Center for Metrology Scientific Data and Energy Metrology (NIM-NMDC). We selected this specification precisely because it represents the highest standard for data quality in the field of metrology, a status reinforced by its official accreditation from both the China National Accreditation Service for Conformity Assessment (CNAS) and the China Metrology Accreditation (CMA). Grounding our framework in this authoritative document allowed us to instill the most rigorous principles of data quality from the outset. Within the metrological system, \u0026quot;Standard Reference Data\u0026quot; (SRD) are not ordinary scientific data; they are datasets of the highest accuracy and credibility that have undergone the most stringent evaluation, intended for calibrating measurement systems, evaluating measurement methods, and assigning values to materials. Consequently, this Technical Specification imposes requirements for data provenance, production processes, completeness, consistency, and uncertainty assessment that are far more stringent than those for conventional research data.\u003c/p\u003e\n\u003cp\u003eBy adapting and applying this rigorous review philosophy from metrology to the assessment of data from secondary literature, our framework is therefore grounded in a higher standard for data credibility and traceability. This ensures our evaluation scale is not merely a subjective checklist, but rather a systematic tool designed to scrutinize and screen conventional research data through the lens of Standard Reference Data. This constitutes the core feature and advantage of our study\u0026apos;s methodology.\u003c/p\u003e\n\u003cp\u003eTo ensure the framework\u0026apos;s applicability, we purposefully adapted the original specification: we retained its systematic evaluation structure while removing assessment content irrelevant to our research (such as physical measurement uncertainty), thereby focusing the evaluation on the fitness-for-purpose of quantitative human breath VOC data in diabetes biomarker research. To ensure the scientific validity and international recognition of each evaluation domain, we referenced international standards such as ISO 8000 (Data Quality) and ISO/IEC 25012 and 25024 (Data Quality Model) [54, 55] to develop precise, operational definitions and assessment criteria for each data attribute (e.g., study subjects, sampling methods). Consequently, the final assessment framework is a systematic tool that integrates the stringent requirements of metrology with international data quality theory. It is customized for the specific purpose of this study, ensuring a structured evaluation process and scientifically sound assessment criteria, as illustrated in Figure 2. The detailed criteria for each core domain are presented in Results section.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eFramework Implementation and Methodological Feature Extraction\u003c/p\u003e\n\u003cp\u003eThe methodological evaluation process in this study consisted of two main steps. First, we applied the NIM-DBA framework, as defined in the preceding section, to conduct an item-by-item screening of all articles that passed the initial selection process detailed in the \u0026quot;Data Acquisition and Standardization\u0026quot; section. Any article that failed to meet any of the key criteria within the NIM-DBA framework was excluded from the final subset of studies with high fitness-for-purpose.\u003c/p\u003e\n\u003cp\u003eFollowing this, the subset of articles that passed all NIM-DBA screening steps advanced to an exploratory phase of methodological feature extraction. The purpose of this stage was not to adhere to a pre-defined medical information extraction protocol, but rather to systematically and comprehensively characterize the shared methodological profile of these \u0026quot;high-quality data\u0026quot; studies. This allows for a subsequent analysis of the underlying factors contributing to their success.\u003c/p\u003e\n\u003cp\u003eTo this end, we systematically extracted a series of key methodological features from these articles. These included: study groups and sample size (n), total sample size (N), age, inclusion criteria, medication status, pre-sampling conditions, type of gas sampled, collection and storage equipment, analytical instrument(s) used, sample preprocessing techniques, statistical test methods, and the reported p-values. These extracted features serve as the data foundation for our subsequent cross-study comparative analysis and for the distillation of \u0026quot;best practice\u0026quot; principles. These procedures correspond specifically to the consecutive stages of the \u0026apos;NIM-DBA Data Assessment Process\u0026apos; and the \u0026apos;Application of Assessment Results\u0026apos; as depicted in Figure 1.\u003c/p\u003e\n\u003cp\u003eComparative Analysis: Characterizing the Clinical Bias Landscape using QUADAS-2\u003c/p\u003e\n\u003cp\u003eThe NIM-DBA framework proposed in this study represents a data-driven paradigm, the core of which is to assess whether the data themselves are fit for secondary analysis. To more clearly demonstrate the necessity of this data-driven paradigm and to reveal the challenges faced by traditional evaluation methods in this field, we conducted a parallel analysis of the study pool (N=38) to serve as a comparative reference. This analysis utilized the internationally recognized \u0026quot;Quality Assessment of Diagnostic Accuracy Studies 2\u0026quot; (QUADAS-2) tool. QUADAS-2 is a classic evaluation framework born from the foundational principles of Evidence-Based Medicine (EBM) and operates on a rigorous clinical hypothesis-driven principle. To clearly delineate the distinct evaluative dimensions of these two frameworks\u0026mdash;and to emphasize that NIM-DBA is not a replacement for QUADAS-2 but rather a complementary tool that assesses research data from another perspective\u0026mdash;their fundamental differences are summarized in Table 2.\u003c/p\u003e\n\u003cp\u003eTable 2. Comparison of the Core Principles of the NIM-DBA and QUADAS-2 Frameworks\u003c/p\u003e\n\u003cdiv align=\"center\"\u003e\n \u003ctable border=\"1\" cellspacing=\"0\" cellpadding=\"0\" width=\"576\"\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 104px;\"\u003e\u003cbr\u003e\u003c/td\u003e\n \u003ctd style=\"width: 208px;\"\u003e\n \u003cp\u003eNIM-DBA\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 265px;\"\u003e\n \u003cp\u003eQUADAS-2\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 104px;\"\u003e\n \u003cp\u003eParadigm\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 208px;\"\u003e\n \u003cp\u003eData-Driven\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 265px;\"\u003e\n \u003cp\u003eClinical Hypothesis-Driven\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 104px;\"\u003e\n \u003cp\u003eFocus\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 208px;\"\u003e\n \u003cp\u003eThe Data Itself\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 265px;\"\u003e\n \u003cp\u003eThe Process of Testing a Hypothesis\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 104px;\"\u003e\n \u003cp\u003eCore Question\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 208px;\"\u003e\n \u003cp\u003e\u0026quot;Is the data fit for purpose?\u0026quot;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 265px;\"\u003e\n \u003cp\u003e\u0026quot;Is the conclusion credible?\u0026quot;\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 104px;\"\u003e\n \u003cp\u003eAssessment Dimensions\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 208px;\"\u003e\n \u003cp\u003eTechnical attributes of the data:\u0026nbsp;\u003cbr\u003e\u0026nbsp;format, units, precision, completeness, instrument information, etc.\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 265px;\"\u003e\n \u003cp\u003eMethodology of the clinical study design:\u0026nbsp;\u003cbr\u003e\u0026nbsp;patient selection, blinding, appropriateness of the reference standard, etc.\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n \u003c/table\u003e\n\u003c/div\u003e\n\u003cp\u003e\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eIt must be emphasized that the purpose of this QUADAS-2 analysis was not to screen or exclude articles. Instead, it served as a demonstrative tool. Through this assessment, we aimed to characterize the overall landscape of clinical study design quality and the distribution of bias risk within the current field of diabetes breath analysis. In the subsequent \u0026quot;Discussion\u0026quot; section, this landscape will be contrasted with the results obtained from our NIM-DBA screening to highlight the unique value of the data-driven approach.\u003c/p\u003e\n\u003cp\u003eTo execute this comparative analysis systematically, we first defined a specific \u0026quot;Review Question,\u0026quot; which is the logical starting point for any QUADAS-2 assessment. The question was formulated as: \u0026quot;In people with suspected diabetes, what is the accuracy of a breath test (index test) for diagnosing diabetes (outcome) compared to standard glucose testing (reference standard)?\u0026quot; Based on this review question, we assessed the articles using the QUADAS-2 signaling question checklist adapted by Hanna et al. (2019) and previously used by Wang et al. (2021) in their systematic review. The assessment covered both major domains of the tool: risk of bias and applicability concerns [11]. (The detailed evaluation rules are provided in the supplementary material, Table S4).\u003c/p\u003e"},{"header":"Results","content":"\u003cp\u003eThis section presents the twofold results of our study. First, we introduce the primary outcome of our research: the complete structure and content of the NIM-DBA framework. Second, we report the findings from the application of this framework in a case study designed to assess the literature on breath acetone as a biomarker for diabetes.\u003c/p\u003e\u003cdiv id=\"Sec11\" class=\"Section2\"\u003e\u003ch2\u003eThe NIM-DBA Framework\u003c/h2\u003e\u003cp\u003eThe primary result of our framework development process is the NIM-DBA framework. It should be emphasized that NIM-DBA is designed as a general assessment framework for all types of VOC data in the field of diabetes breath testing; therefore, its criteria are universally applicable. When the assessment is focused on a specific VOC, such as acetone, the semantic settings within NIM-DBA must be adjusted accordingly.\u003c/p\u003e\u003cp\u003eFor the sake of clarity and brevity, the version of the framework presented in Table\u0026nbsp;\u003cspan refid=\"Tab3\" class=\"InternalRef\"\u003e3\u003c/span\u003e has been streamlined: the \"Source Credibility\" assessment domain includes criteria specifically for journal articles, omitting those for other publication types like conference proceedings or books. This simplification was made to manage the table's size while ensuring the complete conceptual structure of the framework remains intact. The complete structure of the framework, encompassing its four domains and their specific assessment criteria, is detailed below.\u003c/p\u003e\u003cp\u003e\u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab3\" border=\"1\"\u003e\u003ccaption language=\"En\"\u003e\u003cdiv class=\"CaptionNumber\"\u003eTable 3\u003c/div\u003e\u003cdiv class=\"CaptionContent\"\u003e\u003cp\u003eThe Data Quality Assessment Framework and Detailed Criteria\u003c/p\u003e\u003c/div\u003e\u003c/caption\u003e\u003ccolgroup cols=\"3\"\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e\u003cthead\u003e\u003ctr\u003e\u003cth align=\"left\" colname=\"c1\"\u003e\u003cp\u003eCriterion\u003c/p\u003e\u003cp\u003eID\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c2\"\u003e\u003cp\u003eData Quality Characteristic\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c3\"\u003e\u003cp\u003eAssessment Procedure\u003c/p\u003e\u003c/th\u003e\u003c/tr\u003e\u003c/thead\u003e\u003ctbody\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003e1.1\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eSource credibility\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e\u003cb\u003eStep 1: Journal and Database Review\u003c/b\u003e\u003c/p\u003e\u003cp\u003eQuestion: Is the journal peer-reviewed and indexed in reputable international or national databases?\u003c/p\u003e\u003cp\u003eDecision:\u003c/p\u003e\u003cp\u003eNo \u0026rarr; Proceed to Step 2.\u003c/p\u003e\u003cp\u003eYes \u0026rarr; Proceed to Step 3.\u003c/p\u003e\u003cp\u003e\u003cb\u003eStep 2: Predatory Journal Check\u003c/b\u003e\u003c/p\u003e\u003cp\u003eQuestion: Are there clear indicators of a \"predatory journal\"?\u003c/p\u003e\u003cp\u003eDecision:\u003c/p\u003e\u003cp\u003eYes \u0026rarr; [Assessment Outcome: Not Credible]\u003c/p\u003e\u003cp\u003eNo \u0026rarr; Proceed to Step 3.\u003c/p\u003e\u003cp\u003e\u003cb\u003eStep 3: Author and Institutional Background Review\u003c/b\u003e\u003c/p\u003e\u003cp\u003eQuestion: Are the authors' affiliations and professional backgrounds relevant and reputable?\u003c/p\u003e\u003cp\u003eDecision:\u003c/p\u003e\u003cp\u003eNo \u0026rarr; [Assessment Outcome: Not Credible]\u003c/p\u003e\u003cp\u003eYes \u0026rarr; Proceed to Step 4.\u003c/p\u003e\u003cp\u003e\u003cb\u003eStep 4: Conflict of Interest and Risk of Bias Assessment\u003c/b\u003e\u003c/p\u003e\u003cp\u003eQuestion: Are funding sources/conflicts of interest disclosed and considered?\u003c/p\u003e\u003cp\u003eDecision:\u003c/p\u003e\u003cp\u003eNo, and significant risk of bias exists \u0026rarr; [Assessment Outcome: Not Credible]\u003c/p\u003e\u003cp\u003eYes / No, but with low risk of bias \u0026rarr; [Assessment Outcome: Credible]\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003e2.1\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eData values completeness\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e\u003cb\u003eStep 1: Core Element Check\u003c/b\u003e\u003c/p\u003e\u003cp\u003eQuestion: Are all core elements (VOC name, numerical value, unit) present?\u003c/p\u003e\u003cp\u003eDecision:\u003c/p\u003e\u003cp\u003eNo \u0026rarr; [Assessment Outcome: Core Incomplete]\u003c/p\u003e\u003cp\u003eYes \u0026rarr; Proceed to Step 2.\u003c/p\u003e\u003cp\u003e\u003cb\u003eStep 2: Attribute Information Check\u003c/b\u003e\u003c/p\u003e\u003cp\u003eQuestion: Does the data have clear attribute information? (e.g., diabetic patients, healthy control group)\u003c/p\u003e\u003cp\u003eDecision:\u003c/p\u003e\u003cp\u003eNo \u0026rarr; [Assessment Outcome: Attribute Missing]\u003c/p\u003e\u003cp\u003eYes \u0026rarr; [Assessment Outcome: Complete]\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003e2.2\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eSemantic consistency\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e\u003cb\u003eStep 1: Source Verification\u003c/b\u003e\u003c/p\u003e\u003cp\u003eQuestion: Does the data originate from human breath analysis?\u003c/p\u003e\u003cp\u003eDecision:\u003c/p\u003e\u003cp\u003e\u0026amp;p;No (e.g., from cell/animal experiments) \u0026rarr; [Assessment Outcome: Semantically Irrelevant]\u003c/p\u003e\u003cp\u003eYes \u0026rarr; Proceed to Step 2.\u003c/p\u003e\u003cp\u003e\u003cb\u003eStep 2: Condition Verification\u003c/b\u003e\u003c/p\u003e\u003cp\u003eQuestion: Is the data explicitly from diabetic patients or a relevant control group?\u003c/p\u003e\u003cp\u003eDecision:\u003c/p\u003e\u003cp\u003eNo \u0026rarr; [Assessment Outcome: Semantically Irrelevant]\u003c/p\u003e\u003cp\u003eYes \u0026rarr; Proceed to Step 3.\u003c/p\u003e\u003cp\u003e\u003cb\u003eStep 3: VOC Type Verification\u003c/b\u003e\u003c/p\u003e\u003cp\u003eQuestion: Is the data for the semantically specified VOCs?\u003c/p\u003e\u003cp\u003eDecision:\u003c/p\u003e\u003cp\u003eNo \u0026rarr; [Assessment Outcome: Semantically Irrelevant]\u003c/p\u003e\u003cp\u003eYes \u0026rarr; [Assessment Outcome: Semantically Consistent]\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003e2.3\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eData format consistency\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e\u003cb\u003eStep 1: Unit Consistency Assessment\u003c/b\u003e\u003c/p\u003e\u003cp\u003eQuestion: Are all units natively uniform or mutually convertible?\u003c/p\u003e\u003cp\u003eDecision:\u003c/p\u003e\u003cp\u003eNo \u0026rarr; [Assessment Outcome: Inconsistent Format (Units cannot be unified)]\u003c/p\u003e\u003cp\u003eYes \u0026rarr; Proceed to Step 2.\u003c/p\u003e\u003cp\u003e\u003cb\u003eStep 2: Statistical Representation Consistency Assessment\u003c/b\u003e\u003c/p\u003e\u003cp\u003eQuestion: Are there major inconsistencies in statistical representation?\u003c/p\u003e\u003cp\u003eDecision:\u003c/p\u003e\u003cp\u003eYes \u0026rarr; [Assessment Outcome: Inconsistent Format (Major conflict in statistical representation)]\u003c/p\u003e\u003cp\u003eNo \u0026rarr; [Assessment Outcome: Consistent Format]\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003e3.1\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eData accuracy assurance\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e\u003cb\u003eStep 1: Clarity of Analytical Method Assessment\u003c/b\u003e\u003c/p\u003e\u003cp\u003eQuestion: Does the article specify the analytical method used?\u003c/p\u003e\u003cp\u003eDecision:\u003c/p\u003e\u003cp\u003eNo (e.g., not mentioned or vaguely described) \u0026rarr; [Assessment Outcome: Not Credible]\u003c/p\u003e\u003cp\u003eYes (e.g., GC-MS, SIFT-MS) \u0026rarr; [Assessment Outcome: Credible]\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003e3.2\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eTraceability of data values\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e\u003cb\u003eStep 1: Sample Size Threshold\u003c/b\u003e\u003c/p\u003e\u003cp\u003eQuestion: Is a clear sample size (N) reported for the data?\u003c/p\u003e\u003cp\u003eDecision:\u003c/p\u003e\u003cp\u003eNo \u0026rarr; [Assessment Outcome: Not Credible]\u003c/p\u003e\u003cp\u003eYes \u0026rarr; Proceed to Step 2.\u003c/p\u003e\u003cp\u003e\u003cb\u003eStep 2: Contextual Threshold\u003c/b\u003e\u003c/p\u003e\u003cp\u003eQuestion: Is the context from which the data was extracted clear and unambiguous?\u003c/p\u003e\u003cp\u003eDecision:\u003c/p\u003e\u003cp\u003eNo (e.g., ambiguous context) \u0026rarr; [Assessment Outcome: Not Credible]\u003c/p\u003e\u003cp\u003eYes (e.g., from a numbered figure/table) \u0026rarr; [Assessment Outcome: Credible]\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003e4.1\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eData accuracy range\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e\u003cb\u003ePresence of Uncertainty Information Check\u003c/b\u003e\u003c/p\u003e\u003cp\u003eQuestion: Is any measure of dispersion provided for the data points?\u003c/p\u003e\u003cp\u003eDecision:\u003c/p\u003e\u003cp\u003eNo (no dispersion information provided) \u0026rarr; [Assessment Outcome: Filtered Out (Missing dispersion information)]\u003c/p\u003e\u003cp\u003eYes (any of SD, SEM, Range, etc., is provided) \u0026rarr; [Rating: Pass]\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003e4.2\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eValues credibility\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e\u003cb\u003eStep 1: Identification and Filtering of Potential Outliers\u003c/b\u003e\u003c/p\u003e\u003cp\u003eProcedure: Compare the quantitative data reported in the literature against known physiological ranges.\u003c/p\u003e\u003cp\u003eCriteria for Outliers:\u003c/p\u003e\u003cp\u003eScenario 1: Value far exceeds the accepted maximum.\u003c/p\u003e\u003cp\u003eScenario 2: Value is far below the accepted minimum.\u003c/p\u003e\u003cp\u003eScenario 3: Obvious statistical/typographical error.\u003c/p\u003e\u003cp\u003eAction: Data points identified as outliers are filtered out.\u003c/p\u003e\u003cp\u003e\u003cb\u003eStep 2: Assessment of Statistical Trend Strength\u003c/b\u003e\u003c/p\u003e\u003cp\u003eObjective: To evaluate differentiation between the patient and control groups.\u003c/p\u003e\u003cp\u003eRating Criteria:\u003c/p\u003e\u003cp\u003ep\u0026thinsp;\u0026lt;\u0026thinsp;0.05 \u0026rarr; [Rating: Credible (Strong statistical support)]\u003c/p\u003e\u003cp\u003e0.05\u0026thinsp;\u0026le;\u0026thinsp;p\u0026thinsp;\u0026lt;\u0026thinsp;0.10 \u0026rarr; [Rating: Credible (Suggests a statistical trend)]\u003c/p\u003e\u003cp\u003ep\u0026thinsp;\u0026ge;\u0026thinsp;0.10 \u0026rarr; [Rating: Not Credible (Lacks statistical support)]\u003c/p\u003e\u003cp\u003eProcess: Obtain or estimate the p-value according to the following priority.\u003c/p\u003e\u003cp\u003e1. Direct Path: Use the p-value if directly reported.\u003c/p\u003e\u003cp\u003e2. Estimation Path: If not reported, estimate as follows:\u003c/p\u003e\u003cp\u003e2a. Obtain Measure of Central Tendency\u003c/p\u003e\u003cp\u003e2b. Obtain Measure of Dispersion\u003c/p\u003e\u003cp\u003e2c. P-value Estimation and Rating (via two-sample t-test)\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003e4.3\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eSemantic data accuracy\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e\u003cb\u003eComprehensive Evaluation\u003c/b\u003e\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003c/tbody\u003e\u003c/colgroup\u003e\u003c/table\u003e\u003c/div\u003e\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec12\" class=\"Section2\"\u003e\u003ch2\u003eCase Study: Assessing the Literature on Breath Acetone\u003c/h2\u003e\u003cp\u003eTo specifically validate the practical value and analytical power of our evaluation framework, we selected breath acetone as the target analyte for this case study. This choice was based on several key reasons. First, acetone is the most historic, extensively studied, and widely recognized biomarker in the field of breath analysis for diabetes. Second, precisely because of the large volume of research, the conclusions regarding its quantitative results in the existing literature are fraught with contradictions and inconsistencies. This provides an ideal scenario to test the effectiveness of our framework in filtering heterogeneous data and clarifying existing controversies. Finally, focusing the framework's initial application on a single, crucial VOC allows for a clearer and more in-depth demonstration of its evaluation process and analytical logic. Therefore, this section will detail the process and findings of applying our framework to systematically assess the quality and fitness-for-purpose of quantitative acetone data from the literature. The semantic context for this assessment was defined as: \"the concentration (in ppbV) of acetone in the breath test results of individuals with diabetes.\"\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec13\" class=\"Section2\"\u003e\u003ch2\u003eLiterature Screening and Sample Establishment\u003c/h2\u003e\u003cp\u003eThe systematic literature screening and selection process was conducted according to the methodology detailed in the Methods section. The initial search across the four specified databases yielded a total of 712 records. After the removal of duplicates, 330 unique articles remained for screening. These articles were then subjected to our two-step AI-assisted screening followed by a rigorous manual verification process. Through this multi-stage funnel, studies were excluded primarily for reasons such as not reporting primary quantitative data or not including a relevant diabetic cohort.\u003c/p\u003e\u003cp\u003eUltimately, this process resulted in a final sample of 38\u003csup\u003e1\u003c/sup\u003e studies that met all inclusion criteria (a full list is provided in Supplementary File 1 under the heading \"References for Supplementary Material\"). These articles constitute the literature pool for the subsequent NIM-DBA quality assessment and the case study analysis. A complete visual breakdown of the study selection process, detailing the number of records at each stage and the specific reasons for exclusion, is presented in the PRISMA flow diagram in Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003e.\u003c/p\u003e\u003cp\u003e\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec14\" class=\"Section2\"\u003e\u003ch2\u003eFitness-for-Purpose Screening with the NIM-DBA Framework\u003c/h2\u003e\u003cp\u003eFigure \u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003e illustrates the specific number and proportion of articles that passed or were screened out at each step of the NIM-DBA assessment for the 38 studies included via the PRISMA process. The detailed results of this framework evaluation are available in Table S5. As the literature search was conducted across four authoritative academic databases, all 38 studies passed the initial evaluation domains of \"Source Credibility\" (Criterion 1.1) and \"Data Values Completeness\" (Criterion 2.1). In the subsequent assessment steps, non-compliant articles were progressively screened out. In the \"Semantic Consistency\" (Criterion 2.2) assessment, nine articles were excluded because their data did not explicitly originate from acetone levels in human breath tests related to diabetes. Next, in the \"Data Format Consistency\" (Criterion 2.3) assessment of the remaining 29 articles, another nine were filtered out due to severe inconsistencies or an inability to unify their statistical representations or units. The 20 articles that passed the format consistency assessment all passed the evaluations for \"Data Accuracy Assurance\" (Criterion 3.1) and \"Traceability of Data Values\" (Criterion 3.2). In the \"Data Accuracy Range\" (Criterion 4.1) step, one article was filtered out for lacking a data range. For \"Values Credibility\" (Criterion 4.2), 11 articles were filtered out because a statistical analysis of their quantitative data showed no significant difference between the diabetic and control groups (a lenient threshold of p\u0026thinsp;\u0026lt;\u0026thinsp;0.10 was used as the screening criterion here).\u003c/p\u003e\u003cp\u003eIn the final \"Comprehensive Assessment\" (Criterion 4.3) stage, we conducted a deeper, holistic evaluation of the remaining studies, excluding two representative articles at this point. One of our data mining principles is to capture data from all diabetes-related contexts. From this perspective, Study #8, which was not primarily a diabetes study, passed the preceding steps, demonstrating the framework's efficacy in capturing latent data from non-directly related research. However, a deeper analysis revealed that Study #8 included three sample groups. Although a standalone comparison of its diabetic and control groups showed a statistical difference, the Kruskal-Wallis test performed by the original authors across all three groups was not significant. Prioritizing the overall multi-group test, we concluded that the statistical distinction of its data was not robust and thus deemed it non-compliant. The other excluded article, Study #22, had an extremely small sample size (N\u0026thinsp;=\u0026thinsp;4). We believe that quantitative data from studies with low sample sizes can still be of high quality depending on the research objectives. Therefore, the \"Comprehensive Assessment\" further examined other aspects of Study #22. After determining that critical methodological information such as sample age, medication status, and sampling conditions were all missing, we ultimately concluded that Study #22 did not represent a high-quality study suitable for secondary analysis and it was therefore excluded.\u003c/p\u003e\u003cp\u003eFollowing this series of rigorous, multi-stage screening procedures, only six articles fully passed all assessments of the framework. Their quantitative data were considered to be of high quality and high fitness-for-purpose, forming the basis for our subsequent cross-study comparative analysis.\u003c/p\u003e\u003cp\u003e\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec15\" class=\"Section2\"\u003e\u003ch2\u003eThe Landscape of Bias Risk in Clinical Study Design\u003c/h2\u003e\u003cp\u003eAs shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003e, our QUADAS-2 assessment of all 38 articles reveals a complex situation of the clinical study design quality in this field. This evaluation is based on a classic hypothesis-driven paradigm, which aims to judge whether a study's clinical conclusions are subject to bias. The assessment reveals two noteworthy systemic issues.\u003c/p\u003e\u003cp\u003eFirst, in the \"Risk of Bias\" assessment (left panel), the most prominent problem lies in the \"Patient Selection\" domain, where a high proportion (79%) of studies were rated as high risk. This primarily reflects a prevalent reliance on case-control designs, which, while convenient to implement, are widely known to potentially overestimate diagnostic accuracy. Second, the assessment revealed a widespread lack of reporting transparency. In the \"Index Test\" domain, a high percentage (82%) of studies were rated as \"Unclear\" due to missing information, making it impossible to determine if their testing process and result interpretation were biased. This figure clearly demonstrates that from the traditional QUADAS-2 perspective, which is oriented towards validating clinical hypotheses, the current literature pool shows a high prevalence of design flaws and reporting deficiencies that could render their \"clinical conclusions\" unreliable.\u003c/p\u003e\u003cp\u003e\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec16\" class=\"Section2\"\u003e\u003ch2\u003eCommon Methodological Features of High-Fitness-for-Purpose Studies\u003c/h2\u003e\u003cp\u003eTo systematically investigate the core elements that constitute a \"high fitness-for-purpose\" study, we conducted a detailed comparative analysis of the methodological features of the six articles that ultimately passed our evaluation framework (Goerl et al. [\u003cspan citationid=\"CR56\" class=\"CitationRef\"\u003e56\u003c/span\u003e], Ghimenti et al. [\u003cspan citationid=\"CR57\" class=\"CitationRef\"\u003e57\u003c/span\u003e], Chien et al. [\u003cspan citationid=\"CR58\" class=\"CitationRef\"\u003e58\u003c/span\u003e], Lekha and Suchetha [\u003cspan citationid=\"CR59\" class=\"CitationRef\"\u003e59\u003c/span\u003e], Sha et al. [\u003cspan citationid=\"CR60\" class=\"CitationRef\"\u003e60\u003c/span\u003e], and Li et al. [\u003cspan citationid=\"CR61\" class=\"CitationRef\"\u003e61\u003c/span\u003e]). The specific details of this analysis are provided in Supplementary File 2 (Sheet 3), and the quantitative data distributions are visualized in Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003e. A core screening criterion of our framework is \"Values Credibility\" (Criterion 4.2), which requires the study's data to show some potential for statistical distinction between diabetic and control groups (using a lenient threshold of p\u0026thinsp;\u0026lt;\u0026thinsp;0.10). Therefore, the purpose of this section is not simply to reiterate that \"these studies all show a difference,\" but rather to uncover whether a common and reliable biological pattern exists within the studies that passed our rigorous methodological screening and demonstrated this statistical trend.\u003c/p\u003e\u003cp\u003eAmong these high-quality studies, the analysis reveals a clear pattern: when the data possess statistical differentiability, the trend consistently indicates that breath acetone concentration is higher in diabetic patients than in healthy controls. This pattern was observed across studies employing different technological pathways; for example, both Li et al. [\u003cspan citationid=\"CR61\" class=\"CitationRef\"\u003e61\u003c/span\u003e], which used the gold-standard GC-MS, and Chien et al. [\u003cspan citationid=\"CR58\" class=\"CitationRef\"\u003e58\u003c/span\u003e], which used a novel biosensor, reported significantly elevated acetone levels in the diabetic group.\u003c/p\u003e\u003cp\u003eNotably, the inclusion of Goerl et al. [\u003cspan citationid=\"CR56\" class=\"CitationRef\"\u003e56\u003c/span\u003e] clearly demonstrates the objectivity and depth of our evaluation framework. This study also passed the majority of our methodological screening criteria, yet its conclusion was unique: the acetone concentration in the diabetic group had lower variability, not a higher mean level. Our framework did not exclude this study for its non-conforming conclusion; instead, it prompted us to trace back its specific methodology. The analysis revealed that the study subjects in Goerl et al. [\u003cspan citationid=\"CR56\" class=\"CitationRef\"\u003e56\u003c/span\u003e] were end-stage renal disease (ESRD) patients on hemodialysis, a special population whose metabolic and clearance mechanisms are distinctly different from those of typical diabetic patients. Therefore, the \"exception\" of Goerl et al. [\u003cspan citationid=\"CR56\" class=\"CitationRef\"\u003e56\u003c/span\u003e] does not weaken our overall finding. On the contrary, it demonstrates that our framework can identify methodologically rigorous studies, and that when conclusions differ, the discrepancy can be rationally explained by the unique study design\u0026mdash;in this case, the choice of population.\u003c/p\u003e\u003cp\u003eA further analysis of the commonalities among these high-quality studies reveals a shared set of core methodological principles. In terms of analytical techniques, Gas Chromatography-Mass Spectrometry (GC-MS) with complementary sample pre-processing (such as SPME or TD) represents one reliable pathway to obtaining high-quality quantitative data (Goerl et al. [\u003cspan citationid=\"CR56\" class=\"CitationRef\"\u003e56\u003c/span\u003e], Ghimenti et al. [\u003cspan citationid=\"CR57\" class=\"CitationRef\"\u003e57\u003c/span\u003e], Li et al. [\u003cspan citationid=\"CR61\" class=\"CitationRef\"\u003e61\u003c/span\u003e]). Concurrently, the other three studies (Chien et al. [\u003cspan citationid=\"CR58\" class=\"CitationRef\"\u003e58\u003c/span\u003e], Lekha and Suchetha [\u003cspan citationid=\"CR59\" class=\"CitationRef\"\u003e59\u003c/span\u003e], Sha et al. [\u003cspan citationid=\"CR60\" class=\"CitationRef\"\u003e60\u003c/span\u003e]) collectively highlight that novel sensor technologies are a critical development direction for achieving rapid, non-invasive detection. On the crucial front of sample collection, the precise collection of alveolar gas and strict control of pre-sampling conditions such as fasting were identified as key prerequisites for ensuring data credibility.\u003c/p\u003e\u003cp\u003eIn summary, our framework successfully filtered out a high-quality subset of studies. Within this subset, we not only revealed a highly consistent biological pattern regarding breath acetone in diabetes but, more importantly, we demonstrated that a study's methodological design is the fundamental factor determining the reliability and fitness-for-purpose of its conclusions. The \"best practice\" principles identified through this process can provide invaluable guidance for the design of future research in this field.\u003c/p\u003e\u003cp\u003e\u003c/p\u003e\u003c/div\u003e"},{"header":"Discussion","content":"\u003cp\u003eThe core objective of this study was to address the issue of contradictory conclusions in the secondary analysis of literature on breath VOCs for diabetes, a problem arising from the methodological heterogeneity of primary studies. To this end, we proposed and validated a novel data quality and fitness-for-purpose evaluation framework rooted in metrological science. Our results clearly reveal the necessity and effectiveness of applying this framework: of the 38 relevant articles initially identified, only six fully met our established quality and fitness-for-purpose standards. This stark disparity is a core finding in itself, quantitatively confirming the significant methodological heterogeneity in the design, execution, and reporting of current research in the breath analysis field. This heterogeneity is arguably the fundamental reason for the long-standing lack of clinical consensus.\u003c/p\u003e\u003cp\u003eTherefore, the contribution of this study is not simply to conclude on the efficacy of a specific biomarker like acetone, but to provide a methodological tool that can systematically resolve such controversies. The value of this framework is threefold. First, it transforms the subjective question of \"Is the data reliable?\" into a series of objective, quantifiable evaluation criteria, providing an operational standard for data inclusion and exclusion in secondary analyses. Second, by enabling a comparative analysis of the high-quality studies that pass the assessment, the framework systematically reveals the common methodological principles\u0026mdash;the \"best practices\"\u0026mdash;that underpin reliable conclusions. Finally, its evaluation criteria can serve as a prospective design guide for future researchers, helping to improve the data quality and comparability of the entire field from the source.\u003c/p\u003e\u003cdiv id=\"Sec18\" class=\"Section2\"\u003e\u003ch2\u003eTwo Paradigms, Two Landscapes: A Comparison with Classic Clinical Evaluation\u003c/h2\u003e\u003cp\u003eThe most significant finding of this study is the emergence of two vastly different quality landscapes when the same literature pool was assessed by the data-driven NIM-DBA framework versus the classic hypothesis-driven QUADAS-2 tool. The QUADAS-2 assessment revealed widespread methodological issues: a literature pool rife with design flaws and reporting deficiencies that could render their clinical conclusions unreliable (Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003e). In stark contrast, the NIM-DBA framework successfully identified a \"core subset\" of six studies with high fitness-for-purpose from within the same pool (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003e).\u003c/p\u003e\u003cp\u003eThis significant divergence visually substantiates our core thesis: a framework designed to assess the credibility of a \"study's conclusion\" (QUADAS-2) operates on a completely different logic from a framework designed to assess the usability of the \"data itself\" (NIM-DBA). QUADAS-2 answers the question, \"Can we trust the authors' conclusions?\" whereas NIM-DBA answers, \"Can we confidently use the authors' data for secondary analysis?\" In the current era, where secondary research and data reuse are increasingly important, the latter question is of growing significance. The detailed, study-by-study assessment data for these two landscapes are presented in Supplementary Table S5 (NIM-DBA) and Table S6 (QUADAS-2), providing direct evidentiary support for the arguments in this section and constituting the fundamental value of our proposed data-driven paradigm.\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec19\" class=\"Section2\"\u003e\u003ch2\u003eMethodological Rigor as the Cornerstone of Conclusion Consistency\u003c/h2\u003e\u003cp\u003eWhile our framework does not attempt to define a universal \"gold standard,\" it successfully filtered out a high-quality subset of studies, revealing an important pattern: the studies that provide the most reliable and fit-for-purpose data tend to share a similar set of more rigorous core methodological principles.\u003c/p\u003e\u003cp\u003eOur analysis found that when studies strictly adhered to certain \"best practices\"\u0026mdash;such as the precise collection of alveolar gas, pre-sampling preparations like subject fasting, and the use of high-sensitivity analytical techniques like GC-MS\u0026mdash;their conclusions regarding elevated breath acetone in diabetic patients were highly consistent. This finding suggests that the prevalent \"contradictory\" conclusions in the literature likely stem not from the intrinsic instability of the biomarker itself, but from the vast differences in methodological rigor across studies. The evaluation framework proposed herein provides an objective yardstick to identify and quantify this degree of rigor.\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec20\" class=\"Section2\"\u003e\u003ch2\u003eBeyond Quality Judgment: The Critical Role of \"Fitness-for-Purpose\"\u003c/h2\u003e\u003cp\u003eThe unique value of this framework lies in its transcendence of a simple binary \"good/bad\" quality judgment, introducing instead the core concept of \"Fitness-for-Purpose.\" This was fully embodied in our analysis of Goerl et al. [\u003cspan citationid=\"CR56\" class=\"CitationRef\"\u003e56\u003c/span\u003e]. That study was methodologically rigorous, but the specificity of its study population (end-stage renal disease patients) meant its data were not fit for the purpose of directly answering the question about acetone levels in the general diabetic population. Our framework did not incorrectly label it as \"low quality\" but accurately identified the boundaries of its applicability. This means the framework is not only a retrospective evaluation tool but also a prospective \"guide for assessing research applicability.\" It helps secondary data analysts (e.g., meta-analysts) to quickly screen a large body of literature for datasets that are not only of high quality but whose study design also matches their target semantics, thereby avoiding erroneous inferences caused by mismatched designs and greatly enhancing the reliability of secondary research.\u003c/p\u003e\u003cp\u003eLikewise, the concept of \"fitness-for-purpose\" must be applied to the evaluation of different technological pathways, especially the rapidly evolving field of novel sensor technologies. Among the six high-quality studies we identified, three (Chien et al. [\u003cspan citationid=\"CR58\" class=\"CitationRef\"\u003e58\u003c/span\u003e], Lekha and Suchetha [\u003cspan citationid=\"CR59\" class=\"CitationRef\"\u003e59\u003c/span\u003e], and Sha et al. [\u003cspan citationid=\"CR60\" class=\"CitationRef\"\u003e60\u003c/span\u003e]) employed novel non-GC-MS sensor technologies, which undoubtedly represent a critical future direction for achieving real-time, non-invasive detection. Our framework affirmed the methodological rigor of these studies' designs. However, from the perspective of secondary quantitative analysis, the fitness-for-purpose of data from these emerging technologies presents new challenges. Compared to gold-standard methods like GC-MS, the quantitative accuracy, specificity (i.e., susceptibility to interference from other gases), and transparency of calibration methods are key to assessing the applicability of sensor data, and this information is not always fully reported. For instance, some studies may focus more on reporting the accuracy of classification models (as in Lha and Suchetha [\u003cspan citationid=\"CR59\" class=\"CitationRef\"\u003e59\u003c/span\u003e]) rather than providing standardized concentration values with uncertainty information that are suitable for direct comparison. Therefore, the value of the NIM-DBA framework here lies not only in identifying rigorous sensor-based studies but also in helping secondary analysts judge whether the data produced by these studies are fit, in both form and precision, for their own quantitative meta-analysis objectives.\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec21\" class=\"Section2\"\u003e\u003ch2\u003eImplications for Future Research: A Dual-Purpose Guide\u003c/h2\u003e\u003cp\u003eThe framework and findings of this study can serve as a dual-purpose guide for future research. For secondary researchers, it provides an operational, retrospective screening tool. For primary researchers, the \"best practice\" principles distilled from this study can serve as a prospective design guide. It informs future investigators that to ensure their research data\u0026mdash;whether from traditional GC-MS or novel sensors\u0026mdash;can be widely and reliably reused and compared by the academic community, they should strive from the outset to adhere to the core principles revealed by this framework (e.g., clear population definitions, standardized sampling procedures, and transparent data reporting). This will help to elevate the research quality of the entire breath analysis field and accelerate the translation from laboratory discoveries to clinical applications.\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec22\" class=\"Section2\"\u003e\u003ch2\u003eStrengths and Limitations\u003c/h2\u003e\u003cp\u003eThe primary strength of this study lies in its methodological innovation: it is the first to combine the stringent requirements for standard reference data from the field of metrology with international ISO data quality standards for the systematic evaluation of secondary literature. Second, through a direct comparison with the classic QUADAS-2 framework, this study is the first to empirically and clearly reveal the fundamental differences and complementary value of the data-driven versus hypothesis-driven evaluation paradigms. Furthermore, our \"AI-assisted, manual verification\" workflow offers an efficient and rigorous paradigm for handling large-scale literature reviews.\u003c/p\u003e\u003cp\u003eHowever, this study also has limitations. First, our case study focused solely on a single biomarker, breath acetone, and the framework's applicability to other VOCs awaits further validation. Second, some aspects of the data evaluation, such as the estimation of p-values, still relied on statistical assumptions and cannot fully replace an analysis of the original raw data. Third, and most importantly, although our framework successfully identified a high-quality subset of studies, the number of articles (n\u0026thinsp;=\u0026thinsp;6) is insufficient to draw a definitive conclusion on the core clinical question of whether breath acetone is a reliable biomarker for diabetes. This reflects, on one hand, the absolute scarcity of high-quality studies in the existing literature, and on the other, suggests that the framework's screening criteria may have room for further optimization to include more valuable data while maintaining rigor.\u003c/p\u003e\u003c/div\u003e"},{"header":"Conclusion","content":"\u003cp\u003eThe core objective of this study was not to re-validate the efficacy of breath acetone as a biomarker for diabetes, but rather to answer a deeper methodological question: among the numerous published studies, what are the common design and execution elements that enable some to yield statistically significant positive conclusions?\u003c/p\u003e\u003cp\u003eBy applying our custom-built framework, which is grounded in metrological principles, we successfully filtered a heterogeneous pool of 38 articles down to a high-quality, high-fitness-for-purpose subset of just six studies. An in-depth analysis of this subset suggests that the observation of a significant difference between diabetic and control groups in these studies is likely not coincidental, but rather stems from their high degree of convergence on key methodological principles. These \"best practice\" principles\u0026mdash;such as the precise collection of alveolar gas, strict control of pre-sampling conditions like fasting, and the use of high-sensitivity analytical techniques like GC-MS\u0026mdash;appear to be important prerequisites for ensuring data quality and the reliability of conclusions.\u003c/p\u003e\u003cp\u003eTherefore, a key conclusion of this study is that the \"contradictions\" prevalent in the literature are likely rooted in differences in methodological rigor. Our framework, acting as an effective \"filter,\" demonstrates its value by providing a systematic process to assess and manage the challenges posed by data heterogeneity, thereby identifying those studies that reached reliable conclusions precisely because they adhered to these \"best practices.\" This provides a valuable prospective guide for future research design in the field, emphasizing the important role of standardizing and optimizing research protocols from the outset to accelerate the entire field's translation from the laboratory to clinical application.\u003c/p\u003e\u003cp\u003eLooking ahead, the data quality and fitness-for-purpose framework proposed in this study has broad application prospects. On one hand, future work can extend this framework to the evaluation of other diseases and VOC biomarkers. On the other hand, the standardized process of the framework provides a solid theoretical foundation for developing automated AI tools for literature quality assessment. This may improve the efficiency and objectivity, thereby accelerating the progress of the entire breath diagnostics field.\u003c/p\u003e"},{"header":"Abbreviations","content":"\u003cp\u003eAC: \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp;\u0026nbsp;Assessment Criterion\u003c/p\u003e\n\u003cp\u003eAI: \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp;\u0026nbsp;Artificial Intelligence\u003c/p\u003e\n\u003cp\u003eCMA: \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp;China Metrology Accreditation\u003c/p\u003e\n\u003cp\u003eCNAS: \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp;China National Accreditation Service for Conformity Assessment\u003c/p\u003e\n\u003cp\u003eEBM: \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp;Evidence-Based Medicine\u003c/p\u003e\n\u003cp\u003ee-nose: \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp;\u0026nbsp;Electronic Nose\u003c/p\u003e\n\u003cp\u003eESRD: \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp;\u0026nbsp;End-Stage Renal Disease\u003c/p\u003e\n\u003cp\u003eGC-MS: \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp;Gas Chromatography-Mass Spectrometry\u003c/p\u003e\n\u003cp\u003eIQR: \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp;Interquartile Range\u003c/p\u003e\n\u003cp\u003eISO: \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp;\u0026nbsp;International Organization for Standardization\u003c/p\u003e\n\u003cp\u003eLLM: \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp;\u0026nbsp;Large Language Model\u003c/p\u003e\n\u003cp\u003eMeSH: \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp;\u0026nbsp;Medical Subject Headings\u003c/p\u003e\n\u003cp\u003eNIM-DBA: \u0026nbsp; \u0026nbsp;\u0026nbsp;National Institute of Metrology - Diabetes Breath Assessment\u003c/p\u003e\n\u003cp\u003eNIM-NMDC: National Institute of Metrology, China - National Center for Metrology Scientific Data and Energy Metrology\u003c/p\u003e\n\u003cp\u003eNR: \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp;\u0026nbsp;Not Reported\u003c/p\u003e\n\u003cp\u003eppbv: \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp;\u0026nbsp;Parts Per Billion by Volume\u003c/p\u003e\n\u003cp\u003ePRISMA: \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp;Preferred Reporting Items for Systematic Reviews and Meta-Analyses\u003c/p\u003e\n\u003cp\u003eQUADAS-2: Quality Assessment of Diagnostic Accuracy Studies 2\u003c/p\u003e\n\u003cp\u003eRoB: \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp;Risk of Bias\u003c/p\u003e\n\u003cp\u003eRWD: \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp;Real-World Data\u003c/p\u003e\n\u003cp\u003eSD: \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp;Standard Deviation\u003c/p\u003e\n\u003cp\u003eSE / SEM: \u0026nbsp; \u0026nbsp; \u0026nbsp;\u0026nbsp;Standard Error of the Mean\u003c/p\u003e\n\u003cp\u003eSIFT-MS: \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp;Selected Ion Flow Tube Mass Spectrometry\u003c/p\u003e\n\u003cp\u003eSOP: \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp;Standard Operating Procedure\u003c/p\u003e\n\u003cp\u003eSPME: \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp;\u0026nbsp;Solid-Phase Microextraction\u003c/p\u003e\n\u003cp\u003eSRD: \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp;\u0026nbsp;Standard Reference Data\u003c/p\u003e\n\u003cp\u003eT1DM: \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp;Type 1 Diabetes Mellitus\u003c/p\u003e\n\u003cp\u003eT2DM: \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp;Type 2 Diabetes Mellitus\u003c/p\u003e\n\u003cp\u003eTD: \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp;Thermal Desorption\u003c/p\u003e\n\u003cp\u003eVOCs: \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp;\u0026nbsp;Volatile Organic Compounds\u003c/p\u003e\n\u003cp\u003eWoS: \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; Web of Science\u003c/p\u003e"},{"header":"Declarations","content":"\u003cp\u003eEthics approval and consent to participate\u003c/p\u003e\n\u003cp\u003eNot applicable. This study is a systematic review of previously published literature and does not involve any new human participants, data, or tissue.\u003c/p\u003e\n\u003cp\u003eConsent for publication\u003c/p\u003e\n\u003cp\u003eNot applicable.\u003c/p\u003e\n\u003cp\u003eAvailability of data and materials\u003c/p\u003e\n\u003cp\u003eAll data generated or analyzed during this study are included in this published article and its supplementary information files.\u003c/p\u003e\n\u003cp\u003eCompeting interests\u003c/p\u003e\n\u003cp\u003eThe authors declare that they have no competing interests.\u003c/p\u003e\n\u003cp\u003eFunding\u003c/p\u003e\n\u003cp\u003eThis work was supported by the Science \u0026amp; Technology Fundamental Resources Investigation Program (Grant No. 2022FY101200).\u003c/p\u003e\n\u003cp\u003eAuthors' contributions\u003c/p\u003e\n\u003cp\u003eL.G. and X.X. conceptualized the study. L.G. developed the methodology, designed the AI-assisted workflow, performed the data analysis, and wrote the original draft. W.Z. and Y.W. conducted the literature screening and performed the manual data verification and extraction. Z.L. assisted with data visualization and software implementation. X.X. acquired funding, provided supervision, and reviewed and edited the manuscript. All authors read and approved the final manuscript.\u003c/p\u003e\n\u003cp\u003eAcknowledgements\u003c/p\u003e\n\u003cp\u003eThe authors would like to acknowledge the National Institute of Metrology, China (NIM), for providing the research platform and resources that made this study possible. We are also sincerely grateful to Dr. Bin Wang and Dr. Heng Zhou for their insightful and inspiring discussions on medicine, artificial intelligence models, and data science, which were instrumental to the development of this work.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\n\u003cli\u003eLee B, Lee J-O, Lee J, Park I, Lee D-S. Breath gas sensors for diabetes and lung cancer diagnosis. J Sens Sci Technol. 2023;32:1\u0026ndash;9. https://doi.org/10.46670/JSST.2023.32.1.1.\u003c/li\u003e\n\u003cli\u003eSharma A, Kumar R, Varadwaj P. Smelling the disease: diagnostic potential of breath analysis. Mol Diagn Ther. 2023;27:321\u0026ndash;47. https://doi.org/10.1007/s40291-023-00640-7.\u003c/li\u003e\n\u003cli\u003eDrabińska N, Flynn C, Ratcliffe N, Belluomo I, Myridakis A, Gould O, et al. A literature survey of all volatiles from healthy human breath and bodily fluids: the human volatilome. J Breath Res. 2021;15. https://doi.org/10.1088/1752-7163/abf1d0.\u003c/li\u003e\n\u003cli\u003eMahnoor M, Shah AA, Inam A. Acetone detection using various techniques for diagnosis of diabetes mellitus from human exhaled breath: a review. AIP Conf. Proc. American Institute of Physics; 2024. https://doi.org/10.1063/5.0214527.\u003c/li\u003e\n\u003cli\u003eHaripriya P, Rangarajan M, Pandya HJ. Breath VOC analysis and machine learning approaches for disease screening: a review. J Breath Res. 2023;17. https://doi.org/10.1088/1752-7163/acb283.\u003c/li\u003e\n\u003cli\u003eDixit K, Fardindoost S, Ravishankara A, Tasnim N, Hoorfar M. Exhaled breath analysis for diabetes diagnosis and monitoring: relevance, challenges and possibilities. Biosensors. 2021;11:476. https://doi.org/10.3390/bios11120476.\u003c/li\u003e\n\u003cli\u003eMiekisch W, Sukul P, Schubert JK. Diagnostic potential of breath analysis \u0026ndash; focus on the dynamics of volatile organic compounds. TrAC, Trends Anal Chem. 2024;180. https://doi.org/10.1016/j.trac.2024.117977.\u003c/li\u003e\n\u003cli\u003eMa P, Li J, Chen Y, Zhou Montano BA, Luo H, Zhang D, et al. Non-invasive exhaled breath diagnostic and monitoring technologies. Microwave Opt Technol Lett. 2023;65:1475\u0026ndash;88. https://doi.org/10.1002/mop.33133.\u003c/li\u003e\n\u003cli\u003eObeidat Y. The most common methods for breath acetone concentration detection: a review. IEEE Sensors J. 2021;21:14540\u0026ndash;58. https://doi.org/10.1109/JSEN.2021.3074610.\u003c/li\u003e\n\u003cli\u003eLiu H, Liu W, Sun C, Huang W, Cui X. A review of non-invasive blood glucose monitoring through breath acetone and body surface. Sens Actuators, A. 2024;374. https://doi.org/10.1016/j.sna.2024.115500.\u003c/li\u003e\n\u003cli\u003eWang W, Zhou W, Wang S, Huang J, Le Y, Nie S, et al. Accuracy of breath test for diabetes mellitus diagnosis: a systematic review and meta-analysis. BMJ Open Diabetes Res Care. 2021;9. https://doi.org/10.1136/bmjdrc-2021-002174.\u003c/li\u003e\n\u003cli\u003eMathew TL, Pownraj P, Abdulla S, Pullithadathil B. Technologies for clinical diagnosis using expired human breath analysis. Diagnostics. 2015;5:27\u0026ndash;60. https://doi.org/10.3390/diagnostics5010027.\u003c/li\u003e\n\u003cli\u003eFan X, Zhong R, Liang H, Zhong Q, Huang H, He J, et al. Exhaled VOC detection in lung cancer screening: a comprehensive meta-analysis. BMC Cancer. 2024;24:775. https://doi.org/10.1186/s12885-024-12537-7.\u003c/li\u003e\n\u003cli\u003eScheepers MHMC, Al-Difaie Z, Brandts L, Peeters A, van Grinsven B, Bouvy ND. Diagnostic performance of electronic noses in cancer diagnoses using exhaled breath a systematic review and meta-analysis. JAMA Netw Open. 2022;5:e2219372. https://doi.org/10.1001/jamanetworkopen.2022.19372.\u003c/li\u003e\n\u003cli\u003eAlamilla-Valenzuela A, Erazo-Lema JS, Hern\u0026aacute;ndez-Hern\u0026aacute;ndez BS, Vega-Escalante B de J, Sarabia-Aguayo VV, Aguirre-Cervantes EL, et al. Exhaled volatile organic compounds: effective in detecting breast cancer? Gac Mex Oncol. 2023;22:122\u0026ndash;9. https://doi.org/10.24875/j.gamo.22000099.\u003c/li\u003e\n\u003cli\u003eMarfatia K, Ni J, Preda V, Nasiri N. Is breath best? A systematic review on the accuracy and utility of nanotechnology based breath analysis of ketones in type 1 diabetes. Biosens-Basel. 2025;15:62. https://doi.org/10.3390/bios15010062.\u003c/li\u003e\n\u003cli\u003eSchwabe D, Becker K, Seyferth M, Kla\u0026szlig; A, Schaeffter T. The METRIC-framework for assessing data quality for trustworthy AI in medicine: a systematic review. npj Digital Med. 2024;7:1\u0026ndash;30. https://doi.org/10.1038/s41746-024-01196-4.\u003c/li\u003e\n\u003cli\u003eFadahunsi KP, Akinlua JT, O\u0026rsquo;Connor S, Wark PA, Gallagher J, Carroll C, et al. Protocol for a systematic review and qualitative synthesis of information quality frameworks in eHealth. BMJ OPEN. 2019;9:e024722. https://doi.org/10.1136/bmjopen-2018-024722.\u003c/li\u003e\n\u003cli\u003ePatone M, Zhang L-C. On two existing approaches to statistical analysis of social media data. Int Stat Rev. 2021;89:54\u0026ndash;71. https://doi.org/10.1111/insr.12404.\u003c/li\u003e\n\u003cli\u003eFadahunsi KP, O\u0026rsquo;Connor S, Akinlua JT, Wark PA, Gallagher J, Carroll C, et al. Information quality frameworks for digital health technologies: systematic review. J Med Internet Res. 2021;23:e23479. https://doi.org/10.2196/23479.\u003c/li\u003e\n\u003cli\u003eDeclerck J, Kalra D, Vander Stichele R, Coorevits P. Frameworks, dimensions, definitions of aspects, and assessment methods for the appraisal of quality of health data for secondary use: comprehensive overview of reviews. JMIR Med Inf. 2024;12:e51560. https://doi.org/10.2196/51560.\u003c/li\u003e\n\u003cli\u003eSchmidt CO, Struckmann S, Enzenbach C, Reineke A, Stausberg J, Damerow S, et al. Facilitating harmonized data quality assessments. A data quality framework for observational health research data collections with software implementations in R. BMC Med Res Methodol. 2021;21:63. https://doi.org/10.1186/s12874-021-01252-7.\u003c/li\u003e\n\u003cli\u003eZhang L, Jeong D, Lee S. Data quality management in the internet of things. Sens. 2021;21:5834. https://doi.org/10.3390/s21175834.\u003c/li\u003e\n\u003cli\u003eIjab MT, Surin ESM, Nayan NM. Conceptualizing big data quality framework from a systematic literature review perspective. Malays J Comput Sci. 2019;:25\u0026ndash;37. https://doi.org/10.22452/mjcs.sp2019no1.2.\u003c/li\u003e\n\u003cli\u003eDaikeler J, Froehling L, Sen I, Birkenmaier L, Gummer T, Schwalbach J, et al. Assessing data quality in the age of digital social research: a systematic review. Social Sci Comput Rev. 2024. https://doi.org/10.1177/08944393241245395.\u003c/li\u003e\n\u003cli\u003eCichy C, Rass S. An overview of data quality frameworks. IEEE Access. 2019;7:24634\u0026ndash;48. https://doi.org/10.1109/ACCESS.2019.2899751.\u003c/li\u003e\n\u003cli\u003eBhana B, Flowerday S, Satt A. Using participatory crowdsourcing in South Africa to create a safer living environment. Int J Distrib Sens Netw. 2013;:907196. https://doi.org/10.1155/2013/907196.\u003c/li\u003e\n\u003cli\u003eZou KH, Berger ML. Real-world data and real-world evidence in healthcare in the United States and Europe union. Bioeng-basel. 2024;11:784. https://doi.org/10.3390/bioengineering11080784.\u003c/li\u003e\n\u003cli\u003eShabani J, Salim N, Bohne C, Day LT, Kumalija C, Makuwani AM, et al. Neonatal indicator data in Tanzania district health information system: evaluation of availability and quality of selected newborn indicators, 2015-2022. BMC Pediatr. 2025;23:658. https://doi.org/10.1186/s12887-025-05417-x.\u003c/li\u003e\n\u003cli\u003eOkwaraji YB, Bradley E, Ohuma EO, Yargawa J, Suarez-Idueta L, Requejo J, et al. National routine data for low birthweight and preterm births: systematic data quality assessment for united nations member states (2000-2020). Bjog-Int J Obstet Gynaecol. 2024;131:917\u0026ndash;28. https://doi.org/10.1111/1471-0528.17699.\u003c/li\u003e\n\u003cli\u003eGyrard A, Abedian S, Gribbon P, Manias G, van Nuland R, Zatloukal K, et al. Lessons learned from european health data projects with cancer use cases: implementation of health standards and internet of things semantic interoperability. J Med Internet Res. 2025;27:e66273. https://doi.org/10.2196/66273.\u003c/li\u003e\n\u003cli\u003eJin M-J, Li E-M, Xu L-Y. Diagnostic accuracy of breath tests based on volatile organic compounds for cancer: a systematic review and meta-analysis. Clinical Biochemistry. 2025;136:110898. https://doi.org/10.1016/j.clinbiochem.2025.110898.\u003c/li\u003e\n\u003cli\u003eHealy A, Duggan C, Foley B, Flynn R, Huss T. Development of a data quality framework for health and social care - a strategic approach to assess and improve the quality of health data and information in Ireland. J Epidemiol Community Health. 2019;73:A102\u0026ndash;A102. https://doi.org/10.1136/jech-2019-SSMabstracts.218.\u003c/li\u003e\n\u003cli\u003eLaberge M, Shachak A. Developing a tool to assess the quality of socio-demographic data in community health centres. Appl Clin Inf. 2013;4:1\u0026ndash;11. https://doi.org/10.4338/ACI-2012-10-CR-0041.\u003c/li\u003e\n\u003cli\u003eComero S, Dalla Costa S, Cusinato A, Korytar P, Kephalopoulos S, Bopp S, et al. A conceptual data quality framework for IPCHEM - the european commission information platform for chemical monitoring. TrAC, Trends Anal Chem. 2020;127:115879. https://doi.org/10.1016/j.trac.2020.115879.\u003c/li\u003e\n\u003cli\u003eNg MY, Youssef A, Miner AS, Sarellano D, Long J, Larson DB, et al. Perceptions of data set experts on important characteristics of health data sets ready for machine learning. JAMA Netw Open. 2023;6:e2345892. https://doi.org/10.1001/jamanetworkopen.2023.45892.\u003c/li\u003e\n\u003cli\u003eTute E, Mast M, Wulff A. Targeted data quality analysis for a clinical decision support system for SIRS detection in critically ill pediatric patients. Methods Inf Med. 2023;62:e1\u0026ndash;9. https://doi.org/10.1055/s-0042-1760238.\u003c/li\u003e\n\u003cli\u003eKookal KK, Walji MF, Brandon R, Kivanc F, Mertz E, Kottek A, et al. Systematically assessing the quality of dental electronic health record data for an investigation into oral health care disparities. J Public Health Dent. 2024;84:242\u0026ndash;50. https://doi.org/10.1111/jphd.12618.\u003c/li\u003e\n\u003cli\u003eKuusisto A, Saranto K, Korhonen P, Haavisto E. Quality of information transferred to palliative care. J Clin Nurs. 2023;32:3421\u0026ndash;33. https://doi.org/10.1111/jocn.16453.\u003c/li\u003e\n\u003cli\u003eWidad E, Saida E, Gahi Y. Quality anomaly detection using predictive techniques: an extensive big data quality framework for reliable data analysis. IEEE Access. 2023;11:103306\u0026ndash;18. https://doi.org/10.1109/ACCESS.2023.3317354.\u003c/li\u003e\n\u003cli\u003eMcCord SE, Webb NP, Van Zee JW, Burnett SH, Christensen EM, Courtright EM, et al. Provoking a cultural shift in data quality. Bioscience. 2021;71:647\u0026ndash;57. https://doi.org/10.1093/biosci/biab020.\u003c/li\u003e\n\u003cli\u003eBlacketer C, Defalco FJ, Ryan PB, Rijnbeek PR. Increasing trust in real-world evidence through evaluation of observational data quality. J Am Med Inf Assoc. 2021;28:2251\u0026ndash;7. https://doi.org/10.1093/jamia/ocab132.\u003c/li\u003e\n\u003cli\u003eHillert J, Butzkueven H, Magyari M, Wergeland S, Moore N, Soilu-Hanninen M, et al. Harmonized data quality indicators maintain data quality in long-term safety studies using multiple sclerosis registries/data sources: experience from the CLARION study. Clin Epidemiol. 2024;16:717\u0026ndash;32. https://doi.org/10.2147/CLEP.S480525.\u003c/li\u003e\n\u003cli\u003eCamilo Corrales D, Ledezma A, Carlos Corrales J. From theory to practice: a data quality framework for classification tasks. Symmetry-basel. 2018;10:248. https://doi.org/10.3390/sym10070248.\u003c/li\u003e\n\u003cli\u003eJiang G, Dhruva SS, Chen J, Schulz WL, Doshi AA, Noseworthy PA, et al. Feasibility of capturing real-world data from health information technology systems at multiple centers to assess cardiac ablation device outcomes: a fit-for-purpose informatics analysis report. J Am Med Inf Assoc. 2021;28:2241\u0026ndash;50. https://doi.org/10.1093/jamia/ocab117.\u003c/li\u003e\n\u003cli\u003eWirsching J, Gra\u0026szlig;mann S, Eichelmann F, Harms LM, Schenk M, Barth E, et al. Development and reliability assessment of a new quality appraisal tool for cross-sectional studies using biomarker data (BIOCROSS). BMC Med Res Methodol. 2018;18:122. https://doi.org/10.1186/s12874-018-0583-x.\u003c/li\u003e\n\u003cli\u003eNguyen T, Nguyen H-T, Nguyen-Hoang T-A. Data quality management in big data: strategies, tools, and educational implications. J Parallel Distrib Comput. 2025;200:105067. https://doi.org/10.1016/j.jpdc.2025.105067.\u003c/li\u003e\n\u003cli\u003eDeady M, Duncan R, Jones LD, Sang A, Goodness B, Pandey A, et al. Data quality and timeliness analysis for post-vaccination adverse event cases reported through healthcare data exchange to FDA BEST pilot platform. Front Public Health. 2024;12:1379973. https://doi.org/10.3389/fpubh.2024.1379973.\u003c/li\u003e\n\u003cli\u003eNaik S, Voong S, Bamford M, Smith K, Joyce A, Grinspun D. Assessment of the nursing quality indicators for reporting and evaluation (NQuIRE) database using a data quality index. J Am Med Inf Assoc. 2020;27:776\u0026ndash;82. https://doi.org/10.1093/jamia/ocaa031.\u003c/li\u003e\n\u003cli\u003eSmith M, Lix LM, Azimaee M, Enns JE, Orr J, Hong S, et al. Assessing the quality of administrative data for research: a framework from the manitoba centre for health policy. J Am Med Inf Assoc. 2018;25:224\u0026ndash;9. https://doi.org/10.1093/jamia/ocx078.\u003c/li\u003e\n\u003cli\u003eIverson R, Taljaard M, Geraghty MT, Pugliese M, Tingley K, Coyle D, et al. Assessing the quality and value of metabolic chart data for capturing core outcomes for pediatric medium-chain acyl-CoA dehydrogenase (MCAD) deficiency. BMC Pediatr. 2024;24:37. https://doi.org/10.1186/s12887-023-04393-4.\u003c/li\u003e\n\u003cli\u003eKrishna CM, Ruikar K, Jha KN. Determinants of data quality dimensions for assessing highway infrastructure data using semiotic framework. Buildings. 2023;13:944. https://doi.org/10.3390/buildings13040944.\u003c/li\u003e\n\u003cli\u003eLarburu N, Bults RGA, Van Sinderen MJ, Widya I, Hermens HJ. An ontology for telemedicine systems resiliency to technological context variations in pervasive healthcare. IEEE J Transl Eng Health Med. 2015;3:2900110. https://doi.org/10.1109/JTEHM.2015.2458870.\u003c/li\u003e\n\u003cli\u003eIso/iec 25012:2008. Iso. https://www.iso.org/standard/35736.html. Accessed 20 Aug 2025.\u003c/li\u003e\n\u003cli\u003eIso/iec 25024:2015. Iso. https://www.iso.org/standard/35749.html. Accessed 20 Aug 2025.\u003c/li\u003e\n\u003cli\u003eGoerl T, Kischkel S, Sawacki A, Fuchs P, Miekisch W, Schubert JK. Volatile breath biomarkers for patient monitoring during haemodialysis. J Breath Res. 2013;7:17116. https://doi.org/10.1088/1752-7155/7/1/017116.\u003c/li\u003e\n\u003cli\u003eGhimenti S, Tabucchi S, Lomonaco T, Di Francesco F, Fuoco R, Onor M, et al. Monitoring breath during oral glucose tolerance tests. J Breath Res. 2013;7:17115. https://doi.org/10.1088/1752-7155/7/1/017115.\u003c/li\u003e\n\u003cli\u003eChien P-J, Suzuki T, Tsujii M, Ye M, Minami I, Toda K, et al. Biochemical gas sensors (biosniffers) using forward and reverse reactions of secondary alcohol dehydrogenase for breath isopropanol and acetone as potential volatile biomarkers of diabetes mellitus. Anal Chem. 2017;89:12261\u0026ndash;8. https://doi.org/10.1021/acs.analchem.7b03191.\u003c/li\u003e\n\u003cli\u003eLekha S, Suchetha MS. Real-time non-invasive detection and classification of diabetes using modified convolution neural network. IEEE J Biomed Health Inform. 2018;22:1630\u0026ndash;6. https://doi.org/10.1109/JBHI.2017.2757510.\u003c/li\u003e\n\u003cli\u003eSha MS, Maurya MR, Shafath S, Cabibihan J-J, Al-Ali A, Malik RA, et al. Breath analysis for the in vivo detection of diabetic ketoacidosis. ACS Omega. 2022;7:4257\u0026ndash;66. https://doi.org/10.1021/acsomega.1c05948.\u003c/li\u003e\n\u003cli\u003eLi W, Liu Y, Lu X, Huang Y, Liu Y, Cheng S, et al. A cross-sectional study of breath acetone based on diabetic metabolic disorders. J Breath Res. 2015;9. https://doi.org/10.1088/1752-7155/9/1/016005.\u003c/li\u003e\n\u003c/ol\u003e"},{"header":"Footnotes","content":"\u003cp\u003e\u003csup\u003e1\u003c/sup\u003eThe number of studies in this final sample (n=38) is coincidentally identical to the 38 representative publications analyzed for the framework's paradigmatic development (see Table 1). These two sets of literature are distinct and should not be confused.\u003c/p\u003e\n\u003cp\u003e\u003csup\u003e2\u003c/sup\u003eNote on the deep screening prompt (Supplementary File 4): The prompt intentionally instructs the AI to 'Include' documents containing either primary (original) or secondary (cited) data. This was a strategic choice. Although studies containing only secondary data were excluded from the final analysis of the present study, identifying them is highly valuable for future work. For instance, these references can be used to trace and discover additional original studies (a process known as citation snowballing). The prompt was therefore designed in its current form to serve this broader research objective.\u003c/p\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"bmc-endocrine-disorders","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"bend","sideBox":"Learn more about [BMC Endocrine Disorders](http://bmcendocrdisord.biomedcentral.com/)","snPcode":"","submissionUrl":"https://www.editorialmanager.com/bend/default.aspx","title":"BMC Endocrine Disorders","twitterHandle":"BMC_series","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"em","reportingPortfolio":"BMC Series","inReviewEnabled":true,"inReviewRevisionsEnabled":true},"keywords":"Breath Analysis, Diabetes Mellitus, Volatile Organic Compounds (VOCs), Acetone, Data Quality, Fitness-for-Purpose, Systematic Review, Assessment Framework, Biomarker","lastPublishedDoi":"10.21203/rs.3.rs-7455257/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-7455257/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003ch2\u003eBackground\u003c/h2\u003e\u003cp\u003eExhaled breath analysis is a promising field for non-invasive diabetes diagnostics, but its clinical translation is hindered by contradictory findings across studies. We argue that this inconsistency stems from significant methodological heterogeneity and the lack of appropriate criteria for screening published data for secondary analysis. Existing tools, such as QUADAS-2, assess the quality of clinical study design but are not equipped to evaluate the technical comparability and fitness-for-purpose of the quantitative data itself. This study aimed to develop and validate a novel, data-driven framework to systematically assess the fitness-for-purpose of published data, thereby addressing this critical gap.\u003c/p\u003e\u003ch2\u003eMethods\u003c/h2\u003e\u003cp\u003eWe developed the National Institute of Metrology - Diabetes Breath Assessment (NIM-DBA) framework, a multi-domain quality and fitness-for-purpose assessment tool. Its theoretical basis is derived from the stringent specifications for Standard Reference Data in metrology and aligned with international ISO data quality standards. A systematic literature search was conducted in PubMed, Scopus, Embase, and Web of Science (up to April 2025) to identify studies reporting quantitative data on breath volatile organic compounds (VOCs) in diabetic patients. Using breath acetone as a case study, we applied the NIM-DBA framework to the resulting literature pool. A parallel assessment using the QUADAS-2 tool was also performed on the same pool to compare the data-driven (NIM-DBA) and hypothesis-driven (QUADAS-2) evaluation paradigms.\u003c/p\u003e\u003ch2\u003eResults\u003c/h2\u003e\u003cp\u003eThe systematic search identified an initial pool of 38 eligible studies. Application of the multi-stage NIM-DBA screening process filtered this heterogeneous pool down to a core subset of only six studies (15.8%) that met all criteria for high data quality and fitness-for-purpose. In contrast, the parallel QUADAS-2 assessment of the same 38 studies revealed widespread high or unclear risk of bias, particularly in the domains of Patient Selection (79% high risk) and Index Test reporting (82% unclear risk). The six studies that passed the NIM-DBA framework demonstrated a highly consistent biological pattern\u0026mdash;elevated breath acetone concentrations in diabetic patients\u0026mdash;and shared common methodological best practices, such as standardized alveolar gas collection and the use of high-sensitivity analytical instruments.\u003c/p\u003e\u003ch2\u003eConclusion\u003c/h2\u003e\u003cp\u003eThe prevalent contradictory conclusions in breath analysis literature are likely attributable to differences in methodological rigor rather than biomarker instability. The proposed NIM-DBA framework is an effective tool for systematically managing data heterogeneity, filtering literature for secondary analysis, and identifying methodologically robust studies. This data-driven approach provides a necessary complement to classic clinical evaluation tools, offering a new perspective on research quality assessment and providing valuable guidance for future study design in the field.\u003c/p\u003e","manuscriptTitle":"Assessing the Fitness-for-Purpose of Published Breath Analysis Data: A Quality Assessment Framework for Diabetes Biomarker Research","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-10-08 09:59:42","doi":"10.21203/rs.3.rs-7455257/v1","editorialEvents":[{"type":"communityComments","content":0},{"type":"reviewersInvited","content":"","date":"2025-09-25T10:53:38+00:00","index":"","fulltext":""},{"type":"editorAssigned","content":"","date":"2025-09-23T09:59:33+00:00","index":"","fulltext":""},{"type":"editorInvited","content":"","date":"2025-08-28T08:20:19+00:00","index":"","fulltext":""},{"type":"checksComplete","content":"","date":"2025-08-27T12:13:00+00:00","index":"","fulltext":""},{"type":"submitted","content":"BMC Endocrine Disorders","date":"2025-08-27T12:08:01+00:00","index":"","fulltext":""}],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"bmc-endocrine-disorders","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"bend","sideBox":"Learn more about [BMC Endocrine Disorders](http://bmcendocrdisord.biomedcentral.com/)","snPcode":"","submissionUrl":"https://www.editorialmanager.com/bend/default.aspx","title":"BMC Endocrine Disorders","twitterHandle":"BMC_series","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"em","reportingPortfolio":"BMC Series","inReviewEnabled":true,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"10bda5ef-fb4a-4400-bd7e-36c49bbc314d","owner":[],"postedDate":"October 8th, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"under-review","subjectAreas":[],"tags":[],"updatedAt":"2025-10-08T09:59:43+00:00","versionOfRecord":[],"versionCreatedAt":"2025-10-08 09:59:42","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-7455257","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-7455257","identity":"rs-7455257","version":["v1"]},"buildId":"8U1c8b4HqxoKbykW_rLl7","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.