{"paper_id":"14748134-0c44-4e71-9f6b-4c980af2a544","body_text":"Minimum genomic data sets for rare diseases: A systematic review | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Systematic Review Minimum genomic data sets for rare diseases: A systematic review Filipe Andrade Bernardi, Natana Chaves Rabelod, Claudia Fernandes Lorea, and 9 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-8204628/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract Background Minimum data sets (MDS) are used to harmonize the capture and exchange of rare-disease information across studies and care settings, but the genomic component of these frameworks is often inconsistently specified. In our sample of included studies (n = 23), only 2 explicitly reported using Whole-Exome Sequencing (WES) or Whole-Genome Sequencing (WGS), highlighting a persistent gap in genomic method reporting alongside heterogeneity in scope, standards adoption, and reported impacts. Methods We performed a systematic review (searches through 2024) to identify publications proposing, developing, or applying MDS that included genomic elements for rare diseases. Screening was conducted in two steps: (i) independent title/abstract screening by two reviewer pairs with conflict resolution by a third reviewer, followed by (ii) independent full-text assessment by two reviewers. We extracted study characteristics, MDS domains, intended use context, referenced standards/ontologies, level of genomic reporting, and reported outcomes. Results were summarized with descriptive statistics, Jaccard-based co-occurrence patterns, and exploratory association analyses. Results Twenty-three studies met the inclusion criteria and were mostly produced in Europe and North America. Clinical/phenotypic information was nearly universal (95.7%), whereas genomic data were included in 69.6% of cases and were usually described without specifying the sequencing modality. Most studies targeted biomedical/genomic research (91.3%) and clinical diagnosis/care (69.6%). Standards use was modest (median = 1 per study), with the most frequent being HPO (26.1%), Orphanet/Orphacode (21.7%), FAIR (17.4%), and ICD (8.7%). Reported benefits were more common at the system level (e.g., interoperability or policy-related outputs) than as consistently quantified clinical effects. Exploratory analyses suggested that practices such as planned reanalysis, phenotype–genotype linkage, and explicit handling of structural variants may be associated with greater clinical/knowledge gains than the sequencing modality alone, although evidence remained insufficient to draw firm causal conclusions. Conclusions Rare-disease MDS commonly captures clinical information but often underspecifies core genomic details and inconsistently applies standards, limiting comparability and interoperability. Progress would benefit from a minimal genomic reporting core (sequencing approach, reference genome, variant classes, and analysis/annotation pipeline descriptors) aligned with widely used ontologies and interoperability principles, together with routine inclusion of patient-centered outcomes and biospecimen linkages. Rare Disease Genomic Minimum dataset Figures Figure 1 1. Introduction a. Contextualization of Rare Diseases Rare diseases (RD) have gained prominence in public health due to their collective contribution to chronic illness, despite their individual rarity. Approximately 6,000 to 8,000 distinct rare conditions collectively affect a substantial portion of the population, necessitating governmental interventions, including the development and implementation of targeted public policies (Austin et al., 2018 ). Given that over 72% of RDs are of genetic origin, a comprehensive understanding requires detailed examination of genetic variation within affected populations ( Fu et al., 2023 ). Effective information sharing among stakeholders is critical for advancing genomic knowledge across diverse populations, underscoring the importance of collaborative, ethical, and methodologically rigorous research partnerships ( Fu et al., 2023 ). Understanding the biological mechanisms underlying RDs and developing new diagnostic or therapeutic approaches for RDs remain challenging for researchers and the pharmaceutical sector. Key obstacles include poor institutional coordination and limited sharing of diagnostic resources and information (I nternational Rare Diseases Research Consortium, 2017; Taruscio et al., 2020 ) . Globally, health data management is hindered by a lack of standardized terminology and data structures (Wilkinson et al., 2016 ). This limits effective data collection, recording, and analysis, which are essential for research and strong public health policies. These problems are especially acute in RDs, where data are often fragmented and dispersed ( Wilkinson et al., 2016 ). Recent advances in sequencing technologies have significantly reduced the time required to translate genetic insights into patient outcomes, enabling therapeutic decisions to be made within days rather than years. This acceleration has improved the quality of life for families affected by rare diseases. The integration of genomics into research, diagnosis, and treatment is now a cornerstone of modern medicine. Genomic data analysis yields essential insights into the genetic mechanisms underlying RDs, thereby advancing both research and clinical care. Beyond variant detection, computational pipelines and phenotype-driven prioritization tools enable the identification of subtle mutations and structural variants that may be missed by conventional diagnostics. These innovations facilitate the development of precise diagnostic assays and support the identification of novel therapeutic targets tailored to specific molecular alterations. Additionally, integrating genomic data with clinical, phenotypic, and epidemiological information enables patient stratification, supports more effective treatment strategies, and advances precision public health initiatives ( Kent et al., 2023 ). Recent literature highlights the importance of establishing minimum data sets (MDS) to ensure ethical, efficient, and standardized data collection, improving the planning, implementation, and evaluation of public health interventions (Bernardi et al., 2023 ). Genomic MDS provide structured frameworks that integrate genetic variants, clinical data, and metadata to promote consistency and interoperability. This standardization helps create robust, comparable datasets that support accurate diagnosis, personalized treatment, and evidence-based policy development, while enabling secure and efficient data sharing across health systems (Stark et al., 2019 ). b. Study scenario and relevance Research and understanding of RD are increasingly supported by comprehensive databases and resources that consolidate diverse data types, thereby facilitating academic research and clinical applications. An example of this integration is the work of the National Institutes of Health (NIH) through the Genetic and Rare Diseases Information Center (GARD), which has developed a disease harmonization database. This platform is significant because it combines GARD data with other databases to enable the investigation of RD, particularly those with a genetic etiology. This integration is crucial for researchers and healthcare providers, as it allows for a deeper understanding of RD through accessible, high-quality information ( Sequeira et al., 2021 ). The analysis presented by Pintos-Morell et al. highlights the evolving landscape of genomic implementation in newborn screening for hereditary metabolic disorders. This research emphasizes the integration of genomic tools into public health strategies to improve early detection and treatment of RD, thereby significantly enhancing health outcomes. These efforts reflect a broader trend toward the use of genomic data for disease diagnosis, proactive health management, and preventive care ( Pintos-Morell et al., 2024 ) . Complementing these resources, FindZebra is a specialized tool for diagnosing RD and indexing articles from GARD and other notable databases such as Online Mendelian Inheritance in Man (OMIM) and Orphanet. Recognizing the challenges of diagnosing RD, given its rarity and complex phenotypes, FindZebra is designed to optimize the search for RD based on symptoms, clinical characteristics, and phenotypic information ( Liévin et al., 2023 ) . At the continental level, the European Genomic Data Infrastructure (EGDI) focuses on creating and maintaining a genomic data infrastructure across Europe (Schmitt et al., 2024 ). This initiative facilitates the sharing and analysis of large volumes of genomic data, essential for advancing personalized medicine and improving the understanding and treatment of diseases, including RD (Visibelli et al., 2022 ). The EGDI builds on the outcomes of the Beyond 1 Million Genomes (B1MG) project and fulfills the ambition of the 1 + Million Genomes (1 + MG) initiative by establishing a federated, sustainable, and secure infrastructure for accessing genomic, phenotypic, and related clinical data ( Schmitt et al., 2024 ). Similarly, DisGeNET and ClinVar are among the largest publicly accessible collections of genes and variants associated with human diseases. They integrate data from curated repositories, genomic association study catalogs, animal models, and extensive scientific literature, uniformly annotated with controlled vocabularies and community-driven ontologies ( Piñero et al., 2019 ) . The Global Alliance for Genomics and Health (GA4GH) is an international coalition that develops frameworks and standards to facilitate the responsible, voluntary, and secure sharing of genomic and clinical data. It aims to accelerate genomic research and medicine by promoting interoperability and data sharing across institutions worldwide, enabling large-scale collaborative studies and advancing understanding of human health and disease globally ( World Health Organization, 2024 ) . Building on this foundation, recent global initiatives, such as those outlined by the World Health Organization (WHO), aim to establish ethical, legal, and equitable frameworks for the access, use, and sharing of human genome data. These frameworks ensure that such activities promote human health and well-being, uphold social justice, and foster public trust and transparency. WHO principles emphasize the importance of including diverse populations in genomic datasets to avoid perpetuating health inequities and ensure that genomic data align with local health needs and contexts. Various initiatives and databases support efforts to advance genomic medicine in Latin America by integrating genomic data into clinical practice and research. The Latin American Network for Genomic Medicine (LatinGen) fosters the integration of genomic data into clinical practice across Latin America and promotes collaboration among researchers, clinicians, and institutions. The Leiden Open Variation Database (LOVD) platforms serve as critical repositories for genetic variants, with a focus on local populations in Argentina and Mexico. Similarly, ChileGenómico is a national initiative to integrate genomic data into clinical practice and research. These efforts enhance precision medicine initiatives, improving the understanding of the genetic basis of diseases in these countries (Bernardi et al., 2025 ). The Brazilian Initiative on Precision Medicine (BIPMed) collects and shares genomic data specific to the Brazilian population ( Rocha et al., 2020 ). Similarly, in focusing on RD diagnosis, the Brazilian Rare Genomes Project aims to integrate whole-genome sequencing (WGS) into the Brazilian public healthcare system ( Coelho et al., 2022 ) . These initiatives aim to address the genetic diversity and ancestry proportions of Brazilian populations, thereby enhancing precision medicine and improving diagnostic capabilities for genetic and rare disorders. They provide valuable insights into disease predisposition and help to fill gaps in global genomic databases. Despite the growth of genomics programs and rare-disease registries, many initiatives still collect genomic elements in ways that are difficult to align across projects. Fragmented data structures and uneven documentation of sequencing and interpretation workflows hinder reproducible diagnosis and downstream reuse, thereby weakening health systems' ability to scale evidence-based rare-disease policies (Bernardi et al., 2023 ). These barriers are amplified in low-resource or historically underrepresented settings, where limitations in infrastructure, funding, and specialized workforce can constrain adoption. Strengthening international collaboration and investing in locally appropriate implementation strategies are therefore central to equitable uptake. A systematic review of genomic minimum data sets can clarify how existing initiatives define “minimum” in practice, where genomic reporting is most frequently incomplete, and which design choices are most consistently associated with intended outcomes. By consolidating how MDS are specified and used, the literature can be mapped to reveal common patterns, gaps, and opportunities for harmonization that support both research and care pathways. c. Purpose of the Review This review evaluates how genomic minimum data sets are defined, operationalized, and reported in rare-disease research and clinical contexts, and summarizes the impacts attributed to their use (e.g., diagnostic, knowledge-generation, and system-level outcomes). 2. Methods a) Study Design We conducted a systematic review (SR) to synthesize evidence on how genomic minimum data sets are defined and applied in rare-disease research and clinical practice. The review methods followed established systematic review guidance to ensure transparent selection, extraction, and synthesis (Higgins et al., 2019 ). The review protocol was prospectively registered in PROSPERO (CRD42024510192) to document the planned methods in advance and support transparency in reporting ( https://www.crd.york.ac.uk/prospero ) b) Defining the Research Question We structured the review question using the PIcO framework (Population, Phenomenon of Interest, Comparison, Outcome) to define the target population, the genomic MDS concept under evaluation, and the outcomes of interest, thereby keeping eligibility criteria aligned with the review objectives (Higgins et al., 2019 ). The PIcO strategy for this study was defined as follows: Population (P) RD patients (of genetic or non-genetic origin), including population groups involved in studies related to research, diagnosis, and treatment using genomic data. Phenomenon of Interest (I) Definition, use, and analysis of genomic MDS in studies, diagnostics, or clinical practices. Comparison(c) / Outcome (O) : Clinical and research impacts derived from implementing minimal genomic data sets, such as improved diagnostic accuracy, personalized treatments, and enhanced data interoperability. The mnemonic of this strategy led to the following central question: How are minimal genomic datasets defined and utilized in the research, diagnosis, and treatment of rare monogenic diseases? Given the diversity of clinical presentations and their relevance for treatment and research, four additional subquestions guide this investigation. These subquestions are intended to deepen the analysis of practical and differential aspects in the application of genomic data in rare monogenic diseases, specifically: Which genetic RD and patient populations are included in studies of minimal genomic data sets? What specific types of genomic data are selected for these minimal sets, and how often are they updated or revised? What are the clinical impacts and research advancements observed with the implementation of these minimal genomic data sets? To preserve a focused biological and interpretive scope for genomic elements, we limited the population to monogenic rare diseases and excluded non-monogenic rare conditions. This restriction supported clearer comparisons in how “minimum” genomic content is specified, curated, and reused across initiatives. c) Key concepts definition To build a sensitive and specific search strategy, we combined controlled vocabulary terms (e.g., MeSH) with free-text keywords. Controlled terms helped standardize concept retrieval across databases, while keywords captured newer or commonly used expressions that may not yet be consistently indexed. This dual approach aimed to improve recall without sacrificing relevance, and the final PIcO-derived descriptor set is summarized in Table 1 . Table 1 PIcO Strategy and Descriptors Used in the Search Strategy PIcO Strategy Controlled Descriptors (MeSH) Uncontrolled Descriptors (Keywords) P - P opulation Rare Diseases Genetic Diseases Orphan diseases Genetic rare diseases; Rare Genetic Disorders; I - (Phenomenon of) I nterest Health Information Systems Data Collection Health Services Research minimum data set core data Data gathering cO - Outcome Diagnostic Accuracy Treatment Outcome Data Interoperability Health Informatics Genomic Data Sharing Health Policy Precision Medicine Public Health Ethics, Medical Biomedical Research Personalized Medicine Diagnostic accuracy; Diagnostic precision; Treatment outcomes; Therapeutic outcomes; Data interoperability; Health informatics; Medical informatics; Genomic data sharing; Public Health Policy; Genomic policies Targeted therapy; Healthcare regulations; d) Search strategy Search strings were tailored to each database’s advanced interface, using combinations of controlled terms and keywords with Boolean logic to balance sensitivity and precision. We executed electronic searches in November 2024 in PubMed (NLM), Scopus, LILACS, Web of Science, and CINAHL. Boolean operators (AND/OR/NOT) and database-specific filters were applied as needed to refine retrieval while preserving the conceptual scope defined in PIcO. e) Screening, Selection, and Extraction of studies Eligibility criteria were applied consistently across screening and full-text review to ensure that included publications addressed genomic components of MDS for human monogenic rare diseases. We included studies in English, Portuguese, Spanish, or Italian, with no publication-year restriction. We excluded publications unrelated to human genomic MDS (e.g., non-human studies or MDS outside genomics), records not accessible electronically due to paywalls, and non-scholarly sources such as websites or social media advertisements. All records were de-duplicated and managed in Rayyan ( https://rayyan.qcri.org/ ) to support blinded screening and collaborative review workflows (Ouzzani et al., 2016 ). Titles/abstracts/keywords were screened first, followed by full-text assessment of retained articles. Two reviewers independently assessed full texts; disagreements were resolved through adjudication. Inter-reviewer agreement (include/exclude) was summarized using raw agreement and Cohen’s κ based on the latest independent decisions, excluding adjudicator votes. For included studies, we extracted: (i) design and publication characteristics; (ii) descriptive features of the proposed/used MDS (e.g., geographical scope, targeted disease group, referenced standards); (iii) the context of MDS use (research, laboratory, diagnostic, and/or treatment settings); and (iv) reported outcomes, including diagnostic effects, treatment personalization, and system-level impacts (e.g., interoperability or policy relevance). We prioritized documents aligned with WHO principles for access, use, and sharing of human genome data where applicable (World Health Organization, 2024 ). f) Analysis and synthesis of evidence We synthesized evidence using a mixed approach that combined structured narrative synthesis with thematic coding. First, studies were grouped by scope and intended use (e.g., registry design, clinical implementation, research infrastructure), enabling comparisons of the genomic elements selected, their documentation, and the outcomes reported. We then applied thematic coding to capture recurring design features (e.g., sequencing modality specification, ontology use, re-analysis planning) and to interpret how these features were discussed in relation to diagnostic, knowledge-generation, and system-level impacts. For quantitative summaries, domains, contexts, and outcomes were coded as binary indicators (present/absent). Explicit “Yes” statements were coded as 1; “Qualitative/Indirect,” “Not reported,” and “No/Absent” were coded as 0 (while their frequencies were still described). Using these indicators, we defined three composite indices: a Clinical Impact Index (diagnostic accuracy, time-to-diagnosis, personalized treatment, prognosis change; range 0–4), a Knowledge Index (new gene/biomarker discovery, genomic annotation advances, molecular classification improvements; range 0–3), and a System Index (policy influence, impact on public programs, interoperability/standardization; range 0–3). Their sum formed a Total Impact Score (range 0–10). Co-occurrence between elements was assessed using the Jaccard index (intersection/union), and temporal comparisons were made between two periods (≤ 2018 vs. 2019–2024) without sample-size weighting. Finally, the qualitative interpretation emphasized the relevance of implementation, how MDS components were operationalized in practice, and the implications for generalizability and clinical utility across settings. g) Risk of bias assessment We assessed methodological quality using the Mixed Methods Appraisal Tool (MMAT, 2018), which supports appraisal across heterogeneous designs (qualitative, quantitative descriptive, non-randomized, randomized, and mixed-methods studies). The tool evaluates design-appropriate criteria for question fit, data adequacy, sampling/representativeness, risk of nonresponse/confounding, and coherence between the data and the conclusions (Hong et al., 2018 ). Two reviewers (FAB and TTK) independently appraised each study and resolved disagreements by consensus. In line with MMAT guidance, we did not compute an overall numeric score; instead, we reported criterion-level judgments to make strengths and limitations transparent. h) Presentation of results We organized the findings to support comparisons across studies and to provide a clear visualization of the evidence base. Extracted data were summarized in a structured table (e.g., author/year, study focus, design, target population, and MDS use context), allowing direct cross-study comparison of genomic and non-genomic domains and reported outcomes. We reported the selection process using a PRISMA-aligned flow diagram that documents the identification, screening, eligibility, and inclusion stages, including the reasons for exclusion at each stage. In addition to tabular summaries, we provided a descriptive narrative of major patterns and gaps, integrating thematic findings with quantitative summaries to highlight clinical and research implications of genomic MDS choices. 3. Results 3.1 General characteristics of the studies A total of 23 studies published between 2004 and 2024 were included, with a marked increase observed from 2019 onwards (13/23; 56.5%). Collaboration was prevalent, with 8 of 23 studies (34.8%) involving multinational or international partnerships. These cross-border collaborations resulted in a 25% increase in sample size compared to single-country studies, thereby enhancing the robustness and comprehensiveness of analyses. Geographically, Europe contributed 12 studies (52.2%), followed by North America (6; 26.1%), Asia (3; 13.0%), and Oceania (3; 13.0%), with Africa and South America each contributing one study (4.3%). Two studies (8.7%) were described as 'international' without specifying the country. As studies may span multiple regions, percentages reflect participation and do not sum to 100%. By country, the United States (n = 5) and Italy (n = 4) were the most frequent contributors, followed by Australia (n = 3) and France (n = 3). Canada and the United Kingdom each appeared twice, while Belgium, Brazil, Egypt, and Japan each appeared once. More than 65% of studies characterized the MDS as modular and updatable. Thematic mapping indicated contributions in neuromuscular or neuro disorders (7/23; 30.4%), general monogenic conditions (6/23; 26.1%), autoinflammatory diseases (2/23; 8.7%), metabolic disorders (1/23; 4.3%), and a pan-rare, general, or unspecified group (7/23; 30.4%). The study selection process is summarized in the PRISMA 2020 flow diagram (Fig. 1). Across the five databases, we retrieved 549 records and removed 162 duplicates, leaving 387 unique records for title/abstract screening. We excluded 284 records at this stage due to a lack of a rare-disease focus, the absence of MDS content, or the absence of a genomic component. We then assessed 103 full texts, excluding 80 that did not meet the inclusion criteria (e.g., methodological papers, abstracts, reviews, or registry descriptions outside the scope). In total, 23 studies were included; full study-level details are provided in Supplementary file 1. Figure 1. PRISMA 2020 flow diagram of the study selection process. 3.2 Composition of the MDS Clinical/phenotypic data were nearly universal (22/23; 95.7%), followed by genetic/genomic/variant data (16/23; 69.6%). Additional modalities included imaging (7/23; 30.4%), demographics (11/23; 47.8%), metadata/administrative (5/23; 21.7%), laboratory results (5/23; 21.7%), and biobank references (2/23; 8.7%). Each study combined a median of 3 modalities (IQR, 2–4; distribution: 1/23 with one modality, 6/23 with 2, 9/23 with 3, and 7/23 with 4), indicating a tendency toward multimodality. Co-occurrence patterns indicated that the clinical layer served as the integrative axis. Among studies with genomic data, 15 of 16 also included clinical data (Jaccard ≈ index 0.65). Clinical and demographic variables were frequently reported (10/11; Jaccard's approximation = ≈ 0.43). All imaging-positive studies (7/7) also reported clinical data (Jaccard ≈ 0.32 for clinical and imaging), and genomic and imaging co-occurred in 5/7 (Jaccard ≈ 0.28). The most common exact combinations were clinical + genomic + demographic (4/23; 17.4%), clinical + genomic + imaging (3/23; 13.0%), and clinical + genomic + demographic + laboratory (2/23; 8.7%). Taken together, these top three combinations accounted for ≈ 39% of multimodal studies, suggesting a minimal recurrent nucleus (clinical and genomic) with contextual (demographic) and instrumental (imaging/laboratory) layers added. Modality usage varied across application contexts, with clinical data present in 100% of them. Genomic inclusion was higher in diagnostic and clinical care (≈ 81%) than in research (≈ 67%). Imaging was more prevalent in therapeutic and decision-support applications (≈ 50%). In platform/integration contexts, demographics (≈ 55%) and laboratory (≈ 36%) were relatively frequent, consistent with operational interoperability. 3.3 Application contexts MDS were predominantly applied in biomedical/genomic research (21/23; 91.3%) and diagnosis/clinical care (16/23; 69.6%), with notable frequencies in platform/system integration (11/23; 47.8%) and health planning/public policy (10/23; 43.5%). Additionally, less frequent contexts included therapeutic/decision support (6/23; 26.1%), epidemiological surveillance (3/23; 13.0%), patient registries (3/23; 13.0%), biobanks (3/23; 13.0%), and newborn screening (1/23; 4.3%). On average, each study spanned multiple contexts, typically two to three per MDS, underscoring the cross-cutting nature of use cases. Context overlaps were consistent. The most frequent and tight combination was research and clinical (16 shared studies; Jaccard ≈ coefficient 0.76), followed by research and platforms (11; Jaccard ≈ coefficient 0.52) and clinical and platforms (8; Jaccard ≈ coefficient 0.42). Co-occurrences also appeared between research + public policy (9; Jaccard ≈ 0.41) and platforms + policy (6; Jaccard ≈ 0.40). These patterns suggest that clinical initiatives connect to both knowledge production and integration requirements, whereas policy-oriented programs often couple with data infrastructures. Across contexts, clinical care and therapeutic support showed higher medians for the clinical impact index (both 1.0 vs 0.0 otherwise). For discovery/knowledge, biobanks (median 3.0 vs 0.0) and clinical (1.0 vs 0.0) stood out, while research showed a modest gain (1.0 vs 0.5). For system interoperability, medians were higher in surveillance and newborn screening (both 3.0 vs. 2.0), followed by public policy (2.5 vs. 2.0). For total impact, biobanks (5.0 vs. 3.0) and clinical (4.0 vs. 3.0) showed the most significant differences. These signals warrant caution given the small sample sizes (n) in some contexts (e.g., biobanks, surveillance). By maturity and life-cycle, the share of MDS with planned updates was highest in therapeutic support (83.3%), platform integration (72.7%), and clinical care (68.8%), followed by research (66.7%) and public health (60.0%). Temporally, the most recent median years were seen in newborn screening (2023; 100% in 2019–2024), therapeutic support (median 2022; 83.3% in 2019–2024), platforms (2021; 63.6%), and clinical care (2020.5; 62.5%), suggesting a recent intensification of applications directly linked to decision-making, integration, and care workflows. Within each context, Europe predominated, with regional variation in contributions. For clinical care, Europe accounted for 62.5%, followed by North America and Asia (18.8% each) and Oceania (12.5%); Africa and South America accounted for 6.2% each. By platform, Europe accounted for 45.5%, North America 27.3%, Oceania 18.2%, with sporadic participation from Asia, Africa, and South America (9.1% each). For public policy, contributions were distributed as follows: Europe, 40%; Oceania, 30%; North America, 20%; and Asia/Africa/South America, each, 10%. In research, Europe again predominated (52.4%), followed by North America (28.6%), Asia (14.3%), and Oceania (9.5%), with isolated contributions from Africa and South America (4.8% each). Overall, these patterns reflect regional asymmetries consistent with institutional capacity. 3.4 Adherence to standards and ontologies Declared adherence to standards was heterogeneous. We recorded references to the Human Phenotype Ontology (HPO) in 6 studies, Orphanet/Orphacode in 5, the Findable, Accessible, Interoperable, and Reusable (FAIR) principles in 4, and the International Classification of Diseases (ICD) in 2. Each of the Human Genome Variation Society, Health Level Seven Fast Healthcare Interoperability Resources (HL7 FHIR), GA4GH, General Data Protection Regulation (GDPR), and Clinical Data Interchange Standards Consortium (CDISC) appeared in one study. The number of standards cited per study (“richness”) had a median of 1. In practice, 10 studies cited no standard, 6 cited 1, 5 cited 2, and 2 cited 3. Overall, 56.5% of the studies cited at least one standard or ontology, but most referred to only one or two. Co-occurrence analysis suggested two profiles. First, a phenotype–disease nucleus: HPO + Orphanet/Orphacode co-occurred in 4 studies (Jaccard ≈ 0.57), aligning phenotypic encoding with RD nosology. Second, infrastructural pairs were rare but highly overlapping (e.g., HL7/FHIR and GA4GH), co-occurring in the same study (Jaccard = 1.00), reflecting technically focused initiatives. By application context, platform integration featured HPO and Orphanet/Orphacode (approximately 36.4% each) and FAIR (approximately 27.3%); diagnosis/clinical care showed 31.2% (HPO), 18.8% (Orphanet), 18.8% (FAIR); biomedical/genomic research recorded 19.0% (FAIR), 23.8% (HPO), 19.0% (Orphanet), 9.5% (ICD); and public policy/planning showed relatively higher Orphanet (40.0%), with HPO 30.0%, FAIR 20.0%, and ICD 10.0%. Temporally, FAIR was a recent addition (median 2022; 100% between 2019 and 2024), compared with HPO (median 2019.5) and Orphanet (median 2023). MDS that cited FAIR were more likely to plan updates (75.0% vs. 63.2%), whereas HPO (50.0%) and Orphanet (40.0%) showed lower proportions, consistent with FAIR as an infrastructural maturity vector, although the sample size was small. A tangible example of FAIR's impact is the direct enabling of cross-registry queries, as demonstrated by the Vascular Anomalies Working Group integration, where FAIRification enabled federated queries across multiple databases, significantly enhancing data interoperability and informing policy decisions for healthcare systems. 3.5 Primary outcomes Across three domains, clinical (diagnostic accuracy, time-to-diagnosis, personalized treatment, clinical prognosis), knowledge production (new gene/biomarker discovery, genomic annotation, molecular classification/stratification), and system-level (policy influence, impact on public programs, standardization/interoperability), we observed at least one “Yes” outcome in 47.8% of studies (clinical), 52.2% (knowledge), and 82.6% (system-level). Co-occurrence patterns were coherent. In the biomolecular domain, genomic annotation and molecular classification/stratification co-occurred in eight studies (φ = 0.76), and new gene/biomarker discovery co-occurred with annotation in seven (φ = 0.81). In the clinical domain, diagnostic accuracy and personalized treatment co-occurred in four (φ = 0.50), and time-to-diagnosis reduction co-occurred with a change in prognosis in two (φ = 1.00; driven by small n). In the system domain, policy influence co-occurred with interoperability in nine (φ = 0.21), suggesting that initiatives with institutional traction tend to report standardization alongside policy activity. These results align with expected value chains: advances in annotation facilitate molecular reclassification; accuracy gains often accompany therapeutic decisions; and policy reforms track with integration and interoperability efforts. Over time (≤ 2018 vs. 2019–2024), we observed absolute increases in time-to-diagnosis and clinical prognosis (both + 15.4 percentage points, from 0.0% to 15.4%), as well as moderate gains in personalized treatment and genomic annotation (+ 8.5 percentage points each). Interoperability increased slightly (+ 3.8 pp), whereas policy influence (–26.2 pp) and impact on public programs (–10.8 pp) declined, indicating that recent studies have emphasized technical and clinical outcomes over macro-institutional ones. Aggregated by domain, the share of studies with at least one “Yes” increased in the clinical domain (from 30.0% to 46.2%) but decreased in knowledge (from 60.0% to 46.2%) and system (from 90.0% to 76.9%), suggesting a recent rebalance toward care-proximal and instrumentation outcomes. 3.6 Genomic profile of MDS: design, maturity, and impact Reporting of core genomic information was uneven. Only one study explicitly used Whole-Genome Sequencing (WGS), and one used whole-exome sequencing (WES); the remainder used generic labels (e.g., 'next-generation sequencing' or 'genomic data') without specifying WES, WGS, or targeted panels. Trio design, simultaneous sequencing of the proband and both parents, was cited in 2/23 (8.7%). Reference genomes were rarely reported (one using GRCh38/hg38; one using GRCh37/hg19). The biological sample type was explicitly specified in 5/23 (21.7%) cases (blood alone or combined with saliva/tissue). To promote standard uptake, a concise genomic reporting checklist for future studies is proposed. This checklist includes: 1) specification of sequencing modality (e.g., WGS, WES, or targeted panels); 2) reference genome details (e.g., GRCh38/hg38 or GRCh37/hg19); 3) trio design usage when applicable; and 4) explicit description of biological sample types used. A detailed reporting checklist template has also been developed to guide researchers in implementing these recommendations. This template is available via the institutional repository or upon request from the authors. Such resources are intended to enhance consistency and quality in genomic data reporting across studies. Regarding knowledge bases, OMIM was most frequently referenced (18/23; 78.3%), followed by Orphanet (12/23; 52.2%), ClinVar (5/23; 21.7%), and gnomAD (3/23; 13.0%). Annotation sources like these play a critical role in ensuring consistency and reliability in variant interpretation. However, discrepancies can arise when different databases provide conflicting information. To address these challenges, selecting knowledge bases requires careful comparison of data from multiple sources. When OMIM, Orphanet, and ClinVar present contradictions, it is advisable to examine the methodological basis of each entry, including factors such as the curation process, update frequency, and the strength of evidence supporting variant classification. Implementing reconciliation strategies, such as cross-referencing with additional resources or involving domain experts in dispute resolution, can further enhance consistency and confidence in variant interpretation. Explicit support for Copy Number Variants (CNV)/ Structural Variants (SV) detection appeared in 3/23 (13.0%), whereas planned re-analysis was frequent (20/23; 87.0%). HPO was used in 16/23 (69.6%); phenotype-driven prioritization tools (e.g., Exomiser, Automated Mendelian Literature Evaluation - AMELIE, Phenomizer) appeared in 3/23 (13.0%). Only one study quantified the median number of HPO terms per case (6). Reporting of genes/variants was heterogeneous. Mode of inheritance (autosomal dominant/recessive, de novo) was explicit in 6/23 (26.1%); functional validation in 3/23 (13.0%); newly implicated genes in 4/23 (17.4%) (median two genes when reported). Variant classification counts were seldom reported; only one study provided totals for pathogenic/likely pathogenic (P/LP) variants (n = 17). Counts for variants of uncertain significance (VUS) and LikelyBenign/Benign(LB/B) variants were generally absent. Therapeutic impact was indicated in 6/23 (26.1%); trial eligibility and genetic counseling were mentioned across studies as potential outcomes. 3.7 Study bias assessment All 23 included studies were appraised for methodological quality using MMAT. No study was excluded post-appraisal due to concerns about bias; all met the basic MMAT screening criteria (clear research questions and appropriate data sources). However, the rigor of studies varied considerably. Based on the consensus MMAT judgments, five studies (21.7%) were rated as high quality (meeting ≥ 80% of criteria), 12 (52.2%) as moderate quality (40–60% of criteria met), and 6 (26.1%) as low quality (≤ 20% of criteria met). In practical terms, only four studies met all five MMAT criteria (5/5 “Yes” ratings), and one study met four criteria. Over half of the studies satisfied three or fewer domains, indicating a moderate to high risk of bias across much of the literature. Patterns emerged across study designs. Mixed-methods studies (2/23) had the highest quality, with both achieving 5/5 \"Yes\" ratings across MMAT domains, reflecting robust conduct and reporting. Quantitative descriptive studies (7/23) tended to have intermediate quality; none attained a perfect score, and most met only three of five criteria (e.g., issues with sample representativeness or handling of missing data were common). The single quantitative, non-randomized study met 3 of 4 applicable criteria (75%); while it met most domains, it fell short in one domain related to confounding control. Qualitative studies (13/23) exhibited the most significant variability in quality. Only two qualitative studies fully satisfied all MMAT criteria, whereas most others had significant methodological gaps. Approximately one-quarter of the qualitative studies were of moderate quality (typically 3 of 5 criteria met), and about half were low quality, with only 1 or 2 criteria met. This suggests that many qualitative reports lacked sufficient methodological transparency or rigor in key areas. To address these limitations and improve future studies, three practical reporting recommendations are proposed. First, explicitly define the sampling frame and inclusion criteria to enhance transparency and clarify representativeness (Q2). Second, thoroughly describe strategies for handling non-response and managing potential confounders to strengthen the validity of findings (Q4). Third, provide a clear explanation of how themes are derived from raw data and maintain reflexivity to enhance the coherence and integrity of qualitative analyses (Q3–Q4). Implementing these steps can advance the field by translating methodological critique into actionable guidance. Across the board, certain MMAT domains showed recurring limitations. In quantitative studies, the most frequent weaknesses were in Q2 (sampling strategy/representativeness) and Q4 (handling of non-response or confounding). Several registry papers did not clearly define their sampling frame or inclusion criteria, resulting in “No” or “Cannot tell” judgments regarding representativeness, and many failed to report how missing data or potential confounders were handled. For instance, over half of the quantitative studies lacked clear strategies to address non-response bias or were unable to describe any adjustment for confounders in observational designs. In qualitative studies, the common shortcomings were in Q3–Q4, which assess the linkage between data and interpretations and the coherence of the analysis. Many qualitative reports provided limited evidence of how themes were derived from raw data or lacked reflexivity, resulting in frequent “Cannot tell” ratings in these domains. In contrast, most studies – regardless of design – scored well on Q5, often indicating that conclusions were reasonably justified by the data provided, even if earlier criteria were unmet. Notably, S1–S2 (screening items) were satisfied by virtually all studies (S1 was “Yes” for 100% of cases). However, a couple of studies had unclear data adequacy (S2 rated “Cannot tell”) that had to be resolved by consensus. Notably, no studies were excluded solely based on poor quality, and all 23 studies were retained in the review. That said, the presence of multiple “No” or “Cannot tell” judgments in several reports indicates a non-trivial risk of bias in those findings. Studies classified as low quality were treated with caution during synthesis, and their conclusions were weighed accordingly. Overall, the MMAT-based assessment indicates that although a minority of genomic MDS studies are methodologically robust, most exhibit a moderate to high risk of bias, primarily due to incomplete reporting in the sampling and analysis domains. This underscores the need for improved reporting standards in this field to strengthen confidence in study findings. The final study-by-study consensus matrix with item-level justifications is provided in Supplementary File 2. Given the quality profile, the findings from this review warrant cautious implementation. Most genomic MDS proposals for rare diseases function as provisional frameworks to guide registry and pipeline design, rather than as definitive standards. Implementation should prioritize data elements supported by substantial evidence, while treating less substantiated items as hypotheses subject to pilot testing, ongoing evaluation, and iterative refinement. Supplementary Material 3 contains the complete dataset processing workflow, analyses, and figures for this review. 4. Discussion 4.1 The hybrid and multimodal nature of genomic MDS Overall, the included studies demonstrate a strong predominance of initiatives conducted in Europe and North America, with meaningful contributions from Oceania and more limited involvement from Asia, Africa, and South America. The concentration of initiatives in high-income regions such as the United States and Western Europe suggests that institutional capacity and infrastructure maturity have shaped the geographical distribution of genomic MDS projects. The clinical layer anchors most use cases, the infrastructural layer enables interoperability and integration, and the public health layer connects standardization to population planning and surveillance. Differences observed across contexts in impact indices, update strategies, and adherence to standards carry direct implications for implementation and scalability. Across the included studies, MDS function as hybrid artifacts that bind clinical/phenotypic descriptors to infrastructure for exchange and reuse, and, where present, to genomic analyses that support diagnosis, discovery, and policy. Concrete examples illustrate why this integration is crucial. DM-Scope was designed to bridge research and care and to standardize data capture for myotonic dystrophy across centers (De Antonio et al., 2019 ). Beyond single-country initiatives, long-term international collaborations in neuromuscular disease reinforce the importance of harmonized governance and sustained data quality. For example, the global myotonic dystrophy registry network demonstrated how multi-country coordination can standardize data structures, support longitudinal follow-up, and enable cross-border research alignment (Wood et al., 2018 ). The Autoinflammatory Diseases Alliance (AIDA) registries emphasize modularity, governance for updates, and cross-registry communication, practical features that enable an MDS to evolve with the science (Della Casa et al., 2022 ; Gaggiano et al., 2022 ). The ApreciseKUre platform takes this a step further by embedding analytics within a digital ecosystem, illustrating how multimodal records (genetic, biochemical, histopathological, clinical, and QoL) can power precision medicine use cases in ultra-RD (Visibelli et al., 2022 ). These initiatives collectively reinforce that an MDS is most effective when it is both clinically legible and technically interoperable, with clear pathways for reanalysis and reuse. These examples underscore that multimodality is not optional but essential to describe the natural history and progression of RD (Visibelli et al., 2022 ; Ruseckaite et al., 2023b ). 4.2. Standards and interoperability: from principles to operations Standardization remains a central challenge for genomic MDS interoperability. Adoption was generally limited, with most studies reporting only one or two standards. Ontologies such as HPO and Orphanet were the most frequently used, forming a “phenotype–disease nucleus,” but their application was inconsistent across registries. The results show uneven, sometimes sparse, adoption of standards, an observation mirrored by prior methodological work. Our findings also complement and extend another recent systematic review, which mapped MDS for RD across health care networks and organized their elements into 10 categories aligned with World Health Organization digital health guidelines, ultimately proposing a generic RD MDS for clinical and managerial use. While that work provides a broad, system-level view of data requirements for RD, it treats the genomic component as one among many domains. In contrast, our review narrows the focus to the genomic layer of MDS, examining how sequencing modalities, reference genomes, variant classes, ontologies, reanalysis practices, and biospecimen linkages are specified, and how these design choices relate to diagnostic yield, treatment decisions, and knowledge generation. This more granular perspective highlights implementation-sensitive details that are not fully visible in wider MDS taxonomies. A French methodology for building an MDS for RD explicitly ties item selection to standard data elements (CDEs) and to reference terminologies (Medical Dictionary for Regulatory Activities - MedDRA, HPO, Anatomical Therapeutic Chemical classification - ATC, ICD-10, Orphanet) while targeting HL7-compatible exchange. It also formalizes expert-led governance for versioning (Choquet et al., 2015 ). The development of the F-MDS-RD also emphasized alignment of local terminologies with international references through expert consultation and SR (Choquet et al., 2015 ; Toubiana et al., 2015 ). The recent adoption of the FAIR principles illustrates a shift toward operational standardization. The “de novo FAIRification” of the Vascular Anomalies Working Group (VASCA) registry demonstrates a concrete pipeline to make data machine-actionable at entry, mapping fields to ontologies (HPO/Orphanet Rare Disease Ontology - ORDO) and CDEs, and exposing them for federated queries (Groenen et al., 2021 ). The VASCA registry incorporated FAIRification directly into its data collection, using standard data elements and ontologies to ensure machine-readable interoperability (Groenen et al., 2021 ). Nevertheless, the interpretation of CDEs remains multifaceted, requiring clear definitions to avoid false assumptions of uniformity. Broader frameworks also illustrate this need. The CDISC, required by the U.S. FDA for regulatory submissions, exemplifies how structured data standards can promote harmonization (Mullin et al., 2021 ). CDISC user guides for Duchenne muscular dystrophy and Huntington’s disease demonstrate how to represent outcomes and longitudinal assessments in a manner acceptable to regulators, thereby closing a common gap between research registries and trial-ready data (Mullin et al., 2021 ). At the health system scale, Australian Genomics documents the operational work of integrating genomics into care, including evidence sharing across laboratories and alignment with international standards bodies, as well as evidence that national programs can translate standards into routine practice (Stark et al., 2023 ). International collaboration is indispensable. For example, collaboration between U.S. networks and the European Network of Rare Bleeding Disorders (EN-RBD) led to the development of a harmonized data tool for rare coagulation disorders (Shapiro et al., 2011 ). These initiatives highlight the persistent challenge of siloed data and the pressing need for harmonization at scale (Raycheva et al., 2023 ). 4.3. From Genomic Data to Clinical Translation: Reporting, Re-analysis, and Biobank Linkages A central gap in the literature is the incomplete reporting of core genomic details, including sequencing modality, reference build, trio design, structural-variant coverage, pipelines, and annotation sources. Case-level studies illustrate why specificity matters: for early infantile epileptic encephalopathy due to biallelic PIGQ variants, authors reported exome sequencing, Sanger validation, and the exact reference transcript, enabling replication of variant interpretation (Johnstone et al., 2020 ). Methodologically, best practice also extends “upstream” to curation and databases. Experience in clinical genetics cautions that heterogeneous databases have utility but also pose limitations unless entries adhere to standard nomenclature and quality controls (Birch & Friedman, 2004 ). Guidance from community efforts emphasizes adopting consistent variant nomenclature, curated locus-specific resources, and ethics frameworks to ensure the clinical reliability of shared variant data (Kohonen-Corish et al., 2010 ). The high rates of planned re-analysis and widespread use of HPO are also notable. That pattern tracks with emerging infrastructure thinking: FAIRified registries explicitly plan periodic reinterpretation, and HPO-anchored phenotyping is proposed as a bridge between registries, biobanks, and variant curation workflows (Groenen et al., 2021 ; Rubinstein, Posada de la Paz, & Mora, 2017). When implemented with clinical decision support, such integration can shorten time-to-diagnosis and expand trial eligibility, two benefits that national programs report as part of routine care transformation (Stark et al., 2023 ). Several sources emphasize that the scientific yield of registries increases when entries are physically linked to high-quality biospecimens. The EuroBioBank/SpainRDR-BioNER case demonstrates how a networked catalog connected to a national RD registry facilitates discovery, while also necessitating governance for privacy, accreditation, and standard operating procedures (Rubinstein et al., 2017 ). By quantifying the discovery acceleration associated with biobank linkages, such as the number of novel genes identified per year, we can provide compelling evidence for the strategic value of integrated bioresources. This data point can serve as a strong motivator for increased investment in the infrastructure that connects registries to biospecimens, ultimately enhancing the scope and efficiency of rare disease research. The field is converging on practical tools, including GUID-based linkage, CDEs that embed the GUID elements, and HPO-anchored phenotyping, to make specimen-linked data computable (Rubinstein et al., 2017 ; Glassberg et al., 2020 ). Biobanks dedicated to neuromuscular disease illustrate the downstream payoff: ready-to-sequence DNA, standardized consent, and logistics for WES/WGS, as well as recontact, enable faster gene discovery and validation (Reza et al., 2017 ). Recent national efforts in Asia (e.g., K-MoSCA for rare neurological diseases) integrate registries with bioresources from the outset, underscoring the global relevance of integrated designs (Kim et al., 2024 ). 4.5 From Patient Voice to System Scale: Implementation Pathways The impact of MDS is distributed along a gradient. System-level outcomes, such as their influence on policy, public programs, and data standardization, were more frequently reported (82.6% of studies reported at least one such outcome). More recent studies (2019–2024) show a shift toward clinical outcomes, including diagnostic time and prognosis, suggesting a maturation of the field: from early infrastructure and policy building toward direct improvements in patient care and technical instrumentation (Ruseckaite et al., 2023b ). In practice, registries contribute to better treatment outcomes, process improvements, and quality of care, while also supporting research into clinical course and natural history. However, a significant gap exists in the incorporation of patient-reported outcome measures (PROMs). Our analysis found that only a limited number of MDS included PROMs alongside diagnostic metrics. This deficiency underscores the necessity for future studies to elevate the patient voice by framing PROMs as a core success indicator. Incorporating PROMs can shift the focus toward a more holistic value by capturing the patient's perspective, thus enhancing the relevance and impact of MDS on patient-centered care. Practical strategies to facilitate the integration of PROMs into MDS include the use of standardized instruments, such as the EQ-5D or PROMIS, which are designed to capture patient-reported outcomes efficiently. Collaborating with patient advocacy groups can further ensure that the chosen PROMs reflect patients' needs and concerns, fostering patient-centered approaches in future studies. Patient-reported outcome measures (PROMs) remain underused, with only 40% of RD registries collecting them, despite their importance for embedding the patient voice into best practice. Strategy work from the same group stresses that a national registry framework should explicitly require PROMs and equity safeguards (Ruseckaite et al., 2023b ). At the European level, reviews of databases highlight governance, legal, and FAIR compliance, as well as the risk that siloed efforts underrepresent patient groups and impede ML-ready integration (Raycheva et al., 2023 ). Three recurring features characterize sustainable MDS implementations across geographies: (1) transparent data structures (publishing case report forms - CRFs/CDEs and mappings) (Glassberg et al., 2020 ), (2) alignment with recognized standards (HL7/FHIR/CDISC for transport; HPO/ORDO/ICD/ATC for semantics), and (3) networked governance that supports federated discovery, re-analysis, and specimen linkage. Evidence is drawn from national programs (Stark et al., 2023 ), FAIRification case studies (Groenen et al., 2021 ), and classification work demonstrating that registries cluster by purpose and need, with tailored interoperability contracts (Santoro et al., 2015 ). Nonetheless, several obstacles remain for the implementation and scalability of genomic MDS. Public-health-oriented registries have long argued that harmonized tools are prerequisites to close therapeutic and knowledge gaps in ultra-rare conditions (Shapiro et al., 2011 ). More straightforward reporting guidelines for genomic methods are needed to ensure comparability and maximize interoperability across registries and electronic health records (Choquet et al., 2015 ). 4.6 Limitations and next steps Two constraints temper inferences. First, the literature frequently under-reports genomic specifics and outcome denominators; as classic database audits caution, this impedes replication and external validity (Birch & Friedman, 2004 ). Second, registries vary widely in purpose and maturity; clustering work implies that synthesis should be stratified by registry archetype rather than pooled indiscriminately (Santoro et al., 2015 ). Challenges also include ensuring patient representativeness, preventing registries from disproportionately reflecting more affluent or educated populations (Rubinstein et al., 2010 ). Ethical and legal issues such as privacy, informed consent, and cross-border data transfer remain central (Reza et al., 2017 ; Raycheva et al., 2023 ). To address these challenges, we recommend concrete steps, including the use of model consent forms tailored to enhance understanding and engagement among diverse populations. Implementing data governance frameworks that prioritize privacy and equity can help ensure that the global registry reflects a diverse range of populations. Regular feedback of aggregated results to participants can maintain trust and engagement, ensuring transparency and participatory governance. Best practices include routine updates to consent materials and regular review of data policies to align with evolving ethical guidelines (Rubinstein et al., 2010 ). A practical roadmap for the field would combine (a) mandatory minimal genomic reporting, (b) publication of CDE dictionaries and CRFs, (c) FAIR-at-entry pipelines and ontology bindings, (d) routine re-analysis policies, (e) GUID-based biobank linkage, and (f) built-in PROMs and equity metrics, governed by consortia with regulator-grade data structures (Mullin et al., 2021 ; Groenen et al., 2021 ; Stark et al., 2023 ). 5. Conclusion This systematic review demonstrates that MDS for rare diseases is heterogeneous in scope, composition, and reporting practices. Although clinical and phenotypic domains are consistently included, the genomic layer is often incompletely reported, with limited specification of sequencing modalities, reference genomes, and variant classes. Adherence to standards and ontologies is also inconsistent, with most studies referencing only one or two frameworks, thereby constraining cross-study interoperability. Despite these gaps, MDS have shown tangible contributions to diagnostic accuracy, personalized treatment, and knowledge generation, as well as system-level impacts on policy and public health planning. Exploratory analyses further suggest that practices such as planned reanalysis, phenotype-genotype integration through ontologies, and explicit capture of structural variants may be more influential for clinical benefit than the sequencing modality alone. Researchers, clinicians, and policymakers are encouraged to collaborate to establish and refine standardized genomic MDS. Participation in consensus exercises, such as Delphi rounds, will facilitate the integration of genomic data into healthcare. Such collective efforts aim to translate these findings into actionable frameworks to enhance the care and management of rare diseases. Declarations Competing interests . The authors declare no competing interests. Ethics approval . Not applicable (review paper; no human/animal subjects). Use of AI. Generative AI tools (e.g., ChatGPT) were used for language refinement; all ideas, analyses, and conclusions are the authors’ own. Clinical trial number : not applicable (systematic review; no clinical trial was conducted). Funding. Supported by the Brazilian National Council for Scientific and Technological Development (CNPq), grant 403244/2024-2 (2025–2027). Author Contribution FAB conceived the study, designed the review methodology, supervised the screening and data extraction process, conducted the quantitative analyses, and wrote the main manuscript text.NCR contributed to study selection, data extraction, literature organization, and drafting of methods and results.CFL contributed to refining the search strategy, interpreting the results, and editing the manuscript.MEG performed screening, data extraction, and contributed to the synthesis of results.AH assisted with data extraction, coding validation, and critical manuscript revision.BA contributed to screening, extraction, and preparation of descriptive summaries.TTK contributed to study selection, MMAT quality assessment, and manuscript review.BMO contributed to conceptualization, methodological validation, and critical revision of the manuscript.DA contributed to the analytical strategy, data interpretation, and revision of computational aspects.IS contributed to clinical and genomic interpretation, discussion framing, and manuscript review.SG supported data organization, thematic synthesis, and manuscript editing.TMF supervised the project, contributed to study conceptualization, reviewed and edited the full manuscript, and ensured methodological and scientific integrity.All authors reviewed and approved the final manuscript. Data Availability This study is based solely on data extracted from previously published articles included in the systematic review. No new primary datasets were generated. All coded datasets and analytical workflows used in this review are provided in Supplementary Material 3, which contains the full Jupyter notebook with the data processing steps, statistical analyses, and generation of tables and figures. References Austin CP, et al. Future of Rare Diseases Research 2017–2027: An IRDiRC Perspective. Clin Transl Sci. 2018;11(1):21–7. https://doi.org/10.1111/cts.12500 . Bernardi FA, Mello de Oliveira B, Bettiol Yamada D, Artifon M, Schmidt AM, Scheibe M, Felix V, T. M. The minimum data set for rare diseases: Systematic review. J Med Internet Res. 2023;25:e44641. https://doi.org/10.2196/44641 . Bernardi F, de Oliveira B, de Moraes JC, Baiochi J, Lima V, Ferraz V, Schwartz I. Diseases in Brazil: A Delphi Protocol Approach. Procedia Comput Sci. 2025;256:1294–301. https://doi.org/10.1016/j.procs.2025.02.241 . Developing a Genomic Minimum Data Set for Rare. Birch P, Friedman JM. Utility and limitations of genetic disease databases in clinical genetics research: A neurofibromatosis type 1 database experience. Am J Med Genet Part A. 2004;128A(1):58–64. https://doi.org/10.1002/ajmg.c.30007 . Choquet R, Maaroufi M, Vandenbussche P, Landais P. A methodology for a minimum data set for rare diseases to support national centers of excellence for healthcare and research. J Am Med Inform Assoc. 2015;22(1):76–85. https://doi.org/10.1136/amiajnl-2014-002794 . Coelho AVC, et al. The Brazilian Rare Genomes Project: Validation of whole genome sequencing for rare diseases diagnosis. Front Mol Biosci. 2022;9:821582. https://doi.org/10.3389/fmolb.2022.821582 . De Antonio M, Dogan C, Hamroun D, et al. The DM-Scope registry: A rare disease innovative framework bridging the gap between research and medical care. Orphanet J Rare Dis. 2019;14:339. https://doi.org/10.1186/s13023-019-1088-3 . Della Casa F, Vitale A, Pereira RM, Guerriero S, Ragab G, Lopalco G, Cantarini L. Development and implementation of the AIDA international registry for patients with undifferentiated systemic autoinflammatory diseases. Front Med. 2022;9:908501. .https://doi.org/10.3389/fmed.2022.908501 . Fu MP, Merrill SM, Sharma M, Gibson WT, Turvey SE, Kobor MS. Rare diseases of epigenetic origin: Challenges and opportunities. Front Genet. 2023;14:1113086. https://doi.org/10.3389/fgene.2023.1113086 . Gaggiano C, Vitale A, Tufan A, Ragab G, Aragona E, Wiesik-Szewczyk E, Cantarini L. The Autoinflammatory Diseases Alliance Registry of monogenic autoinflammatory diseases. Front Med. 2022;9:980679. https://doi.org/10.3389/fmed.2022.980679 . Glassberg JA, Linton EA, Burson K, Hendershot T, Telfair J, Kanter J, Sickle Cell Disease Implementation Consortium. Publication of data collection forms from NHLBI funded sickle cell disease implementation consortium (SCDIC) registry. Orphanet J Rare Dis. 2020;15(1):178. .https://doi.org/10.1186/s13023-020-01457-x . Groenen KHJ, Jacobsen A, Kersloot MG, dos Santos Vieira B, van Enckevort E, Kaliyaperumal R, Arts DL, ’t Hoen PAC, Cornet R, Roos M, Kool S, L. The de novo FAIRification process of a registry for vascular anomalies. Orphanet J Rare Dis. 2021;16(1). 376.https://doi.org/10.1186/s13023-021-02004-y . Higgins JPT, Thomas J, Chandler J, Cumpston M, Li T, Page MJ, Welch VA, editors. Cochrane handbook for systematic reviews of interventions. 1st ed. Wiley.; 2019. https://doi.org/10.1002/9781119536604 . Hong QN, Pluye P, Fàbregues S, Bartlett G, Boardman F, Cargo M, Vedel I. (2018). Mixed Methods Appraisal Tool (MMAT), version 2018: User guide . Canadian Intellectual Property Office, Industry Canada. Retrieved from: http://mixedmethodsappraisaltoolpublic.pbworks.com/ The International Rare Diseases Research Consortium. Policies and guidelines to maximize impact. Eur J Hum Genet. 2017;25(12):1293–302. https://doi.org/10.1038/s41431-017-0008-z . Johnstone DL, Al-Sayed MD, Chakrabarti K, et al. Early infantile epileptic encephalopathy due to biallelic pathogenic variants in PIGQ: Report of seven new cases. Epilepsia. 2020;61(6):e77–83. https://doi.org/10.1002/jimd.12278 . Kent A, Parker AP, Patel A, Wynn SL, Steward CA. Genomics in rare diseases: An overview for the patient, family, and non-specialist healthcare professional. Future Rare Dis. 2023;3(4):FRD56. https://doi.org/10.2217/frd-2023-0019 . Kim D, Kim S, Seok JM, Shin KJ, Oh E, Jeon MY, Park J, Chang HJ, Youn J, Oh J, Sohn E, Park J, Cho JW, Kim BJ. Establishment of a registry of clinical data and bioresources for rare nervous system diseases. Osong Public Health Res Perspect. 2024;15(2):174–81. https://doi.org/10.24171/j.phrp.2023.0353 . Kohonen-Corish MRJ, Al-Aama JY, Auerbach AD, Axton M, Barash CI, Bernstein I, Béroud C, Burn J, Cunningham F, Cutting GR, den Dunnen JT, Greenblatt MS, Kaput J, Katz M, Lindblom A, Macrae F, Maglott D, Möslein G, Povey S, Cotton RGH. (2010). How to catch all those mutations—the report of the Third Human Variome Project Meeting, UNESCO, Paris, May 2010. Human Mutation, 31 (12), 1374–1381. https://doi.org/10.1002/humu.21379 Liévin V, et al. FindZebra online search delving into rare disease case reports using natural language processing. PLOS Digit Health. 2023;2(6):e0000269. https://doi.org/10.1371/journal.pdig.0000269 . Mullin AP, Corey D, Turner EC, Liwski R, Olson D, Burton J, Larkindale J. Standardized data structures in rare diseases: CDISC user guides for duchenne muscular dystrophy and Huntington’s disease. Clin Transl Sci. 2021;14(1):214–21. https://doi.org/10.1111/cts.12845 . Ouzzani M, Hammady H, Fedorowicz Z, Elmagarmid A. Rayyan: A web and mobile app for systematic reviews. Syst Reviews. 2016;5(1):210. https://doi.org/10.1186/s13643-016-0384-4 . Piñero J, et al. The DisGeNET knowledge platform for disease genomics: 2019 update. Nucleic Acids Res. 2019. https://doi.org/10.1093/nar/gkz1021 . Pintos-Morell G, et al. Analysis of genomics implementation in newborn screening for inherited metabolic disorders: An IRDiRC initiative. Rare Disease Orphan Drugs J. 2024;3(2):19. https://doi.org/10.20517/rdodj.2023.52 . Raycheva R, Al-Naemi F, Denecke K, et al. Challenges in mapping European rare disease databases, relevant for ML-based screening technologies. Front Public Health. 2023;11:1154426. https://doi.org/10.3389/fpubh.2023.1214766 . Reza M, Hildyard JCW, Kirschner J, et al. Supporting and facilitating rare and neuromuscular disease research worldwide. Open J Bioresources. 2017;4(1). 3.https://doi.org/10.1016/j.nmd.2017.07.001 . MRC Centre Neuromuscular Biobank (Newcastle and London). Rocha CS, Secolin R, Rodrigues MR, Carvalho BS, Lopes-Cendes I. The Brazilian Initiative on Precision Medicine (BIPMed): Fostering genomic data-sharing of underrepresented populations. NPJ Genomic Med. 2020;5(1):42. https://doi.org/10.1038/s41525-020-00149-6 . Rubinstein YR, Groft SC, Bartek R, Brown K, Peay H, Ramsey K. Creating a global rare disease patient registry linked to a rare diseases biorepository database: Rare Disease-HUB (RD-HUB). Contemp Clin Trials. 2010;31(5):394–404. https://doi.org/10.1016/j.cct.2010.06.007 . Rubinstein YR, de la Posada M, Mora M. (2017). Rare disease biospecimens and patient registries: Interoperability for research promotion, a European example: EuroBioBank and SpainRDR-BioNER. In M. Posada de la Paz, S. Taruscio, & S. C. Groft, editors, Rare diseases epidemiology: Update and overview (pp. 141–147). Springer.https://doi.org/10.1007/978-3-319-67144-4_7 Ruseckaite R, McAllister S, Muir J, Enticott J, Donaldson A, King S. Current state of rare disease registries and databases in Australia: A scoping review. Orphanet J Rare Dis. 2023a;18:220. https://doi.org/10.1186/s13023-023-02823-1 . Ruseckaite R, Enticott J, Muir J, McAllister S, Donaldson A, King S. Informing a national rare disease registry strategy in Australia: A mixed methods study. Orphanet J Rare Dis. 2023b;18:162. https://doi.org/10.1186/s12913-023-10049-x . Santoro M, Coi A, Di Lipucci M, Bianucci AM, Gainotti S, Mollo E, Vittozzi L, Taruscio D, Bianchi F. Rare disease registries classification and characterization: A data mining approach. Public Health Genomics. 2015;18(2):113–22. https://doi.org/10.1159/000369993 . Schmitt T, Poirel HA, Cauët E, Delnord M, Van Den Bulcke M. Unlocking the genomic landscape: Results of the Beyond 1 Million Genomes (B1MG) pilot in Belgium towards genomic data infrastructure (GDI). Health Policy. 2024;143:105060. https://doi.org/10.1016/j.healthpol.2024.105060 . Sequeira M, Almeida JR, Oliveira JL. (2021). A comparative analysis of data platforms for rare diseases. In 2021 IEEE 34th International Symposium on Computer-Based Medical Systems (CBMS) (pp. 366–371). IEEE. https://doi.org/10.1109/CBMS52027.2021.00041 Shapiro AD, Soucie JM, Peyvandi F, Aschman DJ, DiMichele DM, European Network Rare Bleeding Disorders Database. Clotting Disorders Working Group, &. (2011). Knowledge and therapeutic gaps: a public health problem in the rare coagulation disorders population. American Journal of Preventive Medicine, 41(6), S324-S331 .https://doi.org/10.1016/j.amepre.2011.09.021 Stark Z, et al. Integrating genomics into healthcare: A global responsibility. Am J Hum Genet. 2019;104(1):13–20. https://doi.org/10.1016/j.ajhg.2018.11.014 . Stark Z, Boughtwood T, McClaren BJ, et al. Australian Genomics: Outcomes of a 5-year national program to accelerate the integration of genomics into healthcare. Eur J Hum Genet. 2023;31:489–500. https://doi.org/10.1016/j.ajhg.2023.01.018 . Taruscio D, et al. The Undiagnosed Diseases Network International: Five years and more! Mol Genet Metab. 2020;129(4):243–54. https://doi.org/10.1016/j.ymgme.2020.01.004 . Toubiana L, Ugon A, Giavarini A, Riquier J, Charlet J, Jeunemaitre X, Plouin P-F, Jaulent M-C. A pivot model to set up large scale rare diseases information systems: Application to the Fibromuscular Dysplasia Registry. In: Cornet R, et al. editors. Digital healthcare empowering Europeans. IOS.; 2015. pp. 887–91. https://doi.org/10.3233/978-1-61499-512-8-887 . Visibelli A, Scatena C, Tonarelli A, et al. Computational approaches integrated in a digital ecosystem platform for a rare disease. J Personalized Med. 2022;12(6). 1013.https://doi.org/10.3389/fmmed.2022.827340 . Wilkinson MD, Dumontier M, Aalbersberg IJ, et al. The FAIR guiding principles for scientific data management and stewardship. Sci Data. 2016;3:160018. https://doi.org/10.1038/sdata.2016.18 . Wood L, Bassez G, Bleyenheuft C, Campbell C, Cossette L, Jimenez-Moreno AC, Dai Y, Dawkins H, Díaz-Manera J, Dogan C, el Sherif R, Fossati B, Graham C, Hilbert J, Kastreva K, Kimura E, Korngut L, Kostera-Pruszczyk A, Lindberg C, Lindvall B, Luebbe E, Lusakowska A, Mazanec R, Meola G, Orlando L, Takahashi MP, Peric S, Puymirat J, Rakocevic-Stojanovic V, Rodrigues M, Roxburgh R, Schoser B, Segovia S, Shatillo A, Thiele S, Tournev I, van Engelen B, Vohanka S, Lochmüller H. (2018). Eight years after an international workshop on myotonic dystrophy patient registries: Case study of a global collaboration for a rare disease. Orphanet Journal of Rare Diseases, 13 (1). 155.https://doi.org/10.1186/s13023-018-0889-0 World Health Organization. (2024). WHO principles for human genome data: Access, use, and sharing . World Health Organization. https://cdn.who.int/media/docs/default-source/research-for-health/who-principles-human-genome-data-access--use--and-sharing_public-consultation_8-april.pdf?sfvrsn=f2c7afc7_3 Additional Declarations No competing interests reported. Supplementary Files SupplementaryFile1.xlsx SupplementaryFile2.xlsx SupplementaryFile3NotebookAnalysis.ipynb Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {\"props\":{\"pageProps\":{\"initialData\":{\"identity\":\"rs-8204628\",\"acceptedTermsAndConditions\":true,\"allowDirectSubmit\":true,\"archivedVersions\":[],\"articleType\":\"Systematic Review\",\"associatedPublications\":[],\"authors\":[{\"id\":590845048,\"identity\":\"aa50fc3e-4abd-4a7b-b58a-3c914551528c\",\"order_by\":0,\"name\":\"Filipe Andrade Bernardi\",\"email\":\"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAABJklEQVRIie3RsWrCQBjA8TsOdIl1PVuwr5AiiFCKr9LgWku6ZRCbErhuZk3fQujSocMXPtAlIaulLiJkDgiO0rsYXZK0a4f7E8Jxye8uRwjR6f5hTY+5cBxRV955t01YPsgnWQUxkBaESQJk0Ou49A8C+QvFUyCONT/P1JEmfUHbWV+byLxdNuH0fRmKy6+PATGTGNjTZ5nI78EgSm/mSAWHBWf9yBJX44gTc/V4z4K0RIaKtARSRQg0eKMPighFDJMZUL1L64BDSbwMDtzo+ZuCJNEvxEVLEpeHgnOTn3aBh3oSLNLRmzpLPJNitfFuJTE68iwYVJA24s6erO9my9dt5uynz74/Cr/HYtq9SOJwa5eJiqkfUl5MXdWgjhyVTqfT6Qj5AXj7dGyiP81/AAAAAElFTkSuQmCC\",\"orcid\":\"\",\"institution\":\"University of São Paulo\",\"correspondingAuthor\":true,\"prefix\":\"\",\"firstName\":\"Filipe\",\"middleName\":\"Andrade\",\"lastName\":\"Bernardi\",\"suffix\":\"\"},{\"id\":603365664,\"identity\":\"b00f161c-feef-42f2-ab99-e9f2ef3126c2\",\"order_by\":1,\"name\":\"Natana Chaves Rabelod\",\"email\":\"\",\"orcid\":\"\",\"institution\":\"InRaras - National Institute for Rare Diseases (InRaras), Porto Alegre Clinical Hospital, Porto Alegre, Brazil\",\"correspondingAuthor\":false,\"prefix\":\"\",\"firstName\":\"Natana\",\"middleName\":\"Chaves\",\"lastName\":\"Rabelod\",\"suffix\":\"\"},{\"id\":603366374,\"identity\":\"5c4d5ee5-97f2-4f2d-8d58-19f210f09d10\",\"order_by\":2,\"name\":\"Claudia Fernandes Lorea\",\"email\":\"\",\"orcid\":\"\",\"institution\":\"Medical Genetics Service, Hospital de Clínicas de Porto Alegre, Porto Alegre, Brazil\",\"correspondingAuthor\":false,\"prefix\":\"\",\"firstName\":\"Claudia\",\"middleName\":\"Fernandes\",\"lastName\":\"Lorea\",\"suffix\":\"\"},{\"id\":603370149,\"identity\":\"9e88c3ed-6769-4c7d-8735-94d0f179cd4c\",\"order_by\":3,\"name\":\"Maria Eduarda Gomes\",\"email\":\"\",\"orcid\":\"\",\"institution\":\"Instituto Fernandes Figueira/FIOCRUZ\",\"correspondingAuthor\":false,\"prefix\":\"\",\"firstName\":\"Maria\",\"middleName\":\"Eduarda\",\"lastName\":\"Gomes\",\"suffix\":\"\"},{\"id\":603370150,\"identity\":\"0ef469ee-e10a-4f12-a8bd-01fc6201b4a4\",\"order_by\":4,\"name\":\"Annanda Holtz\",\"email\":\"\",\"orcid\":\"\",\"institution\":\"Medical Genetics Service, Hospital de Clínicas de Porto Alegre, Porto Alegre, Brazil\",\"correspondingAuthor\":false,\"prefix\":\"\",\"firstName\":\"Annanda\",\"middleName\":\"\",\"lastName\":\"Holtz\",\"suffix\":\"\"},{\"id\":603370481,\"identity\":\"0f6e1026-f684-40f4-8697-29996777f5c0\",\"order_by\":5,\"name\":\"Bianca Abdala\",\"email\":\"\",\"orcid\":\"\",\"institution\":\"Instituto Fernandes Figueira/FIOCRUZ\",\"correspondingAuthor\":false,\"prefix\":\"\",\"firstName\":\"Bianca\",\"middleName\":\"\",\"lastName\":\"Abdala\",\"suffix\":\"\"},{\"id\":603370663,\"identity\":\"0cb853a9-9f6b-4a0a-aad1-e34c262b7e42\",\"order_by\":6,\"name\":\"Tatiana Takahasi Komoto\",\"email\":\"\",\"orcid\":\"\",\"institution\":\"Laboratory of Health Intelligence (LIS), Ribeirão Preto Medical School, University of São Paulo, São Paulo, Brazil\",\"correspondingAuthor\":false,\"prefix\":\"\",\"firstName\":\"Tatiana\",\"middleName\":\"Takahasi\",\"lastName\":\"Komoto\",\"suffix\":\"\"},{\"id\":603370865,\"identity\":\"980b092f-2cd0-49cc-9313-9f5d65d90063\",\"order_by\":7,\"name\":\"Bibiana Mello de Oliveirae\",\"email\":\"\",\"orcid\":\"\",\"institution\":\"InRaras - National Institute for Rare Diseases (InRaras), Porto Alegre Clinical Hospital, Porto Alegre, Brazil\",\"correspondingAuthor\":false,\"prefix\":\"\",\"firstName\":\"Bibiana\",\"middleName\":\"Mello\",\"lastName\":\"de Oliveirae\",\"suffix\":\"\"},{\"id\":603371137,\"identity\":\"f510a968-50f7-45bc-9e3c-4bb0cef2ac75\",\"order_by\":8,\"name\":\"Sayonara Gonzalezd\",\"email\":\"\",\"orcid\":\"\",\"institution\":\"InRaras - National Institute for Rare Diseases (InRaras), Porto Alegre Clinical Hospital, Porto Alegre, Brazil\",\"correspondingAuthor\":false,\"prefix\":\"\",\"firstName\":\"Sayonara\",\"middleName\":\"\",\"lastName\":\"Gonzalezd\",\"suffix\":\"\"},{\"id\":603372195,\"identity\":\"37f1ad3d-6dba-41f1-9d3a-c11cc56af71d\",\"order_by\":9,\"name\":\"Ida Schwartze\",\"email\":\"\",\"orcid\":\"\",\"institution\":\"InRaras - National Institute for Rare Diseases (InRaras), Porto Alegre Clinical Hospital, Porto Alegre, Brazil\",\"correspondingAuthor\":false,\"prefix\":\"\",\"firstName\":\"Ida\",\"middleName\":\"\",\"lastName\":\"Schwartze\",\"suffix\":\"\"},{\"id\":603373601,\"identity\":\"b01e2fb9-d3df-42eb-b520-4486087a89c0\",\"order_by\":10,\"name\":\"Domingos Alvesa\",\"email\":\"\",\"orcid\":\"\",\"institution\":\"RISE-Health, MEDCIDS, Faculty of Medicine, University of Porto, Porto, Portugal\",\"correspondingAuthor\":false,\"prefix\":\"\",\"firstName\":\"Domingos\",\"middleName\":\"\",\"lastName\":\"Alvesa\",\"suffix\":\"\"},{\"id\":603373896,\"identity\":\"f1f15977-ca97-4f69-889f-09f5a49ff22f\",\"order_by\":11,\"name\":\"Têmis Maria Félixe\",\"email\":\"\",\"orcid\":\"\",\"institution\":\"InRaras - National Institute for Rare Diseases (InRaras), Porto Alegre Clinical Hospital, Porto Alegre, Brazil\",\"correspondingAuthor\":false,\"prefix\":\"\",\"firstName\":\"Têmis\",\"middleName\":\"Maria\",\"lastName\":\"Félixe\",\"suffix\":\"\"}],\"badges\":[],\"createdAt\":\"2025-11-25 14:53:37\",\"currentVersionCode\":1,\"declarations\":\"\",\"doi\":\"10.21203/rs.3.rs-8204628/v1\",\"doiUrl\":\"https://doi.org/10.21203/rs.3.rs-8204628/v1\",\"draftVersion\":[],\"editorialEvents\":[],\"editorialNote\":\"\",\"failedWorkflow\":false,\"files\":[{\"id\":102829024,\"identity\":\"37c66309-2d1d-4427-aa98-d415d010978a\",\"added_by\":\"auto\",\"created_at\":\"2026-02-17 09:27:05\",\"extension\":\"jpg\",\"order_by\":1,\"title\":\"Figure 1\",\"display\":\"\",\"copyAsset\":false,\"role\":\"figure\",\"size\":378426,\"visible\":true,\"origin\":\"\",\"legend\":\"\\u003cp\\u003e\\u003cstrong\\u003ePRISMA 2020 flow diagram of the study selection process.\\u003c/strong\\u003e\\u003c/p\\u003e\",\"description\":\"\",\"filename\":\"Figura1PRISMA2020FlowDiagrampage0001.jpg\",\"url\":\"https://assets-eu.researchsquare.com/files/rs-8204628/v1/2b53c8b7c726939a094bfcd0.jpg\"},{\"id\":108799735,\"identity\":\"235373ed-09bb-4db9-918f-ac458c409445\",\"added_by\":\"auto\",\"created_at\":\"2026-05-08 14:00:12\",\"extension\":\"pdf\",\"order_by\":0,\"title\":\"\",\"display\":\"\",\"copyAsset\":false,\"role\":\"manuscript-pdf\",\"size\":741044,\"visible\":true,\"origin\":\"\",\"legend\":\"\",\"description\":\"\",\"filename\":\"manuscript.pdf\",\"url\":\"https://assets-eu.researchsquare.com/files/rs-8204628/v1/9e02cf4d-2447-40a3-8685-ec42603f4e25.pdf\"},{\"id\":102829041,\"identity\":\"0cbd28f0-5cfc-4bb6-a8e6-6e766330e2be\",\"added_by\":\"auto\",\"created_at\":\"2026-02-17 09:27:11\",\"extension\":\"xlsx\",\"order_by\":1,\"title\":\"\",\"display\":\"\",\"copyAsset\":false,\"role\":\"supplement\",\"size\":52218,\"visible\":true,\"origin\":\"\",\"legend\":\"\",\"description\":\"\",\"filename\":\"SupplementaryFile1.xlsx\",\"url\":\"https://assets-eu.researchsquare.com/files/rs-8204628/v1/b65410b42f1554fd39f2506b.xlsx\"},{\"id\":102828960,\"identity\":\"2d5e18b8-c89d-4184-a303-3f9a431327dc\",\"added_by\":\"auto\",\"created_at\":\"2026-02-17 09:26:53\",\"extension\":\"xlsx\",\"order_by\":2,\"title\":\"\",\"display\":\"\",\"copyAsset\":false,\"role\":\"supplement\",\"size\":50654,\"visible\":true,\"origin\":\"\",\"legend\":\"\",\"description\":\"\",\"filename\":\"SupplementaryFile2.xlsx\",\"url\":\"https://assets-eu.researchsquare.com/files/rs-8204628/v1/69acea4855ffb92a41e13678.xlsx\"},{\"id\":102828961,\"identity\":\"d6693a8a-04d5-49ce-83ef-4fde9dfa0d94\",\"added_by\":\"auto\",\"created_at\":\"2026-02-17 09:26:54\",\"extension\":\"ipynb\",\"order_by\":3,\"title\":\"\",\"display\":\"\",\"copyAsset\":false,\"role\":\"supplement\",\"size\":37390,\"visible\":true,\"origin\":\"\",\"legend\":\"\",\"description\":\"\",\"filename\":\"SupplementaryFile3NotebookAnalysis.ipynb\",\"url\":\"https://assets-eu.researchsquare.com/files/rs-8204628/v1/41f722c5e3a55298b88147af.ipynb\"}],\"financialInterests\":\"No competing interests reported.\",\"formattedTitle\":\"Minimum genomic data sets for rare diseases: A systematic review\",\"fulltext\":[{\"header\":\"1. Introduction\",\"content\":\" \\u003cp\\u003e \\u003cb\\u003ea. Contextualization of Rare Diseases\\u003c/b\\u003e \\u003c/p\\u003e\\u003cp\\u003eRare diseases (RD) have gained prominence in public health due to their collective contribution to chronic illness, despite their individual rarity. Approximately 6,000 to 8,000 distinct rare conditions collectively affect a substantial portion of the population, necessitating governmental interventions, including the development and implementation of targeted public policies (Austin et al., \\u003cspan citationid=\\\"CR1\\\" class=\\\"CitationRef\\\"\\u003e2018\\u003c/span\\u003e). Given that over 72% of RDs are of genetic origin, a comprehensive understanding requires detailed examination of genetic variation within affected populations \\u003cspan type=\\\"Underline\\\" class=\\\"Underline\\\" name=\\\"Emphasis\\\"\\u003e(\\u003c/span\\u003eFu et al., \\u003cspan citationid=\\\"CR9\\\" class=\\\"CitationRef\\\"\\u003e2023\\u003c/span\\u003e). Effective information sharing among stakeholders is critical for advancing genomic knowledge across diverse populations, underscoring the importance of collaborative, ethical, and methodologically rigorous research partnerships \\u003cspan type=\\\"Underline\\\" class=\\\"Underline\\\" name=\\\"Emphasis\\\"\\u003e(\\u003c/span\\u003eFu et al., \\u003cspan citationid=\\\"CR9\\\" class=\\\"CitationRef\\\"\\u003e2023\\u003c/span\\u003e).\\u003c/p\\u003e \\u003cp\\u003eUnderstanding the biological mechanisms underlying RDs and developing new diagnostic or therapeutic approaches for RDs remain challenging for researchers and the pharmaceutical sector. Key obstacles include poor institutional coordination and limited sharing of diagnostic resources and information (I\\u003cspan type=\\\"Underline\\\" class=\\\"Underline\\\" name=\\\"Emphasis\\\"\\u003enternational Rare Diseases Research Consortium, 2017;\\u003c/span\\u003e Taruscio et al., \\u003cspan citationid=\\\"CR38\\\" class=\\\"CitationRef\\\"\\u003e2020\\u003c/span\\u003e\\u003cspan type=\\\"Underline\\\" class=\\\"Underline\\\" name=\\\"Emphasis\\\"\\u003e)\\u003c/span\\u003e. Globally, health data management is hindered by a lack of standardized terminology and data structures (Wilkinson et al., \\u003cspan citationid=\\\"CR41\\\" class=\\\"CitationRef\\\"\\u003e2016\\u003c/span\\u003e). This limits effective data collection, recording, and analysis, which are essential for research and strong public health policies. These problems are especially acute in RDs, where data are often fragmented and dispersed \\u003cspan type=\\\"Underline\\\" class=\\\"Underline\\\" name=\\\"Emphasis\\\"\\u003e(\\u003c/span\\u003eWilkinson et al., \\u003cspan citationid=\\\"CR41\\\" class=\\\"CitationRef\\\"\\u003e2016\\u003c/span\\u003e).\\u003c/p\\u003e \\u003cp\\u003eRecent advances in sequencing technologies have significantly reduced the time required to translate genetic insights into patient outcomes, enabling therapeutic decisions to be made within days rather than years. This acceleration has improved the quality of life for families affected by rare diseases. The integration of genomics into research, diagnosis, and treatment is now a cornerstone of modern medicine. Genomic data analysis yields essential insights into the genetic mechanisms underlying RDs, thereby advancing both research and clinical care. Beyond variant detection, computational pipelines and phenotype-driven prioritization tools enable the identification of subtle mutations and structural variants that may be missed by conventional diagnostics. These innovations facilitate the development of precise diagnostic assays and support the identification of novel therapeutic targets tailored to specific molecular alterations. Additionally, integrating genomic data with clinical, phenotypic, and epidemiological information enables patient stratification, supports more effective treatment strategies, and advances precision public health initiatives \\u003cspan type=\\\"Underline\\\" class=\\\"Underline\\\" name=\\\"Emphasis\\\"\\u003e(\\u003c/span\\u003eKent et al., \\u003cspan citationid=\\\"CR17\\\" class=\\\"CitationRef\\\"\\u003e2023\\u003c/span\\u003e).\\u003c/p\\u003e \\u003cp\\u003eRecent literature highlights the importance of establishing minimum data sets (MDS) to ensure ethical, efficient, and standardized data collection, improving the planning, implementation, and evaluation of public health interventions (Bernardi et al., \\u003cspan citationid=\\\"CR2\\\" class=\\\"CitationRef\\\"\\u003e2023\\u003c/span\\u003e). Genomic MDS provide structured frameworks that integrate genetic variants, clinical data, and metadata to promote consistency and interoperability. This standardization helps create robust, comparable datasets that support accurate diagnosis, personalized treatment, and evidence-based policy development, while enabling secure and efficient data sharing across health systems (Stark et al., \\u003cspan citationid=\\\"CR36\\\" class=\\\"CitationRef\\\"\\u003e2019\\u003c/span\\u003e).\\u003c/p\\u003e \\u003cp\\u003e \\u003cb\\u003eb. Study scenario and relevance\\u003c/b\\u003e \\u003c/p\\u003e \\u003cp\\u003eResearch and understanding of RD are increasingly supported by comprehensive databases and resources that consolidate diverse data types, thereby facilitating academic research and clinical applications. An example of this integration is the work of the National Institutes of Health (NIH) through the Genetic and Rare Diseases Information Center (GARD), which has developed a disease harmonization database. This platform is significant because it combines GARD data with other databases to enable the investigation of RD, particularly those with a genetic etiology. This integration is crucial for researchers and healthcare providers, as it allows for a deeper understanding of RD through accessible, high-quality information \\u003cspan type=\\\"Underline\\\" class=\\\"Underline\\\" name=\\\"Emphasis\\\"\\u003e(\\u003c/span\\u003eSequeira et al., \\u003cspan citationid=\\\"CR34\\\" class=\\\"CitationRef\\\"\\u003e2021\\u003c/span\\u003e).\\u003c/p\\u003e \\u003cp\\u003eThe analysis presented by Pintos-Morell et al. highlights the evolving landscape of genomic implementation in newborn screening for hereditary metabolic disorders. This research emphasizes the integration of genomic tools into public health strategies to improve early detection and treatment of RD, thereby significantly enhancing health outcomes. These efforts reflect a broader trend toward the use of genomic data for disease diagnosis, proactive health management, and preventive care \\u003cspan type=\\\"Underline\\\" class=\\\"Underline\\\" name=\\\"Emphasis\\\"\\u003e(\\u003c/span\\u003ePintos-Morell et al., \\u003cspan citationid=\\\"CR24\\\" class=\\\"CitationRef\\\"\\u003e2024\\u003c/span\\u003e\\u003cspan type=\\\"Underline\\\" class=\\\"Underline\\\" name=\\\"Emphasis\\\"\\u003e)\\u003c/span\\u003e. Complementing these resources, FindZebra is a specialized tool for diagnosing RD and indexing articles from GARD and other notable databases such as Online Mendelian Inheritance in Man (OMIM) and Orphanet. Recognizing the challenges of diagnosing RD, given its rarity and complex phenotypes, FindZebra is designed to optimize the search for RD based on symptoms, clinical characteristics, and phenotypic information \\u003cspan type=\\\"Underline\\\" class=\\\"Underline\\\" name=\\\"Emphasis\\\"\\u003e(\\u003c/span\\u003eLi\\u0026eacute;vin et al., \\u003cspan citationid=\\\"CR20\\\" class=\\\"CitationRef\\\"\\u003e2023\\u003c/span\\u003e\\u003cspan type=\\\"Underline\\\" class=\\\"Underline\\\" name=\\\"Emphasis\\\"\\u003e)\\u003c/span\\u003e.\\u003c/p\\u003e \\u003cp\\u003eAt the continental level, the European Genomic Data Infrastructure (EGDI) focuses on creating and maintaining a genomic data infrastructure across Europe (Schmitt et al., \\u003cspan citationid=\\\"CR33\\\" class=\\\"CitationRef\\\"\\u003e2024\\u003c/span\\u003e). This initiative facilitates the sharing and analysis of large volumes of genomic data, essential for advancing personalized medicine and improving the understanding and treatment of diseases, including RD (Visibelli et al., \\u003cspan citationid=\\\"CR40\\\" class=\\\"CitationRef\\\"\\u003e2022\\u003c/span\\u003e). The EGDI builds on the outcomes of the Beyond 1\\u0026nbsp;Million Genomes (B1MG) project and fulfills the ambition of the 1\\u0026thinsp;+\\u0026thinsp;Million Genomes (1\\u0026thinsp;+\\u0026thinsp;MG) initiative by establishing a federated, sustainable, and secure infrastructure for accessing genomic, phenotypic, and related clinical data\\u003cspan type=\\\"Underline\\\" class=\\\"Underline\\\" name=\\\"Emphasis\\\"\\u003e(\\u003c/span\\u003eSchmitt et al., \\u003cspan citationid=\\\"CR33\\\" class=\\\"CitationRef\\\"\\u003e2024\\u003c/span\\u003e).\\u003c/p\\u003e \\u003cp\\u003eSimilarly, DisGeNET and ClinVar are among the largest publicly accessible collections of genes and variants associated with human diseases. They integrate data from curated repositories, genomic association study catalogs, animal models, and extensive scientific literature, uniformly annotated with controlled vocabularies and community-driven ontologies\\u003cspan type=\\\"Underline\\\" class=\\\"Underline\\\" name=\\\"Emphasis\\\"\\u003e(\\u003c/span\\u003ePi\\u0026ntilde;ero et al., \\u003cspan citationid=\\\"CR23\\\" class=\\\"CitationRef\\\"\\u003e2019\\u003c/span\\u003e\\u003cspan type=\\\"Underline\\\" class=\\\"Underline\\\" name=\\\"Emphasis\\\"\\u003e)\\u003c/span\\u003e. The Global Alliance for Genomics and Health (GA4GH) is an international coalition that develops frameworks and standards to facilitate the responsible, voluntary, and secure sharing of genomic and clinical data. It aims to accelerate genomic research and medicine by promoting interoperability and data sharing across institutions worldwide, enabling large-scale collaborative studies and advancing understanding of human health and disease globally \\u003cspan type=\\\"Underline\\\" class=\\\"Underline\\\" name=\\\"Emphasis\\\"\\u003e(\\u003c/span\\u003eWorld Health Organization, \\u003cspan citationid=\\\"CR43\\\" class=\\\"CitationRef\\\"\\u003e2024\\u003c/span\\u003e\\u003cspan type=\\\"Underline\\\" class=\\\"Underline\\\" name=\\\"Emphasis\\\"\\u003e)\\u003c/span\\u003e.\\u003c/p\\u003e \\u003cp\\u003eBuilding on this foundation, recent global initiatives, such as those outlined by the World Health Organization (WHO), aim to establish ethical, legal, and equitable frameworks for the access, use, and sharing of human genome data. These frameworks ensure that such activities promote human health and well-being, uphold social justice, and foster public trust and transparency. WHO principles emphasize the importance of including diverse populations in genomic datasets to avoid perpetuating health inequities and ensure that genomic data align with local health needs and contexts.\\u003c/p\\u003e \\u003cp\\u003eVarious initiatives and databases support efforts to advance genomic medicine in Latin America by integrating genomic data into clinical practice and research. The Latin American Network for Genomic Medicine (LatinGen) fosters the integration of genomic data into clinical practice across Latin America and promotes collaboration among researchers, clinicians, and institutions. The Leiden Open Variation Database (LOVD) platforms serve as critical repositories for genetic variants, with a focus on local populations in Argentina and Mexico. Similarly, ChileGen\\u0026oacute;mico is a national initiative to integrate genomic data into clinical practice and research. These efforts enhance precision medicine initiatives, improving the understanding of the genetic basis of diseases in these countries (Bernardi et al., \\u003cspan citationid=\\\"CR3\\\" class=\\\"CitationRef\\\"\\u003e2025\\u003c/span\\u003e).\\u003c/p\\u003e \\u003cp\\u003eThe Brazilian Initiative on Precision Medicine (BIPMed) collects and shares genomic data specific to the Brazilian population\\u003cspan type=\\\"Underline\\\" class=\\\"Underline\\\" name=\\\"Emphasis\\\"\\u003e(\\u003c/span\\u003eRocha et al., \\u003cspan citationid=\\\"CR27\\\" class=\\\"CitationRef\\\"\\u003e2020\\u003c/span\\u003e). Similarly, in focusing on RD diagnosis, the Brazilian Rare Genomes Project aims to integrate whole-genome sequencing (WGS) into the Brazilian public healthcare system \\u003cspan type=\\\"Underline\\\" class=\\\"Underline\\\" name=\\\"Emphasis\\\"\\u003e(\\u003c/span\\u003eCoelho et al., \\u003cspan citationid=\\\"CR6\\\" class=\\\"CitationRef\\\"\\u003e2022\\u003c/span\\u003e\\u003cspan type=\\\"Underline\\\" class=\\\"Underline\\\" name=\\\"Emphasis\\\"\\u003e)\\u003c/span\\u003e. These initiatives aim to address the genetic diversity and ancestry proportions of Brazilian populations, thereby enhancing precision medicine and improving diagnostic capabilities for genetic and rare disorders. They provide valuable insights into disease predisposition and help to fill gaps in global genomic databases.\\u003c/p\\u003e \\u003cp\\u003eDespite the growth of genomics programs and rare-disease registries, many initiatives still collect genomic elements in ways that are difficult to align across projects. Fragmented data structures and uneven documentation of sequencing and interpretation workflows hinder reproducible diagnosis and downstream reuse, thereby weakening health systems' ability to scale evidence-based rare-disease policies (Bernardi et al., \\u003cspan citationid=\\\"CR2\\\" class=\\\"CitationRef\\\"\\u003e2023\\u003c/span\\u003e). These barriers are amplified in low-resource or historically underrepresented settings, where limitations in infrastructure, funding, and specialized workforce can constrain adoption. Strengthening international collaboration and investing in locally appropriate implementation strategies are therefore central to equitable uptake.\\u003c/p\\u003e \\u003cp\\u003eA systematic review of genomic minimum data sets can clarify how existing initiatives define \\u0026ldquo;minimum\\u0026rdquo; in practice, where genomic reporting is most frequently incomplete, and which design choices are most consistently associated with intended outcomes. By consolidating how MDS are specified and used, the literature can be mapped to reveal common patterns, gaps, and opportunities for harmonization that support both research and care pathways.\\u003c/p\\u003e \\u003cp\\u003e \\u003cb\\u003ec. Purpose of the Review\\u003c/b\\u003e \\u003c/p\\u003e \\u003cp\\u003eThis review evaluates how genomic minimum data sets are defined, operationalized, and reported in rare-disease research and clinical contexts, and summarizes the impacts attributed to their use (e.g., diagnostic, knowledge-generation, and system-level outcomes).\\u003c/p\\u003e\"},{\"header\":\"2. Methods\",\"content\":\"\\u003cp\\u003e \\u003cb\\u003ea) Study Design\\u003c/b\\u003e \\u003c/p\\u003e\\u003cp\\u003eWe conducted a systematic review (SR) to synthesize evidence on how genomic minimum data sets are defined and applied in rare-disease research and clinical practice. The review methods followed established systematic review guidance to ensure transparent selection, extraction, and synthesis (Higgins et al., \\u003cspan citationid=\\\"CR13\\\" class=\\\"CitationRef\\\"\\u003e2019\\u003c/span\\u003e).\\u003c/p\\u003e \\u003cp\\u003eThe review protocol was prospectively registered in PROSPERO (CRD42024510192) to document the planned methods in advance and support transparency in reporting (\\u003cspan class=\\\"ExternalRef\\\"\\u003e\\u003cspan class=\\\"RefSource\\\"\\u003ehttps://www.crd.york.ac.uk/prospero\\u003c/span\\u003e\\u003cspan address=\\\"https://www.crd.york.ac.uk/prospero\\\" targettype=\\\"URL\\\" class=\\\"RefTarget\\\"\\u003e\\u003c/span\\u003e\\u003c/span\\u003e\\u003cspan type=\\\"Underline\\\" class=\\\"Underline\\\" name=\\\"Emphasis\\\"\\u003e)\\u003c/span\\u003e\\u003c/p\\u003e \\u003cp\\u003e \\u003cb\\u003eb) Defining the Research Question\\u003c/b\\u003e \\u003c/p\\u003e \\u003cp\\u003eWe structured the review question using the PIcO framework (Population, Phenomenon of Interest, Comparison, Outcome) to define the target population, the genomic MDS concept under evaluation, and the outcomes of interest, thereby keeping eligibility criteria aligned with the review objectives (Higgins et al., \\u003cspan citationid=\\\"CR13\\\" class=\\\"CitationRef\\\"\\u003e2019\\u003c/span\\u003e). The PIcO strategy for this study was defined as follows:\\u003c/p\\u003e \\u003cp\\u003e \\u003cstrong\\u003ePopulation (P)\\u003c/strong\\u003e \\u003cp\\u003eRD patients (of genetic or non-genetic origin), including population groups involved in studies related to research, diagnosis, and treatment using genomic data.\\u003c/p\\u003e \\u003c/p\\u003e \\u003cp\\u003e \\u003cstrong\\u003ePhenomenon of Interest (I)\\u003c/strong\\u003e \\u003cp\\u003eDefinition, use, and analysis of genomic MDS in studies, diagnostics, or clinical practices.\\u003c/p\\u003e \\u003c/p\\u003e \\u003cp\\u003e \\u003cb\\u003eComparison(c)\\u003c/b\\u003e/\\u003cb\\u003eOutcome (O)\\u003c/b\\u003e: Clinical and research impacts derived from implementing minimal genomic data sets, such as improved diagnostic accuracy, personalized treatments, and enhanced data interoperability.\\u003c/p\\u003e \\u003cp\\u003eThe mnemonic of this strategy led to the following central question: \\u003cem\\u003eHow are minimal genomic datasets defined and utilized in the research, diagnosis, and treatment of rare monogenic diseases?\\u003c/em\\u003e\\u003c/p\\u003e \\u003cp\\u003eGiven the diversity of clinical presentations and their relevance for treatment and research, four additional subquestions guide this investigation. These subquestions are intended to deepen the analysis of practical and differential aspects in the application of genomic data in rare monogenic diseases, specifically:\\u003c/p\\u003e \\u003cp\\u003e \\u003col\\u003e \\u003cspan\\u003e \\u003cli\\u003e \\u003cp\\u003e \\u003cem\\u003eWhich genetic RD and patient populations are included in studies of minimal genomic data sets?\\u003c/em\\u003e \\u003c/p\\u003e \\u003c/li\\u003e \\u003c/span\\u003e \\u003cspan\\u003e \\u003cli\\u003e \\u003cp\\u003e \\u003cem\\u003eWhat specific types of genomic data are selected for these minimal sets, and how often are they updated or revised?\\u003c/em\\u003e \\u003c/p\\u003e \\u003c/li\\u003e \\u003c/span\\u003e \\u003cspan\\u003e \\u003cli\\u003e \\u003cp\\u003e \\u003cem\\u003eWhat are the clinical impacts and research advancements observed with the implementation of these minimal genomic data sets?\\u003c/em\\u003e \\u003c/p\\u003e \\u003c/li\\u003e \\u003c/span\\u003e \\u003c/ol\\u003e \\u003c/p\\u003e \\u003cp\\u003eTo preserve a focused biological and interpretive scope for genomic elements, we limited the population to monogenic rare diseases and excluded non-monogenic rare conditions. This restriction supported clearer comparisons in how \\u0026ldquo;minimum\\u0026rdquo; genomic content is specified, curated, and reused across initiatives.\\u003c/p\\u003e \\u003cp\\u003e \\u003cb\\u003ec) Key concepts definition\\u003c/b\\u003e \\u003c/p\\u003e \\u003cp\\u003eTo build a sensitive and specific search strategy, we combined controlled vocabulary terms (e.g., MeSH) with free-text keywords. Controlled terms helped standardize concept retrieval across databases, while keywords captured newer or commonly used expressions that may not yet be consistently indexed. This dual approach aimed to improve recall without sacrificing relevance, and the final PIcO-derived descriptor set is summarized in Table\\u0026nbsp;\\u003cspan refid=\\\"Tab1\\\" class=\\\"InternalRef\\\"\\u003e1\\u003c/span\\u003e.\\u003c/p\\u003e \\u003cp\\u003e \\u003cdiv class=\\\"gridtable\\\"\\u003e\\u003ctable float=\\\"Yes\\\" id=\\\"Tab1\\\" border=\\\"1\\\"\\u003e \\u003ccaption language=\\\"En\\\"\\u003e \\u003cdiv class=\\\"CaptionNumber\\\"\\u003eTable 1\\u003c/div\\u003e \\u003cdiv class=\\\"CaptionContent\\\"\\u003e \\u003cp\\u003ePIcO Strategy and Descriptors Used in the Search Strategy\\u003c/p\\u003e \\u003c/div\\u003e \\u003c/caption\\u003e \\u003ccolgroup cols=\\\"3\\\"\\u003e \\u003cdiv align=\\\"left\\\" class=\\\"colspec\\\" colname=\\\"c1\\\" colnum=\\\"1\\\"\\u003e\\u003c/div\\u003e \\u003cdiv align=\\\"left\\\" class=\\\"colspec\\\" colname=\\\"c2\\\" colnum=\\\"2\\\"\\u003e\\u003c/div\\u003e \\u003cdiv align=\\\"left\\\" class=\\\"colspec\\\" colname=\\\"c3\\\" colnum=\\\"3\\\"\\u003e\\u003c/div\\u003e \\u003cthead\\u003e \\u003ctr\\u003e \\u003cth align=\\\"left\\\" colname=\\\"c1\\\"\\u003e \\u003cp\\u003ePIcO Strategy\\u003c/p\\u003e \\u003c/th\\u003e \\u003cth align=\\\"left\\\" colname=\\\"c2\\\"\\u003e \\u003cp\\u003eControlled Descriptors (MeSH)\\u003c/p\\u003e \\u003c/th\\u003e \\u003cth align=\\\"left\\\" colname=\\\"c3\\\"\\u003e \\u003cp\\u003eUncontrolled Descriptors (Keywords)\\u003c/p\\u003e \\u003c/th\\u003e \\u003c/tr\\u003e \\u003c/thead\\u003e \\u003ctbody\\u003e \\u003ctr\\u003e \\u003ctd align=\\\"left\\\" colname=\\\"c1\\\"\\u003e \\u003cp\\u003e\\u003cb\\u003eP -\\u003c/b\\u003e \\u003cspan type=\\\"BoldUnderline\\\" class=\\\"BoldUnderline\\\" name=\\\"Emphasis\\\"\\u003eP\\u003c/span\\u003e\\u003cb\\u003eopulation\\u003c/b\\u003e\\u003c/p\\u003e \\u003c/td\\u003e \\u003ctd align=\\\"left\\\" colname=\\\"c2\\\"\\u003e \\u003cp\\u003eRare Diseases\\u003c/p\\u003e \\u003cp\\u003eGenetic Diseases\\u003c/p\\u003e \\u003c/td\\u003e \\u003ctd align=\\\"left\\\" colname=\\\"c3\\\"\\u003e \\u003cp\\u003eOrphan diseases\\u003c/p\\u003e \\u003cp\\u003eGenetic rare diseases; \\u003c/p\\u003e \\u003cp\\u003eRare Genetic Disorders;\\u003c/p\\u003e \\u003c/td\\u003e \\u003c/tr\\u003e \\u003ctr\\u003e \\u003ctd align=\\\"left\\\" colname=\\\"c1\\\"\\u003e \\u003cp\\u003e\\u003cb\\u003eI - (Phenomenon of)\\u003c/b\\u003e \\u003cspan type=\\\"BoldUnderline\\\" class=\\\"BoldUnderline\\\" name=\\\"Emphasis\\\"\\u003eI\\u003c/span\\u003e\\u003cb\\u003enterest\\u003c/b\\u003e\\u003c/p\\u003e \\u003c/td\\u003e \\u003ctd align=\\\"left\\\" colname=\\\"c2\\\"\\u003e \\u003cp\\u003eHealth Information Systems\\u003c/p\\u003e \\u003cp\\u003eData Collection\\u003c/p\\u003e \\u003cp\\u003eHealth Services Research\\u003c/p\\u003e \\u003c/td\\u003e \\u003ctd align=\\\"left\\\" colname=\\\"c3\\\"\\u003e \\u003cp\\u003eminimum data set\\u003c/p\\u003e \\u003cp\\u003ecore data\\u003c/p\\u003e \\u003cp\\u003eData gathering\\u003c/p\\u003e \\u003c/td\\u003e \\u003c/tr\\u003e \\u003ctr\\u003e \\u003ctd align=\\\"left\\\" colname=\\\"c1\\\"\\u003e \\u003cp\\u003e\\u003cb\\u003ecO - Outcome\\u003c/b\\u003e\\u003c/p\\u003e \\u003c/td\\u003e \\u003ctd align=\\\"left\\\" colname=\\\"c2\\\"\\u003e \\u003cp\\u003eDiagnostic Accuracy\\u003c/p\\u003e \\u003cp\\u003eTreatment Outcome\\u003c/p\\u003e \\u003cp\\u003eData Interoperability\\u003c/p\\u003e \\u003cp\\u003eHealth Informatics\\u003c/p\\u003e \\u003cp\\u003eGenomic Data Sharing\\u003c/p\\u003e \\u003cp\\u003eHealth Policy\\u003c/p\\u003e \\u003cp\\u003ePrecision Medicine\\u003c/p\\u003e \\u003cp\\u003ePublic Health\\u003c/p\\u003e \\u003cp\\u003eEthics, Medical\\u003c/p\\u003e \\u003cp\\u003eBiomedical Research\\u003c/p\\u003e \\u003cp\\u003ePersonalized Medicine\\u003c/p\\u003e \\u003c/td\\u003e \\u003ctd align=\\\"left\\\" colname=\\\"c3\\\"\\u003e \\u003cp\\u003eDiagnostic accuracy;\\u003c/p\\u003e \\u003cp\\u003eDiagnostic precision;\\u003c/p\\u003e \\u003cp\\u003eTreatment outcomes;\\u003c/p\\u003e \\u003cp\\u003eTherapeutic outcomes;\\u003c/p\\u003e \\u003cp\\u003eData interoperability;\\u003c/p\\u003e \\u003cp\\u003eHealth informatics;\\u003c/p\\u003e \\u003cp\\u003eMedical informatics;\\u003c/p\\u003e \\u003cp\\u003eGenomic data sharing;\\u003c/p\\u003e \\u003cp\\u003ePublic Health Policy;\\u003c/p\\u003e \\u003cp\\u003eGenomic policies\\u003c/p\\u003e \\u003cp\\u003eTargeted therapy;\\u003c/p\\u003e \\u003cp\\u003eHealthcare regulations;\\u003c/p\\u003e \\u003c/td\\u003e \\u003c/tr\\u003e \\u003c/tbody\\u003e \\u003c/colgroup\\u003e\\u003c/table\\u003e\\u003c/div\\u003e \\u003c/p\\u003e \\u003cp\\u003e\\u003cb\\u003ed) Search strategy\\u003c/b\\u003e\\u003c/p\\u003e \\u003cp\\u003eSearch strings were tailored to each database\\u0026rsquo;s advanced interface, using combinations of controlled terms and keywords with Boolean logic to balance sensitivity and precision. We executed electronic searches in November 2024 in PubMed (NLM), Scopus, LILACS, Web of Science, and CINAHL. Boolean operators (AND/OR/NOT) and database-specific filters were applied as needed to refine retrieval while preserving the conceptual scope defined in PIcO.\\u003c/p\\u003e \\u003cp\\u003e \\u003cb\\u003ee) Screening, Selection, and Extraction of studies\\u003c/b\\u003e \\u003c/p\\u003e \\u003cp\\u003eEligibility criteria were applied consistently across screening and full-text review to ensure that included publications addressed genomic components of MDS for human monogenic rare diseases. We included studies in English, Portuguese, Spanish, or Italian, with no publication-year restriction. We excluded publications unrelated to human genomic MDS (e.g., non-human studies or MDS outside genomics), records not accessible electronically due to paywalls, and non-scholarly sources such as websites or social media advertisements.\\u003c/p\\u003e \\u003cp\\u003eAll records were de-duplicated and managed in Rayyan (\\u003cspan class=\\\"ExternalRef\\\"\\u003e\\u003cspan class=\\\"RefSource\\\"\\u003ehttps://rayyan.qcri.org/\\u003c/span\\u003e\\u003cspan address=\\\"https://rayyan.qcri.org/\\\" targettype=\\\"URL\\\" class=\\\"RefTarget\\\"\\u003e\\u003c/span\\u003e\\u003c/span\\u003e\\u003cspan type=\\\"Underline\\\" class=\\\"Underline\\\" name=\\\"Emphasis\\\"\\u003e)\\u003c/span\\u003e to support blinded screening and collaborative review workflows (Ouzzani et al., \\u003cspan citationid=\\\"CR22\\\" class=\\\"CitationRef\\\"\\u003e2016\\u003c/span\\u003e). Titles/abstracts/keywords were screened first, followed by full-text assessment of retained articles. Two reviewers independently assessed full texts; disagreements were resolved through adjudication. Inter-reviewer agreement (include/exclude) was summarized using raw agreement and Cohen\\u0026rsquo;s κ based on the latest independent decisions, excluding adjudicator votes.\\u003c/p\\u003e \\u003cp\\u003eFor included studies, we extracted: (i) design and publication characteristics; (ii) descriptive features of the proposed/used MDS (e.g., geographical scope, targeted disease group, referenced standards); (iii) the context of MDS use (research, laboratory, diagnostic, and/or treatment settings); and (iv) reported outcomes, including diagnostic effects, treatment personalization, and system-level impacts (e.g., interoperability or policy relevance). We prioritized documents aligned with WHO principles for access, use, and sharing of human genome data where applicable (World Health Organization, \\u003cspan citationid=\\\"CR43\\\" class=\\\"CitationRef\\\"\\u003e2024\\u003c/span\\u003e).\\u003c/p\\u003e \\u003cp\\u003e \\u003cb\\u003ef) Analysis and synthesis of evidence\\u003c/b\\u003e \\u003c/p\\u003e \\u003cp\\u003eWe synthesized evidence using a mixed approach that combined structured narrative synthesis with thematic coding. First, studies were grouped by scope and intended use (e.g., registry design, clinical implementation, research infrastructure), enabling comparisons of the genomic elements selected, their documentation, and the outcomes reported. We then applied thematic coding to capture recurring design features (e.g., sequencing modality specification, ontology use, re-analysis planning) and to interpret how these features were discussed in relation to diagnostic, knowledge-generation, and system-level impacts.\\u003c/p\\u003e \\u003cp\\u003eFor quantitative summaries, domains, contexts, and outcomes were coded as binary indicators (present/absent). Explicit \\u0026ldquo;Yes\\u0026rdquo; statements were coded as 1; \\u0026ldquo;Qualitative/Indirect,\\u0026rdquo; \\u0026ldquo;Not reported,\\u0026rdquo; and \\u0026ldquo;No/Absent\\u0026rdquo; were coded as 0 (while their frequencies were still described). Using these indicators, we defined three composite indices: a Clinical Impact Index (diagnostic accuracy, time-to-diagnosis, personalized treatment, prognosis change; range 0\\u0026ndash;4), a Knowledge Index (new gene/biomarker discovery, genomic annotation advances, molecular classification improvements; range 0\\u0026ndash;3), and a System Index (policy influence, impact on public programs, interoperability/standardization; range 0\\u0026ndash;3). Their sum formed a Total Impact Score (range 0\\u0026ndash;10). Co-occurrence between elements was assessed using the Jaccard index (intersection/union), and temporal comparisons were made between two periods (\\u0026le;\\u0026thinsp;2018 vs. 2019\\u0026ndash;2024) without sample-size weighting.\\u003c/p\\u003e \\u003cp\\u003eFinally, the qualitative interpretation emphasized the relevance of implementation, how MDS components were operationalized in practice, and the implications for generalizability and clinical utility across settings.\\u003c/p\\u003e \\u003cp\\u003e \\u003cb\\u003eg) Risk of bias assessment\\u003c/b\\u003e \\u003c/p\\u003e \\u003cp\\u003eWe assessed methodological quality using the Mixed Methods Appraisal Tool (MMAT, 2018), which supports appraisal across heterogeneous designs (qualitative, quantitative descriptive, non-randomized, randomized, and mixed-methods studies). The tool evaluates design-appropriate criteria for question fit, data adequacy, sampling/representativeness, risk of nonresponse/confounding, and coherence between the data and the conclusions (Hong et al., \\u003cspan citationid=\\\"CR14\\\" class=\\\"CitationRef\\\"\\u003e2018\\u003c/span\\u003e). Two reviewers (FAB and TTK) independently appraised each study and resolved disagreements by consensus. In line with MMAT guidance, we did not compute an overall numeric score; instead, we reported criterion-level judgments to make strengths and limitations transparent.\\u003c/p\\u003e \\u003cp\\u003e \\u003cb\\u003eh) Presentation of results\\u003c/b\\u003e \\u003c/p\\u003e \\u003cp\\u003eWe organized the findings to support comparisons across studies and to provide a clear visualization of the evidence base. Extracted data were summarized in a structured table (e.g., author/year, study focus, design, target population, and MDS use context), allowing direct cross-study comparison of genomic and non-genomic domains and reported outcomes.\\u003c/p\\u003e \\u003cp\\u003eWe reported the selection process using a PRISMA-aligned flow diagram that documents the identification, screening, eligibility, and inclusion stages, including the reasons for exclusion at each stage. In addition to tabular summaries, we provided a descriptive narrative of major patterns and gaps, integrating thematic findings with quantitative summaries to highlight clinical and research implications of genomic MDS choices.\\u003c/p\\u003e\"},{\"header\":\"3. Results\",\"content\":\"\\u003cdiv id=\\\"Sec3\\\" class=\\\"Section2\\\"\\u003e \\u003ch2\\u003e3.1 General characteristics of the studies\\u003c/h2\\u003e \\u003cp\\u003eA total of 23 studies published between 2004 and 2024 were included, with a marked increase observed from 2019 onwards (13/23; 56.5%). Collaboration was prevalent, with 8 of 23 studies (34.8%) involving multinational or international partnerships. These cross-border collaborations resulted in a 25% increase in sample size compared to single-country studies, thereby enhancing the robustness and comprehensiveness of analyses. Geographically, Europe contributed 12 studies (52.2%), followed by North America (6; 26.1%), Asia (3; 13.0%), and Oceania (3; 13.0%), with Africa and South America each contributing one study (4.3%). Two studies (8.7%) were described as 'international' without specifying the country. As studies may span multiple regions, percentages reflect participation and do not sum to 100%. By country, the United States (n\\u0026thinsp;=\\u0026thinsp;5) and Italy (n\\u0026thinsp;=\\u0026thinsp;4) were the most frequent contributors, followed by Australia (n\\u0026thinsp;=\\u0026thinsp;3) and France (n\\u0026thinsp;=\\u0026thinsp;3). Canada and the United Kingdom each appeared twice, while Belgium, Brazil, Egypt, and Japan each appeared once. More than 65% of studies characterized the MDS as modular and updatable. Thematic mapping indicated contributions in neuromuscular or neuro disorders (7/23; 30.4%), general monogenic conditions (6/23; 26.1%), autoinflammatory diseases (2/23; 8.7%), metabolic disorders (1/23; 4.3%), and a pan-rare, general, or unspecified group (7/23; 30.4%).\\u003c/p\\u003e \\u003cp\\u003eThe study selection process is summarized in the PRISMA 2020 flow diagram (Fig.\\u0026nbsp;1). Across the five databases, we retrieved 549 records and removed 162 duplicates, leaving 387 unique records for title/abstract screening. We excluded 284 records at this stage due to a lack of a rare-disease focus, the absence of MDS content, or the absence of a genomic component. We then assessed 103 full texts, excluding 80 that did not meet the inclusion criteria (e.g., methodological papers, abstracts, reviews, or registry descriptions outside the scope). In total, 23 studies were included; full study-level details are provided in Supplementary file 1.\\u003c/p\\u003e \\u003cp\\u003e \\u003cb\\u003eFigure 1. PRISMA 2020 flow diagram of the study selection process.\\u003c/b\\u003e \\u003c/p\\u003e \\u003c/div\\u003e \\u003cdiv id=\\\"Sec4\\\" class=\\\"Section2\\\"\\u003e \\u003ch2\\u003e3.2 Composition of the MDS\\u003c/h2\\u003e \\u003cp\\u003eClinical/phenotypic data were nearly universal (22/23; 95.7%), followed by genetic/genomic/variant data (16/23; 69.6%). Additional modalities included imaging (7/23; 30.4%), demographics (11/23; 47.8%), metadata/administrative (5/23; 21.7%), laboratory results (5/23; 21.7%), and biobank references (2/23; 8.7%). Each study combined a median of 3 modalities (IQR, 2\\u0026ndash;4; distribution: 1/23 with one modality, 6/23 with 2, 9/23 with 3, and 7/23 with 4), indicating a tendency toward multimodality.\\u003c/p\\u003e \\u003cp\\u003eCo-occurrence patterns indicated that the clinical layer served as the integrative axis. Among studies with genomic data, 15 of 16 also included clinical data (Jaccard\\u0026thinsp;\\u0026asymp;\\u0026thinsp;index 0.65). Clinical and demographic variables were frequently reported (10/11; Jaccard's approximation\\u0026thinsp;=\\u0026thinsp;\\u0026asymp;\\u0026thinsp;0.43). All imaging-positive studies (7/7) also reported clinical data (Jaccard\\u0026thinsp;\\u0026asymp;\\u0026thinsp;0.32 for clinical and imaging), and genomic and imaging co-occurred in 5/7 (Jaccard\\u0026thinsp;\\u0026asymp;\\u0026thinsp;0.28). The most common exact combinations were clinical\\u0026thinsp;+\\u0026thinsp;genomic\\u0026thinsp;+\\u0026thinsp;demographic (4/23; 17.4%), clinical\\u0026thinsp;+\\u0026thinsp;genomic\\u0026thinsp;+\\u0026thinsp;imaging (3/23; 13.0%), and clinical\\u0026thinsp;+\\u0026thinsp;genomic\\u0026thinsp;+\\u0026thinsp;demographic\\u0026thinsp;+\\u0026thinsp;laboratory (2/23; 8.7%). Taken together, these top three combinations accounted for \\u0026asymp;\\u0026thinsp;39% of multimodal studies, suggesting a minimal recurrent nucleus (clinical and genomic) with contextual (demographic) and instrumental (imaging/laboratory) layers added.\\u003c/p\\u003e \\u003cp\\u003eModality usage varied across application contexts, with clinical data present in 100% of them. Genomic inclusion was higher in diagnostic and clinical care (\\u0026asymp;\\u0026thinsp;81%) than in research (\\u0026asymp;\\u0026thinsp;67%). Imaging was more prevalent in therapeutic and decision-support applications (\\u0026asymp;\\u0026thinsp;50%). In platform/integration contexts, demographics (\\u0026asymp;\\u0026thinsp;55%) and laboratory (\\u0026asymp;\\u0026thinsp;36%) were relatively frequent, consistent with operational interoperability.\\u003c/p\\u003e \\u003c/div\\u003e \\u003cdiv id=\\\"Sec5\\\" class=\\\"Section2\\\"\\u003e \\u003ch2\\u003e3.3 Application contexts\\u003c/h2\\u003e \\u003cp\\u003eMDS were predominantly applied in biomedical/genomic research (21/23; 91.3%) and diagnosis/clinical care (16/23; 69.6%), with notable frequencies in platform/system integration (11/23; 47.8%) and health planning/public policy (10/23; 43.5%). Additionally, less frequent contexts included therapeutic/decision support (6/23; 26.1%), epidemiological surveillance (3/23; 13.0%), patient registries (3/23; 13.0%), biobanks (3/23; 13.0%), and newborn screening (1/23; 4.3%). On average, each study spanned multiple contexts, typically two to three per MDS, underscoring the cross-cutting nature of use cases.\\u003c/p\\u003e \\u003cp\\u003eContext overlaps were consistent. The most frequent and tight combination was research and clinical (16 shared studies; Jaccard\\u0026thinsp;\\u0026asymp;\\u0026thinsp;coefficient 0.76), followed by research and platforms (11; Jaccard\\u0026thinsp;\\u0026asymp;\\u0026thinsp;coefficient 0.52) and clinical and platforms (8; Jaccard\\u0026thinsp;\\u0026asymp;\\u0026thinsp;coefficient 0.42). Co-occurrences also appeared between research\\u0026thinsp;+\\u0026thinsp;public policy (9; Jaccard\\u0026thinsp;\\u0026asymp;\\u0026thinsp;0.41) and platforms\\u0026thinsp;+\\u0026thinsp;policy (6; Jaccard\\u0026thinsp;\\u0026asymp;\\u0026thinsp;0.40). These patterns suggest that clinical initiatives connect to both knowledge production and integration requirements, whereas policy-oriented programs often couple with data infrastructures.\\u003c/p\\u003e \\u003cp\\u003eAcross contexts, clinical care and therapeutic support showed higher medians for the clinical impact index (both 1.0 vs 0.0 otherwise). For discovery/knowledge, biobanks (median 3.0 vs 0.0) and clinical (1.0 vs 0.0) stood out, while research showed a modest gain (1.0 vs 0.5). For system interoperability, medians were higher in surveillance and newborn screening (both 3.0 vs. 2.0), followed by public policy (2.5 vs. 2.0). For total impact, biobanks (5.0 vs. 3.0) and clinical (4.0 vs. 3.0) showed the most significant differences. These signals warrant caution given the small sample sizes (n) in some contexts (e.g., biobanks, surveillance).\\u003c/p\\u003e \\u003cp\\u003eBy maturity and life-cycle, the share of MDS with planned updates was highest in therapeutic support (83.3%), platform integration (72.7%), and clinical care (68.8%), followed by research (66.7%) and public health (60.0%). Temporally, the most recent median years were seen in newborn screening (2023; 100% in 2019\\u0026ndash;2024), therapeutic support (median 2022; 83.3% in 2019\\u0026ndash;2024), platforms (2021; 63.6%), and clinical care (2020.5; 62.5%), suggesting a recent intensification of applications directly linked to decision-making, integration, and care workflows.\\u003c/p\\u003e \\u003cp\\u003eWithin each context, Europe predominated, with regional variation in contributions. For clinical care, Europe accounted for 62.5%, followed by North America and Asia (18.8% each) and Oceania (12.5%); Africa and South America accounted for 6.2% each. By platform, Europe accounted for 45.5%, North America 27.3%, Oceania 18.2%, with sporadic participation from Asia, Africa, and South America (9.1% each). For public policy, contributions were distributed as follows: Europe, 40%; Oceania, 30%; North America, 20%; and Asia/Africa/South America, each, 10%. In research, Europe again predominated (52.4%), followed by North America (28.6%), Asia (14.3%), and Oceania (9.5%), with isolated contributions from Africa and South America (4.8% each). Overall, these patterns reflect regional asymmetries consistent with institutional capacity.\\u003c/p\\u003e \\u003c/div\\u003e \\u003cdiv id=\\\"Sec6\\\" class=\\\"Section2\\\"\\u003e \\u003ch2\\u003e3.4 Adherence to standards and ontologies\\u003c/h2\\u003e \\u003cp\\u003eDeclared adherence to standards was heterogeneous. We recorded references to the Human Phenotype Ontology (HPO) in 6 studies, Orphanet/Orphacode in 5, the Findable, Accessible, Interoperable, and Reusable (FAIR) principles in 4, and the International Classification of Diseases (ICD) in 2. Each of the Human Genome Variation Society, Health Level Seven Fast Healthcare Interoperability Resources (HL7 FHIR), GA4GH, General Data Protection Regulation (GDPR), and Clinical Data Interchange Standards Consortium (CDISC) appeared in one study. The number of standards cited per study (\\u0026ldquo;richness\\u0026rdquo;) had a median of 1. In practice, 10 studies cited no standard, 6 cited 1, 5 cited 2, and 2 cited 3. Overall, 56.5% of the studies cited at least one standard or ontology, but most referred to only one or two.\\u003c/p\\u003e \\u003cp\\u003eCo-occurrence analysis suggested two profiles. First, a phenotype\\u0026ndash;disease nucleus: HPO\\u0026thinsp;+\\u0026thinsp;Orphanet/Orphacode co-occurred in 4 studies (Jaccard\\u0026thinsp;\\u0026asymp;\\u0026thinsp;0.57), aligning phenotypic encoding with RD nosology. Second, infrastructural pairs were rare but highly overlapping (e.g., HL7/FHIR and GA4GH), co-occurring in the same study (Jaccard\\u0026thinsp;=\\u0026thinsp;1.00), reflecting technically focused initiatives.\\u003c/p\\u003e \\u003cp\\u003eBy application context, platform integration featured HPO and Orphanet/Orphacode (approximately 36.4% each) and FAIR (approximately 27.3%); diagnosis/clinical care showed 31.2% (HPO), 18.8% (Orphanet), 18.8% (FAIR); biomedical/genomic research recorded 19.0% (FAIR), 23.8% (HPO), 19.0% (Orphanet), 9.5% (ICD); and public policy/planning showed relatively higher Orphanet (40.0%), with HPO 30.0%, FAIR 20.0%, and ICD 10.0%. Temporally, FAIR was a recent addition (median 2022; 100% between 2019 and 2024), compared with HPO (median 2019.5) and Orphanet (median 2023). MDS that cited FAIR were more likely to plan updates (75.0% vs. 63.2%), whereas HPO (50.0%) and Orphanet (40.0%) showed lower proportions, consistent with FAIR as an infrastructural maturity vector, although the sample size was small. A tangible example of FAIR's impact is the direct enabling of cross-registry queries, as demonstrated by the Vascular Anomalies Working Group integration, where FAIRification enabled federated queries across multiple databases, significantly enhancing data interoperability and informing policy decisions for healthcare systems.\\u003c/p\\u003e \\u003c/div\\u003e \\u003cdiv id=\\\"Sec7\\\" class=\\\"Section2\\\"\\u003e \\u003ch2\\u003e3.5 Primary outcomes\\u003c/h2\\u003e \\u003cp\\u003eAcross three domains, clinical (diagnostic accuracy, time-to-diagnosis, personalized treatment, clinical prognosis), knowledge production (new gene/biomarker discovery, genomic annotation, molecular classification/stratification), and system-level (policy influence, impact on public programs, standardization/interoperability), we observed at least one \\u0026ldquo;Yes\\u0026rdquo; outcome in 47.8% of studies (clinical), 52.2% (knowledge), and 82.6% (system-level).\\u003c/p\\u003e \\u003cp\\u003eCo-occurrence patterns were coherent. In the biomolecular domain, genomic annotation and molecular classification/stratification co-occurred in eight studies (φ\\u0026thinsp;=\\u0026thinsp;0.76), and new gene/biomarker discovery co-occurred with annotation in seven (φ\\u0026thinsp;=\\u0026thinsp;0.81). In the clinical domain, diagnostic accuracy and personalized treatment co-occurred in four (φ\\u0026thinsp;=\\u0026thinsp;0.50), and time-to-diagnosis reduction co-occurred with a change in prognosis in two (φ\\u0026thinsp;=\\u0026thinsp;1.00; driven by small n). In the system domain, policy influence co-occurred with interoperability in nine (φ\\u0026thinsp;=\\u0026thinsp;0.21), suggesting that initiatives with institutional traction tend to report standardization alongside policy activity. These results align with expected value chains: advances in annotation facilitate molecular reclassification; accuracy gains often accompany therapeutic decisions; and policy reforms track with integration and interoperability efforts.\\u003c/p\\u003e \\u003cp\\u003eOver time (\\u0026le;\\u0026thinsp;2018 vs. 2019\\u0026ndash;2024), we observed absolute increases in time-to-diagnosis and clinical prognosis (both +\\u0026thinsp;15.4 percentage points, from 0.0% to 15.4%), as well as moderate gains in personalized treatment and genomic annotation (+\\u0026thinsp;8.5 percentage points each). Interoperability increased slightly (+\\u0026thinsp;3.8 pp), whereas policy influence (\\u0026ndash;26.2 pp) and impact on public programs (\\u0026ndash;10.8 pp) declined, indicating that recent studies have emphasized technical and clinical outcomes over macro-institutional ones. Aggregated by domain, the share of studies with at least one \\u0026ldquo;Yes\\u0026rdquo; increased in the clinical domain (from 30.0% to 46.2%) but decreased in knowledge (from 60.0% to 46.2%) and system (from 90.0% to 76.9%), suggesting a recent rebalance toward care-proximal and instrumentation outcomes.\\u003c/p\\u003e \\u003c/div\\u003e \\u003cdiv id=\\\"Sec8\\\" class=\\\"Section2\\\"\\u003e \\u003ch2\\u003e3.6 Genomic profile of MDS: design, maturity, and impact\\u003c/h2\\u003e \\u003cp\\u003eReporting of core genomic information was uneven. Only one study explicitly used Whole-Genome Sequencing (WGS), and one used whole-exome sequencing (WES); the remainder used generic labels (e.g., 'next-generation sequencing' or 'genomic data') without specifying WES, WGS, or targeted panels. Trio design, simultaneous sequencing of the proband and both parents, was cited in 2/23 (8.7%). Reference genomes were rarely reported (one using GRCh38/hg38; one using GRCh37/hg19). The biological sample type was explicitly specified in 5/23 (21.7%) cases (blood alone or combined with saliva/tissue).\\u003c/p\\u003e \\u003cp\\u003eTo promote standard uptake, a concise genomic reporting checklist for future studies is proposed. This checklist includes: 1) specification of sequencing modality (e.g., WGS, WES, or targeted panels); 2) reference genome details (e.g., GRCh38/hg38 or GRCh37/hg19); 3) trio design usage when applicable; and 4) explicit description of biological sample types used. A detailed reporting checklist template has also been developed to guide researchers in implementing these recommendations. This template is available via the institutional repository or upon request from the authors. Such resources are intended to enhance consistency and quality in genomic data reporting across studies.\\u003c/p\\u003e \\u003cp\\u003eRegarding knowledge bases, OMIM was most frequently referenced (18/23; 78.3%), followed by Orphanet (12/23; 52.2%), ClinVar (5/23; 21.7%), and gnomAD (3/23; 13.0%). Annotation sources like these play a critical role in ensuring consistency and reliability in variant interpretation. However, discrepancies can arise when different databases provide conflicting information. To address these challenges, selecting knowledge bases requires careful comparison of data from multiple sources. When OMIM, Orphanet, and ClinVar present contradictions, it is advisable to examine the methodological basis of each entry, including factors such as the curation process, update frequency, and the strength of evidence supporting variant classification. Implementing reconciliation strategies, such as cross-referencing with additional resources or involving domain experts in dispute resolution, can further enhance consistency and confidence in variant interpretation.\\u003c/p\\u003e \\u003cp\\u003eExplicit support for Copy Number Variants (CNV)/ Structural Variants (SV) detection appeared in 3/23 (13.0%), whereas planned re-analysis was frequent (20/23; 87.0%). HPO was used in 16/23 (69.6%); phenotype-driven prioritization tools (e.g., Exomiser, Automated Mendelian Literature Evaluation - AMELIE, Phenomizer) appeared in 3/23 (13.0%). Only one study quantified the median number of HPO terms per case (6).\\u003c/p\\u003e \\u003cp\\u003eReporting of genes/variants was heterogeneous. Mode of inheritance (autosomal dominant/recessive, de novo) was explicit in 6/23 (26.1%); functional validation in 3/23 (13.0%); newly implicated genes in 4/23 (17.4%) (median two genes when reported). Variant classification counts were seldom reported; only one study provided totals for pathogenic/likely pathogenic (P/LP) variants (n\\u0026thinsp;=\\u0026thinsp;17). Counts for variants of uncertain significance (VUS) and LikelyBenign/Benign(LB/B) variants were generally absent. Therapeutic impact was indicated in 6/23 (26.1%); trial eligibility and genetic counseling were mentioned across studies as potential outcomes.\\u003c/p\\u003e \\u003c/div\\u003e \\u003cdiv id=\\\"Sec9\\\" class=\\\"Section2\\\"\\u003e \\u003ch2\\u003e3.7 Study bias assessment\\u003c/h2\\u003e \\u003cp\\u003eAll 23 included studies were appraised for methodological quality using MMAT. No study was excluded post-appraisal due to concerns about bias; all met the basic MMAT screening criteria (clear research questions and appropriate data sources). However, the rigor of studies varied considerably. Based on the consensus MMAT judgments, five studies (21.7%) were rated as high quality (meeting\\u0026thinsp;\\u0026ge;\\u0026thinsp;80% of criteria), 12 (52.2%) as moderate quality (40\\u0026ndash;60% of criteria met), and 6 (26.1%) as low quality (\\u0026le;\\u0026thinsp;20% of criteria met). In practical terms, only four studies met all five MMAT criteria (5/5 \\u0026ldquo;Yes\\u0026rdquo; ratings), and one study met four criteria. Over half of the studies satisfied three or fewer domains, indicating a moderate to high risk of bias across much of the literature.\\u003c/p\\u003e \\u003cp\\u003ePatterns emerged across study designs. Mixed-methods studies (2/23) had the highest quality, with both achieving 5/5 \\\"Yes\\\" ratings across MMAT domains, reflecting robust conduct and reporting. Quantitative descriptive studies (7/23) tended to have intermediate quality; none attained a perfect score, and most met only three of five criteria (e.g., issues with sample representativeness or handling of missing data were common). The single quantitative, non-randomized study met 3 of 4 applicable criteria (75%); while it met most domains, it fell short in one domain related to confounding control. Qualitative studies (13/23) exhibited the most significant variability in quality. Only two qualitative studies fully satisfied all MMAT criteria, whereas most others had significant methodological gaps. Approximately one-quarter of the qualitative studies were of moderate quality (typically 3 of 5 criteria met), and about half were low quality, with only 1 or 2 criteria met. This suggests that many qualitative reports lacked sufficient methodological transparency or rigor in key areas.\\u003c/p\\u003e \\u003cp\\u003eTo address these limitations and improve future studies, three practical reporting recommendations are proposed. First, explicitly define the sampling frame and inclusion criteria to enhance transparency and clarify representativeness (Q2). Second, thoroughly describe strategies for handling non-response and managing potential confounders to strengthen the validity of findings (Q4). Third, provide a clear explanation of how themes are derived from raw data and maintain reflexivity to enhance the coherence and integrity of qualitative analyses (Q3\\u0026ndash;Q4). Implementing these steps can advance the field by translating methodological critique into actionable guidance.\\u003c/p\\u003e \\u003cp\\u003eAcross the board, certain MMAT domains showed recurring limitations. In quantitative studies, the most frequent weaknesses were in Q2 (sampling strategy/representativeness) and Q4 (handling of non-response or confounding). Several registry papers did not clearly define their sampling frame or inclusion criteria, resulting in \\u0026ldquo;No\\u0026rdquo; or \\u0026ldquo;Cannot tell\\u0026rdquo; judgments regarding representativeness, and many failed to report how missing data or potential confounders were handled. For instance, over half of the quantitative studies lacked clear strategies to address non-response bias or were unable to describe any adjustment for confounders in observational designs. In qualitative studies, the common shortcomings were in Q3\\u0026ndash;Q4, which assess the linkage between data and interpretations and the coherence of the analysis. Many qualitative reports provided limited evidence of how themes were derived from raw data or lacked reflexivity, resulting in frequent \\u0026ldquo;Cannot tell\\u0026rdquo; ratings in these domains. In contrast, most studies \\u0026ndash; regardless of design \\u0026ndash; scored well on Q5, often indicating that conclusions were reasonably justified by the data provided, even if earlier criteria were unmet. Notably, S1\\u0026ndash;S2 (screening items) were satisfied by virtually all studies (S1 was \\u0026ldquo;Yes\\u0026rdquo; for 100% of cases). However, a couple of studies had unclear data adequacy (S2 rated \\u0026ldquo;Cannot tell\\u0026rdquo;) that had to be resolved by consensus.\\u003c/p\\u003e \\u003cp\\u003eNotably, no studies were excluded solely based on poor quality, and all 23 studies were retained in the review. That said, the presence of multiple \\u0026ldquo;No\\u0026rdquo; or \\u0026ldquo;Cannot tell\\u0026rdquo; judgments in several reports indicates a non-trivial risk of bias in those findings. Studies classified as low quality were treated with caution during synthesis, and their conclusions were weighed accordingly. Overall, the MMAT-based assessment indicates that although a minority of genomic MDS studies are methodologically robust, most exhibit a moderate to high risk of bias, primarily due to incomplete reporting in the sampling and analysis domains. This underscores the need for improved reporting standards in this field to strengthen confidence in study findings. The final study-by-study consensus matrix with item-level justifications is provided in Supplementary File 2.\\u003c/p\\u003e \\u003cp\\u003eGiven the quality profile, the findings from this review warrant cautious implementation. Most genomic MDS proposals for rare diseases function as provisional frameworks to guide registry and pipeline design, rather than as definitive standards. Implementation should prioritize data elements supported by substantial evidence, while treating less substantiated items as hypotheses subject to pilot testing, ongoing evaluation, and iterative refinement. Supplementary Material 3 contains the complete dataset processing workflow, analyses, and figures for this review.\\u003c/p\\u003e \\u003c/div\\u003e\"},{\"header\":\"4. Discussion\",\"content\":\"\\u003cdiv id=\\\"Sec11\\\" class=\\\"Section2\\\"\\u003e \\u003ch2\\u003e4.1 The hybrid and multimodal nature of genomic MDS\\u003c/h2\\u003e \\u003cp\\u003eOverall, the included studies demonstrate a strong predominance of initiatives conducted in Europe and North America, with meaningful contributions from Oceania and more limited involvement from Asia, Africa, and South America. The concentration of initiatives in high-income regions such as the United States and Western Europe suggests that institutional capacity and infrastructure maturity have shaped the geographical distribution of genomic MDS projects. The clinical layer anchors most use cases, the infrastructural layer enables interoperability and integration, and the public health layer connects standardization to population planning and surveillance. Differences observed across contexts in impact indices, update strategies, and adherence to standards carry direct implications for implementation and scalability.\\u003c/p\\u003e \\u003cp\\u003eAcross the included studies, MDS function as hybrid artifacts that bind clinical/phenotypic descriptors to infrastructure for exchange and reuse, and, where present, to genomic analyses that support diagnosis, discovery, and policy. Concrete examples illustrate why this integration is crucial. DM-Scope was designed to bridge research and care and to standardize data capture for myotonic dystrophy across centers (De Antonio et al., \\u003cspan citationid=\\\"CR7\\\" class=\\\"CitationRef\\\"\\u003e2019\\u003c/span\\u003e). Beyond single-country initiatives, long-term international collaborations in neuromuscular disease reinforce the importance of harmonized governance and sustained data quality. For example, the global myotonic dystrophy registry network demonstrated how multi-country coordination can standardize data structures, support longitudinal follow-up, and enable cross-border research alignment (Wood et al., \\u003cspan citationid=\\\"CR42\\\" class=\\\"CitationRef\\\"\\u003e2018\\u003c/span\\u003e).\\u003c/p\\u003e \\u003cp\\u003eThe Autoinflammatory Diseases Alliance (AIDA) registries emphasize modularity, governance for updates, and cross-registry communication, practical features that enable an MDS to evolve with the science (Della Casa et al., \\u003cspan citationid=\\\"CR8\\\" class=\\\"CitationRef\\\"\\u003e2022\\u003c/span\\u003e; Gaggiano et al., \\u003cspan citationid=\\\"CR10\\\" class=\\\"CitationRef\\\"\\u003e2022\\u003c/span\\u003e). The ApreciseKUre platform takes this a step further by embedding analytics within a digital ecosystem, illustrating how multimodal records (genetic, biochemical, histopathological, clinical, and QoL) can power precision medicine use cases in ultra-RD (Visibelli et al., \\u003cspan citationid=\\\"CR40\\\" class=\\\"CitationRef\\\"\\u003e2022\\u003c/span\\u003e). These initiatives collectively reinforce that an MDS is most effective when it is both clinically legible and technically interoperable, with clear pathways for reanalysis and reuse. These examples underscore that multimodality is not optional but essential to describe the natural history and progression of RD (Visibelli et al., \\u003cspan citationid=\\\"CR40\\\" class=\\\"CitationRef\\\"\\u003e2022\\u003c/span\\u003e; Ruseckaite et al., \\u003cspan citationid=\\\"CR31\\\" class=\\\"CitationRef\\\"\\u003e2023b\\u003c/span\\u003e).\\u003c/p\\u003e \\u003c/div\\u003e \\u003cdiv id=\\\"Sec12\\\" class=\\\"Section2\\\"\\u003e \\u003ch2\\u003e4.2. Standards and interoperability: from principles to operations\\u003c/h2\\u003e \\u003cp\\u003eStandardization remains a central challenge for genomic MDS interoperability. Adoption was generally limited, with most studies reporting only one or two standards. Ontologies such as HPO and Orphanet were the most frequently used, forming a \\u0026ldquo;phenotype\\u0026ndash;disease nucleus,\\u0026rdquo; but their application was inconsistent across registries. The results show uneven, sometimes sparse, adoption of standards, an observation mirrored by prior methodological work.\\u003c/p\\u003e \\u003cp\\u003eOur findings also complement and extend another recent systematic review, which mapped MDS for RD across health care networks and organized their elements into 10 categories aligned with World Health Organization digital health guidelines, ultimately proposing a generic RD MDS for clinical and managerial use. While that work provides a broad, system-level view of data requirements for RD, it treats the genomic component as one among many domains. In contrast, our review narrows the focus to the genomic layer of MDS, examining how sequencing modalities, reference genomes, variant classes, ontologies, reanalysis practices, and biospecimen linkages are specified, and how these design choices relate to diagnostic yield, treatment decisions, and knowledge generation. This more granular perspective highlights implementation-sensitive details that are not fully visible in wider MDS taxonomies.\\u003c/p\\u003e \\u003cp\\u003eA French methodology for building an MDS for RD explicitly ties item selection to standard data elements (CDEs) and to reference terminologies (Medical Dictionary for Regulatory Activities - MedDRA, HPO, Anatomical Therapeutic Chemical classification - ATC, ICD-10, Orphanet) while targeting HL7-compatible exchange. It also formalizes expert-led governance for versioning (Choquet et al., \\u003cspan citationid=\\\"CR5\\\" class=\\\"CitationRef\\\"\\u003e2015\\u003c/span\\u003e). The development of the F-MDS-RD also emphasized alignment of local terminologies with international references through expert consultation and SR (Choquet et al., \\u003cspan citationid=\\\"CR5\\\" class=\\\"CitationRef\\\"\\u003e2015\\u003c/span\\u003e; Toubiana et al., \\u003cspan citationid=\\\"CR39\\\" class=\\\"CitationRef\\\"\\u003e2015\\u003c/span\\u003e).\\u003c/p\\u003e \\u003cp\\u003eThe recent adoption of the FAIR principles illustrates a shift toward operational standardization. The \\u0026ldquo;de novo FAIRification\\u0026rdquo; of the Vascular Anomalies Working Group (VASCA) registry demonstrates a concrete pipeline to make data machine-actionable at entry, mapping fields to ontologies (HPO/Orphanet Rare Disease Ontology - ORDO) and CDEs, and exposing them for federated queries (Groenen et al., \\u003cspan citationid=\\\"CR12\\\" class=\\\"CitationRef\\\"\\u003e2021\\u003c/span\\u003e). The VASCA registry incorporated FAIRification directly into its data collection, using standard data elements and ontologies to ensure machine-readable interoperability (Groenen et al., \\u003cspan citationid=\\\"CR12\\\" class=\\\"CitationRef\\\"\\u003e2021\\u003c/span\\u003e). Nevertheless, the interpretation of CDEs remains multifaceted, requiring clear definitions to avoid false assumptions of uniformity.\\u003c/p\\u003e \\u003cp\\u003eBroader frameworks also illustrate this need. The CDISC, required by the U.S. FDA for regulatory submissions, exemplifies how structured data standards can promote harmonization (Mullin et al., \\u003cspan citationid=\\\"CR21\\\" class=\\\"CitationRef\\\"\\u003e2021\\u003c/span\\u003e). CDISC user guides for Duchenne muscular dystrophy and Huntington\\u0026rsquo;s disease demonstrate how to represent outcomes and longitudinal assessments in a manner acceptable to regulators, thereby closing a common gap between research registries and trial-ready data (Mullin et al., \\u003cspan citationid=\\\"CR21\\\" class=\\\"CitationRef\\\"\\u003e2021\\u003c/span\\u003e). At the health system scale, Australian Genomics documents the operational work of integrating genomics into care, including evidence sharing across laboratories and alignment with international standards bodies, as well as evidence that national programs can translate standards into routine practice (Stark et al., \\u003cspan citationid=\\\"CR37\\\" class=\\\"CitationRef\\\"\\u003e2023\\u003c/span\\u003e).\\u003c/p\\u003e \\u003cp\\u003eInternational collaboration is indispensable. For example, collaboration between U.S. networks and the European Network of Rare Bleeding Disorders (EN-RBD) led to the development of a harmonized data tool for rare coagulation disorders (Shapiro et al., \\u003cspan citationid=\\\"CR35\\\" class=\\\"CitationRef\\\"\\u003e2011\\u003c/span\\u003e). These initiatives highlight the persistent challenge of siloed data and the pressing need for harmonization at scale (Raycheva et al., \\u003cspan citationid=\\\"CR25\\\" class=\\\"CitationRef\\\"\\u003e2023\\u003c/span\\u003e).\\u003c/p\\u003e \\u003c/div\\u003e \\u003cdiv id=\\\"Sec13\\\" class=\\\"Section2\\\"\\u003e \\u003ch2\\u003e4.3. From Genomic Data to Clinical Translation: Reporting, Re-analysis, and Biobank Linkages\\u003c/h2\\u003e \\u003cp\\u003eA central gap in the literature is the incomplete reporting of core genomic details, including sequencing modality, reference build, trio design, structural-variant coverage, pipelines, and annotation sources. Case-level studies illustrate why specificity matters: for early infantile epileptic encephalopathy due to biallelic PIGQ variants, authors reported exome sequencing, Sanger validation, and the exact reference transcript, enabling replication of variant interpretation (Johnstone et al., \\u003cspan citationid=\\\"CR16\\\" class=\\\"CitationRef\\\"\\u003e2020\\u003c/span\\u003e). Methodologically, best practice also extends \\u0026ldquo;upstream\\u0026rdquo; to curation and databases. Experience in clinical genetics cautions that heterogeneous databases have utility but also pose limitations unless entries adhere to standard nomenclature and quality controls (Birch \\u0026amp; Friedman, \\u003cspan citationid=\\\"CR4\\\" class=\\\"CitationRef\\\"\\u003e2004\\u003c/span\\u003e). Guidance from community efforts emphasizes adopting consistent variant nomenclature, curated locus-specific resources, and ethics frameworks to ensure the clinical reliability of shared variant data (Kohonen-Corish et al., \\u003cspan citationid=\\\"CR19\\\" class=\\\"CitationRef\\\"\\u003e2010\\u003c/span\\u003e).\\u003c/p\\u003e \\u003cp\\u003eThe high rates of planned re-analysis and widespread use of HPO are also notable. That pattern tracks with emerging infrastructure thinking: FAIRified registries explicitly plan periodic reinterpretation, and HPO-anchored phenotyping is proposed as a bridge between registries, biobanks, and variant curation workflows (Groenen et al., \\u003cspan citationid=\\\"CR12\\\" class=\\\"CitationRef\\\"\\u003e2021\\u003c/span\\u003e; Rubinstein, Posada de la Paz, \\u0026amp; Mora, 2017). When implemented with clinical decision support, such integration can shorten time-to-diagnosis and expand trial eligibility, two benefits that national programs report as part of routine care transformation (Stark et al., \\u003cspan citationid=\\\"CR37\\\" class=\\\"CitationRef\\\"\\u003e2023\\u003c/span\\u003e).\\u003c/p\\u003e \\u003cp\\u003eSeveral sources emphasize that the scientific yield of registries increases when entries are physically linked to high-quality biospecimens. The EuroBioBank/SpainRDR-BioNER case demonstrates how a networked catalog connected to a national RD registry facilitates discovery, while also necessitating governance for privacy, accreditation, and standard operating procedures (Rubinstein et al., \\u003cspan citationid=\\\"CR29\\\" class=\\\"CitationRef\\\"\\u003e2017\\u003c/span\\u003e). By quantifying the discovery acceleration associated with biobank linkages, such as the number of novel genes identified per year, we can provide compelling evidence for the strategic value of integrated bioresources. This data point can serve as a strong motivator for increased investment in the infrastructure that connects registries to biospecimens, ultimately enhancing the scope and efficiency of rare disease research.\\u003c/p\\u003e \\u003cp\\u003eThe field is converging on practical tools, including GUID-based linkage, CDEs that embed the GUID elements, and HPO-anchored phenotyping, to make specimen-linked data computable (Rubinstein et al., \\u003cspan citationid=\\\"CR29\\\" class=\\\"CitationRef\\\"\\u003e2017\\u003c/span\\u003e; Glassberg et al., \\u003cspan citationid=\\\"CR11\\\" class=\\\"CitationRef\\\"\\u003e2020\\u003c/span\\u003e). Biobanks dedicated to neuromuscular disease illustrate the downstream payoff: ready-to-sequence DNA, standardized consent, and logistics for WES/WGS, as well as recontact, enable faster gene discovery and validation (Reza et al., \\u003cspan citationid=\\\"CR26\\\" class=\\\"CitationRef\\\"\\u003e2017\\u003c/span\\u003e). Recent national efforts in Asia (e.g., K-MoSCA for rare neurological diseases) integrate registries with bioresources from the outset, underscoring the global relevance of integrated designs (Kim et al., \\u003cspan citationid=\\\"CR18\\\" class=\\\"CitationRef\\\"\\u003e2024\\u003c/span\\u003e).\\u003c/p\\u003e \\u003c/div\\u003e \\u003cdiv id=\\\"Sec14\\\" class=\\\"Section2\\\"\\u003e \\u003ch2\\u003e4.5 From Patient Voice to System Scale: Implementation Pathways\\u003c/h2\\u003e \\u003cp\\u003eThe impact of MDS is distributed along a gradient. System-level outcomes, such as their influence on policy, public programs, and data standardization, were more frequently reported (82.6% of studies reported at least one such outcome). More recent studies (2019\\u0026ndash;2024) show a shift toward clinical outcomes, including diagnostic time and prognosis, suggesting a maturation of the field: from early infrastructure and policy building toward direct improvements in patient care and technical instrumentation (Ruseckaite et al., \\u003cspan citationid=\\\"CR31\\\" class=\\\"CitationRef\\\"\\u003e2023b\\u003c/span\\u003e). In practice, registries contribute to better treatment outcomes, process improvements, and quality of care, while also supporting research into clinical course and natural history. However, a significant gap exists in the incorporation of patient-reported outcome measures (PROMs). Our analysis found that only a limited number of MDS included PROMs alongside diagnostic metrics. This deficiency underscores the necessity for future studies to elevate the patient voice by framing PROMs as a core success indicator. Incorporating PROMs can shift the focus toward a more holistic value by capturing the patient's perspective, thus enhancing the relevance and impact of MDS on patient-centered care. Practical strategies to facilitate the integration of PROMs into MDS include the use of standardized instruments, such as the EQ-5D or PROMIS, which are designed to capture patient-reported outcomes efficiently. Collaborating with patient advocacy groups can further ensure that the chosen PROMs reflect patients' needs and concerns, fostering patient-centered approaches in future studies.\\u003c/p\\u003e \\u003cp\\u003ePatient-reported outcome measures (PROMs) remain underused, with only 40% of RD registries collecting them, despite their importance for embedding the patient voice into best practice. Strategy work from the same group stresses that a national registry framework should explicitly require PROMs and equity safeguards (Ruseckaite et al., \\u003cspan citationid=\\\"CR31\\\" class=\\\"CitationRef\\\"\\u003e2023b\\u003c/span\\u003e). At the European level, reviews of databases highlight governance, legal, and FAIR compliance, as well as the risk that siloed efforts underrepresent patient groups and impede ML-ready integration (Raycheva et al., \\u003cspan citationid=\\\"CR25\\\" class=\\\"CitationRef\\\"\\u003e2023\\u003c/span\\u003e).\\u003c/p\\u003e \\u003cp\\u003eThree recurring features characterize sustainable MDS implementations across geographies: (1) transparent data structures (publishing case report forms - CRFs/CDEs and mappings) (Glassberg et al., \\u003cspan citationid=\\\"CR11\\\" class=\\\"CitationRef\\\"\\u003e2020\\u003c/span\\u003e), (2) alignment with recognized standards (HL7/FHIR/CDISC for transport; HPO/ORDO/ICD/ATC for semantics), and (3) networked governance that supports federated discovery, re-analysis, and specimen linkage. Evidence is drawn from national programs (Stark et al., \\u003cspan citationid=\\\"CR37\\\" class=\\\"CitationRef\\\"\\u003e2023\\u003c/span\\u003e), FAIRification case studies (Groenen et al., \\u003cspan citationid=\\\"CR12\\\" class=\\\"CitationRef\\\"\\u003e2021\\u003c/span\\u003e), and classification work demonstrating that registries cluster by purpose and need, with tailored interoperability contracts (Santoro et al., \\u003cspan citationid=\\\"CR32\\\" class=\\\"CitationRef\\\"\\u003e2015\\u003c/span\\u003e).\\u003c/p\\u003e \\u003cp\\u003eNonetheless, several obstacles remain for the implementation and scalability of genomic MDS. Public-health-oriented registries have long argued that harmonized tools are prerequisites to close therapeutic and knowledge gaps in ultra-rare conditions (Shapiro et al., \\u003cspan citationid=\\\"CR35\\\" class=\\\"CitationRef\\\"\\u003e2011\\u003c/span\\u003e). More straightforward reporting guidelines for genomic methods are needed to ensure comparability and maximize interoperability across registries and electronic health records (Choquet et al., \\u003cspan citationid=\\\"CR5\\\" class=\\\"CitationRef\\\"\\u003e2015\\u003c/span\\u003e).\\u003c/p\\u003e \\u003c/div\\u003e \\u003cdiv id=\\\"Sec15\\\" class=\\\"Section2\\\"\\u003e \\u003ch2\\u003e4.6 Limitations and next steps\\u003c/h2\\u003e \\u003cp\\u003eTwo constraints temper inferences. First, the literature frequently under-reports genomic specifics and outcome denominators; as classic database audits caution, this impedes replication and external validity (Birch \\u0026amp; Friedman, \\u003cspan citationid=\\\"CR4\\\" class=\\\"CitationRef\\\"\\u003e2004\\u003c/span\\u003e). Second, registries vary widely in purpose and maturity; clustering work implies that synthesis should be stratified by registry archetype rather than pooled indiscriminately (Santoro et al., \\u003cspan citationid=\\\"CR32\\\" class=\\\"CitationRef\\\"\\u003e2015\\u003c/span\\u003e).\\u003c/p\\u003e \\u003cp\\u003eChallenges also include ensuring patient representativeness, preventing registries from disproportionately reflecting more affluent or educated populations (Rubinstein et al., \\u003cspan citationid=\\\"CR28\\\" class=\\\"CitationRef\\\"\\u003e2010\\u003c/span\\u003e). Ethical and legal issues such as privacy, informed consent, and cross-border data transfer remain central (Reza et al., \\u003cspan citationid=\\\"CR26\\\" class=\\\"CitationRef\\\"\\u003e2017\\u003c/span\\u003e; Raycheva et al., \\u003cspan citationid=\\\"CR25\\\" class=\\\"CitationRef\\\"\\u003e2023\\u003c/span\\u003e). To address these challenges, we recommend concrete steps, including the use of model consent forms tailored to enhance understanding and engagement among diverse populations. Implementing data governance frameworks that prioritize privacy and equity can help ensure that the global registry reflects a diverse range of populations. Regular feedback of aggregated results to participants can maintain trust and engagement, ensuring transparency and participatory governance. Best practices include routine updates to consent materials and regular review of data policies to align with evolving ethical guidelines (Rubinstein et al., \\u003cspan citationid=\\\"CR28\\\" class=\\\"CitationRef\\\"\\u003e2010\\u003c/span\\u003e).\\u003c/p\\u003e \\u003cp\\u003eA practical roadmap for the field would combine (a) mandatory minimal genomic reporting, (b) publication of CDE dictionaries and CRFs, (c) FAIR-at-entry pipelines and ontology bindings, (d) routine re-analysis policies, (e) GUID-based biobank linkage, and (f) built-in PROMs and equity metrics, governed by consortia with regulator-grade data structures (Mullin et al., \\u003cspan citationid=\\\"CR21\\\" class=\\\"CitationRef\\\"\\u003e2021\\u003c/span\\u003e; Groenen et al., \\u003cspan citationid=\\\"CR12\\\" class=\\\"CitationRef\\\"\\u003e2021\\u003c/span\\u003e; Stark et al., \\u003cspan citationid=\\\"CR37\\\" class=\\\"CitationRef\\\"\\u003e2023\\u003c/span\\u003e).\\u003c/p\\u003e \\u003c/div\\u003e\"},{\"header\":\"5. Conclusion\",\"content\":\"\\u003cp\\u003eThis systematic review demonstrates that MDS for rare diseases is heterogeneous in scope, composition, and reporting practices. Although clinical and phenotypic domains are consistently included, the genomic layer is often incompletely reported, with limited specification of sequencing modalities, reference genomes, and variant classes. Adherence to standards and ontologies is also inconsistent, with most studies referencing only one or two frameworks, thereby constraining cross-study interoperability.\\u003c/p\\u003e \\u003cp\\u003eDespite these gaps, MDS have shown tangible contributions to diagnostic accuracy, personalized treatment, and knowledge generation, as well as system-level impacts on policy and public health planning. Exploratory analyses further suggest that practices such as planned reanalysis, phenotype-genotype integration through ontologies, and explicit capture of structural variants may be more influential for clinical benefit than the sequencing modality alone.\\u003c/p\\u003e \\u003cp\\u003eResearchers, clinicians, and policymakers are encouraged to collaborate to establish and refine standardized genomic MDS. Participation in consensus exercises, such as Delphi rounds, will facilitate the integration of genomic data into healthcare. Such collective efforts aim to translate these findings into actionable frameworks to enhance the care and management of rare diseases.\\u003c/p\\u003e\"},{\"header\":\"Declarations\",\"content\":\"\\u003cp\\u003e\\u003cstrong\\u003eCompeting interests\\u003c/strong\\u003e. The authors declare no competing interests.\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cstrong\\u003eEthics approval\\u003c/strong\\u003e. Not applicable (review paper; no human/animal subjects).\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cstrong\\u003eUse of AI.\\u003c/strong\\u003e Generative AI tools (e.g., ChatGPT) were used for language refinement; all ideas, analyses, and conclusions are the authors\\u0026rsquo; own.\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cstrong\\u003eClinical trial number\\u003c/strong\\u003e: not applicable (systematic review; no clinical trial was conducted).\\u003c/p\\u003e\\u003ch2\\u003eFunding.\\u003c/h2\\u003e \\u003cp\\u003eSupported by the Brazilian National Council for Scientific and Technological Development (CNPq), grant 403244/2024-2 (2025\\u0026ndash;2027).\\u003c/p\\u003e\\u003ch2\\u003eAuthor Contribution\\u003c/h2\\u003e\\u003cp\\u003eFAB conceived the study, designed the review methodology, supervised the screening and data extraction process, conducted the quantitative analyses, and wrote the main manuscript text.NCR contributed to study selection, data extraction, literature organization, and drafting of methods and results.CFL contributed to refining the search strategy, interpreting the results, and editing the manuscript.MEG performed screening, data extraction, and contributed to the synthesis of results.AH assisted with data extraction, coding validation, and critical manuscript revision.BA contributed to screening, extraction, and preparation of descriptive summaries.TTK contributed to study selection, MMAT quality assessment, and manuscript review.BMO contributed to conceptualization, methodological validation, and critical revision of the manuscript.DA contributed to the analytical strategy, data interpretation, and revision of computational aspects.IS contributed to clinical and genomic interpretation, discussion framing, and manuscript review.SG supported data organization, thematic synthesis, and manuscript editing.TMF supervised the project, contributed to study conceptualization, reviewed and edited the full manuscript, and ensured methodological and scientific integrity.All authors reviewed and approved the final manuscript.\\u003c/p\\u003e\\u003ch2\\u003eData Availability\\u003c/h2\\u003e\\u003cp\\u003eThis study is based solely on data extracted from previously published articles included in the systematic review. No new primary datasets were generated. All coded datasets and analytical workflows used in this review are provided in Supplementary Material 3, which contains the full Jupyter notebook with the data processing steps, statistical analyses, and generation of tables and figures.\\u003c/p\\u003e\"},{\"header\":\"References\",\"content\":\"\\u003col\\u003e\\u003cli\\u003e\\u003cspan\\u003eAustin CP, et al. Future of Rare Diseases Research 2017\\u0026ndash;2027: An IRDiRC Perspective. Clin Transl Sci. 2018;11(1):21\\u0026ndash;7. \\u003cspan class=\\\"ExternalRef\\\"\\u003e\\u003cspan class=\\\"RefSource\\\"\\u003ehttps://doi.org/10.1111/cts.12500\\u003c/span\\u003e\\u003cspan address=\\\"10.1111/cts.12500\\\" targettype=\\\"DOI\\\" class=\\\"RefTarget\\\"\\u003e\\u003c/span\\u003e\\u003c/span\\u003e.\\u003c/span\\u003e\\u003c/li\\u003e \\u003cli\\u003e\\u003cspan\\u003eBernardi FA, Mello de Oliveira B, Bettiol Yamada D, Artifon M, Schmidt AM, Scheibe M, Felix V, T. M. The minimum data set for rare diseases: Systematic review. J Med Internet Res. 2023;25:e44641. \\u003cspan class=\\\"ExternalRef\\\"\\u003e\\u003cspan class=\\\"RefSource\\\"\\u003ehttps://doi.org/10.2196/44641\\u003c/span\\u003e\\u003cspan address=\\\"10.2196/44641\\\" targettype=\\\"DOI\\\" class=\\\"RefTarget\\\"\\u003e\\u003c/span\\u003e\\u003c/span\\u003e.\\u003c/span\\u003e\\u003c/li\\u003e \\u003cli\\u003e\\u003cspan\\u003eBernardi F, de Oliveira B, de Moraes JC, Baiochi J, Lima V, Ferraz V, Schwartz I. Diseases in Brazil: A Delphi Protocol Approach. Procedia Comput Sci. 2025;256:1294\\u0026ndash;301. \\u003cspan class=\\\"ExternalRef\\\"\\u003e\\u003cspan class=\\\"RefSource\\\"\\u003ehttps://doi.org/10.1016/j.procs.2025.02.241\\u003c/span\\u003e\\u003cspan address=\\\"10.1016/j.procs.2025.02.241\\\" targettype=\\\"DOI\\\" class=\\\"RefTarget\\\"\\u003e\\u003c/span\\u003e\\u003c/span\\u003e. Developing a Genomic Minimum Data Set for Rare.\\u003c/span\\u003e\\u003c/li\\u003e \\u003cli\\u003e\\u003cspan\\u003eBirch P, Friedman JM. Utility and limitations of genetic disease databases in clinical genetics research: A neurofibromatosis type 1 database experience. Am J Med Genet Part A. 2004;128A(1):58\\u0026ndash;64. \\u003cspan class=\\\"ExternalRef\\\"\\u003e\\u003cspan class=\\\"RefSource\\\"\\u003ehttps://doi.org/10.1002/ajmg.c.30007\\u003c/span\\u003e\\u003cspan address=\\\"10.1002/ajmg.c.30007\\\" targettype=\\\"DOI\\\" class=\\\"RefTarget\\\"\\u003e\\u003c/span\\u003e\\u003c/span\\u003e.\\u003c/span\\u003e\\u003c/li\\u003e \\u003cli\\u003e\\u003cspan\\u003eChoquet R, Maaroufi M, Vandenbussche P, Landais P. A methodology for a minimum data set for rare diseases to support national centers of excellence for healthcare and research. J Am Med Inform Assoc. 2015;22(1):76\\u0026ndash;85. \\u003cspan class=\\\"ExternalRef\\\"\\u003e\\u003cspan class=\\\"RefSource\\\"\\u003ehttps://doi.org/10.1136/amiajnl-2014-002794\\u003c/span\\u003e\\u003cspan address=\\\"10.1136/amiajnl-2014-002794\\\" targettype=\\\"DOI\\\" class=\\\"RefTarget\\\"\\u003e\\u003c/span\\u003e\\u003c/span\\u003e.\\u003c/span\\u003e\\u003c/li\\u003e \\u003cli\\u003e\\u003cspan\\u003eCoelho AVC, et al. The Brazilian Rare Genomes Project: Validation of whole genome sequencing for rare diseases diagnosis. Front Mol Biosci. 2022;9:821582. \\u003cspan class=\\\"ExternalRef\\\"\\u003e\\u003cspan class=\\\"RefSource\\\"\\u003ehttps://doi.org/10.3389/fmolb.2022.821582\\u003c/span\\u003e\\u003cspan address=\\\"10.3389/fmolb.2022.821582\\\" targettype=\\\"DOI\\\" class=\\\"RefTarget\\\"\\u003e\\u003c/span\\u003e\\u003c/span\\u003e.\\u003c/span\\u003e\\u003c/li\\u003e \\u003cli\\u003e\\u003cspan\\u003eDe Antonio M, Dogan C, Hamroun D, et al. The DM-Scope registry: A rare disease innovative framework bridging the gap between research and medical care. Orphanet J Rare Dis. 2019;14:339. \\u003cspan class=\\\"ExternalRef\\\"\\u003e\\u003cspan class=\\\"RefSource\\\"\\u003ehttps://doi.org/10.1186/s13023-019-1088-3\\u003c/span\\u003e\\u003cspan address=\\\"10.1186/s13023-019-1088-3\\\" targettype=\\\"DOI\\\" class=\\\"RefTarget\\\"\\u003e\\u003c/span\\u003e\\u003c/span\\u003e.\\u003c/span\\u003e\\u003c/li\\u003e \\u003cli\\u003e\\u003cspan\\u003eDella Casa F, Vitale A, Pereira RM, Guerriero S, Ragab G, Lopalco G, Cantarini L. Development and implementation of the AIDA international registry for patients with undifferentiated systemic autoinflammatory diseases. Front Med. 2022;9:908501. \\u003cspan class=\\\"ExternalRef\\\"\\u003e\\u003cspan class=\\\"RefSource\\\"\\u003e.https://doi.org/10.3389/fmed.2022.908501\\u003c/span\\u003e\\u003cspan address=\\\".10.3389/fmed.2022.908501\\\" targettype=\\\"DOI\\\" class=\\\"RefTarget\\\"\\u003e\\u003c/span\\u003e\\u003c/span\\u003e.\\u003c/span\\u003e\\u003c/li\\u003e \\u003cli\\u003e\\u003cspan\\u003eFu MP, Merrill SM, Sharma M, Gibson WT, Turvey SE, Kobor MS. Rare diseases of epigenetic origin: Challenges and opportunities. Front Genet. 2023;14:1113086. \\u003cspan class=\\\"ExternalRef\\\"\\u003e\\u003cspan class=\\\"RefSource\\\"\\u003ehttps://doi.org/10.3389/fgene.2023.1113086\\u003c/span\\u003e\\u003cspan address=\\\"10.3389/fgene.2023.1113086\\\" targettype=\\\"DOI\\\" class=\\\"RefTarget\\\"\\u003e\\u003c/span\\u003e\\u003c/span\\u003e.\\u003c/span\\u003e\\u003c/li\\u003e \\u003cli\\u003e\\u003cspan\\u003eGaggiano C, Vitale A, Tufan A, Ragab G, Aragona E, Wiesik-Szewczyk E, Cantarini L. The Autoinflammatory Diseases Alliance Registry of monogenic autoinflammatory diseases. Front Med. 2022;9:980679. \\u003cspan class=\\\"ExternalRef\\\"\\u003e\\u003cspan class=\\\"RefSource\\\"\\u003ehttps://doi.org/10.3389/fmed.2022.980679\\u003c/span\\u003e\\u003cspan address=\\\"10.3389/fmed.2022.980679\\\" targettype=\\\"DOI\\\" class=\\\"RefTarget\\\"\\u003e\\u003c/span\\u003e\\u003c/span\\u003e.\\u003c/span\\u003e\\u003c/li\\u003e \\u003cli\\u003e\\u003cspan\\u003eGlassberg JA, Linton EA, Burson K, Hendershot T, Telfair J, Kanter J, Sickle Cell Disease Implementation Consortium. Publication of data collection forms from NHLBI funded sickle cell disease implementation consortium (SCDIC) registry. Orphanet J Rare Dis. 2020;15(1):178. \\u003cspan class=\\\"ExternalRef\\\"\\u003e\\u003cspan class=\\\"RefSource\\\"\\u003e.https://doi.org/10.1186/s13023-020-01457-x\\u003c/span\\u003e\\u003cspan address=\\\".10.1186/s13023-020-01457-x\\\" targettype=\\\"DOI\\\" class=\\\"RefTarget\\\"\\u003e\\u003c/span\\u003e\\u003c/span\\u003e.\\u003c/span\\u003e\\u003c/li\\u003e \\u003cli\\u003e\\u003cspan\\u003eGroenen KHJ, Jacobsen A, Kersloot MG, dos Santos Vieira B, van Enckevort E, Kaliyaperumal R, Arts DL, \\u0026rsquo;t Hoen PAC, Cornet R, Roos M, Kool S, L. The de novo FAIRification process of a registry for vascular anomalies. Orphanet J Rare Dis. 2021;16(1). \\u003cspan class=\\\"ExternalRef\\\"\\u003e\\u003cspan class=\\\"RefSource\\\"\\u003e376.https://doi.org/10.1186/s13023-021-02004-y\\u003c/span\\u003e\\u003cspan address=\\\"376.10.1186/s13023-021-02004-y\\\" targettype=\\\"DOI\\\" class=\\\"RefTarget\\\"\\u003e\\u003c/span\\u003e\\u003c/span\\u003e.\\u003c/span\\u003e\\u003c/li\\u003e \\u003cli\\u003e\\u003cspan\\u003eHiggins JPT, Thomas J, Chandler J, Cumpston M, Li T, Page MJ, Welch VA, editors. Cochrane handbook for systematic reviews of interventions. 1st ed. Wiley.; 2019. \\u003cspan class=\\\"ExternalRef\\\"\\u003e\\u003cspan class=\\\"RefSource\\\"\\u003ehttps://doi.org/10.1002/9781119536604\\u003c/span\\u003e\\u003cspan address=\\\"10.1002/9781119536604\\\" targettype=\\\"DOI\\\" class=\\\"RefTarget\\\"\\u003e\\u003c/span\\u003e\\u003c/span\\u003e.\\u003c/span\\u003e\\u003c/li\\u003e \\u003cli\\u003e\\u003cspan\\u003eHong QN, Pluye P, F\\u0026agrave;bregues S, Bartlett G, Boardman F, Cargo M, Vedel I. (2018). \\u003cem\\u003eMixed Methods Appraisal Tool (MMAT), version 2018: User guide\\u003c/em\\u003e. Canadian Intellectual Property Office, Industry Canada. Retrieved from:\\u003cspan class=\\\"ExternalRef\\\"\\u003e\\u003cspan class=\\\"RefSource\\\"\\u003ehttp://mixedmethodsappraisaltoolpublic.pbworks.com/\\u003c/span\\u003e\\u003cspan address=\\\"http://mixedmethodsappraisaltoolpublic.pbworks.com/\\\" targettype=\\\"URL\\\" class=\\\"RefTarget\\\"\\u003e\\u003c/span\\u003e\\u003c/span\\u003e\\u003c/span\\u003e\\u003c/li\\u003e \\u003cli\\u003e\\u003cspan\\u003eThe International Rare Diseases Research Consortium. Policies and guidelines to maximize impact. Eur J Hum Genet. 2017;25(12):1293\\u0026ndash;302. \\u003cspan class=\\\"ExternalRef\\\"\\u003e\\u003cspan class=\\\"RefSource\\\"\\u003ehttps://doi.org/10.1038/s41431-017-0008-z\\u003c/span\\u003e\\u003cspan address=\\\"10.1038/s41431-017-0008-z\\\" targettype=\\\"DOI\\\" class=\\\"RefTarget\\\"\\u003e\\u003c/span\\u003e\\u003c/span\\u003e.\\u003c/span\\u003e\\u003c/li\\u003e \\u003cli\\u003e\\u003cspan\\u003eJohnstone DL, Al-Sayed MD, Chakrabarti K, et al. Early infantile epileptic encephalopathy due to biallelic pathogenic variants in PIGQ: Report of seven new cases. Epilepsia. 2020;61(6):e77\\u0026ndash;83. \\u003cspan class=\\\"ExternalRef\\\"\\u003e\\u003cspan class=\\\"RefSource\\\"\\u003ehttps://doi.org/10.1002/jimd.12278\\u003c/span\\u003e\\u003cspan address=\\\"10.1002/jimd.12278\\\" targettype=\\\"DOI\\\" class=\\\"RefTarget\\\"\\u003e\\u003c/span\\u003e\\u003c/span\\u003e.\\u003c/span\\u003e\\u003c/li\\u003e \\u003cli\\u003e\\u003cspan\\u003eKent A, Parker AP, Patel A, Wynn SL, Steward CA. Genomics in rare diseases: An overview for the patient, family, and non-specialist healthcare professional. Future Rare Dis. 2023;3(4):FRD56. \\u003cspan class=\\\"ExternalRef\\\"\\u003e\\u003cspan class=\\\"RefSource\\\"\\u003ehttps://doi.org/10.2217/frd-2023-0019\\u003c/span\\u003e\\u003cspan address=\\\"10.2217/frd-2023-0019\\\" targettype=\\\"DOI\\\" class=\\\"RefTarget\\\"\\u003e\\u003c/span\\u003e\\u003c/span\\u003e.\\u003c/span\\u003e\\u003c/li\\u003e \\u003cli\\u003e\\u003cspan\\u003eKim D, Kim S, Seok JM, Shin KJ, Oh E, Jeon MY, Park J, Chang HJ, Youn J, Oh J, Sohn E, Park J, Cho JW, Kim BJ. Establishment of a registry of clinical data and bioresources for rare nervous system diseases. Osong Public Health Res Perspect. 2024;15(2):174\\u0026ndash;81. \\u003cspan class=\\\"ExternalRef\\\"\\u003e\\u003cspan class=\\\"RefSource\\\"\\u003ehttps://doi.org/10.24171/j.phrp.2023.0353\\u003c/span\\u003e\\u003cspan address=\\\"10.24171/j.phrp.2023.0353\\\" targettype=\\\"DOI\\\" class=\\\"RefTarget\\\"\\u003e\\u003c/span\\u003e\\u003c/span\\u003e.\\u003c/span\\u003e\\u003c/li\\u003e \\u003cli\\u003e\\u003cspan\\u003eKohonen-Corish MRJ, Al-Aama JY, Auerbach AD, Axton M, Barash CI, Bernstein I, B\\u0026eacute;roud C, Burn J, Cunningham F, Cutting GR, den Dunnen JT, Greenblatt MS, Kaput J, Katz M, Lindblom A, Macrae F, Maglott D, M\\u0026ouml;slein G, Povey S, Cotton RGH. (2010). How to catch all those mutations\\u0026mdash;the report of the Third Human Variome Project Meeting, UNESCO, Paris, May 2010. \\u003cem\\u003eHuman Mutation, 31\\u003c/em\\u003e(12), 1374\\u0026ndash;1381.\\u003cspan class=\\\"ExternalRef\\\"\\u003e\\u003cspan class=\\\"RefSource\\\"\\u003ehttps://doi.org/10.1002/humu.21379\\u003c/span\\u003e\\u003cspan address=\\\"10.1002/humu.21379\\\" targettype=\\\"DOI\\\" class=\\\"RefTarget\\\"\\u003e\\u003c/span\\u003e\\u003c/span\\u003e\\u003c/span\\u003e\\u003c/li\\u003e \\u003cli\\u003e\\u003cspan\\u003eLi\\u0026eacute;vin V, et al. FindZebra online search delving into rare disease case reports using natural language processing. PLOS Digit Health. 2023;2(6):e0000269. \\u003cspan class=\\\"ExternalRef\\\"\\u003e\\u003cspan class=\\\"RefSource\\\"\\u003ehttps://doi.org/10.1371/journal.pdig.0000269\\u003c/span\\u003e\\u003cspan address=\\\"10.1371/journal.pdig.0000269\\\" targettype=\\\"DOI\\\" class=\\\"RefTarget\\\"\\u003e\\u003c/span\\u003e\\u003c/span\\u003e.\\u003c/span\\u003e\\u003c/li\\u003e \\u003cli\\u003e\\u003cspan\\u003eMullin AP, Corey D, Turner EC, Liwski R, Olson D, Burton J, Larkindale J. Standardized data structures in rare diseases: CDISC user guides for duchenne muscular dystrophy and Huntington\\u0026rsquo;s disease. Clin Transl Sci. 2021;14(1):214\\u0026ndash;21. \\u003cspan class=\\\"ExternalRef\\\"\\u003e\\u003cspan class=\\\"RefSource\\\"\\u003ehttps://doi.org/10.1111/cts.12845\\u003c/span\\u003e\\u003cspan address=\\\"10.1111/cts.12845\\\" targettype=\\\"DOI\\\" class=\\\"RefTarget\\\"\\u003e\\u003c/span\\u003e\\u003c/span\\u003e.\\u003c/span\\u003e\\u003c/li\\u003e \\u003cli\\u003e\\u003cspan\\u003eOuzzani M, Hammady H, Fedorowicz Z, Elmagarmid A. Rayyan: A web and mobile app for systematic reviews. Syst Reviews. 2016;5(1):210. \\u003cspan class=\\\"ExternalRef\\\"\\u003e\\u003cspan class=\\\"RefSource\\\"\\u003ehttps://doi.org/10.1186/s13643-016-0384-4\\u003c/span\\u003e\\u003cspan address=\\\"10.1186/s13643-016-0384-4\\\" targettype=\\\"DOI\\\" class=\\\"RefTarget\\\"\\u003e\\u003c/span\\u003e\\u003c/span\\u003e.\\u003c/span\\u003e\\u003c/li\\u003e \\u003cli\\u003e\\u003cspan\\u003ePi\\u0026ntilde;ero J, et al. The DisGeNET knowledge platform for disease genomics: 2019 update. Nucleic Acids Res. 2019. \\u003cspan class=\\\"ExternalRef\\\"\\u003e\\u003cspan class=\\\"RefSource\\\"\\u003ehttps://doi.org/10.1093/nar/gkz1021\\u003c/span\\u003e\\u003cspan address=\\\"10.1093/nar/gkz1021\\\" targettype=\\\"DOI\\\" class=\\\"RefTarget\\\"\\u003e\\u003c/span\\u003e\\u003c/span\\u003e.\\u003c/span\\u003e\\u003c/li\\u003e \\u003cli\\u003e\\u003cspan\\u003ePintos-Morell G, et al. Analysis of genomics implementation in newborn screening for inherited metabolic disorders: An IRDiRC initiative. Rare Disease Orphan Drugs J. 2024;3(2):19. \\u003cspan class=\\\"ExternalRef\\\"\\u003e\\u003cspan class=\\\"RefSource\\\"\\u003ehttps://doi.org/10.20517/rdodj.2023.52\\u003c/span\\u003e\\u003cspan address=\\\"10.20517/rdodj.2023.52\\\" targettype=\\\"DOI\\\" class=\\\"RefTarget\\\"\\u003e\\u003c/span\\u003e\\u003c/span\\u003e.\\u003c/span\\u003e\\u003c/li\\u003e \\u003cli\\u003e\\u003cspan\\u003eRaycheva R, Al-Naemi F, Denecke K, et al. Challenges in mapping European rare disease databases, relevant for ML-based screening technologies. Front Public Health. 2023;11:1154426. \\u003cspan class=\\\"ExternalRef\\\"\\u003e\\u003cspan class=\\\"RefSource\\\"\\u003ehttps://doi.org/10.3389/fpubh.2023.1214766\\u003c/span\\u003e\\u003cspan address=\\\"10.3389/fpubh.2023.1214766\\\" targettype=\\\"DOI\\\" class=\\\"RefTarget\\\"\\u003e\\u003c/span\\u003e\\u003c/span\\u003e.\\u003c/span\\u003e\\u003c/li\\u003e \\u003cli\\u003e\\u003cspan\\u003eReza M, Hildyard JCW, Kirschner J, et al. Supporting and facilitating rare and neuromuscular disease research worldwide. Open J Bioresources. 2017;4(1). \\u003cspan class=\\\"ExternalRef\\\"\\u003e\\u003cspan class=\\\"RefSource\\\"\\u003e3.https://doi.org/10.1016/j.nmd.2017.07.001\\u003c/span\\u003e\\u003cspan address=\\\"3.10.1016/j.nmd.2017.07.001\\\" targettype=\\\"DOI\\\" class=\\\"RefTarget\\\"\\u003e\\u003c/span\\u003e\\u003c/span\\u003e. MRC Centre Neuromuscular Biobank (Newcastle and London).\\u003c/span\\u003e\\u003c/li\\u003e \\u003cli\\u003e\\u003cspan\\u003eRocha CS, Secolin R, Rodrigues MR, Carvalho BS, Lopes-Cendes I. The Brazilian Initiative on Precision Medicine (BIPMed): Fostering genomic data-sharing of underrepresented populations. NPJ Genomic Med. 2020;5(1):42. \\u003cspan class=\\\"ExternalRef\\\"\\u003e\\u003cspan class=\\\"RefSource\\\"\\u003ehttps://doi.org/10.1038/s41525-020-00149-6\\u003c/span\\u003e\\u003cspan address=\\\"10.1038/s41525-020-00149-6\\\" targettype=\\\"DOI\\\" class=\\\"RefTarget\\\"\\u003e\\u003c/span\\u003e\\u003c/span\\u003e.\\u003c/span\\u003e\\u003c/li\\u003e \\u003cli\\u003e\\u003cspan\\u003eRubinstein YR, Groft SC, Bartek R, Brown K, Peay H, Ramsey K. Creating a global rare disease patient registry linked to a rare diseases biorepository database: Rare Disease-HUB (RD-HUB). Contemp Clin Trials. 2010;31(5):394\\u0026ndash;404. \\u003cspan class=\\\"ExternalRef\\\"\\u003e\\u003cspan class=\\\"RefSource\\\"\\u003ehttps://doi.org/10.1016/j.cct.2010.06.007\\u003c/span\\u003e\\u003cspan address=\\\"10.1016/j.cct.2010.06.007\\\" targettype=\\\"DOI\\\" class=\\\"RefTarget\\\"\\u003e\\u003c/span\\u003e\\u003c/span\\u003e.\\u003c/span\\u003e\\u003c/li\\u003e \\u003cli\\u003e\\u003cspan\\u003eRubinstein YR, de la Posada M, Mora M. (2017). Rare disease biospecimens and patient registries: Interoperability for research promotion, a European example: EuroBioBank and SpainRDR-BioNER. In M. Posada de la Paz, S. Taruscio, \\u0026amp; S. C. Groft, editors, \\u003cem\\u003eRare diseases epidemiology: Update and overview\\u003c/em\\u003e (pp. 141\\u0026ndash;147). \\u003cspan class=\\\"ExternalRef\\\"\\u003e\\u003cspan class=\\\"RefSource\\\"\\u003eSpringer.https://doi.org/10.1007/978-3-319-67144-4_7\\u003c/span\\u003e\\u003cspan address=\\\"Springer.10.1007/978-3-319-67144-4_7\\\" targettype=\\\"DOI\\\" class=\\\"RefTarget\\\"\\u003e\\u003c/span\\u003e\\u003c/span\\u003e\\u003c/span\\u003e\\u003c/li\\u003e \\u003cli\\u003e\\u003cspan\\u003eRuseckaite R, McAllister S, Muir J, Enticott J, Donaldson A, King S. Current state of rare disease registries and databases in Australia: A scoping review. Orphanet J Rare Dis. 2023a;18:220. \\u003cspan class=\\\"ExternalRef\\\"\\u003e\\u003cspan class=\\\"RefSource\\\"\\u003ehttps://doi.org/10.1186/s13023-023-02823-1\\u003c/span\\u003e\\u003cspan address=\\\"10.1186/s13023-023-02823-1\\\" targettype=\\\"DOI\\\" class=\\\"RefTarget\\\"\\u003e\\u003c/span\\u003e\\u003c/span\\u003e.\\u003c/span\\u003e\\u003c/li\\u003e \\u003cli\\u003e\\u003cspan\\u003eRuseckaite R, Enticott J, Muir J, McAllister S, Donaldson A, King S. Informing a national rare disease registry strategy in Australia: A mixed methods study. Orphanet J Rare Dis. 2023b;18:162. \\u003cspan class=\\\"ExternalRef\\\"\\u003e\\u003cspan class=\\\"RefSource\\\"\\u003ehttps://doi.org/10.1186/s12913-023-10049-x\\u003c/span\\u003e\\u003cspan address=\\\"10.1186/s12913-023-10049-x\\\" targettype=\\\"DOI\\\" class=\\\"RefTarget\\\"\\u003e\\u003c/span\\u003e\\u003c/span\\u003e.\\u003c/span\\u003e\\u003c/li\\u003e \\u003cli\\u003e\\u003cspan\\u003eSantoro M, Coi A, Di Lipucci M, Bianucci AM, Gainotti S, Mollo E, Vittozzi L, Taruscio D, Bianchi F. Rare disease registries classification and characterization: A data mining approach. Public Health Genomics. 2015;18(2):113\\u0026ndash;22. \\u003cspan class=\\\"ExternalRef\\\"\\u003e\\u003cspan class=\\\"RefSource\\\"\\u003ehttps://doi.org/10.1159/000369993\\u003c/span\\u003e\\u003cspan address=\\\"10.1159/000369993\\\" targettype=\\\"DOI\\\" class=\\\"RefTarget\\\"\\u003e\\u003c/span\\u003e\\u003c/span\\u003e.\\u003c/span\\u003e\\u003c/li\\u003e \\u003cli\\u003e\\u003cspan\\u003eSchmitt T, Poirel HA, Cau\\u0026euml;t E, Delnord M, Van Den Bulcke M. Unlocking the genomic landscape: Results of the Beyond 1 Million Genomes (B1MG) pilot in Belgium towards genomic data infrastructure (GDI). Health Policy. 2024;143:105060. \\u003cspan class=\\\"ExternalRef\\\"\\u003e\\u003cspan class=\\\"RefSource\\\"\\u003ehttps://doi.org/10.1016/j.healthpol.2024.105060\\u003c/span\\u003e\\u003cspan address=\\\"10.1016/j.healthpol.2024.105060\\\" targettype=\\\"DOI\\\" class=\\\"RefTarget\\\"\\u003e\\u003c/span\\u003e\\u003c/span\\u003e.\\u003c/span\\u003e\\u003c/li\\u003e \\u003cli\\u003e\\u003cspan\\u003eSequeira M, Almeida JR, Oliveira JL. (2021). A comparative analysis of data platforms for rare diseases. In \\u003cem\\u003e2021 IEEE 34th International Symposium on Computer-Based Medical Systems (CBMS)\\u003c/em\\u003e (pp. 366\\u0026ndash;371). IEEE. \\u003cspan class=\\\"ExternalRef\\\"\\u003e\\u003cspan class=\\\"RefSource\\\"\\u003ehttps://doi.org/10.1109/CBMS52027.2021.00041\\u003c/span\\u003e\\u003cspan address=\\\"10.1109/CBMS52027.2021.00041\\\" targettype=\\\"DOI\\\" class=\\\"RefTarget\\\"\\u003e\\u003c/span\\u003e\\u003c/span\\u003e\\u003c/span\\u003e\\u003c/li\\u003e \\u003cli\\u003e\\u003cspan\\u003eShapiro AD, Soucie JM, Peyvandi F, Aschman DJ, DiMichele DM, European Network Rare Bleeding Disorders Database. Clotting Disorders Working Group, \\u0026amp;. (2011). Knowledge and therapeutic gaps: a public health problem in the rare coagulation disorders population. American Journal of Preventive Medicine, 41(6), S324-S331\\u003cspan class=\\\"ExternalRef\\\"\\u003e\\u003cspan class=\\\"RefSource\\\"\\u003e.https://doi.org/10.1016/j.amepre.2011.09.021\\u003c/span\\u003e\\u003cspan address=\\\".10.1016/j.amepre.2011.09.021\\\" targettype=\\\"DOI\\\" class=\\\"RefTarget\\\"\\u003e\\u003c/span\\u003e\\u003c/span\\u003e\\u003c/span\\u003e\\u003c/li\\u003e \\u003cli\\u003e\\u003cspan\\u003eStark Z, et al. Integrating genomics into healthcare: A global responsibility. Am J Hum Genet. 2019;104(1):13\\u0026ndash;20. \\u003cspan class=\\\"ExternalRef\\\"\\u003e\\u003cspan class=\\\"RefSource\\\"\\u003ehttps://doi.org/10.1016/j.ajhg.2018.11.014\\u003c/span\\u003e\\u003cspan address=\\\"10.1016/j.ajhg.2018.11.014\\\" targettype=\\\"DOI\\\" class=\\\"RefTarget\\\"\\u003e\\u003c/span\\u003e\\u003c/span\\u003e.\\u003c/span\\u003e\\u003c/li\\u003e \\u003cli\\u003e\\u003cspan\\u003eStark Z, Boughtwood T, McClaren BJ, et al. Australian Genomics: Outcomes of a 5-year national program to accelerate the integration of genomics into healthcare. Eur J Hum Genet. 2023;31:489\\u0026ndash;500. \\u003cspan class=\\\"ExternalRef\\\"\\u003e\\u003cspan class=\\\"RefSource\\\"\\u003ehttps://doi.org/10.1016/j.ajhg.2023.01.018\\u003c/span\\u003e\\u003cspan address=\\\"10.1016/j.ajhg.2023.01.018\\\" targettype=\\\"DOI\\\" class=\\\"RefTarget\\\"\\u003e\\u003c/span\\u003e\\u003c/span\\u003e.\\u003c/span\\u003e\\u003c/li\\u003e \\u003cli\\u003e\\u003cspan\\u003eTaruscio D, et al. The Undiagnosed Diseases Network International: Five years and more! Mol Genet Metab. 2020;129(4):243\\u0026ndash;54. \\u003cspan class=\\\"ExternalRef\\\"\\u003e\\u003cspan class=\\\"RefSource\\\"\\u003ehttps://doi.org/10.1016/j.ymgme.2020.01.004\\u003c/span\\u003e\\u003cspan address=\\\"10.1016/j.ymgme.2020.01.004\\\" targettype=\\\"DOI\\\" class=\\\"RefTarget\\\"\\u003e\\u003c/span\\u003e\\u003c/span\\u003e.\\u003c/span\\u003e\\u003c/li\\u003e \\u003cli\\u003e\\u003cspan\\u003eToubiana L, Ugon A, Giavarini A, Riquier J, Charlet J, Jeunemaitre X, Plouin P-F, Jaulent M-C. A pivot model to set up large scale rare diseases information systems: Application to the Fibromuscular Dysplasia Registry. In: Cornet R, et al. editors. Digital healthcare empowering Europeans. IOS.; 2015. pp. 887\\u0026ndash;91. \\u003cspan class=\\\"ExternalRef\\\"\\u003e\\u003cspan class=\\\"RefSource\\\"\\u003ehttps://doi.org/10.3233/978-1-61499-512-8-887\\u003c/span\\u003e\\u003cspan address=\\\"10.3233/978-1-61499-512-8-887\\\" targettype=\\\"DOI\\\" class=\\\"RefTarget\\\"\\u003e\\u003c/span\\u003e\\u003c/span\\u003e.\\u003c/span\\u003e\\u003c/li\\u003e \\u003cli\\u003e\\u003cspan\\u003eVisibelli A, Scatena C, Tonarelli A, et al. Computational approaches integrated in a digital ecosystem platform for a rare disease. J Personalized Med. 2022;12(6). \\u003cspan class=\\\"ExternalRef\\\"\\u003e\\u003cspan class=\\\"RefSource\\\"\\u003e1013.https://doi.org/10.3389/fmmed.2022.827340\\u003c/span\\u003e\\u003cspan address=\\\"1013.10.3389/fmmed.2022.827340\\\" targettype=\\\"DOI\\\" class=\\\"RefTarget\\\"\\u003e\\u003c/span\\u003e\\u003c/span\\u003e.\\u003c/span\\u003e\\u003c/li\\u003e \\u003cli\\u003e\\u003cspan\\u003eWilkinson MD, Dumontier M, Aalbersberg IJ, et al. The FAIR guiding principles for scientific data management and stewardship. Sci Data. 2016;3:160018. \\u003cspan class=\\\"ExternalRef\\\"\\u003e\\u003cspan class=\\\"RefSource\\\"\\u003ehttps://doi.org/10.1038/sdata.2016.18\\u003c/span\\u003e\\u003cspan address=\\\"10.1038/sdata.2016.18\\\" targettype=\\\"DOI\\\" class=\\\"RefTarget\\\"\\u003e\\u003c/span\\u003e\\u003c/span\\u003e.\\u003c/span\\u003e\\u003c/li\\u003e \\u003cli\\u003e\\u003cspan\\u003eWood L, Bassez G, Bleyenheuft C, Campbell C, Cossette L, Jimenez-Moreno AC, Dai Y, Dawkins H, D\\u0026iacute;az-Manera J, Dogan C, el Sherif R, Fossati B, Graham C, Hilbert J, Kastreva K, Kimura E, Korngut L, Kostera-Pruszczyk A, Lindberg C, Lindvall B, Luebbe E, Lusakowska A, Mazanec R, Meola G, Orlando L, Takahashi MP, Peric S, Puymirat J, Rakocevic-Stojanovic V, Rodrigues M, Roxburgh R, Schoser B, Segovia S, Shatillo A, Thiele S, Tournev I, van Engelen B, Vohanka S, Lochm\\u0026uuml;ller H. (2018). Eight years after an international workshop on myotonic dystrophy patient registries: Case study of a global collaboration for a rare disease. \\u003cem\\u003eOrphanet Journal of Rare Diseases, 13\\u003c/em\\u003e(1). \\u003cspan class=\\\"ExternalRef\\\"\\u003e\\u003cspan class=\\\"RefSource\\\"\\u003e155.https://doi.org/10.1186/s13023-018-0889-0\\u003c/span\\u003e\\u003cspan address=\\\"155.10.1186/s13023-018-0889-0\\\" targettype=\\\"DOI\\\" class=\\\"RefTarget\\\"\\u003e\\u003c/span\\u003e\\u003c/span\\u003e\\u003c/span\\u003e\\u003c/li\\u003e \\u003cli\\u003e\\u003cspan\\u003eWorld Health Organization. (2024). \\u003cem\\u003eWHO principles for human genome data: Access, use, and sharing\\u003c/em\\u003e. World Health Organization.\\u003cspan class=\\\"ExternalRef\\\"\\u003e\\u003cspan class=\\\"RefSource\\\"\\u003ehttps://cdn.who.int/media/docs/default-source/research-for-health/who-principles-human-genome-data-access--use--and-sharing_public-consultation_8-april.pdf?sfvrsn=f2c7afc7_3\\u003c/span\\u003e\\u003cspan address=\\\"https://cdn.who.int/media/docs/default-source/research-for-health/who-principles-human-genome-data-access--use--and-sharing_public-consultation_8-april.pdf?sfvrsn=f2c7afc7_3\\\" targettype=\\\"URL\\\" class=\\\"RefTarget\\\"\\u003e\\u003c/span\\u003e\\u003c/span\\u003e\\u003c/span\\u003e\\u003c/li\\u003e\\u003c/ol\\u003e\"}],\"fulltextSource\":\"\",\"fullText\":\"\",\"funders\":[],\"hasAdminPriorityOnWorkflow\":false,\"hasManuscriptDocX\":true,\"hasOptedInToPreprint\":true,\"hasPassedJournalQc\":\"\",\"hasAnyPriority\":false,\"hideJournal\":true,\"highlight\":\"\",\"institution\":\"\",\"isAcceptedByJournal\":false,\"isAuthorSuppliedPdf\":false,\"isDeskRejected\":\"\",\"isHiddenFromSearch\":false,\"isInQc\":false,\"isInWorkflow\":false,\"isPdf\":false,\"isPdfUpToDate\":true,\"isWithdrawnOrRetracted\":false,\"journal\":{\"display\":true,\"email\":\"info@researchsquare.com\",\"identity\":\"researchsquare\",\"isNatureJournal\":false,\"hasQc\":true,\"allowDirectSubmit\":true,\"externalIdentity\":\"\",\"sideBox\":\"\",\"snPcode\":\"\",\"submissionUrl\":\"/submission\",\"title\":\"Research Square\",\"twitterHandle\":\"researchsquare\",\"acdcEnabled\":true,\"dfaEnabled\":false,\"editorialSystem\":\"\",\"reportingPortfolio\":\"\",\"inReviewEnabled\":false,\"inReviewRevisionsEnabled\":true},\"keywords\":\"Rare Disease, Genomic, Minimum dataset\",\"lastPublishedDoi\":\"10.21203/rs.3.rs-8204628/v1\",\"lastPublishedDoiUrl\":\"https://doi.org/10.21203/rs.3.rs-8204628/v1\",\"license\":{\"name\":\"CC BY 4.0\",\"url\":\"https://creativecommons.org/licenses/by/4.0/\"},\"manuscriptAbstract\":\"\\u003ch2\\u003eBackground\\u003c/h2\\u003e \\u003cp\\u003eMinimum data sets (MDS) are used to harmonize the capture and exchange of rare-disease information across studies and care settings, but the genomic component of these frameworks is often inconsistently specified. In our sample of included studies (n\\u0026thinsp;=\\u0026thinsp;23), only 2 explicitly reported using Whole-Exome Sequencing (WES) or Whole-Genome Sequencing (WGS), highlighting a persistent gap in genomic method reporting alongside heterogeneity in scope, standards adoption, and reported impacts.\\u003c/p\\u003e\\u003ch2\\u003eMethods\\u003c/h2\\u003e \\u003cp\\u003eWe performed a systematic review (searches through 2024) to identify publications proposing, developing, or applying MDS that included genomic elements for rare diseases. Screening was conducted in two steps: (i) independent title/abstract screening by two reviewer pairs with conflict resolution by a third reviewer, followed by (ii) independent full-text assessment by two reviewers. We extracted study characteristics, MDS domains, intended use context, referenced standards/ontologies, level of genomic reporting, and reported outcomes. Results were summarized with descriptive statistics, Jaccard-based co-occurrence patterns, and exploratory association analyses.\\u003c/p\\u003e\\u003ch2\\u003eResults\\u003c/h2\\u003e \\u003cp\\u003eTwenty-three studies met the inclusion criteria and were mostly produced in Europe and North America. Clinical/phenotypic information was nearly universal (95.7%), whereas genomic data were included in 69.6% of cases and were usually described without specifying the sequencing modality. Most studies targeted biomedical/genomic research (91.3%) and clinical diagnosis/care (69.6%). Standards use was modest (median\\u0026thinsp;=\\u0026thinsp;1 per study), with the most frequent being HPO (26.1%), Orphanet/Orphacode (21.7%), FAIR (17.4%), and ICD (8.7%). Reported benefits were more common at the system level (e.g., interoperability or policy-related outputs) than as consistently quantified clinical effects. Exploratory analyses suggested that practices such as planned reanalysis, phenotype\\u0026ndash;genotype linkage, and explicit handling of structural variants may be associated with greater clinical/knowledge gains than the sequencing modality alone, although evidence remained insufficient to draw firm causal conclusions.\\u003c/p\\u003e\\u003ch2\\u003eConclusions\\u003c/h2\\u003e \\u003cp\\u003eRare-disease MDS commonly captures clinical information but often underspecifies core genomic details and inconsistently applies standards, limiting comparability and interoperability. Progress would benefit from a minimal genomic reporting core (sequencing approach, reference genome, variant classes, and analysis/annotation pipeline descriptors) aligned with widely used ontologies and interoperability principles, together with routine inclusion of patient-centered outcomes and biospecimen linkages.\\u003c/p\\u003e\",\"manuscriptTitle\":\"Minimum genomic data sets for rare diseases: A systematic review\",\"msid\":\"\",\"msnumber\":\"\",\"nonDraftVersions\":[{\"code\":1,\"date\":\"2026-02-17 09:24:50\",\"doi\":\"10.21203/rs.3.rs-8204628/v1\",\"editorialEvents\":[{\"type\":\"communityComments\",\"content\":0}],\"status\":\"published\",\"journal\":{\"display\":true,\"email\":\"info@researchsquare.com\",\"identity\":\"researchsquare\",\"isNatureJournal\":false,\"hasQc\":true,\"allowDirectSubmit\":true,\"externalIdentity\":\"\",\"sideBox\":\"\",\"snPcode\":\"\",\"submissionUrl\":\"/submission\",\"title\":\"Research Square\",\"twitterHandle\":\"researchsquare\",\"acdcEnabled\":true,\"dfaEnabled\":false,\"editorialSystem\":\"\",\"reportingPortfolio\":\"\",\"inReviewEnabled\":false,\"inReviewRevisionsEnabled\":true}}],\"origin\":\"\",\"ownerIdentity\":\"6071c763-9117-4d65-b07d-2979c5130bfa\",\"owner\":[],\"postedDate\":\"February 17th, 2026\",\"published\":true,\"recentEditorialEvents\":[{\"type\":\"decision\",\"content\":\"Rejected\",\"date\":\"2026-05-08T13:48:51+00:00\",\"index\":\"\",\"fulltext\":\"\"}],\"rejectedJournal\":[],\"revision\":\"\",\"amendment\":\"\",\"status\":\"posted\",\"subjectAreas\":[],\"tags\":[],\"updatedAt\":\"2026-05-08T13:59:24+00:00\",\"versionOfRecord\":[],\"versionCreatedAt\":\"2026-02-17 09:24:50\",\"video\":\"\",\"vorDoi\":\"\",\"vorDoiUrl\":\"\",\"workflowStages\":[]},\"version\":\"v1\",\"identity\":\"rs-8204628\",\"journalConfig\":\"researchsquare\"},\"__N_SSP\":true},\"page\":\"/article/[identity]/[[...version]]\",\"query\":{\"redirect\":\"/article/rs-8204628\",\"identity\":\"rs-8204628\",\"version\":[\"v1\"]},\"buildId\":\"XKTyCvWXoU3ODBz1xrDgd\",\"isFallback\":false,\"isExperimentalCompile\":false,\"dynamicIds\":[84888],\"gssp\":true,\"scriptLoader\":[]}","source_license":"CC-BY-4.0","license_restricted":false}