PubMatcher: a web app to streamline genomic data interpretation with automated bibliographic research | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Article PubMatcher: a web app to streamline genomic data interpretation with automated bibliographic research Victor Marin, Hugo Lannes, Victor Dumont, Julien Thévenon, David Baux, and 4 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-7224867/v1 This work is licensed under a CC BY 4.0 License Status: Published Journal Publication published 07 Mar, 2026 Read the published version in European Journal of Human Genetics → Version 1 posted 11 You are reading this latest preprint version Abstract In the era of rapidly accumulating genomic data, largely driven by the broad use of whole-genome sequencing (WGS) in clinical settings, interpreting lesser-known genes with varied phenotypes remains challenging. PubMatcher is a new tool that automates bibliographic research for multiple genes at once and grants quick and easy access to relevant gene information. It helps users efficiently identify potential genotype-phenotype associations using PubMed complemented by additional data. By significantly reducing analysis time, PubMatcher streamlines the interpretation of novel or under-documented genes. Available to non-commercial users for free, PubMatcher is a user-friendly and efficient solution for researchers, clinical scientists and pathologists working with pangenomics analyses. Biological sciences/Genetics/Medical genetics Biological sciences/Genetics/Clinical genetics Figures Figure 1 Figure 2 Figure 3 Figure 4 Introduction Genomic sequencing advancements have led to an explosion of data, making the interpretation of variants in lesser-known genes a day-to-day challenge for geneticists. Key gene-phenotype associations often remain underrepresented in widely used databases involved in human disease like Online Mendelian Inheritance in Man (OMIM) 1 . For example, OMIM may omit some gene-phenotype associations 2 or include them, but with an emphasis on symptoms different from those observed in some patients. To avoid this issue, PubMed or other databases can be useful to find the most relevant scientific publications regarding the link between a gene and a specific phenotype. This thorough approach to genomic data interpretation can be time-intensive and potentially less accurate over time. This is especially true for whole genome sequencing (WGS) analysis, where a significant number of variants located in non OMIM morbid genes are selected by classical filters (such as “rare loss-of-function”, “rare homozygous missense for a recessive hypothesis”). To address these challenges, we developed PubMatcher, a free online tool that automates the retrieval of gene-phenotype associations by querying multiple curated databases and PubMed simultaneously. PubMatcher uniquely supports batch format-free analysis, significantly reducing the time required to identify candidate genes relevant to a patient’s phenotype. With a user-friendly, format-free interface, the tool facilitates the exploration of lesser-known or emerging gene-phenotype associations, providing an efficient solution for genomic interpretation workflows. Materials and Methods Pubmatcher is a web application developed in Node.js 3 , 4 a server-side JavaScript runtime environment known for its versatility and efficiency in modern web applications. Two types of inputs are needed: one or more genes and one or more phenotypes (or relevant keyword) (Fig. 1 ). The PubMatcher pipeline then queries multiple databases and APIs: Gnomad genes constraint metrics 5 , Pubmed, Uniprot 6 , International Mouse Phenotyping Consortium (IMPC) 7 , Clinvar, Gene Curation Coalition (GenCC), HUGO Gene Nomenclature Committee, and PanelApp. The results page presents a summary of all the information collected (Fig. 2 ). Ensuring wide accessibility, PubMatcher is designed to be accessed via web browsers at https://pubmatcher.fr and code is available under Massachusetts Institute of Technology license on Github ( https://github.com/victormar1/PubMatcher/ ). It does not require user registration, adhering to most journal's guidelines for software tools. Results are presented in an organized table format, where gene-phenotype pairs are listed with key metrics, such as constraint scores and publication count. The details of each query are described below. Genes Constraint metrics PubMatcher obtains for each gene the following constraint metrics from the gnomAD v2.1 database 8 : pLi (probability of being loss-of-function intolerant), LOEUF (loss-of-function observed/expected upper bound fraction), MOEUF (missense observed/expected upper bound fraction) and missense Z-Score. LOEUF and MOEUF metrics indicate a gene’s tolerance to loss-of-function and missense variants, respectively, helping prioritize genes under selective constraint for clinical relevance. LOEUF and MOEUF values are highlighted based on constraint levels: dark red for the top 10% most constrained genes (LOEUF ≤ 0.26, MOEUF ≤ 0.58), red for the top 20% (LOEUF ≤ 0.41, MOEUF ≤ 0.70), orange for the top 25% (LOEUF ≤ 0.48, MOEUF ≤ 0.73), and yellow for the top 30% (LOEUF ≤ 0.55, MOEUF ≤ 0.77). Values beyond these can remain black as defined in reference 9 . GnomAD v4 constraints can also be displayed by clicking on the constraints, similar information is displayed and discrepancies between the two versions are highlighted with an exclamation mark. PubMed PubMed is a free online database providing access to a vast repository of biomedical research articles maintained by the National Center for Biotechnology Information and represents an “up-to-date” knowledge source for gene-phenotype associations. 10 PubMatcher includes the number of publications retrieved following a query, the title of the first publication in the list, and a link to access the query on PubMed and the related research articles. The PubMed research includes the association between a gene name and a phenotype. Moreover, the queries are cumulative for each gene-phenotype pair. An example of query is shown in Fig. 2 , which includes five genes and two phenotypes. The PubMed query for each gene follows this pattern: (GENE AND PHENOTYPE_1) OR (GENE AND PHENOTYPE_2). Hovering on the title of the publication will display other matching publications titles. Uniprot The UniProt database 6 provides information about protein functions, which is potentially relevant for genetic interpretation. PubMatcher requests the protein description and biological features keywords from UniProt using API access. International Mouse Phenotyping Consortium The IMPC database 7 provides information about the consequences of gene knockouts in mice, which could suggest a gene's involvement in human diseases. Different phenotypes are listed as presented on IMPC and specific symptoms can be displayed by mouseover. Clinvar Lookup PubMatcher integrates data from ClinVar, a public database maintained by the National Center for Biotechnology Information that provides clinically relevant interpretations of genetic variants, including their pathogenicity, molecular consequences, and supporting evidence. For pathogenic and likely pathogenic small nucleotide variants, PubMatcher displays both the number of loss-of-function (LOF) variants—including frameshift, nonsense, and canonical splice site alterations—and the number of missense variants. Additionally, VUS are also reported to ensure no potentially relevant findings are overlooked. Gene Curation Coalition, PanelApp & OMIM PubMatcher integrates data from GenCC 11 , PanelApp 12 , and OMIM 1 to provide comprehensive information on gene-disease associations, ensuring rapid and accurate curation of clinically relevant genes. GenCC aggregates gene-disease validity information from multiple expert-curated sources, facilitating the identification of genes with well-established evidence for their role in human diseases. PubMatcher displays the gene status from GeneCC. The number of genes listed in both PanelApp UK and PanelApp Australia are mentioned in the PubMatcher output due to their significance in fast gene-disease curation. Links are provided for quick access to the relevant entries on the PanelApp websites. OMIM (Online Mendelian Inheritance in Man): OMIM is a comprehensive, authoritative resource that catalogs human genes and genetic phenotypes, including their relationships to disease. PubMatcher integrates data from OMIM to indicate whether a gene is associated with a known morbid condition or phenotype. Relevance of Pubmatcher in Human Whole Genome Sequencing analysis We evaluated the relevance of the PubMatcher tool in WGS analyses of patients with rare diseases performed at the Auragen laboratory in Lyon, France. This laboratory is part of the French 2025 genomic project, which aims to expand genomic access in human healthcare 13 , 14 . First, the proportion of variants filtered out by an example set of common WGS filters (detailed in Table S1 ) that were not located in OMIM morbid genes across 20 trio-based WGS analyses was assessed. Then, we present examples of variants revealed by PubMatcher in genes that proved potentially relevant for medical use after analyzing 100 WGS cases. Whole genome sequencing was performed following the recommendations of “France Médecine Génomique 2025” Plan. Genomic DNA extracted from whole blood was sequenced according to standard procedures for a Polymerase Chain Reaction-Free genome on a NovaSeq6000 instrument (Illumina, San Diego, California, USA). Sequencing data were aligned to the GRCh38p13 full assembly using bwa 0.7+. Variants were called by several algorithms including GATK4+, Bcftools1.10+, Manta1.6+, CNVnator0.4+, and annotated using the variant effect predictor. Detected variants were prioritized using in-house procedures. Further details are available on request on http://www.auragen.fr . Results Variants in Non-OMIM Genes Found by Common WGS Filters PubMatcher is meant to quickly identify gene and phenotype associations using the most up-to-date sources. Although the OMIM database is regularly updated, the most recent phenotype-to-gene associations may be missing, potentially leading to the exclusion of relevant variants. Therefore, we evaluated the proportion of non-OMIM morbid genes in 20 WGS trios of a patient and their unaffected parents, using a classic filtering strategy (see Table S1 for filters’ details). After applying these filters, the remaining variant counts ranged from 80 to 150 per sample, with a median of 95. Among these, the median proportions of variants mapped to OMIM morbid genes, OMIM non-morbid genes, and non-OMIM genes were 30%, 52%, and 18%, respectively (Fig. 3 ). These results confirm a high representation of non-morbid or non-OMIM genes (70%) post-filtering, underscoring the utility of PubMatcher for efficiently screening them. Miss- or non-annotated genes with relevant variants in 100 WGS analyses We present examples of variants found in genes either not annotated in OMIM for the researched disease or with a non-syndromic form not specified in OMIM (Table 1). These relevant variants were identified in 15 out of 100 whole-genome sequences analyzed at the French laboratory Auragen (Lyon, France). The genomes included in this study were selected solely based on their availability as trios and were analyzed in chronological order, starting with the oldest requests, without any other selection criteria. Some of these genomes had never undergone prior genetic testing, while others had previously undergone panel or exome sequencing, which was insufficient for establishing a definitive diagnosis. Phenotypes included genodermatosis, chronic nephropathy, intellectual deficiency, or red blood cell diseases. Integrating PubMatcher in Genomic Variant Analysis Workflows PubMatcher is a tool that can be integrated early in the general workflow of genomic single nucleotide variant analysis. We propose a flowchart for variant interpretation in a large-scale genomic approach (Fig. 4 ) Starting with a conventional filtering strategy (as described in Table S1 ), a rapid diagnosis can be made if a causative variant is identified — for example, a previously described ClinVar pathogenic variant that matches the patient's clinical presentation. If such a variant is not found, a more thorough variant analysis is required to explore and report relevant genetic variants. The tool can be used for gene screening across all identified variants, allowing for a quick exploration of the most recent scientific knowledge (via PubMed and PanelApp queries), gene constraint metrics, protein functions (Uniprot), and the consequences of mouse knockout models (IMPC). The mode of inheritance based on the family pedigree is also crucial. A recent publication from Chong et al. 9 compiled five key criteria for retaining genes of interest, nearly all of which are integrated into the proposed flowchart that includes PubMatcher, except for gnomAD variant co-occurrence. After analyzing the data in the context of the patients’ phenotypes, some variants may be retained within genes of interest. If the evidence level is sufficient, the variant can be classified and reported (with additional exploration needed if it is a variant of unknown significance). Conversely, if the evidence level is low, a more research-focused approach, such as submitting to MatchMaker Exchange 32 (Genematcher, etc..) or conducting further fundamental post-genomic investigations, may be suggested. Discussion We believe PubMatcher is a significant advancement in clinical genomic research, addressing the need for more efficient interpretation of genomic data. By rapidly identifying relevant gene-phenotype associations—especially in lesser-known genes—PubMatcher increases both the speed and accuracy of genomic analyses. This approach also helps ensure that rare yet important variants are not missed, which is critical for their inclusion in broader research studies; given their rarity, these cases can provide invaluable insights into disease mechanisms and phenotypic diversity. While existing tools already offer some bibliographic functionalities, PubMatcher specifically addresses the need for batch analysis. It is designed to complement widely used tools like VarSome or MobiDetails 33 , offering a more streamlined approach for rapid gene-phenotype association. An important consideration is the inclusion of animal models, such as the mouse model, which provides invaluable insights into gene function and disease relevance due to its genetic similarity to humans. However, mouse models present limitations, including differences in gene expression and phenotypic responses. Expanding to other model organisms, such as zebrafish, could diversify the functional insights available to PubMatcher users, particularly for genes where murine models have limited general data or translational relevance. The effectiveness of PubMatcher heavily depends on the quality and completeness of its external data sources. Attempts to incorporate alternative sources, such as Google Scholar, resulted in an overwhelming volume of unspecific and irrelevant data, highlighting PubMed as the most reliable and curated source for retrieving relevant literature. Advances in AI-driven text-mining tools, such as PubTator 34 , offer promising avenues for improving data retrieval by extracting gene-disease relationships from biomedical literature. These tools could significantly enhance the exhaustivity of PubMatcher’s results by identifying additional relevant publications that might otherwise remain undetected. However, current rate limitations (3 requests per second) within the PubTator API preclude its integration into PubMatcher at this stage. PubMatcher has demonstrated effectiveness in identifying clinically relevant genes, thereby fulfilling its primary objective. Notably, several geneticists outside the development team have already integrated PubMatcher into their variant interpretation workflows, underscoring its reliability and practical utility and adaptability in clinical genomics. Further exploration of PubMatcher’s applications in clinical settings could be beneficial. Another important consideration is the accessibility of the tool. While the current interface is user-friendly—particularly in terms of input formatting, result clarity, and advanced features upon login (such as input history)—further simplifying the user experience and providing enhanced guidance and support would make the tool even more accessible to a wider audience. Integrating artificial intelligence or machine learning could also boost PubMatcher’s capabilities by adding features like gene scoring to rank the matches by their relevancy to the phenotype. Ongoing updates, as well as feedback from the user community, will be crucial for the tool’s continued development and for expanding its utility in the field of genomic research. Conclusion PubMatcher provides an effective solution for streamlining genomic data interpretation by automating bibliographic research and integrating it seamlessly into genomic interpretation workflows. This approach significantly enhances efficiency, particularly in identifying lesser-known yet clinically relevant gene-phenotype associations. As PubMatcher continues to evolve, improvements in data integration, interface design, and user-driven enhancements will further solidify its role as a valuable tool for both clinical diagnostics and genomic research. Declarations Data Availability Statement The human whole genome sequencing data used in this study were obtained from the French national genomic medicine initiative, Plan France Médecine Génomique 2025 (PFMG2025). These sequencing data were generated and analyzed at the AURAGEN genomic sequencing center. Due to ethical and privacy restrictions, the raw sequencing data are not publicly available but as described in Abadie and al. (2025) 14 , access request to molecular dataset can be found in online repositories on the website: https://pfmg2025.fr/ Additional data are available from the corresponding author on reasonable request. Code Availability The source code for PubMatcher is freely available under the Massachusetts Institute of Technology (MIT) License. Project name: PubMatcher Project home page: https://github.com/victormar1/PubMatcher Operating system(s): Platform independent Programming language: JavaScript (Node.js & Vue.js) Other requirements: None License: Massachusetts Institute of Technology License Any restrictions to use by non-academics: No specific restrictions, open-source license Acknowledgements This research was made possible through access to the data generated by the 2025 French Genomic Medicine Initiative. Author Contributions Statement V.M. conceptualized the project, developed the software, performed data analyses, and wrote the main manuscript text and figures. H.L. and V.D. contributed to the development of the software used in the work. J.T. provided data access, provided scientific guidance, and have substantively revised the manuscript. D.B. assisted with software development, provided scientific guidance, and have substantively revised the manuscript. A.-F.R. contributed scientific expertise and have substantively revised the manuscript. E.L. helped with conceptualization, promoted our work, provided scientific feedback, and have substantively revised the manuscript. P.P. contributed to study design, offered scientific input, and have substantively revised the manuscript. L.L. co-conceived the project, supervised the research, and co-wrote the manuscript. All authors reviewed and approved the final version of the manuscript and agree to be accountable for the work. Funding No financial assistance was received in support of the study, Ethical Approval This study involved genomic analyses conducted as part of routine clinical care for patients with rare diseases in France. As such, a clinical trial registration was not required, since all data reported were obtained during standard diagnostic procedures. In accordance with the French Bioethics Law (Law No. 2004-800, dated August 6, 2004), all patients provided written informed consent for diagnostic procedures and were specifically informed that any remaining biological material could be used for research purposes. The retrospective use of these data was approved by the Bordeaux University Hospital under registration number CHUBX2025RE0134 . Competing Interests No competing interests References McKusick VA. Mendelian Inheritance in Man. A Catalog of Human Genes and Genetic Disorders. Johns Hopkins University Press. 1998;12. Shakir A, Ripperger M, Jiang Z, Wierenga KJ. Inferred inheritance of MorbidMap genes without OMIM clinical synopsis. Genet Med. 2018;20:470–3. Holowaychuk TJ. tj/ejs [Internet]. 2024 [cité 22 avr 2024]. Disponible sur: https://github.com/tj/ejs nodejs/node [Internet]. Node.js; 2024 [cité 22 avr 2024]. Disponible sur: https://github.com/nodejs/node Karczewski KJ, Francioli LC, Tiao G, Cummings BB, Alföldi J, Wang Q, et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature. 2020;581:434–43. The UniProt Consortium. UniProt: the Universal Protein Knowledgebase in 2023. Nucleic Acids Research. 2023;51:D523–31. Groza T, Gomez FL, Mashhadi HH, Muñoz-Fuentes V, Gunes O, Wilson R, et al. The International Mouse Phenotyping Consortium: comprehensive knockout phenotyping underpinning the study of human disease. Nucleic Acids Research. 2023;51:D1038–45. Chen S, Francioli LC, Goodrich JK, Collins RL, Kanai M, Wang Q, et al. A genomic mutational constraint map using variation in 76,156 human genomes. Nature. 2024;625:92–100. Chong JX, Berger SI, Baxter S, Smith E, Xiao C, Calame DG, et al. Considerations for reporting variants in novel candidate genes identified during clinical genomic testing. Genetics in Medicine. 2024;26:101199. Sayers EW, Bolton EE, Brister JR, Canese K, Chan J, Comeau DC, et al. Database resources of the national center for biotechnology information. Nucleic Acids Res. 2022;50:D20–6. The Gene Curation Coalition: A global effort to harmonize gene-disease evidence resources - PubMed [Internet]. [cité 9 janv 2025]. Disponible sur: https://pubmed.ncbi.nlm.nih.gov/35507016/ Martin AR, Williams E, Foulger RE, Leigh S, Daugherty LC, Niblock O, et al. PanelApp crowdsources expert knowledge to establish consensus diagnostic gene panels. Nat Genet. 2019;51:1560–5. Sanlaville D, Vidaud M, Thauvin-Robinet C, Nowak F, Lethimonnier F. [French Genomic Medicine Plan 2025 (PFMG2025): France enters the era of genomic medicine]. Rev Prat. 2021;71:1061–4. Abadie C, Abderrahmane A, Abdous O, Abel C, Ackermann O, Acquaviva C, et al. PFMG2025–integrating genomic medicine into the national healthcare system in France. The Lancet Regional Health – Europe [Internet]. 2025 [cité 13 mars 2025];50. Disponible sur: https://www.thelancet.com/journals/lanepe/article/PIIS2666-7762( 24)00352-1/fulltext Hassan A, Morice-Picard F, Marin V, Lasseaux Robine E, Lebreton L, Davaze-Schneider J. Hypohidrotic ectodermal dysplasia in a family: expanding spectrum of LEF1-related disorders. Clinical and Experimental Dermatology. 2024;llae293. Dufour W, Alawbathani S, Jourdain AS, Asif M, Baujat G, Becker C, et al. Monoallelic and biallelic variants in LEF1 are associated with a new syndrome combining ectodermal dysplasia and limb malformations caused by altered WNT signaling. Genet Med. 2022;24:1708–21. Lévy J, Capri Y, Rachid M, Dupont C, Vermeesch JR, Devriendt K, et al. LEF1 haploinsufficiency causes ectodermal dysplasia. Clinical Genetics. 2020;97:595–600. De Franco E, Wakeling MN, Frew RD, Russ-Silsby J, Peters C, Marks SD, et al. A biallelic loss-of-function PDIA6 variant in a second patient with polycystic kidney disease, infancy-onset diabetes, and microcephaly. Clin Genet. 2022;102:457–8. Al-Fadhli FM, Afqi M, Sairafi MH, Almuntashri M, Alharby E, Alharbi G, et al. Biallelic loss of function variant in the unfolded protein response gene PDIA6 is associated with asphyxiating thoracic dystrophy and neonatal-onset diabetes. Clin Genet. 2021;99:694–703. Münch J, Engesser M, Schönauer R, Hamm JA, Hartig C, Hantmann E, et al. Biallelic pathogenic variants in roundabout guidance receptor 1 associate with syndromic congenital anomalies of the kidney and urinary tract. Kidney Int. 2022;101:1039–53. Christians A, Kesdiren E, Hennies I, Hofmann A, Trowe MO, Brand F, et al. Heterozygous variants in the DVL2 interaction region of DACT1 cause CAKUT and features of Townes-Brocks syndrome 2. Hum Genet. 2023;142:73–88. Yan H, Shi Z, Wu Y, Xiao J, Gu Q, Yang Y, et al. Targeted next generation sequencing in 112 Chinese patients with intellectual disability/developmental delay: novel mutations and candidate gene. BMC Med Genet. 2019;20:80. Ha T, Morgan A, Bartos MN, Beatty K, Cogné B, Braun D, et al. De novo variants predicting haploinsufficiency for DIP2C are associated with expressive speech delay. Am J Med Genet A. 2024;194:e63559. Smits DJ, Schot R, Popescu CA, Dias KR, Ades L, Briere LC, et al. De novo MCM6 variants in neurodevelopmental disorders: a recognizable phenotype related to zinc binding residues. Hum Genet. 2023;142:949–64. Azad P, Caldwell AB, Ramachandran S, Spann NJ, Akbari A, Villafuerte FC, et al. ARID1B, a molecular suppressor of erythropoiesis, is essential for the prevention of Monge’s disease. Exp Mol Med. 2022;54:777–87. Shen Y, Bassett MA, Gurumurthy A, Nar R, Knudson IJ, Guy CR, et al. Identification of a Novel Enhancer/Chromatin Opening Element Associated with High-Level γ-Globin Gene Expression. Mol Cell Biol. 2018;38:e00197-18. Werren EA, Peirent ER, Jantti H, Guxholli A, Srivastava KR, Orenstein N, et al. Biallelic variants in CSMD1 are implicated in a neurodevelopmental disorder with intellectual disability and variable cortical malformations. Cell Death Dis. 2024;15:379. Boonsawat P, Asadollahi R, Niedrist D, Steindl K, Begemann A, Joset P, et al. Deleterious ZNRF3 germline variants cause neurodevelopmental disorders with mirror brain phenotypes via domain-specific effects on Wnt/β-catenin signaling. The American Journal of Human Genetics. 2024;111:1994–2011. Gordon PM, Efthymiou S, Salpietro V, Fielding T, Borgione E, Scuderi C, et al. Human patient SFPQ homozygous mutation is found deleterious for brain and motor development in a zebrafish model [Internet]. bioRxiv; 2020 [cité 3 nov 2024]. p. 2020.03.18.993634. Disponible sur: https://www.biorxiv.org/content/ 10.1101/2020.03.18.993634v1 Abramson J, Adler J, Dunger J, Evans R, Green T, Pritzel A, et al. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature. 2024;630:493–500. Schmid CM, Gregor A, Costain G, Morel CF, Massingham L, Schwab J, et al. LHX2 haploinsufficiency causes a variable neurodevelopmental disorder. Genet Med. 2023;25:100839. Philippakis AA, Azzariti DR, Beltran S, Brookes AJ, Brownstein CA, Brudno M, et al. The Matchmaker Exchange: A Platform for Rare Disease Gene Discovery. Human Mutation. 2015;36:915–21. Baux D, Van Goethem C, Ardouin O, Guignard T, Bergougnoux A, Koenig M, et al. MobiDetails: online DNA variants interpretation. Eur J Hum Genet. 2021;29:356–60. Wei CH, Allot A, Lai PT, Leaman R, Tian S, Luo L, et al. PubTator 3.0: an AI-powered literature resource for unlocking biomedical knowledge. Nucleic Acids Research. 2024;52:W540–6. Table 1 Table 1 is available in the Supplementary Files section. Additional Declarations There is a duality of interest Supplementary Files TableS1.xlsx Table S1: Example Set of Filters for Selecting Genetic Variants of Interest in Whole Genome Sequencing Table1.xlsx Table 1 Cite Share Download PDF Status: Published Journal Publication published 07 Mar, 2026 Read the published version in European Journal of Human Genetics → Version 1 posted Editorial decision: revise 07 Nov, 2025 Review # 2 received at journal 30 Oct, 2025 Review # 3 received at journal 30 Oct, 2025 Reviewer # 3 agreed at journal 15 Oct, 2025 Reviewer # 2 agreed at journal 08 Oct, 2025 Review # 1 received at journal 08 Sep, 2025 Reviewer # 1 agreed at journal 19 Aug, 2025 Reviewers invited by journal 19 Aug, 2025 Submission checks completed at journal 28 Jul, 2025 First submitted to journal 27 Jul, 2025 Editor assigned by journal 27 Jul, 2025 You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-7224867","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Article","associatedPublications":[],"authors":[{"id":491958076,"identity":"26f38bd5-69eb-45ae-be95-072ec4715ff4","order_by":0,"name":"Victor Marin","email":"","orcid":"","institution":"Service de Biochimie, Groupe Hospitalier Pellegrin, CHU de Bordeaux, France; GCS AURAGEN, 69003 Lyon, France","correspondingAuthor":false,"prefix":"","firstName":"Victor","middleName":"","lastName":"Marin","suffix":""},{"id":491958077,"identity":"05d30f71-61cf-4981-b8c8-267ac37bb984","order_by":1,"name":"Hugo Lannes","email":"","orcid":"","institution":"Independent Researcher","correspondingAuthor":false,"prefix":"","firstName":"Hugo","middleName":"","lastName":"Lannes","suffix":""},{"id":491958078,"identity":"f2c70bca-fab1-48ac-9be0-4c4401563bfa","order_by":2,"name":"Victor Dumont","email":"","orcid":"","institution":"Independent Researcher","correspondingAuthor":false,"prefix":"","firstName":"Victor","middleName":"","lastName":"Dumont","suffix":""},{"id":491958079,"identity":"3026b201-1063-4bfe-86db-0a416fab7354","order_by":3,"name":"Julien Thévenon","email":"","orcid":"https://orcid.org/0000-0001-9271-3961","institution":"GCS AURAGEN, 69003 Lyon, France","correspondingAuthor":false,"prefix":"","firstName":"Julien","middleName":"","lastName":"Thévenon","suffix":""},{"id":491958080,"identity":"4f198bdc-8276-4af5-a745-cd2c275a3878","order_by":4,"name":"David Baux","email":"","orcid":"https://orcid.org/0000-0003-3423-1221","institution":"Molecular Genetics Laboratory, Univ Montpellier, CHU Montpellier, Montpellier, France; Institute for Neurosciences of Montpellier (INM), Univ Montpellier, Inserm, Montpellier, France; Montpellier BioInformatique pour le Diagnostic Clinique (MOBIDIC), CHU Montpellier, Montpellier, France","correspondingAuthor":false,"prefix":"","firstName":"David","middleName":"","lastName":"Baux","suffix":""},{"id":491958081,"identity":"a7b3c037-09d3-48d8-8780-d46436325295","order_by":5,"name":"Anne-Françoise Roux","email":"","orcid":"","institution":"Molecular Genetics Laboratory, Univ Montpellier, CHU Montpellier, Montpellier, France; Institute for Neurosciences of Montpellier (INM), Univ Montpellier, Inserm, Montpellier, France; Montpellier BioInformatique pour le Diagnostic Clinique (MOBIDIC), CHU Montpellier, Montpellier, France","correspondingAuthor":false,"prefix":"","firstName":"Anne-Françoise","middleName":"","lastName":"Roux","suffix":""},{"id":491958082,"identity":"d254aa18-c28c-48f5-a44d-687c9a9196dc","order_by":6,"name":"Eulalie Lasseaux","email":"","orcid":"","institution":"GCS AURAGEN, 69003 Lyon, France; Unité d'Oncogénétique, Institut Bergonié, CLCC Bordeaux et Sud-Ouest, Bordeaux, France","correspondingAuthor":false,"prefix":"","firstName":"Eulalie","middleName":"","lastName":"Lasseaux","suffix":""},{"id":491958083,"identity":"0e58e785-545d-4f18-8b66-5f850bebd1d4","order_by":7,"name":"Perrine Pennamen","email":"","orcid":"https://orcid.org/0000-0002-0363-9884","institution":"GCS AURAGEN, 69003 Lyon, France; Molecular Genetics Laboratory, Bordeaux University Hospital, Bordeaux, France","correspondingAuthor":false,"prefix":"","firstName":"Perrine","middleName":"","lastName":"Pennamen","suffix":""},{"id":491958075,"identity":"f6918520-30ef-48b8-bfec-86f2b5f5f86b","order_by":8,"name":"Louis Lebreton","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAABIklEQVRIie3QMWuDQBTA8RcEXa51PQjoV7gjgy0J/SwnglmEdswWS8ApdNapX8HSpYEMF4S6HO0qZHFy1qVk7IlQSr2Ujh3uD8I78QfvBNDp/mGX8uEAV0CG86IfTDkYw3k1JuZAcP/lJAYIfxChJvCNFH8g1pYeToAdb1qUdffy7nqYmXW3nzt2GhE47McECVIgwLPrh9C/z8SR7lJm0bRZznAVMuDNmOCIFHIxPxeIbi6SIyPV7ccU8cKPq4AbLVeSfrF1LuxOkjdJmNmT9WPlx8DVhMvFGBFoIgn/InIIuJKg17sCEUx3W5NmWRLQXNTyLnxJn0TDuILY1ua5O60WroeMuu2SG5eU8o+1fO46ZURrBRkiZ97DOaDT6XS63/sEKcBvSkuirWEAAAAASUVORK5CYII=","orcid":"https://orcid.org/0000-0002-8583-1326","institution":"Service de Biochimie, Groupe Hospitalier Pellegrin, CHU de Bordeaux, France; GCS AURAGEN, 69003 Lyon, France","correspondingAuthor":true,"prefix":"","firstName":"Louis","middleName":"","lastName":"Lebreton","suffix":""}],"badges":[],"createdAt":"2025-07-27 07:55:09","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-7224867/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-7224867/v1","draftVersion":[],"editorialEvents":[{"content":"https://doi.org/10.1038/s41431-026-02068-z","type":"published","date":"2026-03-07T05:00:00+00:00"}],"editorialNote":"","failedWorkflow":false,"files":[{"id":87792679,"identity":"745043a0-58de-4dd8-8358-3676d4407029","added_by":"auto","created_at":"2025-07-29 06:11:02","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":44908,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003ePubmatcher query page\u003c/strong\u003e. S\u003cem\u003eearch genes either manually or by using an “EXTRACT FROM TEXT” mode which consists of copy-pasting characters including gene names. Phenotypes can be incremented manually in the lower box, one or more, separated by commas. (version: January 2025)\u003c/em\u003e\u003c/p\u003e","description":"","filename":"1.png","url":"https://assets-eu.researchsquare.com/files/rs-7224867/v1/0206d5c7f2a70fa5c3a9a187.png"},{"id":87791986,"identity":"61fb56be-d6b0-410e-8b1d-cfae1a0ca845","added_by":"auto","created_at":"2025-07-29 05:55:02","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":214041,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eExample of PubMatcher output\u003c/strong\u003e. (query: genes “\u003cem\u003eDIDO1, MC4R, BRAT1, SLC12A5” and \u003c/em\u003ephenotype “obesity, diabetes”). Crosses indicate lack of information in gene. “GENE” column contains gene constraint metrics; “PUBMATCH” column contains the number of publications retrieved on PubMed and the title of the first publication; “FUNCTION” column contain the Uniprot function description of the protein and functional tags; “PHENOTYPE KO”column contains icons representing mouse symptoms after KO compiled within IMPC database; “CLINVAR LOOKUP” column contains the number and type (missense / loss-of-function) of either pathogenic / likely pathogenic variants or variants of unknown significance; “STATUS” column contains OMIM, Gene Curation Coalition and PanelApp England / Australia information.\u003c/p\u003e","description":"","filename":"2.png","url":"https://assets-eu.researchsquare.com/files/rs-7224867/v1/8ea90b6e18e5a50ff51dda04.png"},{"id":87791982,"identity":"fc7de034-029b-4032-b4e2-f2fbe2f077ba","added_by":"auto","created_at":"2025-07-29 05:55:02","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":36588,"visible":true,"origin":"","legend":"\u003cp\u003eProportions of OMIM statuses for genes with identified variants using common WGS filters. Morbid: 30.7% (SD = 5.72), Non-Morbid: 51.3% (SD = 6.11), Non-OMIM: 18% (SD = 5.56).\u003c/p\u003e","description":"","filename":"3.png","url":"https://assets-eu.researchsquare.com/files/rs-7224867/v1/920fc82712a5aa2764ec9e3a.png"},{"id":87792141,"identity":"6e4668cd-e346-46cf-a58e-f8bd1853da7c","added_by":"auto","created_at":"2025-07-29 06:03:02","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":34952,"visible":true,"origin":"","legend":"\u003cp\u003eProposed integration of PubMatcher in interpretation of pangenomic analysis\u003c/p\u003e\n\u003cp\u003eMOI : Mode Of Inheritance; GOI: Gene Of Interest\u003c/p\u003e","description":"","filename":"4.png","url":"https://assets-eu.researchsquare.com/files/rs-7224867/v1/13c25894f73b84f46e685778.png"},{"id":104397482,"identity":"f592052a-f1f2-4d82-bb3f-3be2ca6f64c0","added_by":"auto","created_at":"2026-03-11 11:49:29","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":906865,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-7224867/v1/47753dcb-8586-460c-9395-36c2fa03e9af.pdf"},{"id":87792139,"identity":"1c820e8c-81b6-4b64-974e-db3dcc72fa87","added_by":"auto","created_at":"2025-07-29 06:03:02","extension":"xlsx","order_by":1,"title":"","display":"","copyAsset":false,"role":"supplement","size":11351,"visible":true,"origin":"","legend":"Table S1: Example Set of Filters for Selecting Genetic Variants of Interest in Whole Genome Sequencing","description":"","filename":"TableS1.xlsx","url":"https://assets-eu.researchsquare.com/files/rs-7224867/v1/5a57b19e6ded4d9ca5cd080f.xlsx"},{"id":87791990,"identity":"a7c8b748-4e00-4851-ad4f-2e855326d01a","added_by":"auto","created_at":"2025-07-29 05:55:02","extension":"xlsx","order_by":2,"title":"","display":"","copyAsset":false,"role":"supplement","size":14272,"visible":true,"origin":"","legend":"Table 1","description":"","filename":"Table1.xlsx","url":"https://assets-eu.researchsquare.com/files/rs-7224867/v1/dc3f133f6c4407171248a888.xlsx"}],"financialInterests":"There is a duality of interest","formattedTitle":"PubMatcher: a web app to streamline genomic data interpretation with automated bibliographic research","fulltext":[{"header":"Introduction","content":"\u003cp\u003eGenomic sequencing advancements have led to an explosion of data, making the interpretation of variants in lesser-known genes a day-to-day challenge for geneticists. Key gene-phenotype associations often remain underrepresented in widely used databases involved in human disease like Online Mendelian Inheritance in Man (OMIM) \u003csup\u003e\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e\u003c/sup\u003e. For example, OMIM may omit some gene-phenotype associations\u003csup\u003e\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e\u003c/sup\u003e or include them, but with an emphasis on symptoms different from those observed in some patients. To avoid this issue, PubMed or other databases can be useful to find the most relevant scientific publications regarding the link between a gene and a specific phenotype. This thorough approach to genomic data interpretation can be time-intensive and potentially less accurate over time. This is especially true for whole genome sequencing (WGS) analysis, where a significant number of variants located in non OMIM morbid genes are selected by classical filters (such as \u0026ldquo;rare loss-of-function\u0026rdquo;, \u0026ldquo;rare homozygous missense for a recessive hypothesis\u0026rdquo;).\u003c/p\u003e\u003cp\u003eTo address these challenges, we developed PubMatcher, a free online tool that automates the retrieval of gene-phenotype associations by querying multiple curated databases and PubMed simultaneously. PubMatcher uniquely supports batch format-free analysis, significantly reducing the time required to identify candidate genes relevant to a patient\u0026rsquo;s phenotype. With a user-friendly, format-free interface, the tool facilitates the exploration of lesser-known or emerging gene-phenotype associations, providing an efficient solution for genomic interpretation workflows.\u003c/p\u003e"},{"header":"Materials and Methods","content":"\u003cp\u003ePubmatcher is a web application developed in Node.js\u003csup\u003e\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e,\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e\u003c/sup\u003e a server-side JavaScript runtime environment known for its versatility and efficiency in modern web applications. Two types of inputs are needed: one or more genes and one or more phenotypes (or relevant keyword) (Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e). The PubMatcher pipeline then queries multiple databases and APIs: Gnomad genes constraint metrics\u003csup\u003e\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e\u003c/sup\u003e, Pubmed, Uniprot \u003csup\u003e\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e\u003c/sup\u003e, International Mouse Phenotyping Consortium (IMPC) \u003csup\u003e\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e\u003c/sup\u003e, Clinvar, Gene Curation Coalition (GenCC), HUGO Gene Nomenclature Committee, and PanelApp. The results page presents a summary of all the information collected (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003e). Ensuring wide accessibility, PubMatcher is designed to be accessed via web browsers at \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://pubmatcher.fr\u003c/span\u003e\u003cspan address=\"https://pubmatcher.fr\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e and code is available under Massachusetts Institute of Technology license on Github (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://github.com/victormar1/PubMatcher/\u003c/span\u003e\u003cspan address=\"https://github.com/victormar1/PubMatcher/\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e). It does not require user registration, adhering to most journal's guidelines for software tools.\u003c/p\u003e\u003cp\u003eResults are presented in an organized table format, where gene-phenotype pairs are listed with key metrics, such as constraint scores and publication count. The details of each query are described below.\u003c/p\u003e\u003cp\u003e\u003cb\u003eGenes Constraint metrics\u003c/b\u003e\u003c/p\u003e\u003cp\u003ePubMatcher obtains for each gene the following constraint metrics from the gnomAD v2.1 database\u003csup\u003e\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e\u003c/sup\u003e : pLi (probability of being loss-of-function intolerant), LOEUF (loss-of-function observed/expected upper bound fraction), MOEUF (missense observed/expected upper bound fraction) and missense Z-Score. LOEUF and MOEUF metrics indicate a gene\u0026rsquo;s tolerance to loss-of-function and missense variants, respectively, helping prioritize genes under selective constraint for clinical relevance. LOEUF and MOEUF values are highlighted based on constraint levels: dark red for the top 10% most constrained genes (LOEUF\u0026thinsp;\u0026le;\u0026thinsp;0.26, MOEUF\u0026thinsp;\u0026le;\u0026thinsp;0.58), red for the top 20% (LOEUF\u0026thinsp;\u0026le;\u0026thinsp;0.41, MOEUF\u0026thinsp;\u0026le;\u0026thinsp;0.70), orange for the top 25% (LOEUF\u0026thinsp;\u0026le;\u0026thinsp;0.48, MOEUF\u0026thinsp;\u0026le;\u0026thinsp;0.73), and yellow for the top 30% (LOEUF\u0026thinsp;\u0026le;\u0026thinsp;0.55, MOEUF\u0026thinsp;\u0026le;\u0026thinsp;0.77). Values beyond these can remain black as defined in reference \u003csup\u003e\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e\u003c/sup\u003e. GnomAD v4 constraints can also be displayed by clicking on the constraints, similar information is displayed and discrepancies between the two versions are highlighted with an exclamation mark.\u003c/p\u003e\u003cp\u003e\u003cb\u003ePubMed\u003c/b\u003e\u003c/p\u003e\u003cp\u003ePubMed is a free online database providing access to a vast repository of biomedical research articles maintained by the National Center for Biotechnology Information and represents an \u0026ldquo;up-to-date\u0026rdquo; knowledge source for gene-phenotype associations.\u003csup\u003e\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e\u003c/sup\u003e PubMatcher includes the number of publications retrieved following a query, the title of the first publication in the list, and a link to access the query on PubMed and the related research articles. The PubMed research includes the association between a gene name and a phenotype. Moreover, the queries are cumulative for each gene-phenotype pair. An example of query is shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003e, which includes five genes and two phenotypes. The PubMed query for each gene follows this pattern: (GENE \u003cem\u003eAND\u003c/em\u003e PHENOTYPE_1) \u003cem\u003eOR\u003c/em\u003e (GENE \u003cem\u003eAND\u003c/em\u003e PHENOTYPE_2). Hovering on the title of the publication will display other matching publications titles.\u003c/p\u003e\u003cp\u003e\u003cb\u003eUniprot\u003c/b\u003e\u003c/p\u003e\u003cp\u003eThe UniProt database\u003csup\u003e\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e\u003c/sup\u003e provides information about protein functions, which is potentially relevant for genetic interpretation. PubMatcher requests the protein description and biological features keywords from UniProt using API access.\u003c/p\u003e\u003cp\u003e\u003cb\u003eInternational Mouse Phenotyping Consortium\u003c/b\u003e\u003c/p\u003e\u003cp\u003eThe IMPC database\u003csup\u003e\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e\u003c/sup\u003e provides information about the consequences of gene knockouts in mice, which could suggest a gene's involvement in human diseases. Different phenotypes are listed as presented on IMPC and specific symptoms can be displayed by mouseover.\u003c/p\u003e\u003cp\u003e\u003cb\u003eClinvar Lookup\u003c/b\u003e\u003c/p\u003e\u003cp\u003ePubMatcher integrates data from ClinVar, a public database maintained by the National Center for Biotechnology Information that provides clinically relevant interpretations of genetic variants, including their pathogenicity, molecular consequences, and supporting evidence. For pathogenic and likely pathogenic small nucleotide variants, PubMatcher displays both the number of loss-of-function (LOF) variants\u0026mdash;including frameshift, nonsense, and canonical splice site alterations\u0026mdash;and the number of missense variants. Additionally, VUS are also reported to ensure no potentially relevant findings are overlooked.\u003c/p\u003e\u003cp\u003e\u003cb\u003eGene Curation Coalition, PanelApp \u0026amp; OMIM\u003c/b\u003e\u003c/p\u003e\u003cp\u003ePubMatcher integrates data from GenCC\u003csup\u003e\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e\u003c/sup\u003e, PanelApp\u003csup\u003e\u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e\u003c/sup\u003e, and OMIM\u003csup\u003e\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e\u003c/sup\u003e to provide comprehensive information on gene-disease associations, ensuring rapid and accurate curation of clinically relevant genes. GenCC aggregates gene-disease validity information from multiple expert-curated sources, facilitating the identification of genes with well-established evidence for their role in human diseases. PubMatcher displays the gene status from GeneCC. The number of genes listed in both PanelApp UK and PanelApp Australia are mentioned in the PubMatcher output due to their significance in fast gene-disease curation. Links are provided for quick access to the relevant entries on the PanelApp websites. OMIM (Online Mendelian Inheritance in Man): OMIM is a comprehensive, authoritative resource that catalogs human genes and genetic phenotypes, including their relationships to disease. PubMatcher integrates data from OMIM to indicate whether a gene is associated with a known morbid condition or phenotype.\u003c/p\u003e\u003cp\u003e\u003cb\u003eRelevance of Pubmatcher in Human Whole Genome Sequencing analysis\u003c/b\u003e\u003c/p\u003e\u003cp\u003eWe evaluated the relevance of the PubMatcher tool in WGS analyses of patients with rare diseases performed at the Auragen laboratory in Lyon, France. This laboratory is part of the French 2025 genomic project, which aims to expand genomic access in human healthcare \u003csup\u003e\u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e,\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e\u003c/sup\u003e. First, the proportion of variants filtered out by an example set of common WGS filters (detailed in Table \u003cspan refid=\"MOESM1\" class=\"InternalRef\"\u003eS1\u003c/span\u003e) that were not located in OMIM morbid genes across 20 trio-based WGS analyses was assessed. Then, we present examples of variants revealed by PubMatcher in genes that proved potentially relevant for medical use after analyzing 100 WGS cases.\u003c/p\u003e\u003cp\u003eWhole genome sequencing was performed following the recommendations of \u0026ldquo;France M\u0026eacute;decine G\u0026eacute;nomique 2025\u0026rdquo; Plan. Genomic DNA extracted from whole blood was sequenced according to standard procedures for a Polymerase Chain Reaction-Free genome on a NovaSeq6000 instrument (Illumina, San Diego, California, USA). Sequencing data were aligned to the GRCh38p13 full assembly using bwa 0.7+. Variants were called by several algorithms including GATK4+, Bcftools1.10+, Manta1.6+, CNVnator0.4+, and annotated using the variant effect predictor. Detected variants were prioritized using in-house procedures. Further details are available on request on \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttp://www.auragen.fr\u003c/span\u003e\u003cspan address=\"http://www.auragen.fr\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/p\u003e"},{"header":"Results","content":"\u003cp\u003e\u003cb\u003eVariants in Non-OMIM Genes Found by Common WGS Filters\u003c/b\u003e\u003c/p\u003e\u003cp\u003ePubMatcher is meant to quickly identify gene and phenotype associations using the most up-to-date sources. Although the OMIM database is regularly updated, the most recent phenotype-to-gene associations may be missing, potentially leading to the exclusion of relevant variants. Therefore, we evaluated the proportion of non-OMIM morbid genes in 20 WGS trios of a patient and their unaffected parents, using a classic filtering strategy (see Table \u003cspan refid=\"MOESM1\" class=\"InternalRef\"\u003eS1\u003c/span\u003e for filters\u0026rsquo; details).\u003c/p\u003e\u003cp\u003eAfter applying these filters, the remaining variant counts ranged from 80 to 150 per sample, with a median of 95. Among these, the median proportions of variants mapped to OMIM morbid genes, OMIM non-morbid genes, and non-OMIM genes were 30%, 52%, and 18%, respectively (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003e). These results confirm a high representation of non-morbid or non-OMIM genes (70%) post-filtering, underscoring the utility of PubMatcher for efficiently screening them.\u003c/p\u003e\u003cp\u003e\u003c/p\u003e\u003cp\u003e\u003cb\u003eMiss- or non-annotated genes with relevant variants in 100 WGS analyses\u003c/b\u003e\u003c/p\u003e\u003cp\u003eWe present examples of variants found in genes either not annotated in OMIM for the researched disease or with a non-syndromic form not specified in OMIM (Table\u0026nbsp;1). These relevant variants were identified in 15 out of 100 whole-genome sequences analyzed at the French laboratory Auragen (Lyon, France). The genomes included in this study were selected solely based on their availability as trios and were analyzed in chronological order, starting with the oldest requests, without any other selection criteria. Some of these genomes had never undergone prior genetic testing, while others had previously undergone panel or exome sequencing, which was insufficient for establishing a definitive diagnosis. Phenotypes included genodermatosis, chronic nephropathy, intellectual deficiency, or red blood cell diseases.\u003c/p\u003e\u003cp\u003e\u003cb\u003eIntegrating PubMatcher in Genomic Variant Analysis Workflows\u003c/b\u003e\u003c/p\u003e\u003cp\u003ePubMatcher is a tool that can be integrated early in the general workflow of genomic single nucleotide variant analysis. We propose a flowchart for variant interpretation in a large-scale genomic approach (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003e)\u003c/p\u003e\u003cp\u003e\u003c/p\u003e\u003cp\u003eStarting with a conventional filtering strategy (as described in Table \u003cspan refid=\"MOESM1\" class=\"InternalRef\"\u003eS1\u003c/span\u003e), a rapid diagnosis can be made if a causative variant is identified \u0026mdash; for example, a previously described ClinVar pathogenic variant that matches the patient's clinical presentation. If such a variant is not found, a more thorough variant analysis is required to explore and report relevant genetic variants.\u003c/p\u003e\u003cp\u003eThe tool can be used for gene screening across all identified variants, allowing for a quick exploration of the most recent scientific knowledge (via PubMed and PanelApp queries), gene constraint metrics, protein functions (Uniprot), and the consequences of mouse knockout models (IMPC). The mode of inheritance based on the family pedigree is also crucial. A recent publication from Chong et al. \u003csup\u003e\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e\u003c/sup\u003e compiled five key criteria for retaining genes of interest, nearly all of which are integrated into the proposed flowchart that includes PubMatcher, except for gnomAD variant co-occurrence.\u003c/p\u003e\u003cp\u003eAfter analyzing the data in the context of the patients\u0026rsquo; phenotypes, some variants may be retained within genes of interest. If the evidence level is sufficient, the variant can be classified and reported (with additional exploration needed if it is a variant of unknown significance). Conversely, if the evidence level is low, a more research-focused approach, such as submitting to MatchMaker Exchange\u003csup\u003e\u003cspan citationid=\"CR32\" class=\"CitationRef\"\u003e32\u003c/span\u003e\u003c/sup\u003e (Genematcher, etc..) or conducting further fundamental post-genomic investigations, may be suggested.\u003c/p\u003e"},{"header":"Discussion","content":"\u003cp\u003eWe believe PubMatcher is a significant advancement in clinical genomic research, addressing the need for more efficient interpretation of genomic data. By rapidly identifying relevant gene-phenotype associations\u0026mdash;especially in lesser-known genes\u0026mdash;PubMatcher increases both the speed and accuracy of genomic analyses. This approach also helps ensure that rare yet important variants are not missed, which is critical for their inclusion in broader research studies; given their rarity, these cases can provide invaluable insights into disease mechanisms and phenotypic diversity. While existing tools already offer some bibliographic functionalities, PubMatcher specifically addresses the need for batch analysis. It is designed to complement widely used tools like VarSome or MobiDetails\u003csup\u003e\u003cspan citationid=\"CR33\" class=\"CitationRef\"\u003e33\u003c/span\u003e\u003c/sup\u003e, offering a more streamlined approach for rapid gene-phenotype association.\u003c/p\u003e\u003cp\u003eAn important consideration is the inclusion of animal models, such as the mouse model, which provides invaluable insights into gene function and disease relevance due to its genetic similarity to humans. However, mouse models present limitations, including differences in gene expression and phenotypic responses. Expanding to other model organisms, such as zebrafish, could diversify the functional insights available to PubMatcher users, particularly for genes where murine models have limited general data or translational relevance.\u003c/p\u003e\u003cp\u003eThe effectiveness of PubMatcher heavily depends on the quality and completeness of its external data sources. Attempts to incorporate alternative sources, such as Google Scholar, resulted in an overwhelming volume of unspecific and irrelevant data, highlighting PubMed as the most reliable and curated source for retrieving relevant literature. Advances in AI-driven text-mining tools, such as PubTator \u003csup\u003e\u003cspan citationid=\"CR34\" class=\"CitationRef\"\u003e34\u003c/span\u003e\u003c/sup\u003e, offer promising avenues for improving data retrieval by extracting gene-disease relationships from biomedical literature. These tools could significantly enhance the exhaustivity of PubMatcher\u0026rsquo;s results by identifying additional relevant publications that might otherwise remain undetected. However, current rate limitations (3 requests per second) within the PubTator API preclude its integration into PubMatcher at this stage.\u003c/p\u003e\u003cp\u003ePubMatcher has demonstrated effectiveness in identifying clinically relevant genes, thereby fulfilling its primary objective. Notably, several geneticists outside the development team have already integrated PubMatcher into their variant interpretation workflows, underscoring its reliability and practical utility and adaptability in clinical genomics. Further exploration of PubMatcher\u0026rsquo;s applications in clinical settings could be beneficial. Another important consideration is the accessibility of the tool. While the current interface is user-friendly\u0026mdash;particularly in terms of input formatting, result clarity, and advanced features upon login (such as input history)\u0026mdash;further simplifying the user experience and providing enhanced guidance and support would make the tool even more accessible to a wider audience.\u003c/p\u003e\u003cp\u003eIntegrating artificial intelligence or machine learning could also boost PubMatcher\u0026rsquo;s capabilities by adding features like gene scoring to rank the matches by their relevancy to the phenotype. Ongoing updates, as well as feedback from the user community, will be crucial for the tool\u0026rsquo;s continued development and for expanding its utility in the field of genomic research.\u003c/p\u003e"},{"header":"Conclusion","content":"\u003cp\u003ePubMatcher provides an effective solution for streamlining genomic data interpretation by automating bibliographic research and integrating it seamlessly into genomic interpretation workflows. This approach significantly enhances efficiency, particularly in identifying lesser-known yet clinically relevant gene-phenotype associations. As PubMatcher continues to evolve, improvements in data integration, interface design, and user-driven enhancements will further solidify its role as a valuable tool for both clinical diagnostics and genomic research.\u003c/p\u003e"},{"header":"Declarations","content":"\u003cp\u003e\u003cstrong\u003eData Availability Statement\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe human whole genome sequencing data used in this study were obtained from the French national genomic medicine initiative, Plan France M\u0026eacute;decine G\u0026eacute;nomique 2025 (PFMG2025). These sequencing data were generated and analyzed at the AURAGEN genomic sequencing center.\u003c/p\u003e\n\u003cp\u003eDue to ethical and privacy restrictions, the raw sequencing data are not publicly available but as described in Abadie and al. (2025)\u003csup\u003e14\u003c/sup\u003e, access request to molecular dataset can be found in online repositories on the website: https://pfmg2025.fr/\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eAdditional data are available from the corresponding author on reasonable request.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eCode Availability\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe source code for PubMatcher is freely available under the Massachusetts Institute of Technology (MIT) License.\u003c/p\u003e\n\u003cp\u003eProject name: PubMatcher\u003c/p\u003e\n\u003cp\u003eProject home page: https://github.com/victormar1/PubMatcher\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eOperating system(s): Platform independent\u003c/p\u003e\n\u003cp\u003eProgramming language: JavaScript (Node.js \u0026amp; Vue.js)\u003c/p\u003e\n\u003cp\u003eOther requirements: None\u003c/p\u003e\n\u003cp\u003eLicense: Massachusetts Institute of Technology License\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eAny restrictions to use by non-academics: No specific restrictions, open-source license\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAcknowledgements \u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThis research was made possible through access to the data generated by the 2025 French Genomic Medicine Initiative.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAuthor Contributions Statement\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eV.M. conceptualized the project, developed the software, performed data analyses, and wrote the main manuscript text and figures. H.L. and V.D. contributed to the development of the software used in the work. J.T. provided data access, provided scientific guidance, and have substantively revised the manuscript. D.B. assisted with software development, provided scientific guidance, and have substantively revised the manuscript. A.-F.R. contributed scientific expertise and have substantively revised the manuscript. E.L. helped with conceptualization, promoted our work, provided scientific feedback, and have substantively revised the manuscript. P.P. contributed to study design, offered scientific input, and have substantively revised the manuscript. L.L. co-conceived the project, supervised the research, and co-wrote the manuscript.\u003c/p\u003e\n\u003cp\u003eAll authors reviewed and approved the final version of the manuscript and agree to be accountable for the work.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eFunding\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eNo financial assistance was received in support of the study,\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eEthical Approval \u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThis study involved genomic analyses conducted as part of routine clinical care for patients with rare diseases in France. As such, a clinical trial registration was not required, since all data reported were obtained during standard diagnostic procedures. In accordance with the French Bioethics Law (Law No. 2004-800, dated August 6, 2004), all patients provided written informed consent for diagnostic procedures and were specifically informed that any remaining biological material could be used for research purposes. The retrospective use of these data was approved by the Bordeaux University Hospital under registration number \u003cstrong\u003eCHUBX2025RE0134\u003c/strong\u003e. \u0026nbsp; \u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eCompeting Interests\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eNo competing interests\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\u003cli\u003e\u003cspan\u003eMcKusick VA. Mendelian Inheritance in Man. A Catalog of Human Genes and Genetic Disorders. Johns Hopkins University Press. 1998;12.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eShakir A, Ripperger M, Jiang Z, Wierenga KJ. Inferred inheritance of MorbidMap genes without OMIM clinical synopsis. Genet Med. 2018;20:470\u0026ndash;3.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eHolowaychuk TJ. tj/ejs [Internet]. 2024 [cit\u0026eacute; 22 avr 2024]. Disponible sur: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://github.com/tj/ejs\u003c/span\u003e\u003cspan address=\"https://github.com/tj/ejs\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003enodejs/node [Internet]. Node.js; 2024 [cit\u0026eacute; 22 avr 2024]. Disponible sur: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://github.com/nodejs/node\u003c/span\u003e\u003cspan address=\"https://github.com/nodejs/node\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eKarczewski KJ, Francioli LC, Tiao G, Cummings BB, Alf\u0026ouml;ldi J, Wang Q, et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature. 2020;581:434\u0026ndash;43.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eThe UniProt Consortium. UniProt: the Universal Protein Knowledgebase in 2023. Nucleic Acids Research. 2023;51:D523\u0026ndash;31.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eGroza T, Gomez FL, Mashhadi HH, Mu\u0026ntilde;oz-Fuentes V, Gunes O, Wilson R, et al. The International Mouse Phenotyping Consortium: comprehensive knockout phenotyping underpinning the study of human disease. Nucleic Acids Research. 2023;51:D1038\u0026ndash;45.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eChen S, Francioli LC, Goodrich JK, Collins RL, Kanai M, Wang Q, et al. A genomic mutational constraint map using variation in 76,156 human genomes. Nature. 2024;625:92\u0026ndash;100.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eChong JX, Berger SI, Baxter S, Smith E, Xiao C, Calame DG, et al. Considerations for reporting variants in novel candidate genes identified during clinical genomic testing. Genetics in Medicine. 2024;26:101199.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eSayers EW, Bolton EE, Brister JR, Canese K, Chan J, Comeau DC, et al. Database resources of the national center for biotechnology information. Nucleic Acids Res. 2022;50:D20\u0026ndash;6.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eThe Gene Curation Coalition: A global effort to harmonize gene-disease evidence resources - PubMed [Internet]. [cit\u0026eacute; 9 janv 2025]. Disponible sur: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://pubmed.ncbi.nlm.nih.gov/35507016/\u003c/span\u003e\u003cspan address=\"https://pubmed.ncbi.nlm.nih.gov/35507016/\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eMartin AR, Williams E, Foulger RE, Leigh S, Daugherty LC, Niblock O, et al. PanelApp crowdsources expert knowledge to establish consensus diagnostic gene panels. Nat Genet. 2019;51:1560\u0026ndash;5.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eSanlaville D, Vidaud M, Thauvin-Robinet C, Nowak F, Lethimonnier F. [French Genomic Medicine Plan 2025 (PFMG2025): France enters the era of genomic medicine]. Rev Prat. 2021;71:1061\u0026ndash;4.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eAbadie C, Abderrahmane A, Abdous O, Abel C, Ackermann O, Acquaviva C, et al. PFMG2025\u0026ndash;integrating genomic medicine into the national healthcare system in France. The Lancet Regional Health \u0026ndash; Europe [Internet]. 2025 [cit\u0026eacute; 13 mars 2025];50. Disponible sur: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://www.thelancet.com/journals/lanepe/article/PIIS2666-7762(\u003c/span\u003e\u003cspan address=\"https://www.thelancet.com/journals/lanepe/article/PIIS2666-7762(\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e24)00352-1/fulltext\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eHassan A, Morice-Picard F, Marin V, Lasseaux Robine E, Lebreton L, Davaze-Schneider J. Hypohidrotic ectodermal dysplasia in a family: expanding spectrum of LEF1-related disorders. Clinical and Experimental Dermatology. 2024;llae293.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eDufour W, Alawbathani S, Jourdain AS, Asif M, Baujat G, Becker C, et al. Monoallelic and biallelic variants in LEF1 are associated with a new syndrome combining ectodermal dysplasia and limb malformations caused by altered WNT signaling. Genet Med. 2022;24:1708\u0026ndash;21.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eL\u0026eacute;vy J, Capri Y, Rachid M, Dupont C, Vermeesch JR, Devriendt K, et al. LEF1 haploinsufficiency causes ectodermal dysplasia. Clinical Genetics. 2020;97:595\u0026ndash;600.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eDe Franco E, Wakeling MN, Frew RD, Russ-Silsby J, Peters C, Marks SD, et al. A biallelic loss-of-function PDIA6 variant in a second patient with polycystic kidney disease, infancy-onset diabetes, and microcephaly. Clin Genet. 2022;102:457\u0026ndash;8.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eAl-Fadhli FM, Afqi M, Sairafi MH, Almuntashri M, Alharby E, Alharbi G, et al. Biallelic loss of function variant in the unfolded protein response gene PDIA6 is associated with asphyxiating thoracic dystrophy and neonatal-onset diabetes. Clin Genet. 2021;99:694\u0026ndash;703.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eM\u0026uuml;nch J, Engesser M, Sch\u0026ouml;nauer R, Hamm JA, Hartig C, Hantmann E, et al. Biallelic pathogenic variants in roundabout guidance receptor 1 associate with syndromic congenital anomalies of the kidney and urinary tract. Kidney Int. 2022;101:1039\u0026ndash;53.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eChristians A, Kesdiren E, Hennies I, Hofmann A, Trowe MO, Brand F, et al. Heterozygous variants in the DVL2 interaction region of DACT1 cause CAKUT and features of Townes-Brocks syndrome 2. Hum Genet. 2023;142:73\u0026ndash;88.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eYan H, Shi Z, Wu Y, Xiao J, Gu Q, Yang Y, et al. Targeted next generation sequencing in 112 Chinese patients with intellectual disability/developmental delay: novel mutations and candidate gene. BMC Med Genet. 2019;20:80.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eHa T, Morgan A, Bartos MN, Beatty K, Cogn\u0026eacute; B, Braun D, et al. De novo variants predicting haploinsufficiency for DIP2C are associated with expressive speech delay. Am J Med Genet A. 2024;194:e63559.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eSmits DJ, Schot R, Popescu CA, Dias KR, Ades L, Briere LC, et al. De novo MCM6 variants in neurodevelopmental disorders: a recognizable phenotype related to zinc binding residues. Hum Genet. 2023;142:949\u0026ndash;64.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eAzad P, Caldwell AB, Ramachandran S, Spann NJ, Akbari A, Villafuerte FC, et al. ARID1B, a molecular suppressor of erythropoiesis, is essential for the prevention of Monge\u0026rsquo;s disease. Exp Mol Med. 2022;54:777\u0026ndash;87.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eShen Y, Bassett MA, Gurumurthy A, Nar R, Knudson IJ, Guy CR, et al. Identification of a Novel Enhancer/Chromatin Opening Element Associated with High-Level γ-Globin Gene Expression. Mol Cell Biol. 2018;38:e00197-18.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eWerren EA, Peirent ER, Jantti H, Guxholli A, Srivastava KR, Orenstein N, et al. Biallelic variants in CSMD1 are implicated in a neurodevelopmental disorder with intellectual disability and variable cortical malformations. Cell Death Dis. 2024;15:379.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eBoonsawat P, Asadollahi R, Niedrist D, Steindl K, Begemann A, Joset P, et al. Deleterious \u003cem\u003eZNRF3\u003c/em\u003e germline variants cause neurodevelopmental disorders with mirror brain phenotypes via domain-specific effects on Wnt/β-catenin signaling. The American Journal of Human Genetics. 2024;111:1994\u0026ndash;2011.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eGordon PM, Efthymiou S, Salpietro V, Fielding T, Borgione E, Scuderi C, et al. Human patient SFPQ homozygous mutation is found deleterious for brain and motor development in a zebrafish model [Internet]. bioRxiv; 2020 [cit\u0026eacute; 3 nov 2024]. p. 2020.03.18.993634. Disponible sur: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://www.biorxiv.org/content/\u003c/span\u003e\u003cspan address=\"https://www.biorxiv.org/content/\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1101/2020.03.18.993634v1\u003c/span\u003e\u003cspan address=\"10.1101/2020.03.18.993634v1\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eAbramson J, Adler J, Dunger J, Evans R, Green T, Pritzel A, et al. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature. 2024;630:493\u0026ndash;500.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eSchmid CM, Gregor A, Costain G, Morel CF, Massingham L, Schwab J, et al. LHX2 haploinsufficiency causes a variable neurodevelopmental disorder. Genet Med. 2023;25:100839.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003ePhilippakis AA, Azzariti DR, Beltran S, Brookes AJ, Brownstein CA, Brudno M, et al. The Matchmaker Exchange: A Platform for Rare Disease Gene Discovery. Human Mutation. 2015;36:915\u0026ndash;21.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eBaux D, Van Goethem C, Ardouin O, Guignard T, Bergougnoux A, Koenig M, et al. MobiDetails: online DNA variants interpretation. Eur J Hum Genet. 2021;29:356\u0026ndash;60.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eWei CH, Allot A, Lai PT, Leaman R, Tian S, Luo L, et al. PubTator 3.0: an AI-powered literature resource for unlocking biomedical knowledge. Nucleic Acids Research. 2024;52:W540\u0026ndash;6.\u003c/span\u003e\u003c/li\u003e\u003c/ol\u003e"},{"header":"Table 1","content":"\u003cp\u003eTable 1 is available in the Supplementary Files section.\u003c/p\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":true,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":true,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"european-journal-of-human-genetics","isNatureJournal":false,"hasQc":false,"allowDirectSubmit":false,"externalIdentity":"ejhg","sideBox":"Learn more about [European Journal of Human Genetics](http://www.nature.com/ejhg/)","snPcode":"41431","submissionUrl":"https://mts-ejhg.nature.com/cgi-bin/main.plex","title":"European Journal of Human Genetics","twitterHandle":"@ejhg_journal","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"ejp","reportingPortfolio":"Nature AJ","inReviewEnabled":true,"inReviewRevisionsEnabled":false},"keywords":"","lastPublishedDoi":"10.21203/rs.3.rs-7224867/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-7224867/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eIn the era of rapidly accumulating genomic data, largely driven by the broad use of whole-genome sequencing (WGS) in clinical settings, interpreting lesser-known genes with varied phenotypes remains challenging. PubMatcher is a new tool that automates bibliographic research for multiple genes at once and grants quick and easy access to relevant gene information. It helps users efficiently identify potential genotype-phenotype associations using PubMed complemented by additional data. By significantly reducing analysis time, PubMatcher streamlines the interpretation of novel or under-documented genes. Available to non-commercial users for free, PubMatcher is a user-friendly and efficient solution for researchers, clinical scientists and pathologists working with pangenomics analyses.\u003c/p\u003e","manuscriptTitle":"PubMatcher: a web app to streamline genomic data interpretation with automated bibliographic research","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-07-29 05:54:57","doi":"10.21203/rs.3.rs-7224867/v1","editorialEvents":[{"type":"communityComments","content":0},{"type":"decision","content":"revise","date":"2025-11-07T10:47:10+00:00","index":"","fulltext":""},{"type":"editorInvitedReview","content":"This content is not available.","date":"2025-10-30T11:43:05+00:00","index":2,"fulltext":"This content is not available."},{"type":"editorInvitedReview","content":"This content is not available.","date":"2025-10-30T10:43:39+00:00","index":3,"fulltext":"This content is not available."},{"type":"reviewerAgreed","content":"This content is not available.","date":"2025-10-15T06:26:35+00:00","index":3,"fulltext":"This content is not available."},{"type":"reviewerAgreed","content":"This content is not available.","date":"2025-10-08T09:53:05+00:00","index":2,"fulltext":"This content is not available."},{"type":"editorInvitedReview","content":"This content is not available.","date":"2025-09-09T00:44:03+00:00","index":1,"fulltext":"This content is not available."},{"type":"reviewerAgreed","content":"This content is not available.","date":"2025-08-19T13:46:16+00:00","index":1,"fulltext":"This content is not available."},{"type":"reviewersInvited","content":"","date":"2025-08-19T13:20:16+00:00","index":"","fulltext":""},{"type":"checksComplete","content":"","date":"2025-07-28T11:58:23+00:00","index":"","fulltext":""},{"type":"submitted","content":"European Journal of Human Genetics","date":"2025-07-27T07:53:53+00:00","index":"","fulltext":""},{"type":"editorAssigned","content":"","date":"2025-07-27T07:53:53+00:00","index":"","fulltext":""}],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"european-journal-of-human-genetics","isNatureJournal":false,"hasQc":false,"allowDirectSubmit":false,"externalIdentity":"ejhg","sideBox":"Learn more about [European Journal of Human Genetics](http://www.nature.com/ejhg/)","snPcode":"41431","submissionUrl":"https://mts-ejhg.nature.com/cgi-bin/main.plex","title":"European Journal of Human Genetics","twitterHandle":"@ejhg_journal","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"ejp","reportingPortfolio":"Nature AJ","inReviewEnabled":true,"inReviewRevisionsEnabled":false}}],"origin":"","ownerIdentity":"000f6187-12cc-4b08-96a0-c67621ea287a","owner":[],"postedDate":"July 29th, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"published-in-journal","subjectAreas":[{"id":52229883,"name":"Biological sciences/Genetics/Medical genetics"},{"id":52229884,"name":"Biological sciences/Genetics/Clinical genetics"}],"tags":[],"updatedAt":"2026-03-08T07:05:30+00:00","versionOfRecord":{"articleIdentity":"rs-7224867","link":"https://doi.org/10.1038/s41431-026-02068-z","journal":{"identity":"european-journal-of-human-genetics","isVorOnly":false,"title":"European Journal of Human Genetics"},"publishedOn":"2026-03-07 05:00:00","publishedOnDateReadable":"March 7th, 2026"},"versionCreatedAt":"2025-07-29 05:54:57","video":"","vorDoi":"10.1038/s41431-026-02068-z","vorDoiUrl":"https://doi.org/10.1038/s41431-026-02068-z","workflowStages":[]},"version":"v1","identity":"rs-7224867","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-7224867","identity":"rs-7224867","version":["v1"]},"buildId":"XKTyCvWXoU3ODBz1xrDgd","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.