Advancing Chlamydia trachomatis genomic surveillance and research with a novel core-genome MLST (cgMLST) approach | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article Advancing Chlamydia trachomatis genomic surveillance and research with a novel core-genome MLST (cgMLST) approach Zohra Lohdia*, Verónica Mixão*, Joana Isidro, Rita Ferreira, Dora Cordeiro, and 5 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-7743240/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract Chlamydia trachomatis is the most common sexually transmitted bacterial infection, with an estimated 129 million new cases annually. Its classification traditionally relies on ompA -genotyping, but whole-genome sequencing (WGS) offers transformative resolution to study evolution, transmission dynamics and epidemiological patterns. Yet, WGS-based surveillance of C. trachomatis remains very limited by technical challenges and the lack of standardized typing frameworks. Core-genome multilocus sequence typing (cgMLST) is a scalable and portable approach widely applied to bacterial pathogens, but remains little explored for C. trachomatis . In this context, we compiled and curated the largest C. trachomatis genome dataset to date (1230 samples from 26 countries), including publicly available and newly generated assemblies, to develop a novel cgMLST schema optimized for standardized local deployment. Fueled by existing (like ReporTree) and newly developed bioinformatic resources, the extensive cgMLST analyses performed in this study allowed an in-depth and unprecedented exploration of C. trachomatis global phylogenomic diversity and recombination-driven evolution. Indeed, the novel cgMLST schema (n = 846 loci) robustly recapitulated the four major evolutionary lineages of C. trachomatis and showed high congruence with core-SNP approaches, while providing high resolution to resolve intra-lineage genogroup diversity and detect recombination mosaicisms. Also, it efficiently captured the clonal expansion of epidemiologically relevant strains, including the lymphogranuloma venereum (LGV) epidemic “L2b” and the emergent L4 strains, further consolidating its robustness for contemporary transmission and outbreak monitoring. By enabling a rapid link between loci/alleles and specific phylogenomic/phenotypic traits, the novel cgMLST approach not only elucidated C. trachomatis genome-wide recombination landscape (e.g., through straightforward detection of major genotype-lineage incongruences), but also identified lineage-specific alleles (and disrupted loci) with potential diagnostic and/or functional relevance. Finally, to further advance C. trachomatis genomic surveillance and research, this novel schema is released (https://doi.org/10.5281/zenodo.17177579) accompanied by a hierarchical cgMLST-based nomenclature that supports harmonized genogroup tracking across laboratories and countries. In summary, this work delivers both an expanded global C. trachomatis genomic resource and a robust cgMLST framework, with immediate utility for research and standardized, high-resolution genome-scale routine surveillance. *Zohra Lodhia & Verónica Mixão contributed equally to this work. Infectious Diseases Molecular Epidemiology Molecular Genetics Chlamydia trachomatis Whole-Genome Sequencing cgMLST genomic surveillance Full Text Supplementary Files Supplementaryfigureslegends.docx SupplementaryfigureS1.tiff Supplementary figure S1. Distribution of input parameters and sequencing outcomes in the targeted enrichment WGS of C. trachomatis . A) Distribution of C. trachomatis genome copies (left boxplot, n=835) and dsDNA amount (right boxplot, n=246) in the target enrichment WGS input volume (7 µL), with dots colored according to WGS selection status. WGS selected samples (n=148) are represented in green and orange for genome copies and DNA amount, respectively, and non-selected samples in grey. The red dashed lines represent the thresholds of 10 4 genome copies and 10 ng (both in 7 µL), used to prioritize samples for sequencing. B) Distribution of the number of C. trachomatis copies in the WGS input volume and the corresponding percentage of “on-target” reads in log 10 scale for the 148 isolates selected for WGS. Each point represents one sample submitted for targeted enrichment WGS, being colored in dark blue when passing the inclusion criteria for the final curated genome dataset and in light blue when not passing these criteria. “On-target” reads were assumed as QC-passed reads classified as Chlamydiales, i.e. , reads used for C. trachomatis genome assembly (see Materials and Methods). SupplementaryfigureS2.tiff Supplementary figure S2. Evaluation and benchmarking of the novel C. trachomatis cgMLST schema. A) Distribution of the number of unique alleles per locus of the cgMLST schema. B) Comparison of the distribution of the percentage of loci called per sample in the novel cgMLST schema (blue) and the local deployment of the PubMLST schema with chewBBACA allele caller (orange) using the global C. trachomatis dataset (n=1230). C) Number of partitions (clusters and singletons) obtained at each possible distance threshold level in the novel cgMLST schema and the local deployment of the PubMLST schema with chewBBACA 56 , both using ReporTree single-linkage hierarchical clustering 82 . D) Heatmap of the congruence score (0 - no congruence; 3 - absolute congruence) obtained for the pairwise comparisons performed between all possible threshold combinations of the two approaches analysed in C. E) Number of partitions at each possible distance threshold level in the novel cgMLST schema using ReporTree single-linkage hierarchical clustering and GrapeTree MSTreeV2 algorithm 82,83 . F) Heatmap of the congruence score obtained for the pairwise comparisons performed between all possible threshold combinations of the two approaches analysed in E. G) Number of partitions at each possible distance threshold level in the novel cgMLST schema and the curated core genome alignment of the 53 complete assemblies, both cases using ReporTree single-linkage hierarchical clustering 82 . H) Heatmap of the congruence score (0 - no congruence; 3 - absolute congruence) obtained in the pairwise comparisons performed between all possible threshold combinations of the two approaches analysed in G. I) Number of partitions at each possible distance threshold level in the novel cgMLST schema and the Parsnp 89 alignment of the global dataset, both using ReporTree single-linkage hierarchical clustering 82 . J) Heatmap of the congruence score obtained for the pairwise comparisons performed between all possible threshold combinations of the two approaches analysed in I. SupplementaryfigureS3.tiff Supplementary figure S3. Neighbor-joining phylogenetic tree of C. trachomatis ompA alleles identified by cgMLST in the studied genome dataset. Nodes are coloured by the ompA genotype. Supplementary file S4 provides a summary of ompA allele distributions across lineages, and Supplementary file S2 details the ompA alleles identified in individual genomes. Allele 112, which was identified in a single genome, was excluded, as a shorter CDS was inferred by cgMLST. Scale bar represents numbers of substitutions per site. SupplementaryFigureS4.tiff Supplementary figure S4. Core SNP maximum-likelihood phylogenetic tree of the epidemic L2b C. trachomatis strain. The tree was reconstructed from a core alignment comprising 156 SNVs using IQ-TREE v2.1.4 (68) and rooted to the earliest available L2b sequences from the 1980s (ERR348840 and ERR348841). *All recombination-driven SNVs within the previously identified ompA -containing recombinant region of the L2b/D-Da strain (NC_010280; ~positions 55,221–59,461) (25, 27) were excluded prior to phylogenetic analysis. Edges are coloured by the ompA- genotype and nodes by LGV ompA subvariant (with ‘new’ indicating additional subvariants not included in a recent compilation) 48 . Scale bar represents numbers of SNPs. SupplementaryfigureS5.tiff Supplementary figure S5. Tanglegram comparing cgMLST-based single-linkage hierarchical clustering built from the whole cgMLST scheme (left) and plasmid-only loci (right). Branches, as connectors linking the same isolates in both trees, are colored by lineage. Distances are based on allelic differences. SupplementaryfigureS6.tiff Supplementary figure S6. Sankey plot illustrating the relationships between Chlamydia trachomatis classifications at four levels: ompA class (historical “disease groups” inferred from the traditional ompA classification*), ompA -genotype, cgMLST lineage, and MLST-derived sequence type (ST). Each node represents a category within a given stage, and the width of the connecting flows is proportional to the number of isolates sharing the corresponding combination of categories. *Grouping in “ ompA _class” was strictly based on ompA sequence, so the recombinant L2/L2b–D/Da strain 43,45 was classified as “genital”, as most of its sequence (~75%) derives from a genital genotype (D/Da). Supplementary files S2 and S4 provide source data. SupplementaryfileS1.xlsx SupplementaryfileS2.xlsx SupplementaryfileS3.xlsx SupplementaryfileS4.xlsx SupplementaryfileS5.xlsx SupplementaryfileS6.xlsx SupplementaryfileS7.zip SupplementaryfileS8.xlsx SupplementaryfileS9.xlsx Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-7743240","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":522340589,"identity":"e5752a10-6571-4819-a0bf-7e3f534c7ecd","order_by":0,"name":"Zohra Lohdia*","email":"","orcid":"https://orcid.org/0000-0002-9477-9646","institution":"National Institute of Health Doutor Ricardo Jorge (INSA), Lisbon, Portugal","correspondingAuthor":false,"prefix":"","firstName":"Zohra","middleName":"","lastName":"Lohdia*","suffix":""},{"id":522348415,"identity":"02af9711-faad-47d5-ba02-bf183b42ba45","order_by":1,"name":"Verónica Mixão*","email":"","orcid":"https://orcid.org/0000-0001-6669-0161","institution":"National Institute of Health Doutor Ricardo Jorge (INSA), Lisbon, Portugal","correspondingAuthor":false,"prefix":"","firstName":"Verónica","middleName":"","lastName":"Mixão*","suffix":""},{"id":522348416,"identity":"e531c501-a4f9-403a-a042-370fefbc0e0f","order_by":2,"name":"Joana Isidro","email":"","orcid":"https://orcid.org/0000-0002-8529-9878","institution":"National Institute of Health Doutor Ricardo Jorge (INSA), Lisbon, Portugal","correspondingAuthor":false,"prefix":"","firstName":"Joana","middleName":"","lastName":"Isidro","suffix":""},{"id":522348417,"identity":"dd8ec448-18d1-41d4-a0e4-4c51fefcfd5a","order_by":3,"name":"Rita Ferreira","email":"","orcid":"https://orcid.org/0000-0002-0016-7543","institution":"National Institute of Health Doutor Ricardo Jorge (INSA), Lisbon, Portugal","correspondingAuthor":false,"prefix":"","firstName":"Rita","middleName":"","lastName":"Ferreira","suffix":""},{"id":522348418,"identity":"f45411b3-a63b-4fa8-b75e-f75dfba92b98","order_by":4,"name":"Dora Cordeiro","email":"","orcid":"","institution":"National Institute of Health Doutor Ricardo Jorge (INSA), Lisbon, Portugal","correspondingAuthor":false,"prefix":"","firstName":"Dora","middleName":"","lastName":"Cordeiro","suffix":""},{"id":522348419,"identity":"bad28e87-9855-48b8-8d91-8967351a0277","order_by":5,"name":"Cristina Correia","email":"","orcid":"","institution":"National Institute of Health Doutor Ricardo Jorge (INSA), Lisbon, Portugal","correspondingAuthor":false,"prefix":"","firstName":"Cristina","middleName":"","lastName":"Correia","suffix":""},{"id":522348420,"identity":"eb6bcbde-751d-496e-bd5c-251be7ce31c0","order_by":6,"name":"Inês João","email":"","orcid":"","institution":"National Institute of Health Doutor Ricardo Jorge (INSA), Lisbon, Portugal","correspondingAuthor":false,"prefix":"","firstName":"Inês","middleName":"","lastName":"João","suffix":""},{"id":522348421,"identity":"5d6c2f4c-d4d6-4081-bbbe-0e7c59f1dbdf","order_by":7,"name":"João Paulo Gomes","email":"","orcid":"https://orcid.org/0000-0002-2697-2399","institution":"National Institute of Health Doutor Ricardo Jorge (INSA), Lisbon, Portugal","correspondingAuthor":false,"prefix":"","firstName":"João","middleName":"Paulo","lastName":"Gomes","suffix":""},{"id":522348422,"identity":"8890cdd6-7e32-4a86-bcc5-aea3c9f39103","order_by":8,"name":"Maria José Borrego","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAABQUlEQVRIie2RMUvDQBTHX4zE5SRrQgr9CieBSKDYr2I4iMsNBZcMEiNCXIqu/RjtNwgc2OW0a6AZGoRMDi04RCxirg1C09BZ8H7Lu3fHj/+7OwCJ5A+CE2Uhalj3PVTvb4vRorgJXIpK6t7fV5pav1a8aNuz3eM2xUnAy8sAiG695PlHMOuc62xxlA4y6D7ws3QJWbivEBtxCM3HK9vu8DlyRz5WKS4Ac2q7IyhaUnxLiYFgrmmWGc8RTkEo7HYM1LEQsKaCK8X8/AZvzLWTLzN+RXg2XQoFuk/vjrVuU4594zTaKJqyihOEE7pJAUirFGhRmMZs9GwQc6ipFnBSDUYHTCg4La7dIS7MqKFM7+/y8qYX6qgKKYOLfjXY5I2uxWBkkpZBpjdSQP19/2pd/yMkO5c9hFK2bB5WJBKJ5D/wA7VGcNzYEX9EAAAAAElFTkSuQmCC","orcid":"https://orcid.org/0000-0002-8604-8800","institution":"National Institute of Health Doutor Ricardo Jorge (INSA), Lisbon, Portugal","correspondingAuthor":true,"prefix":"","firstName":"Maria","middleName":"José","lastName":"Borrego","suffix":""},{"id":522348423,"identity":"8c85a1f4-0287-461f-8da7-538d2ad96af3","order_by":9,"name":"Vítor Borges","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAABIklEQVRIie3QvWqEQBAH8JUBm3hsq1zwXmFFuKS44KsowlVCTHfVIaS9JO2mzgsY8gIeU9gIaS1sRNhasLkihHhfxKCmDmT/sAs7sz92WEJkZP5i4LAnx0PNSPfojhD4vqPwPhliHQLaz9YwYSndVitSmFdPL9gswsKh0YWHIVnPWOoncLfrEwSwMiLsy1wspwETHk8mMXKCVpwJF3j/FeMeVCMi6HE9m0PA0CV7MvlMlDgPGGi/kvemuWbozPZEI2snzm/rIULhTOiGTNs5lfhIwGtfISPEtiImbF1X58aGofeKB4L+cyYY8mWPqHRbltGqMHWKVb37QMdMH96adrCbx9Qvq3DRI6d/a5d+HgE6DRwBpxGTgSIM1GRkZGT+Xb4AP9dr9Imkak4AAAAASUVORK5CYII=","orcid":"https://orcid.org/0000-0003-3767-2209","institution":"National Institute of Health Doutor Ricardo Jorge (INSA), Lisbon, Portugal","correspondingAuthor":true,"prefix":"","firstName":"Vítor","middleName":"","lastName":"Borges","suffix":""}],"badges":[],"createdAt":"2025-09-29 14:44:22","currentVersionCode":1,"declarations":{"humanSubjects":false,"vertebrateSubjects":false,"conflictsOfInterestStatement":false,"humanSubjectEthicalGuidelines":false,"humanSubjectConsent":false,"humanSubjectClinicalTrial":false,"humanSubjectCaseReport":false,"vertebrateSubjectEthicalGuidelines":false},"doi":"10.21203/rs.3.rs-7743240/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-7743240/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":92584666,"identity":"5b8b0b70-f29f-487b-b4e0-33ebe18e082f","added_by":"auto","created_at":"2025-10-01 10:14:08","extension":"docx","order_by":0,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":2938513,"visible":true,"origin":"","legend":"","description":"","filename":"Ctrachomatis20250929preprint.docx","url":"https://assets-eu.researchsquare.com/files/rs-7743240/v1/d572ddd7f1e4cf55f1adaed0.docx"},{"id":92584664,"identity":"dfff759c-0f92-477e-8757-d768ffecdf1f","added_by":"auto","created_at":"2025-10-01 10:14:08","extension":"json","order_by":1,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":342,"visible":true,"origin":"","legend":"","description":"","filename":"rs7743240.json","url":"https://assets-eu.researchsquare.com/files/rs-7743240/v1/4bf384b470f40fad8a630e79.json"},{"id":92583347,"identity":"351ad0d4-ccd6-4f6f-856c-1052d3f721ce","added_by":"auto","created_at":"2025-10-01 10:06:08","extension":"xml","order_by":2,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":280373,"visible":true,"origin":"","legend":"","description":"","filename":"rs77432400enriched.xml","url":"https://assets-eu.researchsquare.com/files/rs-7743240/v1/a6064194660f1d8e178a19fe.xml"},{"id":92583348,"identity":"5149d12b-0a9a-4f76-a794-54613c9a2715","added_by":"auto","created_at":"2025-10-01 10:06:08","extension":"png","order_by":3,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":1004093,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage1.png","url":"https://assets-eu.researchsquare.com/files/rs-7743240/v1/02faa22e677eb7089f646791.png"},{"id":92585006,"identity":"4995e363-2e95-4754-866e-789c8bb71063","added_by":"auto","created_at":"2025-10-01 10:22:08","extension":"png","order_by":4,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":479220,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage2.png","url":"https://assets-eu.researchsquare.com/files/rs-7743240/v1/e8035fff256d259eb6625111.png"},{"id":92583352,"identity":"e479c748-4fd1-494d-b5a4-099a51ef8338","added_by":"auto","created_at":"2025-10-01 10:06:08","extension":"png","order_by":5,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":1005361,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage3.png","url":"https://assets-eu.researchsquare.com/files/rs-7743240/v1/b68496f2e8f4162d71b24ebe.png"},{"id":92585008,"identity":"66c3f8ea-ef8a-449d-8f2f-7df869fb8cbc","added_by":"auto","created_at":"2025-10-01 10:22:09","extension":"png","order_by":6,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":357573,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage4.png","url":"https://assets-eu.researchsquare.com/files/rs-7743240/v1/b552a04c42b17a9446785f91.png"},{"id":92583354,"identity":"830097d2-5e55-48a5-a499-7c36c6c9ec37","added_by":"auto","created_at":"2025-10-01 10:06:08","extension":"png","order_by":7,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":145837,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage1.png","url":"https://assets-eu.researchsquare.com/files/rs-7743240/v1/7de862cb1e47ab9e254dfdf2.png"},{"id":92585007,"identity":"20aff544-410c-4e55-bf5c-06c2d93c3042","added_by":"auto","created_at":"2025-10-01 10:22:08","extension":"png","order_by":8,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":118022,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage2.png","url":"https://assets-eu.researchsquare.com/files/rs-7743240/v1/5e5f6eb2a023ae1b7669a2e4.png"},{"id":92583366,"identity":"d7518c66-8294-42a6-ab5a-9ca16e1eff2a","added_by":"auto","created_at":"2025-10-01 10:06:09","extension":"png","order_by":9,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":116275,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage3.png","url":"https://assets-eu.researchsquare.com/files/rs-7743240/v1/89040a6f229dccfb16e1a4b1.png"},{"id":92583363,"identity":"ded9243e-a5fa-4460-b328-e070ff0068e4","added_by":"auto","created_at":"2025-10-01 10:06:09","extension":"png","order_by":10,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":77911,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage4.png","url":"https://assets-eu.researchsquare.com/files/rs-7743240/v1/0459a1013fac44a502210495.png"},{"id":92584671,"identity":"9fabcfd5-a8aa-4de0-ae01-15b8ccd609ef","added_by":"auto","created_at":"2025-10-01 10:14:09","extension":"xml","order_by":11,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":278428,"visible":true,"origin":"","legend":"","description":"","filename":"rs77432400structuring.xml","url":"https://assets-eu.researchsquare.com/files/rs-7743240/v1/80ffb4844278a9853a19e8a1.xml"},{"id":92583371,"identity":"4c7a24a1-ad2a-47a2-a685-8b0e2ded0966","added_by":"auto","created_at":"2025-10-01 10:06:09","extension":"html","order_by":12,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":299697,"visible":true,"origin":"","legend":"","description":"","filename":"earlyproof.html","url":"https://assets-eu.researchsquare.com/files/rs-7743240/v1/f8f388e7fec472095abcad2c.html"},{"id":92944999,"identity":"41be9ec8-5ddd-4f08-b833-461ddd0e76dd","added_by":"auto","created_at":"2025-10-07 12:30:19","extension":"pdf","order_by":1,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":1309633,"visible":true,"origin":"","legend":"","description":"","filename":"Ctrachomatis20250929preprint.pdf","url":"https://assets-eu.researchsquare.com/files/rs-7743240/v1_covered_91c5d4d3-aebf-43bf-b3f5-bb75ccebd303.pdf"},{"id":92583342,"identity":"ced902ea-b783-49b2-9f1f-203758072c12","added_by":"auto","created_at":"2025-10-01 10:06:08","extension":"docx","order_by":1,"title":"","display":"","copyAsset":false,"role":"supplement","size":16441,"visible":true,"origin":"","legend":"","description":"","filename":"Supplementaryfigureslegends.docx","url":"https://assets-eu.researchsquare.com/files/rs-7743240/v1/90298508771aa2a0ede77bbd.docx"},{"id":92583358,"identity":"1de50d15-5b43-4f08-9ca5-88590a50e9e0","added_by":"auto","created_at":"2025-10-01 10:06:08","extension":"tiff","order_by":2,"title":"","display":"","copyAsset":false,"role":"supplement","size":28122934,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eSupplementary figure S1.\u003c/strong\u003e \u003cstrong\u003eDistribution of input parameters and sequencing outcomes in the targeted enrichment WGS of \u003c/strong\u003e\u003cem\u003e\u003cstrong\u003eC. trachomatis\u003c/strong\u003e\u003c/em\u003e\u003cstrong\u003e. A) \u003c/strong\u003eDistribution of \u003cem\u003eC. trachomatis\u003c/em\u003e genome copies (left boxplot, n=835) and dsDNA amount (right boxplot, n=246) in the target enrichment WGS input volume (7 µL), with dots colored according to WGS selection status. WGS selected samples (n=148) are represented in green and orange for genome copies and DNA amount, respectively, and non-selected samples in grey. The red dashed lines represent the thresholds of 10\u003csup\u003e4\u003c/sup\u003e genome copies and 10 ng (both in 7 µL), used to prioritize samples for sequencing.\u003cstrong\u003e B)\u003c/strong\u003e Distribution of the number of \u003cem\u003eC. trachomatis\u003c/em\u003e copies in the WGS input volume and the corresponding percentage of “on-target” reads in log\u003csub\u003e10\u003c/sub\u003e scale for the 148 isolates selected for WGS. Each point represents one sample submitted for targeted enrichment WGS, being colored in dark blue when passing the inclusion criteria for the final curated genome dataset and in light blue when not passing these criteria. “On-target” reads were assumed as QC-passed reads classified as \u003cem\u003eChlamydiales, i.e.\u003c/em\u003e, reads used for \u003cem\u003eC. trachomatis\u003c/em\u003e genome assembly\u003cem\u003e \u003c/em\u003e(see Materials and Methods).\u003c/p\u003e","description":"","filename":"SupplementaryfigureS1.tiff","url":"https://assets-eu.researchsquare.com/files/rs-7743240/v1/51b9826b57222b2c8762e338.tiff"},{"id":92583361,"identity":"3d27fb57-a47a-4047-aa9f-a39ecb48d8ba","added_by":"auto","created_at":"2025-10-01 10:06:09","extension":"tiff","order_by":3,"title":"","display":"","copyAsset":false,"role":"supplement","size":34654006,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eSupplementary figure S2. Evaluation and benchmarking of the novel \u003c/strong\u003e\u003cem\u003e\u003cstrong\u003eC. trachomatis \u003c/strong\u003e\u003c/em\u003e\u003cstrong\u003ecgMLST schema. A) \u003c/strong\u003eDistribution of the number of unique alleles per locus of the cgMLST schema. \u003cstrong\u003eB) \u003c/strong\u003eComparison of the distribution of the percentage of loci called per sample in the novel cgMLST schema (blue) and the local deployment of the PubMLST schema with chewBBACA allele caller (orange) using the global \u003cem\u003eC. trachomatis \u003c/em\u003edataset (n=1230). \u003cstrong\u003eC)\u003c/strong\u003e Number of partitions (clusters and singletons) obtained at each possible distance threshold level in the novel cgMLST schema and the local deployment of the PubMLST schema with chewBBACA \u003csup\u003e56\u003c/sup\u003e, both using ReporTree single-linkage hierarchical clustering \u003csup\u003e82\u003c/sup\u003e. \u003cstrong\u003eD)\u003c/strong\u003e Heatmap of the congruence score (0 - no congruence; 3 - absolute congruence) obtained for the pairwise comparisons performed between all possible threshold combinations of the two approaches analysed in C. \u003cstrong\u003eE)\u003c/strong\u003e Number of partitions at each possible distance threshold level in the novel cgMLST schema using ReporTree single-linkage hierarchical clustering and GrapeTree MSTreeV2 algorithm \u003csup\u003e82,83\u003c/sup\u003e. \u003cstrong\u003eF)\u003c/strong\u003e Heatmap of the congruence score obtained for the pairwise comparisons performed between all possible threshold combinations of the two approaches analysed in E. \u003cstrong\u003eG)\u003c/strong\u003e Number of partitions at each possible distance threshold level in the novel cgMLST schema and the curated core genome alignment of the 53 complete assemblies, both cases using ReporTree single-linkage hierarchical clustering \u003csup\u003e82\u003c/sup\u003e. \u003cstrong\u003eH)\u003c/strong\u003e Heatmap of the congruence score (0 - no congruence; 3 - absolute congruence) obtained in the pairwise comparisons performed between all possible threshold combinations of the two approaches analysed in G. \u003cstrong\u003eI)\u003c/strong\u003e Number of partitions at each possible distance threshold level in the novel cgMLST schema and the Parsnp \u003csup\u003e89\u003c/sup\u003e alignment of the global dataset, both using ReporTree single-linkage hierarchical clustering \u003csup\u003e82\u003c/sup\u003e. \u003cstrong\u003eJ)\u003c/strong\u003e Heatmap of the congruence score obtained for the pairwise comparisons performed between all possible threshold combinations of the two approaches analysed in I.\u003c/p\u003e","description":"","filename":"SupplementaryfigureS2.tiff","url":"https://assets-eu.researchsquare.com/files/rs-7743240/v1/774093abbc49c53aac01c16c.tiff"},{"id":92584673,"identity":"673936a5-1dae-4308-8134-41d8bcaaba0d","added_by":"auto","created_at":"2025-10-01 10:14:09","extension":"tiff","order_by":4,"title":"","display":"","copyAsset":false,"role":"supplement","size":26115990,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eSupplementary figure S3. Neighbor-joining phylogenetic tree of \u003c/strong\u003e\u003cem\u003e\u003cstrong\u003eC. trachomatis\u003c/strong\u003e\u003c/em\u003e\u003cstrong\u003e \u003c/strong\u003e\u003cem\u003e\u003cstrong\u003eompA\u003c/strong\u003e\u003c/em\u003e\u003cstrong\u003ealleles identified by cgMLST in the studied genome dataset. \u003c/strong\u003eNodes are coloured by the \u003cem\u003eompA \u003c/em\u003egenotype. Supplementary file S4 provides a summary of \u003cem\u003eompA \u003c/em\u003eallele distributions across lineages, and Supplementary file S2 details the \u003cem\u003eompA \u003c/em\u003ealleles identified in individual genomes. Allele 112, which was identified in a single genome, was excluded, as a shorter CDS was inferred by cgMLST. Scale bar represents numbers of substitutions\u003cem\u003e per\u003c/em\u003e site.\u003c/p\u003e","description":"","filename":"SupplementaryfigureS3.tiff","url":"https://assets-eu.researchsquare.com/files/rs-7743240/v1/704fdbd3c2457e7313a07984.tiff"},{"id":92583360,"identity":"a7c7f145-dd7b-4ecb-a3f2-6fcf66adb3a6","added_by":"auto","created_at":"2025-10-01 10:06:09","extension":"tiff","order_by":5,"title":"","display":"","copyAsset":false,"role":"supplement","size":20086654,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eSupplementary figure S4. Core SNP maximum-likelihood phylogenetic tree of the epidemic L2b \u003c/strong\u003e\u003cem\u003e\u003cstrong\u003eC. trachomatis \u003c/strong\u003e\u003c/em\u003e\u003cstrong\u003estrain. \u003c/strong\u003eThe tree was reconstructed from a core alignment comprising 156 SNVs using IQ-TREE v2.1.4 (68) and rooted to the earliest available L2b sequences from the 1980s (ERR348840 and ERR348841). *All recombination-driven SNVs within the previously identified \u003cem\u003eompA\u003c/em\u003e-containing recombinant region of the L2b/D-Da strain (NC_010280; ~positions 55,221–59,461) (25, 27) were excluded prior to phylogenetic analysis. Edges are coloured by the \u003cem\u003eompA-\u003c/em\u003egenotype and nodes by LGV \u003cem\u003eompA \u003c/em\u003esubvariant (with ‘new’ indicating additional subvariants not included in a recent compilation) \u003csup\u003e48\u003c/sup\u003e. Scale bar represents numbers of SNPs.\u003c/p\u003e","description":"","filename":"SupplementaryFigureS4.tiff","url":"https://assets-eu.researchsquare.com/files/rs-7743240/v1/453315d33de943feba7a7a31.tiff"},{"id":92584670,"identity":"13b480f2-0f0f-4cdf-b720-054b629d1f5e","added_by":"auto","created_at":"2025-10-01 10:14:09","extension":"tiff","order_by":6,"title":"","display":"","copyAsset":false,"role":"supplement","size":11097926,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eSupplementary figure S5.\u003c/strong\u003e \u003cstrong\u003eTanglegram comparing cgMLST-based single-linkage hierarchical clustering built from the whole cgMLST scheme (left) and plasmid-only loci (right). \u003c/strong\u003eBranches, as connectors linking the same isolates in both trees, are colored by lineage. Distances are based on allelic differences.\u003c/p\u003e","description":"","filename":"SupplementaryfigureS5.tiff","url":"https://assets-eu.researchsquare.com/files/rs-7743240/v1/226d5db034dd057dc6cac26c.tiff"},{"id":92583365,"identity":"3351a2d8-deba-44e8-bd17-862856a1cd55","added_by":"auto","created_at":"2025-10-01 10:06:09","extension":"tiff","order_by":7,"title":"","display":"","copyAsset":false,"role":"supplement","size":16574502,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eSupplementary figure S6. Sankey plot illustrating the relationships between \u003c/strong\u003e\u003cem\u003e\u003cstrong\u003eChlamydia trachomatis\u003c/strong\u003e\u003c/em\u003e\u003cstrong\u003e classifications at four levels: \u003c/strong\u003e\u003cem\u003eompA\u003c/em\u003e class (historical “disease groups” inferred from the traditional \u003cem\u003eompA\u003c/em\u003e classification*), \u003cem\u003eompA\u003c/em\u003e-genotype, cgMLST lineage, and MLST-derived sequence type (ST). Each node represents a category within a given stage, and the width of the connecting flows is proportional to the number of isolates sharing the corresponding combination of categories. *Grouping in “\u003cem\u003e\u003cstrong\u003eompA\u003c/strong\u003e\u003c/em\u003e\u003cstrong\u003e_class”\u003c/strong\u003e was strictly based on \u003cem\u003eompA\u003c/em\u003e sequence, so the recombinant L2/L2b–D/Da strain \u003csup\u003e43,45\u003c/sup\u003e was classified as “genital”, as most of its sequence (~75%) derives from a genital genotype (D/Da). Supplementary files S2 and S4 provide source data.\u003c/p\u003e","description":"","filename":"SupplementaryfigureS6.tiff","url":"https://assets-eu.researchsquare.com/files/rs-7743240/v1/d2d8f3b827327eb71fe5cb51.tiff"},{"id":92583356,"identity":"0b0ccbac-2913-4912-aa87-3778e8387a76","added_by":"auto","created_at":"2025-10-01 10:06:08","extension":"xlsx","order_by":8,"title":"","display":"","copyAsset":false,"role":"supplement","size":107541,"visible":true,"origin":"","legend":"","description":"","filename":"SupplementaryfileS1.xlsx","url":"https://assets-eu.researchsquare.com/files/rs-7743240/v1/ff1ead59e3836b6bd485099f.xlsx"},{"id":92583351,"identity":"30540489-98a9-4e7b-9660-104f4dbef02b","added_by":"auto","created_at":"2025-10-01 10:06:08","extension":"xlsx","order_by":9,"title":"","display":"","copyAsset":false,"role":"supplement","size":169672,"visible":true,"origin":"","legend":"","description":"","filename":"SupplementaryfileS2.xlsx","url":"https://assets-eu.researchsquare.com/files/rs-7743240/v1/d1b75d2c3c5b0438ccf4e947.xlsx"},{"id":92584674,"identity":"36c9741f-d29b-4e84-aaf1-1aaa85886385","added_by":"auto","created_at":"2025-10-01 10:14:09","extension":"xlsx","order_by":10,"title":"","display":"","copyAsset":false,"role":"supplement","size":8056917,"visible":true,"origin":"","legend":"","description":"","filename":"SupplementaryfileS3.xlsx","url":"https://assets-eu.researchsquare.com/files/rs-7743240/v1/09c592766c35c10877771741.xlsx"},{"id":92584668,"identity":"8786a2fa-5139-463d-96a3-53083ccfaa5b","added_by":"auto","created_at":"2025-10-01 10:14:08","extension":"xlsx","order_by":11,"title":"","display":"","copyAsset":false,"role":"supplement","size":24940,"visible":true,"origin":"","legend":"","description":"","filename":"SupplementaryfileS4.xlsx","url":"https://assets-eu.researchsquare.com/files/rs-7743240/v1/d9ef8ada0b7935c2136bfb72.xlsx"},{"id":92583372,"identity":"606d3418-4ed8-487b-adc4-b4d568a2211e","added_by":"auto","created_at":"2025-10-01 10:06:09","extension":"xlsx","order_by":12,"title":"","display":"","copyAsset":false,"role":"supplement","size":20492687,"visible":true,"origin":"","legend":"","description":"","filename":"SupplementaryfileS5.xlsx","url":"https://assets-eu.researchsquare.com/files/rs-7743240/v1/c3c176ff63b794073dc991e9.xlsx"},{"id":92583368,"identity":"015ab530-67e6-423c-a674-342b2d15569e","added_by":"auto","created_at":"2025-10-01 10:06:09","extension":"xlsx","order_by":13,"title":"","display":"","copyAsset":false,"role":"supplement","size":64631,"visible":true,"origin":"","legend":"","description":"","filename":"SupplementaryfileS6.xlsx","url":"https://assets-eu.researchsquare.com/files/rs-7743240/v1/4d74bc6dd158afbadc30b784.xlsx"},{"id":92584675,"identity":"122c70d8-478c-4e61-9a6b-9dbc5eedabdd","added_by":"auto","created_at":"2025-10-01 10:14:09","extension":"zip","order_by":14,"title":"","display":"","copyAsset":false,"role":"supplement","size":7527433,"visible":true,"origin":"","legend":"","description":"","filename":"SupplementaryfileS7.zip","url":"https://assets-eu.researchsquare.com/files/rs-7743240/v1/67194dfc9a60c4db0e67d4ca.zip"},{"id":92583374,"identity":"705689dd-3ef8-4c2e-b567-e968a37ae000","added_by":"auto","created_at":"2025-10-01 10:06:09","extension":"xlsx","order_by":15,"title":"","display":"","copyAsset":false,"role":"supplement","size":9992047,"visible":true,"origin":"","legend":"","description":"","filename":"SupplementaryfileS8.xlsx","url":"https://assets-eu.researchsquare.com/files/rs-7743240/v1/a79422f30d87e14c998fbb74.xlsx"},{"id":92583369,"identity":"3b174ed0-28bd-4f21-8cd3-8f508b9cd8fe","added_by":"auto","created_at":"2025-10-01 10:06:09","extension":"xlsx","order_by":16,"title":"","display":"","copyAsset":false,"role":"supplement","size":1948550,"visible":true,"origin":"","legend":"","description":"","filename":"SupplementaryfileS9.xlsx","url":"https://assets-eu.researchsquare.com/files/rs-7743240/v1/ea0668d285a523cd0ba88f71.xlsx"}],"financialInterests":"","formattedTitle":"\u003cp\u003e\u003cstrong\u003eAdvancing \u003c/strong\u003e\u003cem\u003e\u003cstrong\u003eChlamydia trachomatis\u003c/strong\u003e\u003c/em\u003e\u003cstrong\u003e genomic surveillance and research with a novel core-genome MLST (cgMLST) approach\u003c/strong\u003e\u003c/p\u003e","fulltext":[],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":false,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":true,"hideJournal":true,"highlight":"","institution":"National Institute of Health Doutor Ricardo Jorge (INSA), Portugal","isAcceptedByJournal":false,"isAuthorSuppliedPdf":true,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":true,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"Chlamydia trachomatis, Whole-Genome Sequencing, cgMLST, genomic surveillance","lastPublishedDoi":"10.21203/rs.3.rs-7743240/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-7743240/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003e\u003cem\u003eChlamydia trachomatis\u003c/em\u003e is the most common sexually transmitted bacterial infection, with an estimated 129\u0026nbsp;million new cases annually. Its classification traditionally relies on \u003cem\u003eompA\u003c/em\u003e-genotyping, but whole-genome sequencing (WGS) offers transformative resolution to study evolution, transmission dynamics and epidemiological patterns. Yet, WGS-based surveillance of \u003cem\u003eC. trachomatis\u003c/em\u003e remains very limited by technical challenges and the lack of standardized typing frameworks. Core-genome multilocus sequence typing (cgMLST) is a scalable and portable approach widely applied to bacterial pathogens, but remains little explored for \u003cem\u003eC. trachomatis\u003c/em\u003e. In this context, we compiled and curated the largest \u003cem\u003eC. trachomatis\u003c/em\u003e genome dataset to date (1230 samples from 26 countries), including publicly available and newly generated assemblies, to develop a novel cgMLST schema optimized for standardized local deployment. Fueled by existing (like ReporTree) and newly developed bioinformatic resources, the extensive cgMLST analyses performed in this study allowed an in-depth and unprecedented exploration of \u003cem\u003eC. trachomatis\u003c/em\u003e global phylogenomic diversity and recombination-driven evolution. Indeed, the novel cgMLST schema (n = 846 loci) robustly recapitulated the four major evolutionary lineages of \u003cem\u003eC. trachomatis\u003c/em\u003e and showed high congruence with core-SNP approaches, while providing high resolution to resolve intra-lineage genogroup diversity and detect recombination mosaicisms. Also, it efficiently captured the clonal expansion of epidemiologically relevant strains, including the lymphogranuloma venereum (LGV) epidemic “L2b” and the emergent L4 strains, further consolidating its robustness for contemporary transmission and outbreak monitoring. By enabling a rapid link between loci/alleles and specific phylogenomic/phenotypic traits, the novel cgMLST approach not only elucidated \u003cem\u003eC. trachomatis\u003c/em\u003e genome-wide recombination landscape (e.g., through straightforward detection of major genotype-lineage incongruences), but also identified lineage-specific alleles (and disrupted loci) with potential diagnostic and/or functional relevance. Finally, to further advance \u003cem\u003eC. trachomatis\u003c/em\u003e genomic surveillance and research, this novel schema is released (https://doi.org/10.5281/zenodo.17177579) accompanied by a hierarchical cgMLST-based nomenclature that supports harmonized genogroup tracking across laboratories and countries. In summary, this work delivers both an expanded global \u003cem\u003eC. trachomatis\u003c/em\u003e genomic resource and a robust cgMLST framework, with immediate utility for research and standardized, high-resolution genome-scale routine surveillance.\u003c/p\u003e\n\u003cp\u003e\u003cbr\u003e\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e*Zohra Lodhia \u0026amp; Verónica Mixão contributed equally to this work.\u003c/strong\u003e\u003c/p\u003e","manuscriptTitle":"Advancing Chlamydia trachomatis genomic surveillance and research with a novel core-genome MLST (cgMLST) approach","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-10-01 10:06:03","doi":"10.21203/rs.3.rs-7743240/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"76f25b92-df42-4e22-8531-6149d3ca1758","owner":[],"postedDate":"October 1st, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[{"id":55525396,"name":"Infectious Diseases"},{"id":55525397,"name":"Molecular Epidemiology"},{"id":55525398,"name":"Molecular Genetics"}],"tags":[],"updatedAt":"2025-10-01T10:06:03+00:00","versionOfRecord":[],"versionCreatedAt":"2025-10-01 10:06:03","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-7743240","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-7743240","identity":"rs-7743240","version":["v1"]},"buildId":"8U1c8b4HqxoKbykW_rLl7","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.