Benchmark of open-access star-allele callers to accurately assess haplotypes and phenotypes in pharmacogenetic studies

preprint OA: closed CC-BY-4.0
Full text 16,195 characters · extracted from preprint-html · click to expand
Benchmark of open-access star-allele callers to accurately assess haplotypes and phenotypes in pharmacogenetic studies | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article Benchmark of open-access star-allele callers to accurately assess haplotypes and phenotypes in pharmacogenetic studies Marc Gros La Faige, Emmanuelle Génin, Anthony Herzig This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-7924414/v2 This work is licensed under a CC BY 4.0 License Status: Posted Version 2 posted You are reading this latest preprint version Show more versions Abstract Genetic polymorphisms are common in pharmacogenes, with sometimes important implications for drug metabolism. Assessing the correct enzyme phenotype from genetic data is thus a crucial step into the development of personalized medicine. Many bioinformatics star-allele callers have been developed for this purpose of identifying the correct star alleles and the associated phenotype, each of them having their specific method and limitations. Despite the important benchmarks that have been made so far, their performances have not been yet fully explored depending on various parameters, such as the type of genetic data provided as input, or the individuals’ ancestry. Hence, we provide a multi-gene, multi data-type comparison of the accuracy of four commonly used and open-access star-allele callers: PyPGx, ursaPGx, PharmCAT and Aldy. We found that PyPGx and Aldy are overall more performant than the others. Moreover, using imputation or low-pass sequencing data can enhance the accuracy of star-allele callers compared to SNP-chip genotyping data only. Finally, we noticed that the concordance between star-allele callers is highly dependent on population ancestry. Our study provides new recommendations about the algorithm clinicians and researchers should use regarding the pharmacogene and the type of data they have access to. Biological sciences/Genetics/Genomics/Pharmacogenomics Biological sciences/Computational biology and bioinformatics/Predictive medicine Figures Figure 1 Figure 2 Figure 3 Full Text Additional Declarations The authors declare no competing interests. Supplementary Files Table1.xlsx Table 1 Table2.xlsx Table 2 TableS1.xlsx Table S1 TableS3.xlsx Table S3 supplementaryinformationfinal2.docx Cite Share Download PDF Status: Posted Version 2 posted You are reading this latest preprint version Show more versions Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-7924414","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":541062971,"identity":"6b0f48cc-881a-4a8b-8c7a-8ed837ec81cf","order_by":0,"name":"Marc Gros La Faige","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAABJklEQVRIie3QMUsDMRTA8RcCN73ieqCYT1DIUTgUSh39GjkK5yQKjg5GDq5Lca5Tv8JNzoE3dGnp2klahE4OJy5FTjS1Fjqk1VEw/+nlyI8kB+Dz/e24BmgC2sGY1Rc2/YEwS9IlCdQ34fJXxIbrnW5S74xmJbw9HgrNbktQrQNx2n2hy6p5Udc8KB0kHp41QnZ31ZCGZSGoNkbDUUH3eXr8YDjvuYhJA2BdlRRipvffFxyj3nlBNU0yNnvkulg8nq9IX7NsAeoGo/7zlLD6sIRzJ5nYU2ChEm1Ybi9GKEIEwsDsIHMeJlot35IfgRqgxFRSLW/LmLaQccrKslL2j3GagLo+ER16esWqJeNB5iRfJfnGQpr1tB3Yqo1Z6F07fT6f7z/2CdCYYRCulLXgAAAAAElFTkSuQmCC","orcid":"https://orcid.org/0009-0002-4252-733X","institution":"INSERM UMR 1078","correspondingAuthor":true,"prefix":"","firstName":"Marc","middleName":"Gros La","lastName":"Faige","suffix":""},{"id":541062972,"identity":"77e479a9-0722-4982-a5bb-4139b6077c91","order_by":1,"name":"Emmanuelle Génin","email":"","orcid":"https://orcid.org/0000-0003-4117-2813","institution":"Inserm UMR1078","correspondingAuthor":false,"prefix":"","firstName":"Emmanuelle","middleName":"","lastName":"Génin","suffix":""},{"id":541062973,"identity":"9d29e8a9-b42d-444d-a106-97aa3650ff36","order_by":2,"name":"Anthony Herzig","email":"","orcid":"","institution":"","correspondingAuthor":false,"prefix":"","firstName":"Anthony","middleName":"","lastName":"Herzig","suffix":""}],"badges":[],"createdAt":"2025-10-22 14:11:36","currentVersionCode":2,"declarations":{"humanSubjects":false,"vertebrateSubjects":false,"conflictsOfInterestStatement":false,"humanSubjectEthicalGuidelines":false,"humanSubjectConsent":false,"humanSubjectClinicalTrial":false,"humanSubjectCaseReport":false,"vertebrateSubjectEthicalGuidelines":false},"doi":"10.21203/rs.3.rs-7924414/v2","doiUrl":"https://doi.org/10.21203/rs.3.rs-7924414/v2","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":102964357,"identity":"0606f5bc-ceb4-432b-b98d-991880cc6cc8","added_by":"auto","created_at":"2026-02-19 04:22:06","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":1085802,"visible":true,"origin":"","legend":"\u003cp\u003eAccuracy of star-allele callers with the GeT-RM consensus for diplotypes.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003ea\u003c/strong\u003e: Upset plot of star-allele caller output for diplotypes (a). True positives (i.e concordant with GeT-RM consensus) are represented in blue, tools using VCF files only are represented in green and tools using VCF and BAM files are represented in red. \u003cstrong\u003eb\u003c/strong\u003e: Concordance of each pipeline with GeT-RM consensus for each gene for diplotypes. The average concordance is represented in red. \u003cstrong\u003ec\u003c/strong\u003e: Concordance of VCF-only pipelines (green) and VCF \u0026amp; BAM pipelines (red) with GeT-RM for diplotypes. One diplotype is considered concordant in one group if all the pipelines of the group are consistent with GeT-RM. A McNemar’s test is used to compare the accuracy of each group. \u003cstrong\u003ed\u003c/strong\u003e: Pairwise concordance between each pipeline for each gene for diplotypes.\u003c/p\u003e","description":"","filename":"Fig1.png","url":"https://assets-eu.researchsquare.com/files/rs-7924414/v2/37383f5b22ac9fde3c9555e4.png"},{"id":102951006,"identity":"10637500-8b06-4938-a642-6666d61ca3a3","added_by":"auto","created_at":"2026-02-18 20:56:14","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":50846,"visible":true,"origin":"","legend":"\u003cp\u003eConcordance of star-allele callers between genotyping, low-pass, imputation and WGS data with GeT-RM for diplotype and for each pipeline.\u003c/p\u003e","description":"","filename":"Fig2.png","url":"https://assets-eu.researchsquare.com/files/rs-7924414/v2/ae0b1fcb5480386956d0c76b.png"},{"id":102951003,"identity":"02fd9448-7eb4-43d1-866f-d59d4efa8063","added_by":"auto","created_at":"2026-02-18 20:56:14","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":84909,"visible":true,"origin":"","legend":"\u003cp\u003eBoxplot of concordance of star-allele callers between themselves per geographic regions using WGS data of 1KGP project. Each dot represents one gene. The average concordance per regions is represented by a red line.\u003c/p\u003e","description":"","filename":"Fig3.png","url":"https://assets-eu.researchsquare.com/files/rs-7924414/v2/6a8e44e864f5bccbf6bdbbdd.png"},{"id":102951000,"identity":"640eec4d-71e5-4292-a474-40b69f991dd1","added_by":"auto","created_at":"2026-02-18 20:56:14","extension":"xlsx","order_by":1,"title":"","display":"","copyAsset":false,"role":"supplement","size":10798,"visible":true,"origin":"","legend":"\u003cp\u003eTable 1\u003c/p\u003e","description":"","filename":"Table1.xlsx","url":"https://assets-eu.researchsquare.com/files/rs-7924414/v2/38b205d79298920f49bc376c.xlsx"},{"id":102964482,"identity":"9d25d252-4415-48d1-81c7-b90d86d447d6","added_by":"auto","created_at":"2026-02-19 04:22:26","extension":"xlsx","order_by":2,"title":"","display":"","copyAsset":false,"role":"supplement","size":10409,"visible":true,"origin":"","legend":"\u003cp\u003eTable 2\u003c/p\u003e","description":"","filename":"Table2.xlsx","url":"https://assets-eu.researchsquare.com/files/rs-7924414/v2/8705a1c2f0d800c611098277.xlsx"},{"id":102951007,"identity":"315b40fc-7962-4653-82bf-c153918cadcb","added_by":"auto","created_at":"2026-02-18 20:56:15","extension":"xlsx","order_by":3,"title":"","display":"","copyAsset":false,"role":"supplement","size":11410,"visible":true,"origin":"","legend":"\u003cp\u003eTable S1\u003c/p\u003e","description":"","filename":"TableS1.xlsx","url":"https://assets-eu.researchsquare.com/files/rs-7924414/v2/53522d3a651b7bb2564122a3.xlsx"},{"id":102951002,"identity":"31bf4a0b-3e6c-4d65-9677-165bf86652bb","added_by":"auto","created_at":"2026-02-18 20:56:14","extension":"xlsx","order_by":4,"title":"","display":"","copyAsset":false,"role":"supplement","size":30219,"visible":true,"origin":"","legend":"\u003cp\u003eTable S3\u003c/p\u003e","description":"","filename":"TableS3.xlsx","url":"https://assets-eu.researchsquare.com/files/rs-7924414/v2/55fa09d7c97bc5bd2f011efb.xlsx"},{"id":102964589,"identity":"83684407-ba99-4ebd-9630-118babc02dee","added_by":"auto","created_at":"2026-02-19 04:22:54","extension":"docx","order_by":5,"title":"","display":"","copyAsset":false,"role":"supplement","size":392938,"visible":true,"origin":"","legend":"","description":"","filename":"supplementaryinformationfinal2.docx","url":"https://assets-eu.researchsquare.com/files/rs-7924414/v2/ea745ed4af3af3ea2f3244e9.docx"}],"financialInterests":"The authors declare no competing interests.","formattedTitle":"\u003cp\u003eBenchmark of open-access star-allele callers to accurately assess haplotypes and phenotypes in pharmacogenetic studies\u003c/p\u003e","fulltext":[],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":false,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":true,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":true,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":true,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"","lastPublishedDoi":"10.21203/rs.3.rs-7924414/v2","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-7924414/v2","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eGenetic polymorphisms are common in pharmacogenes, with sometimes important implications for drug metabolism. Assessing the correct enzyme phenotype from genetic data is thus a crucial step into the development of personalized medicine. Many bioinformatics star-allele callers have been developed for this purpose of identifying the correct star alleles and the associated phenotype, each of them having their specific method and limitations. Despite the important benchmarks that have been made so far, their performances have not been yet fully explored depending on various parameters, such as the type of genetic data provided as input, or the individuals’ ancestry. Hence, we provide a multi-gene, multi data-type comparison of the accuracy of four commonly used and open-access star-allele callers: PyPGx, ursaPGx, PharmCAT and Aldy. We found that PyPGx and Aldy are overall more performant than the others. Moreover, using imputation or low-pass sequencing data can enhance the accuracy of star-allele callers compared to SNP-chip genotyping data only. Finally, we noticed that the concordance between star-allele callers is highly dependent on population ancestry. Our study provides new recommendations about the algorithm clinicians and researchers should use regarding the pharmacogene and the type of data they have access to.\u003c/p\u003e","manuscriptTitle":"Benchmark of open-access star-allele callers to accurately assess haplotypes and phenotypes in pharmacogenetic studies","msid":"","msnumber":"","nonDraftVersions":[{"code":2,"date":"2026-02-18 20:56:09","doi":"10.21203/rs.3.rs-7924414/v2","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}},{"code":1,"date":"2025-11-06 14:40:21","doi":"10.21203/rs.3.rs-7924414/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"ea595f9b-4439-4437-83b3-ad869ecb53a8","owner":[],"postedDate":"February 18th, 2026","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[{"id":61228781,"name":"Biological sciences/Genetics/Genomics/Pharmacogenomics"},{"id":61228782,"name":"Biological sciences/Computational biology and bioinformatics/Predictive medicine"}],"tags":[],"updatedAt":"2025-11-17T19:37:51+00:00","versionOfRecord":[],"versionCreatedAt":"2026-02-18 20:56:09","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v2","identity":"rs-7924414","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-7924414","identity":"rs-7924414","version":["v2"]},"buildId":"XKTyCvWXoU3ODBz1xrDgd","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

Ask this paper AI returns verbatim quotes from the full text · source: preprint-html

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2026) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc
last seen: 2026-05-20T01:45:00.602351+00:00
unpaywall
last seen: 2026-05-20T11:00:21.680559+00:00
License: CC-BY-4.0