An Advanced Entropy Approach for Minimizing False Discoveries in Imputation-Based Association Analyses

preprint OA: closed
Full text JSON View at publisher
Full text 31,098 characters · extracted from preprint-html · click to expand
An Advanced Entropy Approach for Minimizing False Discoveries in Imputation-Based Association Analyses | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article An Advanced Entropy Approach for Minimizing False Discoveries in Imputation-Based Association Analyses Zhihui Zhang, Dakai Zhu, Xiangjun Xiao, Christopher I. Amos This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-8264218/v1 This work is licensed under a CC BY 4.0 License Status: Under Revision Version 1 posted 11 You are reading this latest preprint version Abstract Genotype imputation is a cornerstone of modern genetic studies, enhancing the resolution of genome-wide association studies (GWAS), fine mapping, and polygenic risk score estimation by inferring untyped variants using reference panels. The output of imputation is a set of probabilistic genotypes, each associated with an inherent degree of uncertainty. However, conventional downstream analyses often overlook this uncertainty, relying instead on allelic dosages—expected allele counts computed from probabilistic genotypes—as proxies. This practice can be misleading, as distinct genotype probability distributions may produce identical dosages despite vastly different confidence levels, potentially introducing bias and inflating false discoveries. To address this limitation, we introduce an entropy-weighted association method that explicitly quantifies imputation uncertainty using Shannon entropy. These entropy values are integrated as observation-level weights within the association model, allowing the method to dynamically account for the reliability of each imputed genotype. Through simulation studies, we demonstrate that this approach substantially reduces false positives, especially when genotypic uncertainty is pronounced. Our findings highlight the importance of modeling imputation uncertainty and offer a framework that improves the robustness of GWAS and other genotype imputation-dependent analyses. Genotype imputation GWAS power and FDR Entropy weighting Dosage-based methods Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Figure 6 Figure 7 Full Text Additional Declarations No competing interests reported. Supplementary Files sup1.png Cite Share Download PDF Status: Under Revision Version 1 posted Editorial decision: Revision requested 06 Feb, 2026 Reviews received at journal 28 Dec, 2025 Reviews received at journal 26 Dec, 2025 Reviewers agreed at journal 15 Dec, 2025 Reviewers agreed at journal 15 Dec, 2025 Reviewers agreed at journal 14 Dec, 2025 Reviewers invited by journal 12 Dec, 2025 Editor assigned by journal 10 Dec, 2025 Editor invited by journal 09 Dec, 2025 Submission checks completed at journal 05 Dec, 2025 First submitted to journal 05 Dec, 2025 You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-8264218","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":561099909,"identity":"b869f305-7041-4b0d-8486-9726572e9b85","order_by":0,"name":"Zhihui Zhang","email":"","orcid":"","institution":"Baylor College of Medicine","correspondingAuthor":false,"prefix":"","firstName":"Zhihui","middleName":"","lastName":"Zhang","suffix":""},{"id":561099919,"identity":"483a4f64-733c-4b1a-86a8-ed78b463ad30","order_by":1,"name":"Dakai Zhu","email":"","orcid":"","institution":"Baylor College of Medicine","correspondingAuthor":false,"prefix":"","firstName":"Dakai","middleName":"","lastName":"Zhu","suffix":""},{"id":561099920,"identity":"5f9053c7-5b8c-45f0-8e2d-b8d19bfeef30","order_by":2,"name":"Xiangjun Xiao","email":"","orcid":"","institution":"Baylor College of Medicine","correspondingAuthor":false,"prefix":"","firstName":"Xiangjun","middleName":"","lastName":"Xiao","suffix":""},{"id":561099921,"identity":"ef6718eb-dfe0-4f22-a84e-f347e9e80b0d","order_by":3,"name":"Christopher I. Amos","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAA2klEQVRIiWNgGAWjYBAC/v6DDUDKhoGBnYFBAipogFeLxGHmA0AqjYGBmVgtBs5sCUDqMClamHlMN3z4c96ev5nH8AZj251oBvbmbRIEtJjdnNl2O3HGYR5jC8a2Z7kNPMfKCGq5zdtwO4HhMI+ZBGPb4dwGiRwzAlr4v93m+XPOXh6uRf4NAS3OQFt42A4wbkDYwoNfi8RhsF+SEzceZiu2SDh3OLeNJ63YAp8W/v4zZjc+/LGzlzvevPHGh7LDuf3shzfewKcFFSQAMRvxykfBKBgFo2AU4AIALOdHa2/USfQAAAAASUVORK5CYII=","orcid":"","institution":"Baylor College of Medicine","correspondingAuthor":true,"prefix":"","firstName":"Christopher","middleName":"I.","lastName":"Amos","suffix":""}],"badges":[],"createdAt":"2025-12-02 21:38:17","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-8264218/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-8264218/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":98414367,"identity":"a7a5c492-939a-44c0-ae94-03673f1b814a","added_by":"auto","created_at":"2025-12-17 14:30:03","extension":"png","order_by":0,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":529320,"visible":true,"origin":"","legend":"","description":"","filename":"fig1.png","url":"https://assets-eu.researchsquare.com/files/rs-8264218/v1/b0cb7a32f489b385eb2c02b0.png"},{"id":98441773,"identity":"bd39c78d-690d-4e1a-8cca-d431ed13819d","added_by":"auto","created_at":"2025-12-17 17:05:47","extension":"docx","order_by":1,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":344871,"visible":true,"origin":"","legend":"","description":"","filename":"manu20250730v1.docx","url":"https://assets-eu.researchsquare.com/files/rs-8264218/v1/9eabefaf5e3f676d20d66e41.docx"},{"id":98441287,"identity":"552c39ad-a522-4b74-90b5-88d2cd563ccb","added_by":"auto","created_at":"2025-12-17 17:05:08","extension":"png","order_by":5,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":1434044,"visible":true,"origin":"","legend":"","description":"","filename":"fig5.png","url":"https://assets-eu.researchsquare.com/files/rs-8264218/v1/ff421b7756fa313832c51def.png"},{"id":98441757,"identity":"6acd31ba-894f-48fc-b752-51f879871a43","added_by":"auto","created_at":"2025-12-17 17:05:45","extension":"json","order_by":8,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":7060,"visible":true,"origin":"","legend":"","description":"","filename":"dd2711b8bb724643b2e5677c649cba9e.json","url":"https://assets-eu.researchsquare.com/files/rs-8264218/v1/9d143d849fa7cda5130f9860.json"},{"id":98414369,"identity":"b46db5b6-9e26-4dbf-b778-70f5e06da620","added_by":"auto","created_at":"2025-12-17 14:30:03","extension":"png","order_by":9,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":129962,"visible":true,"origin":"","legend":"","description":"","filename":"sup1.png","url":"https://assets-eu.researchsquare.com/files/rs-8264218/v1/4f57d04bb323c5b81f6b096d.png"},{"id":98414371,"identity":"82925e17-f915-49c9-84d4-894905717076","added_by":"auto","created_at":"2025-12-17 14:30:03","extension":"xml","order_by":10,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":90680,"visible":true,"origin":"","legend":"","description":"","filename":"dd2711b8bb724643b2e5677c649cba9e1enriched.xml","url":"https://assets-eu.researchsquare.com/files/rs-8264218/v1/62572e5b90d6c70a3bc6507a.xml"},{"id":98441692,"identity":"abd17cb8-addd-44c8-8371-3d352a123b36","added_by":"auto","created_at":"2025-12-17 17:05:42","extension":"png","order_by":11,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":529320,"visible":true,"origin":"","legend":"","description":"","filename":"fig1.png","url":"https://assets-eu.researchsquare.com/files/rs-8264218/v1/abca4dae4b070995ff401b18.png"},{"id":98441112,"identity":"8683a0c6-3f52-4892-aa3c-c508828a58fe","added_by":"auto","created_at":"2025-12-17 17:04:54","extension":"pdf","order_by":12,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":1295535,"visible":true,"origin":"","legend":"","description":"","filename":"fig2.pdf","url":"https://assets-eu.researchsquare.com/files/rs-8264218/v1/00d3124dd117e3105ce00762.pdf"},{"id":98441424,"identity":"99f3d0a5-88b3-47f3-81f6-46a1b587b9b3","added_by":"auto","created_at":"2025-12-17 17:05:22","extension":"pdf","order_by":13,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":998560,"visible":true,"origin":"","legend":"","description":"","filename":"fig32.pdf","url":"https://assets-eu.researchsquare.com/files/rs-8264218/v1/3a09cab526cf1e0f3b0708ee.pdf"},{"id":98440428,"identity":"6262eb3b-d646-4494-9834-4964dce39f52","added_by":"auto","created_at":"2025-12-17 17:03:50","extension":"pdf","order_by":14,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":1522225,"visible":true,"origin":"","legend":"","description":"","filename":"fig4.pdf","url":"https://assets-eu.researchsquare.com/files/rs-8264218/v1/7010635d98780a5987634cd5.pdf"},{"id":98441055,"identity":"34fa1445-f1c2-43df-81f8-2be6f8b2cace","added_by":"auto","created_at":"2025-12-17 17:04:49","extension":"png","order_by":15,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":1434044,"visible":true,"origin":"","legend":"","description":"","filename":"fig5.png","url":"https://assets-eu.researchsquare.com/files/rs-8264218/v1/977c3b2b60cb126758802c08.png"},{"id":98414378,"identity":"37d9f116-4aec-494c-bc4e-ca761db03a87","added_by":"auto","created_at":"2025-12-17 14:30:03","extension":"pdf","order_by":16,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":148420,"visible":true,"origin":"","legend":"","description":"","filename":"fig6.pdf","url":"https://assets-eu.researchsquare.com/files/rs-8264218/v1/5957e6274507e06ba9ef8d06.pdf"},{"id":98414376,"identity":"e3b70493-0f63-43b4-a086-c63ce873008f","added_by":"auto","created_at":"2025-12-17 14:30:03","extension":"pdf","order_by":17,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":438319,"visible":true,"origin":"","legend":"","description":"","filename":"fig7.pdf","url":"https://assets-eu.researchsquare.com/files/rs-8264218/v1/b874f51fc11bf9a7e79934e0.pdf"},{"id":98414381,"identity":"154d72d9-9ad5-4099-ab70-64918000da4d","added_by":"auto","created_at":"2025-12-17 14:30:03","extension":"png","order_by":18,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":147186,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefig1.png","url":"https://assets-eu.researchsquare.com/files/rs-8264218/v1/8cce824f1ba76248dab30e35.png"},{"id":98441523,"identity":"dd7bdec9-7b0a-4a57-9026-3d039e6cccb5","added_by":"auto","created_at":"2025-12-17 17:05:34","extension":"png","order_by":19,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":254622,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefig5.png","url":"https://assets-eu.researchsquare.com/files/rs-8264218/v1/728663d96555465abb96e24a.png"},{"id":98414382,"identity":"d81802e1-3ccf-4197-93f1-3442ecc6894f","added_by":"auto","created_at":"2025-12-17 14:30:03","extension":"xml","order_by":20,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":89244,"visible":true,"origin":"","legend":"","description":"","filename":"dd2711b8bb724643b2e5677c649cba9e1structuring.xml","url":"https://assets-eu.researchsquare.com/files/rs-8264218/v1/8c7b29bd41da7c3009251774.xml"},{"id":98440619,"identity":"1c8fd790-f623-4fc7-b985-e9711ef7e5a9","added_by":"auto","created_at":"2025-12-17 17:04:06","extension":"html","order_by":21,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":98977,"visible":true,"origin":"","legend":"","description":"","filename":"earlyproof.html","url":"https://assets-eu.researchsquare.com/files/rs-8264218/v1/ddda981a396813583d11f9b6.html"},{"id":98440638,"identity":"03827255-fc53-4f0c-bd0f-f59dd7e1bd8a","added_by":"auto","created_at":"2025-12-17 17:04:07","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":529320,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eIllustrative Workflow Diagram of Conducted Simulations in the Study\u003c/strong\u003e\u003c/p\u003e","description":"","filename":"fig1.png","url":"https://assets-eu.researchsquare.com/files/rs-8264218/v1/d71011c78a17f36beeac85df.png"},{"id":98414360,"identity":"b98e9648-745b-4f58-905e-d610f2e9810f","added_by":"auto","created_at":"2025-12-17 14:30:03","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":16841,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eComparison of Simulated Masking Ratios and Imputation Performance Across Genotyping Densities\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e(A\u003c/strong\u003e) This panel compares the simulated masking ratios used in our study with the SNP densities of widely used genotyping and sequencing platforms, based on chromosome 1. Each point represents a platform, annotated with its masking ratio (relative to the 1000 Genomes reference) and corresponding SNP density in SNPs per megabase. The highest density corresponds to the unmasked 1000G reference (~24,884 SNPs/Mb), while sparse consumer genotyping arrays such as 23andMe (~189 SNPs/Mb) and AncestryDNA (~120 SNPs/Mb) approximate masking levels above 99%. Mid-range platforms like the UK Biobank Axiom array (~249 SNPs/Mb) align with ~90% masking, and low-pass WGS (0.4×–0.6×) ranges between ~964 and ~1,506 SNPs/Mb.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e(B)\u003c/strong\u003e This panel illustrates the relationship between genotype sparsity and imputation quality, quantified by the distribution of allelic R\u003csup\u003e2\u003c/sup\u003e values. SNPs with high imputation quality (R\u003csup\u003e2\u003c/sup\u003e\u0026gt;0.5) in a dense input dataset (SNP density: 17,500 SNPs/Mb) were identified and their corresponding R\u003csup\u003e2 \u003c/sup\u003evalues were evaluated in a sparser dataset with substantially lower input density. The comparison reveals a downward shift in imputation quality under sparse conditions, with many previously high-quality SNPs exhibiting reduced R\u003csup\u003e2 \u003c/sup\u003evalues, including a notable proportion with R\u003csup\u003e2\u003c/sup\u003e=0.\u003c/p\u003e","description":"","filename":"fig2.png","url":"https://assets-eu.researchsquare.com/files/rs-8264218/v1/942036bef5878ffd32ff1486.png"},{"id":98414361,"identity":"76600d9b-ff54-455c-becd-3ff953a0acfa","added_by":"auto","created_at":"2025-12-17 14:30:03","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":104662,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eDistribution of diverse posterior probability sets of genotypes.\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThis figure illustrates how genotype posterior probability distributions differ between datasets generated at two input SNP densities: ~200 SNPs/Mb (left) and ~50 SNPs/Mb (right). We focus on SNPs with dosages between 0.9 and 1.1. For each SNP, genotype posterior probabilities \u003cstrong\u003ePr(AA, Aa, aa)\u003c/strong\u003e were grouped into discrete probability patterns and aggregated across all individuals from 10 independent simulation replicates.\u003c/p\u003e\n\u003cp\u003eFrom top to bottom, the probability patterns are ordered by decreasing heterozygous posterior probability \u003cstrong\u003ePr(Aa).\u003c/strong\u003eCircle color indicates the frequency of each probability pattern, while circle size reflects the magnitude of the posterior probabilities and their associated entropy-based weights (red circles in the rightmost column; values rounded to one decimal place). The fully confident heterozygous pattern \u003cstrong\u003e(0, 1, 0) — \u003c/strong\u003eshown at the top of the left and middle panels — is excluded from the frequency summaries to highlight the diversity of less-certain genotype calls.\u003c/p\u003e","description":"","filename":"fig3.png","url":"https://assets-eu.researchsquare.com/files/rs-8264218/v1/05d1d89ec32a75450382e9a7.png"},{"id":98440620,"identity":"878c33cc-a9be-4770-a38b-598350eb210d","added_by":"auto","created_at":"2025-12-17 17:04:06","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":50120,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eComparison of p-value Discordance\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003ePanels A and B display the relationship between imputed p-values (x-axis: P\u003csub\u003emethod\u003c/sub\u003e) and the discordance between imputed and true p-values (y-axis: -log10(P\u003csub\u003emethod\u003c/sub\u003e/P\u003csub\u003enomiss\u003c/sub\u003e)). Panel A shows results from the dosage-based method, while Panel B illustrates results obtained using the entropy-weighted method. Panels C and D show the distribution of p-value discordance stratified by imputation quality (R\u003csup\u003e2\u003c/sup\u003e) for the dosage and entropy-weighted methods, respectively. Data were generated from simulations with a SNP density of approximately 2,500 SNPs/Mb (corresponding to 90% masking relative to the full 1000 Genomes reference panel). Positive discordance values indicate inflation of imputed p-values compared to the \"true\" (non-masked) dataset.\u003c/p\u003e","description":"","filename":"fig4.png","url":"https://assets-eu.researchsquare.com/files/rs-8264218/v1/2b1e1ccbd7197592056b5d82.png"},{"id":98414364,"identity":"6246271a-f499-4b2b-9296-635824662fb9","added_by":"auto","created_at":"2025-12-17 14:30:03","extension":"png","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":1434044,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eDiscordance of p values in an extreme sparse dataset\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe hexagonal bins represent the discordance between anticipated p-values and dosage p-values (P\u003csub\u003edosage\u003c/sub\u003e) in panels A, C, and E, as well as weighted p-values (P\u003csub\u003eweight\u003c/sub\u003e) in panels B, D, and F. Notably, the p values of in this figure was adjusted using the false discovery rate (FDR) method, and the figure exclusively exhibits data points where imputed p-values (P\u003csub\u003edosage\u003c/sub\u003e in A, C, and E, P\u003csub\u003eweight\u003c/sub\u003e in B, D, and F) are \u0026lt; 0.05. The color variation within each hexagonal bin reflects the range of discordance values encapsulated by that bin. In a top-to-bottom progression, the panels showcase the interplay between the discordance and either P\u003csub\u003edosage \u003c/sub\u003e(A) or P\u003csub\u003eweight \u003c/sub\u003e(B), and the minor allele frequency (MAF) (C, D), and the imputation quality metric R2 (E, F). Superimposed density plots on the upper part of panels A-D illustrate the distribution of positive discordant values (associated with the inflation of imputed p-values) along the x-axis, categorized by distinct levels of discordance. Red horizontal lines underscore instances of discordance at a value of 0, and blues lines in (E and F) are the best linear fit, providing an illustrative depiction of the alignment between data points. Note that results shown in this figure are from the input data with an approximate SNPs density of 50 SNPs/Mb.\u003c/p\u003e","description":"","filename":"fig5.png","url":"https://assets-eu.researchsquare.com/files/rs-8264218/v1/a649b2df30ac02a83b90af44.png"},{"id":98414363,"identity":"98fb9013-478d-4e57-9564-3e09842e904a","added_by":"auto","created_at":"2025-12-17 14:30:03","extension":"png","order_by":6,"title":"Figure 6","display":"","copyAsset":false,"role":"figure","size":24505,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eComparison of Power and False Discovery Rate Across Genotype Masking Rates\u003c/strong\u003e\u003cbr\u003e\nThis figure evaluates the performance of the dosage-based (dos) and entropy-weighted (wei) association methods under varying levels of genotype masking. In the top row, the line plots depict the \u003cstrong\u003epower \u003c/strong\u003e(recall rate) for each method to detect 100 simulated causal variants as a function of the SNP density. Power is expressed as the percentage of true causal variants successfully identified at an FDR-adjusted significance threshold of p \u0026lt; 0.05.\u003c/p\u003e\n\u003cp\u003eThe bottom row presents the \u003cstrong\u003ecomposition of significant SNPs\u003c/strong\u003e identified by each method at different SNP density. Stacked bar plots show the proportion of significant SNPs corresponding to true positives (TPR, causal variants) and false positives (FDR, non-causal variants).\u003c/p\u003e","description":"","filename":"fig6.png","url":"https://assets-eu.researchsquare.com/files/rs-8264218/v1/a0e40630c82aefa9a9396604.png"},{"id":98441762,"identity":"43057db4-0d8e-4771-ac25-9d584bfc654f","added_by":"auto","created_at":"2025-12-17 17:05:45","extension":"png","order_by":7,"title":"Figure 7","display":"","copyAsset":false,"role":"figure","size":43099,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eSignal Discordance and Recovery Rates Across SNP Density Levels Under Extreme Sparsity\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe p-values obtained from analyses of complete genotype data (P\u003csub\u003enomiss\u003c/sub\u003e), dosage-based imputed data (P\u003csub\u003edosage\u003c/sub\u003e), and entropy-weighted imputed data (P\u003csub\u003eweight\u003c/sub\u003e) were adjusted using the Benjamini–Hochberg procedure to control for multiple testing. SNPs identified as significant (adjusted P\u003csub\u003enomiss\u003c/sub\u003e\u0026lt;0.05) in the complete genotype dataset formed the reference signal set. Panel A shows the \u003cstrong\u003esignal discordance rate\u003c/strong\u003e, defined as the proportion of significant SNPs (adjusted P\u0026lt;0.05) identified in imputed datasets (dosage and entropy methods) that were not part of the reference signal set. Panel B illustrates the \u003cstrong\u003esignal recovery rate\u003c/strong\u003e, defined as the proportion of the reference signal set successfully recovered in each imputed dataset. Results are summarized across simulations spanning multiple SNP densities (200 to 50 SNPs/Mb), corresponding to different genotype masking scenarios.\u003c/p\u003e","description":"","filename":"fig7.png","url":"https://assets-eu.researchsquare.com/files/rs-8264218/v1/f114b0d14ee63f96732b80e1.png"},{"id":98445683,"identity":"9208fe1c-d89b-4c3f-ada1-b7bdef6d402f","added_by":"auto","created_at":"2025-12-17 17:20:49","extension":"pdf","order_by":1,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":1801649,"visible":true,"origin":"","legend":"","description":"","filename":"manu20250730v1.pdf","url":"https://assets-eu.researchsquare.com/files/rs-8264218/v1_covered_309348a7-fc2f-49ac-90bb-5a2b84a1f7c8.pdf"},{"id":98441169,"identity":"826d65a1-bc60-4938-a308-9a7719a75d4c","added_by":"auto","created_at":"2025-12-17 17:05:01","extension":"png","order_by":0,"title":"","display":"","copyAsset":false,"role":"supplement","size":129962,"visible":true,"origin":"","legend":"","description":"","filename":"sup1.png","url":"https://assets-eu.researchsquare.com/files/rs-8264218/v1/605dabb874535a9c69dba330.png"}],"financialInterests":"No competing interests reported.","formattedTitle":"An Advanced Entropy Approach for Minimizing False Discoveries in Imputation-Based Association Analyses","fulltext":[],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":false,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":true,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":true,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"bmc-genomics","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"gics","sideBox":"Learn more about [BMC Genomics](http://bmcgenomics.biomedcentral.com/)","snPcode":"","submissionUrl":"https://www.editorialmanager.com/gics","title":"BMC Genomics","twitterHandle":"#BMCGenomics","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"em","reportingPortfolio":"BMC Series","inReviewEnabled":true,"inReviewRevisionsEnabled":true},"keywords":"Genotype imputation, GWAS power and FDR, Entropy weighting, Dosage-based methods","lastPublishedDoi":"10.21203/rs.3.rs-8264218/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-8264218/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eGenotype imputation is a cornerstone of modern genetic studies, enhancing the resolution of genome-wide association studies (GWAS), fine mapping, and polygenic risk score estimation by inferring untyped variants using reference panels. The output of imputation is a set of probabilistic genotypes, each associated with an inherent degree of uncertainty. However, conventional downstream analyses often overlook this uncertainty, relying instead on allelic dosages\u0026mdash;expected allele counts computed from probabilistic genotypes\u0026mdash;as proxies. This practice can be misleading, as distinct genotype probability distributions may produce identical dosages despite vastly different confidence levels, potentially introducing bias and inflating false discoveries. To address this limitation, we introduce an entropy-weighted association method that explicitly quantifies imputation uncertainty using Shannon entropy. These entropy values are integrated as observation-level weights within the association model, allowing the method to dynamically account for the reliability of each imputed genotype. Through simulation studies, we demonstrate that this approach substantially reduces false positives, especially when genotypic uncertainty is pronounced. Our findings highlight the importance of modeling imputation uncertainty and offer a framework that improves the robustness of GWAS and other genotype imputation-dependent analyses.\u003c/p\u003e","manuscriptTitle":"An Advanced Entropy Approach for Minimizing False Discoveries in Imputation-Based Association Analyses","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-12-17 14:29:58","doi":"10.21203/rs.3.rs-8264218/v1","editorialEvents":[{"type":"communityComments","content":0},{"type":"decision","content":"Revision requested","date":"2026-02-06T16:49:39+00:00","index":"","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2025-12-28T05:49:36+00:00","index":"hide","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2025-12-26T08:07:37+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"4864622769643496328480640632717615448","date":"2025-12-15T16:45:23+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"267836527105115291209953019296230848645","date":"2025-12-15T05:21:37+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"275400881490285634360319668732453107128","date":"2025-12-15T02:46:11+00:00","index":"hide","fulltext":""},{"type":"reviewersInvited","content":"","date":"2025-12-12T05:47:30+00:00","index":"","fulltext":""},{"type":"editorAssigned","content":"","date":"2025-12-10T20:25:41+00:00","index":"","fulltext":""},{"type":"editorInvited","content":"","date":"2025-12-09T15:22:35+00:00","index":"","fulltext":""},{"type":"checksComplete","content":"","date":"2025-12-05T15:55:04+00:00","index":"","fulltext":""},{"type":"submitted","content":"BMC Genomics","date":"2025-12-05T15:48:48+00:00","index":"","fulltext":""}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"bmc-genomics","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"gics","sideBox":"Learn more about [BMC Genomics](http://bmcgenomics.biomedcentral.com/)","snPcode":"","submissionUrl":"https://www.editorialmanager.com/gics","title":"BMC Genomics","twitterHandle":"#BMCGenomics","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"em","reportingPortfolio":"BMC Series","inReviewEnabled":true,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"0632a283-853c-47e3-bb1e-83a136a0805e","owner":[],"postedDate":"December 17th, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"in-revision","subjectAreas":[],"tags":[],"updatedAt":"2026-02-06T16:55:34+00:00","versionOfRecord":[],"versionCreatedAt":"2025-12-17 14:29:58","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-8264218","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-8264218","identity":"rs-8264218","version":["v1"]},"buildId":"8U1c8b4HqxoKbykW_rLl7","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

Ask this paper AI returns verbatim quotes from the full text · source: preprint-html

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2025) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc
last seen: 2026-05-20T01:45:00.602351+00:00