Genome Interpretation of Peruvian Inca Empire Descendants Reveals Actionable Insights | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Article Genome Interpretation of Peruvian Inca Empire Descendants Reveals Actionable Insights Manuel Corpas, Segun Fatumo, Cesar Sanchez, Omar Caceres, Carlos Padilla, and 2 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-6362225/v1 This work is licensed under a CC BY 4.0 License Status: Under Review Version 1 posted You are reading this latest preprint version Abstract Latin American populations remain vastly underrepresented in genomic references, limiting their benefit from precision medicine. Here, we analyzed data from 1,149 individuals in 30 Peruvian populations—including 17 Indigenous and 13 mestizo groups—using deep whole-genome sequencing (n = 150) and high‐density SNP array genotyping (n = 873). Over 1.6 million variants (~ 13%) were novel relative to major databases, underscoring uncharted genetic diversity. We identified 1,210 high‐impact variants (e.g., stop‐gained, splice‐site) with allele frequencies below 1% globally, 94% of which were unannotated in ClinVar. Pharmacogenomic profiling of 56 drug‐response genes revealed distinct allele frequency patterns, including clinically actionable variants in CYP2D6 , CYP2C19 , UGT1A1 , and G6PD , emphasizing the need for population‐specific guidelines. These findings highlight how expanding genomic data from understudied groups can improve risk prediction, enable targeted screening, and ultimately foster more equitable genome medicine throughout Latin America. Biological sciences/Computational biology and bioinformatics/Genome informatics Biological sciences/Genetics/Clinical genetics Biological sciences/Genetics/Genomics/Pharmacogenomics Biological sciences/Genetics/Genomics/Medical genomics Genome Medicine Ancestry Andes Indigenous Genomics Amazon Latin America Genomic Diversity Genetic Epidemiology Equity Inclusion Healthcare Disparities Pharmacogenomics Precision Medicine Figures Figure 1 Figure 2 Figure 3 Figure 4 Introduction Genomic research has predominantly focused on individuals of European descent, leaving Latin American populations largely absent from reference datasets 1–5 . Despite Latin America and the Caribbean encompassing over 8% of the global population, only ~1.3% of all human genomic data originates from these communities 6–8 . This underrepresentation creates a stark imbalance: variant interpretation, genetic risk scores, and drug‐response predictions, developed primarily with European data, may not apply accurately to Latin American groups, limiting the reach of precision medicine advances in these settings 8 . The Andean region of South America—once core to the Inca Empire ( Fig. 1A )—embodies a particularly rich yet understudied genetic heritage. Over centuries, diverse Indigenous groups experienced admixture with European and African lineages, while the upheavals of colonization introduced additional demographic bottlenecks 9–12 . Historical and cultural factors further shaped population structure: some high‐altitude communities evolved robust adaptations to chronic hypoxia, whereas more isolated Amazonian groups retained distinct gene pools 13–16 . Here, we present a genomic profiling of 30 populations ( Fig. 1B ) across coastal, highland, and Amazonian Peru, comprising 1,149 individuals, including 150 high‐coverage (35×) whole‐genome and 873 high‐density SNP array samples. Unlike previous studies that typically survey a single Indigenous group or rely on generic “Latin American” labels, this cohort captures the unprecedented breadth of 17 Indigenous communities and 13 mestizo populations, each with distinct admixture histories. By integrating deep genomic sampling with population‐level substructure analyses, we delineate how genetic diversity is partitioned across Peru’s ecologically diverse landscapes. This approach sheds light on a previously unexplored wealth of novel and high‐impact variants—providing an essential foundation for more accurate disease risk assessments, personalized drug therapies, and equitable applications of genomic medicine in these underserved populations. Results Whole-Genome Mutational Landscape We performed high-coverage (35×) whole-genome sequencing on 150 individuals from seven Peruvian populations (Matzes, Uros, Chopccas, Moches, Iquitos, Cusco, and Trujillo). Initial variant calling yielded ~ 13.0 million single nucleotide polymorphisms (SNPs) and indels, with 4.1 million singleton SNPs, reflecting historical isolation and unique demographic events (Fig. 2 A). Notably, 12.7% (1,638,862) of these variants were absent from dbSNP build 154 17 , underscoring substantial uncharted diversity in these groups. After excluding multi-allelic sites and artifacts, we retained 12,897,474 high-quality biallelic SNPs for subsequent analyses. Matzes and Uros displayed the fewest SNPs per individual, likely reflecting limited gene flow, while Cusco and Iquitos exhibited higher SNP counts consistent with their more cosmopolitan histories (Fig. 2 B). Anthropological records corroborate that the Uros have remained genetically and culturally isolated, occupying remote islands on Lake Titicaca 18 . Such endogamy may have reduced overall polymorphism levels and increase the proportion of rare alleles 19 . High-Impact Variant Discovery To prioritize functionally disruptive variants, we filtered out those with allele frequency < 0.01 in gnomAD 20 , leaving 1,210 high-impact SNVs (Table 1 ). These included 611 premature stop-gain lesions and 514 splice-site disruptions, all predicted to affect gene function significantly. Of these, 93.9% were absent from ClinVar 21 , highlighting a critical gap in current clinical variant classification databases. Together, these data reveal a high burden of unannotated putative loss-of-function alleles in Peruvian populations. Table 1 Overview of single-nucleotide variants (SNVs) identified from 150 high‐coverage (35×) Peruvian genomes. A total of 12,905,020 SNVs were detected, of which 12,897,474 were biallelic. Approximately 12.7% (1,638,862) were novel, having no prior record in public databases. Among these, 1,210 rare (AF in gnomAD < 0.01) variants were classified as HIGH‐impact by Ensembl VEP—defined as lesions likely to truncate or abolish gene function—of which the most frequent were stop‐gained (n = 611), followed by splice‐site (donor / acceptor) disruptions (n = 287, 227, respectively). Fewer start‐lost (n = 58) and stop‐lost (n = 27) variants were observed, yet they remain potentially disruptive by altering canonical translation signals. Variant Category N variants SNVs processed 12,905,020 Biallelic SNVs 12,897,474 Novel SNVs % 1,638,862 12.7% Rare High Impact (< 0.01 gnomAD AF with protein truncation, loss of function or triggering nonsense mediated decay) 1,210 Stop Gained (Rare High Impact) 611 Splice Donor (Rare High Impact) 287 Splice Acceptor (Rare High Impact) 227 Start Lost (Rare High Impact) 58 Stop Lost (Rare High Impact) 27 Interpretation of High-Impact Rare Variants in Peruvian Populations To further characterize disruptive alleles, we focused on biallelic variants with allele frequencies ≥ 0.1 in any of the seven Peruvian subpopulations and < 0.01 in gnomAD. This filter yielded 27 distinct high-impact variants in genes spanning immune regulation, metabolism, and protein homeostasis ( Supplementary Materials ). Several splice‐site variants in diverse genes (e.g., FAM166A , LIN37 , PSRC1 ) appeared at or near fixation (allele frequency > 0.95) in all subpopulations, suggesting these alleles may trace back to early founder events predating population divergence. In silico predictions (CADD_PHRED ≥ 14) imply deleterious impacts, but further transcript‐level studies are needed to confirm whether they cause truncated proteins or novel isoforms. Other variants exhibited more localized distributions, highlighting potential regional adaptations or gene flow patterns. For instance, a stop-gain in UBE2NL was relatively common in highland and Amazonian communities (allele frequency 0.53–0.71), yet much lower (0.30–0.44) on the northern coast (Moches, Trujillo). This enzyme is linked to protein turnover of immune factors 22 , raising questions about whether local pathogen pressures or demographic bottlenecks shaped its current frequency. In the Matzes (an Amazonian group), a C5orf20 nonsense mutation reached 0.79 allele frequency, possibly reflecting the group’s long‐term geographic isolation and small effective population size. A damaging HLA-DQB1 stop‐gain (allele frequency 0.45) emerged in the high‐altitude Uros population, yet was absent or rare elsewhere. Disruption of HLA‐DQB1 might alter antigen presentation, potentially influencing infection risks or autoimmune tendencies in this lake‐dwelling community 23 . Likewise, SEL1L3 start‐lost variants (allele frequency up to 0.23) were frequent among non‐admixed groups such as Matzes and Uros but nearly absent in urban mestizos, possibly affecting endoplasmic reticulum‐related immune processes 24 . Together, these findings underscore the distinct variant landscapes in isolated populations and the corresponding need for tailored disease screening and immunogenetic research. Pharmacogenomic Analysis Next, we assessed 56 known pharmacogenes using PyPGx 25,26 , to infer star-allele haplotypes, diplotypes, and predicted metabolizer phenotypes. Across the seven populations, 24 genes (42.9%) showed significant inter-population frequency differences (p < 0.05), reflecting localized founder effects and distinct admixture histories. Genes such as CYP2C19 , CYP2B6 , and UGT1A1 exhibited clinically actionable variation, often at frequencies divergent from global references. Using the FDA Table of Pharmacogenetic Associations 27 , we identified 73 gene–drug recommendations triggered by specific variants (Fig. 3 ). Notably, CYP2C19 poor metabolizers were more prevalent than expected, which may inform antiplatelet (e.g., clopidogrel) or antidepressant dosing 28 in certain Peruvian subgroups. Such findings underscore the need to integrate population-specific genotype data into prescribing guidelines, potentially reducing adverse events and improving therapeutic efficacy. Genetic Diversity and Structure After merging 150 WGS samples with 722 array-based genotypes and removing closely related individuals (n = 746 final), we conducted principal component analysis (PCA) using ~ 0.94 million intersecting variants. The first two PCs separated individuals largely by Andean high-altitude vs. Amazonian lowland ancestry and revealed African admixture in coastal mestizo groups (Fig. 4 A). Highland populations, particularly Chopccas and Cusco participants, formed a cohesive cluster consistent with a shared Inca-lineage background. Some sub-clustering also reflected language groups (Quechua vs. Aymara) (Fig. 4 B). Further fine-scale structure emerged on PCs 3 and 4, highlighting additional subpopulations within the highlands and Amazon (Fig. 4 C). A global PCA with Simons Genome Diversity Project samples 29 placed these Indigenous Peruvians close to other Native American groups, while Afro-Peruvians clustered near African references (Fig. 4 D). These patterns align with historical migrations—underscoring that even within “mestizo” classifications, there is marked ancestral heterogeneity that may carry health and pharmacogenomic implications. Discussion Our comprehensive genomic profiling of multiple Peruvian populations provides a unique window into the clinical and evolutionary significance of underexplored genetic variation in Latin America. By characterizing whole-genome sequences and high-density SNP array data across 30 diverse groups, we uncovered a high proportion of rare and novel alleles, many of which are unknown or poorly annotated in existing clinical databases. These findings address a critical gap in genomic references, given that Latin American populations represent nearly 10% of the global population yet historically account for only around 1% of the data in large consortia 3,6,8 . The skewed Eurocentric focus of current databases limits diagnostic accuracy, disease-risk stratification, and pharmacogenomic utility for Indigenous and admixed communities. Our study highlights how more inclusive population references can advance precision medicine by improving variant interpretation and tailoring clinical interventions to the genetic architecture of understudied groups. A major unmet clinical need involves identifying and validating high-impact variants relevant to both infectious and non-communicable diseases. In our cohort, roughly one-eighth of all single nucleotide polymorphisms (SNPs) were missing from dbSNP, underscoring the extent of “hidden” diversity that standard resources overlook. Even more striking, 94% of the rare high-impact variants that exceeded 10% allele frequency in our Peruvian populations had no entry in ClinVar, an essential database for translating genetic findings into clinical decision-making. This gap poses real dangers for patient care: healthcare practitioners relying on incomplete references may misclassify or completely miss variants that significantly contribute to disease risk, immune function, or drug metabolism in patients of Latin American ancestry. By systematically cataloging these unknown or poorly annotated sites, our work offers a foundation to refine carrier screening, diagnostic interpretations, and population-based risk predictions. Within the immune-related genes, we observed multiple high-frequency truncating or splice-disrupting variants that appear to be geographically clustered—especially in more isolated communities such as the Uros, Matzes, and certain Andean highlanders. These populations exhibited notable allelic patterns in key loci involved in antiviral defense ( UBE2NL , NLRP8 , and others) and antigen presentation (e.g., HLA-DQB1 ). Although functional assays are needed to clarify their exact impact on infection susceptibility or vaccine response, the enrichment of disruptive alleles suggests possible historical adaptations to local pathogen pressures. In a public health context, these variants may help explain population-specific immune profiles and disease burdens 30,31 . Targeted screening or tailored vaccination schedules could eventually be considered, particularly in communities historically prone to certain infections or inflammatory conditions. Our pharmacogenomic analysis points to similarly important clinical implications. We assessed 56 pharmacogenes with known relevance to drug metabolism and toxicity risk—genes such as CYP2C19 , CYP2D6 , UGT1A1 , and DPYD . For nearly half of these loci, we detected significant genotype frequency differences among the seven populations with high-coverage whole-genome data. In some subgroups, “poor metabolizer” phenotypes were more common than global references would predict, indicating a heightened risk of adverse drug reactions if standard dosing is applied without genetic considerations. For instance, CYP2C19 poor metabolizers might face higher failure rates or elevated side effects when prescribed antiplatelet agents like clopidogrel at standard doses. Incorporating these data into local prescribing guidelines could enhance therapeutic safety and efficacy, particularly for individuals carrying variants at elevated frequencies that have been overlooked in mainstream references. From a broader perspective, the high levels of admixture in coastal populations and the unique bottlenecks in Titicaca Lake or rainforest communities highlight why uniform “Latino” or “Hispanic” categories can be misleading in a precision medicine context. Within-country differences in genetic structure can be more pronounced than aggregated continental labels suggest. Our principal component analyses and population-specific variant frequencies underscore how local genealogical histories shape allele distributions relevant to immune function and drug response. This complex heterogeneity signals a need for region- or group-specific reference panels and a commitment to collecting data at finer geographic and cultural scales. Despite these advances, there are several key limitations to note. First, our study design is cross-sectional and focused on genetic variation without direct measurements of clinical outcomes or phenotypic traits. While we can highlight potential disease susceptibilities or pharmacogenomic risks, we cannot confirm whether these variants translate to real-world differences in infection rates, autoimmune conditions, or adverse drug events. Prospective studies correlating genotypes with health records or clinical trials would be essential to establish clinical utility definitively. Second, many of the variants we prioritized for their presumed high impact remain unvalidated experimentally. Computational tools like CADD 32 and VEP 33 can flag a variant as disruptive, but functional assays—ranging from in vitro gene expression to protein-domain mapping—are crucial to ascertain the actual magnitude of phenotypic change. Third, though we cover 30 populations, Peru is home to many additional Indigenous groups with distinct ancestries and cultural practices 34 . Our current sample may not fully capture the entire spectrum of Peruvian genomic diversity. Finally, while we engaged with communities and sought to respect data sovereignty, expanding these efforts toward longitudinal sampling and stronger local research capacity remains a work in progress. Future research efforts should emphasize bridging genomic findings with public health and clinical applications. A logical step is to initiate community-led epidemiological cohorts in which identified immune and pharmacogenomic variants are systematically evaluated for associations with infection outcomes, disease phenotypes, and therapy responses. Further, building specialized reference panels for the most isolated or genetically distinct Peruvian groups could minimize false negatives in diagnostic labs and guide dose modifications for critical medications. On a functional level, collaborating with molecular biology laboratories to perform targeted assays on top candidate variants (e.g., HLA-DQB1 truncations) would clarify mechanistic underpinnings of immune or metabolic phenotypes. Equally important, sustained capacity-building within Peruvian institutions and continued dialogue with local communities will help ensure that genomic research benefits are realized where they are most needed. By integrating these steps—functional validation, clinical correlations, expanded sampling, and robust partnerships—we can accelerate equitable precision medicine across not only Peru but other underserved Latin American regions. Online Methods Ethics and Consent All sample collection and data analyses were conducted in accordance with the Declaration of Helsinki for medical research involving human subjects and were approved by the Instituto Nacional de Salud del Perú (approval no. OI-003-11 and no. OI-087-13). Prior to initiating fieldwork, researchers spent one month preparing communication materials in Spanish and local Indigenous languages. These included a brochure describing the project, a poster explaining informed consent, and radio and television announcements. The final decision to participate was made collectively by each community. Potential participants were identified based on inclusion criteria, and written informed consent was obtained in the presence of a translator and two local witnesses. Subjects could withdraw at any time and request destruction of their samples or data. In total, 22 individuals declined participation and 2 withdrew after enrollment. Population Selection and Sample Collection A stakeholder panel comprising local community representatives and the Peruvian Ministry of Culture oversaw the selection of 30 target populations, chosen to reflect Peru’s coastal, Andean, and Amazonian diversity. Inclusion criteria for “native individuals” required that parents and grandparents be born in the same community, have a mother tongue other than Spanish, and a cultural legacy with minimal outside admixture. This process yielded 17 Indigenous and 13 mestizo populations (1,149 participants total). Of these, 723 were genotyped on a 2.5M Illumina array, and 150 were selected for high-coverage (35×) whole-genome sequencing (WGS). Genotyping, Whole Genome Sequencing, and Variant Calling Genomic DNA was extracted from peripheral blood samples. SNP arrays (2.5M Illumina) were processed per the manufacturer’s protocol. For the 150 WGS samples (Matsés, Uros, Chopccas, Moches, Iquitos, Cusco, Trujillo), libraries were prepared for Illumina HiSeq X10 (35× coverage). Reads were aligned to hg19 using BWA-MEM 35 , and duplicates marked with Picard 36 . Variant calling was performed jointly with GATK 37 UnifiedGenotyper for biallelic single nucleotide variants (SNVs) across the nuclear and mitochondrial genomes. Low-confidence (LowQual) calls and sites with Phred < 20 were excluded. Kinship Analysis To identify and remove closely related individuals, we retained 12,897,474 variants from the WGS set and estimated kinship coefficients with PLINK (v1.9) 38 . Any pair with PI_HAT > 0.95 was flagged as duplicates. We then used the --remove option to keep only one representative per pair, yielding 109 unrelated WGS individuals from the original 150. Merging Array and WGS Samples, Identity-by-Descent Filtering We merged the 150 WGS with 722 array-based genotypes to form a single dataset of overlapping variants (~ 936k). To remove duplicate or closely related individuals, we again applied PLINK (v1.9) with the --genome flag. Samples above the PI_HAT > 0.95 threshold were excluded, resulting in a final curated set of 109 WGS and 627 array samples (736 total). Variant Effect Predictor (VEP) Analysis To characterize coding variants and predict functional impacts, we used the Ensembl Variant Effect Predictor (VEP) 33 on the final WGS dataset. Only biallelic SNVs passing quality filters were included. We further focused on “HIGH-impact” variants (stop-gained, frameshift, splice donor/acceptor, etc.) by uploading only those VEP annotations with strong evidence for loss of function. Potential confounding factors (multi-allelic splitting, ambiguous alternate alleles, and partial annotation mismatches) were reconciled as described in the Supplementary Methods. Allele Count and Frequency Analysis For each variant, we calculated the frequency of alternative alleles in each of the seven WGS populations. Variants with minor allele frequency (MAF) 10 and required at least five allele counts in one population. Comparisons with gnomAD 20 AMR and Fisher’s exact tests were performed to identify population-specific deviations of clinical or evolutionary interest. Pharmacogenomic Annotation and PyPGx Integration Fifty-six pharmacogenes were studied using PyPGx 25 , which infers star-allele diplotypes and predicted metabolizer phenotypes. We matched these phenotypes to FDA-labeled gene–drug pairs (Table of Pharmacogenetic Associations) 27 , employing both direct category overlap (e.g., “poor metabolizer”) and a fallback fuzzy matching (threshold 85% similarity). This approach minimized spurious mappings and enabled personalized gene–drug interaction assessments. Genotype Frequency Analysis We computed genotype frequencies within and across populations for the curated pharmacogenes to capture inter-individual variability. Three metrics were derived for each genotype: Absolute count, Fractional frequency, Percentage representation. Comparisons among populations highlight potential founder effects or selective pressures relevant to drug metabolism. Principal Component Analysis (PCA) We performed PCA with PLINK (v1.90) using ~ 0.94 million intersecting variants to visualize genetic relationships among the 746 unrelated individuals (109 WGS + 627 array samples). Eigenvalues/eigenvectors were computed from the genetic relationship matrix. Additional reference samples (Simons Genome Diversity Project 29 ) were merged for global contextualization (Supplementary Methods). Data Availability and Code Access All WGS and SNP array data generated by the Peruvian Genome Project are deposited at the European Genome-phenome Archive (EGA) under accession numbers EGAD00010001958, EGAD00010001990 (WGS) and EGAD00010001991, EGAD00010001992 (array). Requests for controlled access must be submitted to our Data Access Committee, which reviews proposals for consistency with informed consent and community agreements. The data processing pipelines, including scripts and configuration files, are available on GitHub at https://github.com/manuelcorpas/15-PERU . Community-Led Return of Results Per a community-led framework, each group could request aggregate variant frequencies and consult with local healthcare professionals about individual results. Data sovereignty remains with Indigenous communities; secondary analyses require approval from both the institutional review board and community leaders. By respecting self-determination in data usage, we aim to foster equitable collaboration and avoid exploitative research practices. References Lemke, A. A. et al. Addressing underrepresentation in genomics research through community engagement. Am J Hum Genet 109 , 1563–1571 (2022). Samarasinghe, S. R. et al. Mapping the Pharmacogenetic Landscape in a Ugandan Population: Implications for Personalized Medicine in an Underrepresented Population. Clin. Pharmacol. Ther. (2024). Corpas, M. et al. Addressing Ancestry and Sex Bias in Pharmacogenomics. Annu. Rev. Pharmacol. Toxicol. 64 , (2024). Guio, H. et al. The Peruvian Genome Project: expanding the global pool of genome diversity from South America. medRxiv 2024–05 (2024). Harris, D. N. et al. Evolutionary genomic dynamics of Peruvians before, during, and after the Inca Empire. Proc. Natl. Acad. Sci. U. S. A. 115 , E6526–E6535 (2018). Fatumo, S. et al. A roadmap to increase diversity in genomic studies. Nat Med 28 , 243–250 (2022). Geneticists Drive Inclusive Research. Chan Zuckerberg Initiative https://chanzuckerberg.com/blog/indigenous-latin-american-representation-genomics/. Corpas, M. et al. Bridging genomics’ greatest challenge: The diversity gap. Cell Genomics 5 , (2025). De Oliveira, T. C., Secolin, R. & Lopes-Cendes, I. A review of ancestrality and admixture in Latin America and the caribbean focusing on native American and African descendant populations. Front. Genet. 14 , (2023). Andean peoples | Pre-Columbian Cultures, Indigenous Tribes & History | Britannica. https://www.britannica.com/topic/Andean-peoples (2025). Peru - Spanish Conquest, Inca Empire, Andes | Britannica. https://www.britannica.com/place/Peru/Discovery-and-exploration-by-Europeans (2025). Lindo, J. et al. A time transect of exomes from a Native American population before and after European contact. Nat. Commun. 7 , 13175 (2016). Barbieri, C. et al. The Current Genomic Landscape of Western South America: Andes, Amazonia, and Pacific Coast. Mol. Biol. Evol. 36 , 2698–2713 (2019). Borda, V. et al. The genetic structure and adaptation of Andean highlanders and Amazonians are influenced by the interplay between geography and culture. Proc. Natl. Acad. Sci. 117 , 32557–32565 (2020). Julian, C. G. & Moore, L. G. Human Genetic Adaptation to High Altitude: Evidence from the Andes. Genes 10 , 150 (2019). Caro-Consuegra, R. et al. Uncovering Signals of Positive Selection in Peruvian Populations from Three Ecological Regions. Mol. Biol. Evol. 39 , msac158 (2022). Sherry, S. T. et al. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 29 , 308–311 (2001). Kent, M. The importance of being Uros: Indigenous identity politics in the genomic age. Soc. Stud. Sci. 43 , 534–556 (2013). Prohaska, A. et al. Human Disease Variation in the Light of Population Genomics. Cell 177 , 115–131 (2019). Karczewski, K. J. et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature 581 , 434–443 (2020). Landrum, M. J. et al. ClinVar: improving access to variant interpretations and supporting evidence. Nucleic Acids Res. 46 , D1062–D1067 (2018). Lee, M. J. et al. UBE2N is essential for maintenance of skin homeostasis and suppression of inflammation. bioRxiv 2023.12.01.569631 (2023) doi:10.1101/2023.12.01.569631. Arnaiz-Villena, A. et al. HLA genes in Uros from Titikaka Lake, Peru: origin and relationship with other Amerindians and worldwide populations. Int. J. Immunogenet. 36 , 159–167 (2009). Correa-Medero, L. O. et al. ER-associated degradation adapter Sel1L is required for CD8 + T cell function and memory formation following acute viral infection. Cell Rep. 43 , 114156 (2024). Lee, S. ‘Steven’. sbslee/pypgx. (2025). Lee, S., Shin, J.-Y., Kwon, N.-J., Kim, C. & Seo, J.-S. ClinPharmSeq: A targeted sequencing panel for clinical pharmacogenetics implementation. PLOS ONE 17 , e0272129 (2022). Health, C. for D. and R. Table of Pharmacogenetic Associations. FDA (2022). Nguyen, A. B., Cavallari, L. H., Rossi, J. S., Stouffer, G. A. & Lee, C. R. Evaluation of race and ethnicity disparities in outcome studies of CYP2C19 genotype-guided antiplatelet therapy. Front. Cardiovasc. Med. 9 , 991646 (2022). The Simons Genome Diversity Project: 300 genomes from 142 diverse populations | Nature. https://www.nature.com/articles/nature18964. Accinelli, R. A. & Leon-Abarca, J. A. At High Altitude COVID-19 Is Less Frequent: The Experience of Peru. Arch. Bronconeumol. 56 , 760–761 (2020). Gao, W. et al. The Deubiquitinase USP29 Promotes SARS-CoV-2 Virulence by Preventing Proteasome Degradation of ORF9b. mBio 13 , e0130022 (2022). Rentzsch, P., Witten, D., Cooper, G. M., Shendure, J. & Kircher, M. CADD: predicting the deleteriousness of variants throughout the human genome. Nucleic Acids Res. 47 , D886–D894 (2019). McLaren, W. et al. The Ensembl Variant Effect Predictor. Genome Biol. 17 , 122 (2016). Indigenous peoples of Peru. Wikipedia (2025). Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. Preprint at https://doi.org/10.48550/arXiv.1303.3997 (2013). Picard Tools - By Broad Institute. https://broadinstitute.github.io/picard/. DePristo, M. A. et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat. Genet. 43 , 491–498 (2011). Purcell, S. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81 , 559–575 (2007). Additional Declarations Yes there is potential Competing Interest. MC is currently associated to Cambridge Precision Medicine Limited. Supplementary Files SUPPLTABLE1Consequenceranking.xlsx Ranking used to select variants with different consequences SUPPLTABLE2pgxphenotypesbypopulation.csv Allele frequency patterns for each observed pharmacogenomic star (*) allele SUPPLTABLE3pgxfdamatches.csv Summary of all FDA matched recommendations for each individual SupplementaryMaterialsv2.docx Supplementary Materials Cite Share Download PDF Status: Under Review Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-6362225","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Article","associatedPublications":[],"authors":[{"id":450255139,"identity":"c2e905d5-830e-4efe-ae3e-46ae1baa647d","order_by":0,"name":"Manuel Corpas","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAABB0lEQVRIiWNgGAWjYBADOQhlgCQkQUCLMRAzNoC08BCrJbEBrIWBCC387b3PHvz4sy19w/H2549uFNyxt2fvfcDwo4YhcWYDdi0SZ46bG/a23c7dcOaMYXOOwbPEHp7jBow9xxgSZ+OwxUAijU2Ct+F27rYbOYxALYcTeCTSGBh4GxgS5+HRIvnnz+10s/vPH4K02PPIP2Ng/EtAizQP2+0EsxsMIIcdZuyRYGNgBtmCy2ESZ46xG8u23TbcfybHcDZQS2LPmTSGwzLHJIxxeZ+/vY3t4Zs/t+Ul248/+Jzz57A9e/sxxodvamxkZxzAYQ0DAxum0AECEYlFyygYBaNgFIwCZAAAh2pbedVjI90AAAAASUVORK5CYII=","orcid":"https://orcid.org/0000-0002-4417-1018","institution":"University of Westminster","correspondingAuthor":true,"prefix":"","firstName":"Manuel","middleName":"","lastName":"Corpas","suffix":""},{"id":450255140,"identity":"5c42411f-7a79-4559-b49f-7ebf45c19d99","order_by":1,"name":"Segun Fatumo","email":"","orcid":"","institution":"Queen Mary University of London","correspondingAuthor":false,"prefix":"","firstName":"Segun","middleName":"","lastName":"Fatumo","suffix":""},{"id":450255141,"identity":"75c6a866-a19c-49c6-98e1-3be640d88483","order_by":2,"name":"Cesar Sanchez","email":"","orcid":"","institution":"Instituto Nacional de Salud","correspondingAuthor":false,"prefix":"","firstName":"Cesar","middleName":"","lastName":"Sanchez","suffix":""},{"id":450255142,"identity":"cceadca2-385b-4789-8700-680f53111fad","order_by":3,"name":"Omar Caceres","email":"","orcid":"","institution":"Instituto Nacional de Salud","correspondingAuthor":false,"prefix":"","firstName":"Omar","middleName":"","lastName":"Caceres","suffix":""},{"id":450255143,"identity":"e97ca2d5-d202-4d7d-8a43-90b9bb494fa1","order_by":4,"name":"Carlos Padilla","email":"","orcid":"","institution":"Instituto Nacional de Peru","correspondingAuthor":false,"prefix":"","firstName":"Carlos","middleName":"","lastName":"Padilla","suffix":""},{"id":450255144,"identity":"4693c2df-abeb-4b34-99be-bdd9512499a4","order_by":5,"name":"Julio Valdivia-Silva","email":"","orcid":"https://orcid.org/0000-0002-7061-3756","institution":"UTEC","correspondingAuthor":false,"prefix":"","firstName":"Julio","middleName":"","lastName":"Valdivia-Silva","suffix":""},{"id":450255145,"identity":"ba2086cd-dc8a-481c-ad0c-6471ed8b524e","order_by":6,"name":"Heinner Guio","email":"","orcid":"","institution":"INBIOMEDIC","correspondingAuthor":false,"prefix":"","firstName":"Heinner","middleName":"","lastName":"Guio","suffix":""}],"badges":[],"createdAt":"2025-04-02 14:31:49","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-6362225/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-6362225/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":88304994,"identity":"254d19d7-bfb3-4c3f-b427-748c54111001","added_by":"auto","created_at":"2025-08-05 05:38:38","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":510652,"visible":true,"origin":"","legend":"\u003cp\u003e(\u003cstrong\u003eA\u003c/strong\u003e) Illustration of the maximum territorial expansion of the Inca Empire circa 1525. (\u003cstrong\u003eB\u003c/strong\u003e) Locations of the 30 Peruvian populations included in this genetic analysis. Blue circles represent population sampling sites where each population typically resides (some circles are overlapping).\u003c/p\u003e","description":"","filename":"1.png","url":"https://assets-eu.researchsquare.com/files/rs-6362225/v1/6ab4943b2c9b3bbf6112464d.png"},{"id":88302896,"identity":"e56aeb8b-70f0-4f1a-9950-0c78f6083d44","added_by":"auto","created_at":"2025-08-05 05:06:38","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":92239,"visible":true,"origin":"","legend":"\u003cp\u003e(A) Boxplot illustrating the distribution of SNP counts per individual across the seven Peruvian populations (Matzes, Uros, Chopccas, Moches, Trujillo, Cusco, and Iquitos). Boxes span the interquartile range, horizontal lines mark median values, and whiskers show the non‐outlier range. Notably, Matzes and Uros have the lowest SNP counts, while Cusco and Iquitos sit at the higher end, consistent with each group’s historical patterns of isolation or admixture.\u003cstrong\u003e \u003c/strong\u003e(B) Bar chart displaying the total number of private variants unique to each population. Matzes again exhibits the fewest private variants, whereas Trujillo, Cusco, and Iquitos contain the highest counts.\u003c/p\u003e","description":"","filename":"2.png","url":"https://assets-eu.researchsquare.com/files/rs-6362225/v1/67717ab7c0d96ec3142bc88a.png"},{"id":88302897,"identity":"e7802e02-af40-45e0-a935-c5911bbafc13","added_by":"auto","created_at":"2025-08-05 05:06:38","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":56739,"visible":true,"origin":"","legend":"\u003cp\u003eTotal FDA Recommendations by Population in our sequenced dataset. Each horizontal bar represents the total number of matched pharmacogenomic recommendations in that population. The y-axis lists the populations in descending order of recommendation count, while the x-axis indicates how many gene‐drug warnings or dosing guidelines from the FDA were triggered by individuals in each group.\u003c/p\u003e","description":"","filename":"3.png","url":"https://assets-eu.researchsquare.com/files/rs-6362225/v1/7b8cb403b78cdc7bf19346a1.png"},{"id":88305103,"identity":"0a372114-36f8-4c1e-8036-7816dba9cad7","added_by":"auto","created_at":"2025-08-05 05:38:48","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":765575,"visible":true,"origin":"","legend":"\u003cp\u003ePrincipal Component Analysis (PCA) of Peruvian Populations after removing related individuals (746 samples). (\u003cstrong\u003eA\u003c/strong\u003e) PCA of ARRAY \u0026amp; WGS samples shows genetic structure across Peruvian populations. Andean highlanders (e.g., Cusco, Chopccas) form a tight cluster, Amazonian groups (e.g., Matzes, Awajun) are distinct, and coastal mestizos (e.g., Lima, Trujillo) form a gradient of admixture. Afro-Peruvian individuals cluster separately due to African ancestry. (\u003cstrong\u003eB\u003c/strong\u003e) PCA of common variants confirms population structure, with indigenous Andean and Amazonian groups remaining distinct while mestizos form a continuum. (\u003cstrong\u003eC\u003c/strong\u003e) PCA with PC3 vs. PC2 highlights finer genetic substructure, distinguishing Aymara- and Quechua-speaking groups and showing Amazonian genetic diversity. (\u003cstrong\u003eD\u003c/strong\u003e) PCA with Simons Genome Diversity Project (SGDP) data contextualizes Peruvians globally. Indigenous Peruvians cluster with Native American groups, while Afro-Peruvians align with African populations, reflecting historical migrations.\u003c/p\u003e","description":"","filename":"4.png","url":"https://assets-eu.researchsquare.com/files/rs-6362225/v1/1c53db11f29760afed227bd5.png"},{"id":88506580,"identity":"dc634812-54d2-4093-ba19-884864e02b26","added_by":"auto","created_at":"2025-08-07 07:32:59","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":2122348,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-6362225/v1/6c6b968b-6383-4e1c-80e6-5f14401814e6.pdf"},{"id":88302899,"identity":"465a90bc-2f45-47ba-9a79-47da57b4d1c6","added_by":"auto","created_at":"2025-08-05 05:06:38","extension":"xlsx","order_by":1,"title":"","display":"","copyAsset":false,"role":"supplement","size":10430,"visible":true,"origin":"","legend":"Ranking used to select variants with different consequences","description":"","filename":"SUPPLTABLE1Consequenceranking.xlsx","url":"https://assets-eu.researchsquare.com/files/rs-6362225/v1/49ab3455e80feccb6b6e2924.xlsx"},{"id":88305104,"identity":"6df05cc9-0397-4b0e-a325-1e734d6fe19e","added_by":"auto","created_at":"2025-08-05 05:38:50","extension":"csv","order_by":2,"title":"","display":"","copyAsset":false,"role":"supplement","size":56396,"visible":true,"origin":"","legend":"Allele frequency patterns for each observed pharmacogenomic star (*) allele","description":"","filename":"SUPPLTABLE2pgxphenotypesbypopulation.csv","url":"https://assets-eu.researchsquare.com/files/rs-6362225/v1/67f2635fa51162de2377216a.csv"},{"id":88505309,"identity":"a28a5789-0bee-4993-a86f-79681fadbf34","added_by":"auto","created_at":"2025-08-07 07:23:29","extension":"csv","order_by":3,"title":"","display":"","copyAsset":false,"role":"supplement","size":11359,"visible":true,"origin":"","legend":"Summary of all FDA matched recommendations for each individual","description":"","filename":"SUPPLTABLE3pgxfdamatches.csv","url":"https://assets-eu.researchsquare.com/files/rs-6362225/v1/b6b67a94c8b9db9fe6b6278d.csv"},{"id":88302919,"identity":"f79f4558-7108-464c-8201-7534dee523f3","added_by":"auto","created_at":"2025-08-05 05:06:39","extension":"docx","order_by":4,"title":"","display":"","copyAsset":false,"role":"supplement","size":1693136,"visible":true,"origin":"","legend":"Supplementary Materials","description":"","filename":"SupplementaryMaterialsv2.docx","url":"https://assets-eu.researchsquare.com/files/rs-6362225/v1/c562587b67d6ed4e94517c0b.docx"}],"financialInterests":"\u003cb\u003eYes\u003c/b\u003e there is potential Competing Interest.\nMC is currently associated to Cambridge Precision Medicine Limited.","formattedTitle":"Genome Interpretation of Peruvian Inca Empire Descendants Reveals Actionable Insights","fulltext":[{"header":"Introduction","content":"\u003cp\u003eGenomic research has predominantly focused on individuals of European descent, leaving Latin American populations largely absent from reference datasets \u003csup\u003e1–5\u003c/sup\u003e. Despite Latin America and the Caribbean encompassing over 8% of the global population, only ~1.3% of all human genomic data originates from these communities \u003csup\u003e6–8\u003c/sup\u003e. This underrepresentation creates a stark imbalance: variant interpretation, genetic risk scores, and drug‐response predictions, developed primarily with European data, may not apply accurately to Latin American groups, limiting the reach of precision medicine advances in these settings \u003csup\u003e8\u003c/sup\u003e.\u003c/p\u003e\n\u003cp\u003eThe Andean region of South America—once core to the Inca Empire (\u003cstrong\u003eFig. 1A\u003c/strong\u003e)—embodies a particularly rich yet understudied genetic heritage. Over centuries, diverse Indigenous groups experienced admixture with European and African lineages, while the upheavals of colonization introduced additional demographic bottlenecks \u003csup\u003e9–12\u003c/sup\u003e. Historical and cultural factors further shaped population structure: some high‐altitude communities evolved robust adaptations to chronic hypoxia, whereas more isolated Amazonian groups retained distinct gene pools \u003csup\u003e13–16\u003c/sup\u003e.\u003c/p\u003e\n\u003cp\u003eHere, we present a genomic profiling of 30 populations (\u003cstrong\u003eFig. 1B\u003c/strong\u003e) across coastal, highland, and Amazonian Peru, comprising 1,149 individuals, including 150 high‐coverage (35×) whole‐genome and 873 high‐density SNP array samples. Unlike previous studies that typically survey a single Indigenous group or rely on generic “Latin American” labels, this cohort captures the unprecedented breadth of 17 Indigenous communities and 13 mestizo populations, each with distinct admixture histories. By integrating deep genomic sampling with population‐level substructure analyses, we delineate how genetic diversity is partitioned across Peru’s ecologically diverse landscapes. This approach sheds light on a previously unexplored wealth of novel and high‐impact variants—providing an essential foundation for more accurate disease risk assessments, personalized drug therapies, and equitable applications of genomic medicine in these underserved populations.\u003c/p\u003e"},{"header":"Results","content":"\u003cdiv id=\"Sec2\" class=\"Section2\"\u003e \u003ch2\u003eWhole-Genome Mutational Landscape\u003c/h2\u003e \u003cp\u003eWe performed high-coverage (35\u0026times;) whole-genome sequencing on 150 individuals from seven Peruvian populations (Matzes, Uros, Chopccas, Moches, Iquitos, Cusco, and Trujillo). Initial variant calling yielded\u0026thinsp;~\u0026thinsp;13.0\u0026nbsp;million single nucleotide polymorphisms (SNPs) and indels, with 4.1\u0026nbsp;million singleton SNPs, reflecting historical isolation and unique demographic events (Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e2\u003c/span\u003eA). Notably, 12.7% (1,638,862) of these variants were absent from dbSNP build 154 \u003csup\u003e17\u003c/sup\u003e, underscoring substantial uncharted diversity in these groups.\u003c/p\u003e \u003cp\u003eAfter excluding multi-allelic sites and artifacts, we retained 12,897,474 high-quality biallelic SNPs for subsequent analyses. Matzes and Uros displayed the fewest SNPs per individual, likely reflecting limited gene flow, while Cusco and Iquitos exhibited higher SNP counts consistent with their more cosmopolitan histories (Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e2\u003c/span\u003eB). Anthropological records corroborate that the Uros have remained genetically and culturally isolated, occupying remote islands on Lake Titicaca \u003csup\u003e18\u003c/sup\u003e. Such endogamy may have reduced overall polymorphism levels and increase the proportion of rare alleles \u003csup\u003e19\u003c/sup\u003e.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec3\" class=\"Section2\"\u003e \u003ch2\u003eHigh-Impact Variant Discovery\u003c/h2\u003e \u003cp\u003eTo prioritize functionally disruptive variants, we filtered out those with allele frequency\u0026thinsp;\u0026lt;\u0026thinsp;0.01 in gnomAD \u003csup\u003e20\u003c/sup\u003e, leaving 1,210 high-impact SNVs (Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e). These included 611 premature stop-gain lesions and 514 splice-site disruptions, all predicted to affect gene function significantly. Of these, 93.9% were absent from ClinVar \u003csup\u003e21\u003c/sup\u003e, highlighting a critical gap in current clinical variant classification databases. Together, these data reveal a high burden of unannotated putative loss-of-function alleles in Peruvian populations.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab1\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 1\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eOverview of single-nucleotide variants (SNVs) identified from 150 high‐coverage (35\u0026times;) Peruvian genomes. A total of 12,905,020 SNVs were detected, of which 12,897,474 were biallelic. Approximately 12.7% (1,638,862) were novel, having no prior record in public databases. Among these, 1,210 rare (AF in gnomAD\u0026thinsp;\u0026lt;\u0026thinsp;0.01) variants were classified as HIGH‐impact by Ensembl VEP\u0026mdash;defined as lesions likely to truncate or abolish gene function\u0026mdash;of which the most frequent were stop‐gained (n\u0026thinsp;=\u0026thinsp;611), followed by splice‐site (donor / acceptor) disruptions (n\u0026thinsp;=\u0026thinsp;287, 227, respectively). Fewer start‐lost (n\u0026thinsp;=\u0026thinsp;58) and stop‐lost (n\u0026thinsp;=\u0026thinsp;27) variants were observed, yet they remain potentially disruptive by altering canonical translation signals.\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"2\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eVariant Category\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eN variants\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eSNVs processed\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e12,905,020\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eBiallelic SNVs\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e12,897,474\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eNovel SNVs\u003c/p\u003e \u003cp\u003e%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e1,638,862\u003c/p\u003e \u003cp\u003e12.7%\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eRare High Impact\u003c/p\u003e \u003cp\u003e(\u0026lt;\u0026thinsp;0.01 gnomAD AF with protein truncation, loss of function or triggering nonsense mediated decay)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e1,210\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eStop Gained (Rare High Impact)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e611\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eSplice Donor (Rare High Impact)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e287\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eSplice Acceptor (Rare High Impact)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e227\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eStart Lost (Rare High Impact)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e58\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eStop Lost (Rare High Impact)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e27\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003c/div\u003e\n\u003ch3\u003eInterpretation of High-Impact Rare Variants in Peruvian Populations\u003c/h3\u003e\n\u003cp\u003eTo further characterize disruptive alleles, we focused on biallelic variants with allele frequencies\u0026thinsp;\u0026ge;\u0026thinsp;0.1 in any of the seven Peruvian subpopulations and \u0026lt;\u0026thinsp;0.01 in gnomAD. This filter yielded 27 distinct high-impact variants in genes spanning immune regulation, metabolism, and protein homeostasis (\u003cb\u003eSupplementary Materials\u003c/b\u003e). Several splice‐site variants in diverse genes (e.g., \u003cem\u003eFAM166A\u003c/em\u003e, \u003cem\u003eLIN37\u003c/em\u003e, \u003cem\u003ePSRC1\u003c/em\u003e) appeared at or near fixation (allele frequency\u0026thinsp;\u0026gt;\u0026thinsp;0.95) in all subpopulations, suggesting these alleles may trace back to early founder events predating population divergence. In silico predictions (CADD_PHRED\u0026thinsp;\u0026ge;\u0026thinsp;14) imply deleterious impacts, but further transcript‐level studies are needed to confirm whether they cause truncated proteins or novel isoforms.\u003c/p\u003e \u003cp\u003eOther variants exhibited more localized distributions, highlighting potential regional adaptations or gene flow patterns. For instance, a stop-gain in \u003cem\u003eUBE2NL\u003c/em\u003e was relatively common in highland and Amazonian communities (allele frequency 0.53\u0026ndash;0.71), yet much lower (0.30\u0026ndash;0.44) on the northern coast (Moches, Trujillo). This enzyme is linked to protein turnover of immune factors \u003csup\u003e22\u003c/sup\u003e, raising questions about whether local pathogen pressures or demographic bottlenecks shaped its current frequency. In the Matzes (an Amazonian group), a \u003cem\u003eC5orf20\u003c/em\u003e nonsense mutation reached 0.79 allele frequency, possibly reflecting the group\u0026rsquo;s long‐term geographic isolation and small effective population size.\u003c/p\u003e \u003cp\u003eA damaging \u003cem\u003eHLA-DQB1\u003c/em\u003e stop‐gain (allele frequency 0.45) emerged in the high‐altitude Uros population, yet was absent or rare elsewhere. Disruption of \u003cem\u003eHLA‐DQB1\u003c/em\u003e might alter antigen presentation, potentially influencing infection risks or autoimmune tendencies in this lake‐dwelling community \u003csup\u003e23\u003c/sup\u003e. Likewise, \u003cem\u003eSEL1L3\u003c/em\u003e start‐lost variants (allele frequency up to 0.23) were frequent among non‐admixed groups such as Matzes and Uros but nearly absent in urban mestizos, possibly affecting endoplasmic reticulum‐related immune processes \u003csup\u003e24\u003c/sup\u003e. Together, these findings underscore the distinct variant landscapes in isolated populations and the corresponding need for tailored disease screening and immunogenetic research.\u003c/p\u003e\n\u003ch3\u003ePharmacogenomic Analysis\u003c/h3\u003e\n\u003cp\u003eNext, we assessed 56 known pharmacogenes using PyPGx \u003csup\u003e25,26\u003c/sup\u003e, to infer star-allele haplotypes, diplotypes, and predicted metabolizer phenotypes. Across the seven populations, 24 genes (42.9%) showed significant inter-population frequency differences (p\u0026thinsp;\u0026lt;\u0026thinsp;0.05), reflecting localized founder effects and distinct admixture histories. Genes such as \u003cem\u003eCYP2C19\u003c/em\u003e, \u003cem\u003eCYP2B6\u003c/em\u003e, and \u003cem\u003eUGT1A1\u003c/em\u003e exhibited clinically actionable variation, often at frequencies divergent from global references.\u003c/p\u003e \u003cp\u003eUsing the FDA Table of Pharmacogenetic Associations \u003csup\u003e27\u003c/sup\u003e, we identified 73 gene\u0026ndash;drug recommendations triggered by specific variants (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e3\u003c/span\u003e). Notably, \u003cem\u003eCYP2C19\u003c/em\u003e poor metabolizers were more prevalent than expected, which may inform antiplatelet (e.g., clopidogrel) or antidepressant dosing \u003csup\u003e28\u003c/sup\u003e in certain Peruvian subgroups. Such findings underscore the need to integrate population-specific genotype data into prescribing guidelines, potentially reducing adverse events and improving therapeutic efficacy.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e\n\u003ch3\u003eGenetic Diversity and Structure\u003c/h3\u003e\n\u003cp\u003eAfter merging 150 WGS samples with 722 array-based genotypes and removing closely related individuals (n\u0026thinsp;=\u0026thinsp;746 final), we conducted principal component analysis (PCA) using\u0026thinsp;~\u0026thinsp;0.94\u0026nbsp;million intersecting variants. The first two PCs separated individuals largely by Andean high-altitude vs. Amazonian lowland ancestry and revealed African admixture in coastal mestizo groups (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e4\u003c/span\u003eA). Highland populations, particularly Chopccas and Cusco participants, formed a cohesive cluster consistent with a shared Inca-lineage background. Some sub-clustering also reflected language groups (Quechua vs. Aymara) (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e4\u003c/span\u003eB).\u003c/p\u003e \u003cp\u003eFurther fine-scale structure emerged on PCs 3 and 4, highlighting additional subpopulations within the highlands and Amazon (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e4\u003c/span\u003eC). A global PCA with Simons Genome Diversity Project samples \u003csup\u003e29\u003c/sup\u003e placed these Indigenous Peruvians close to other Native American groups, while Afro-Peruvians clustered near African references (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e4\u003c/span\u003eD). These patterns align with historical migrations\u0026mdash;underscoring that even within \u0026ldquo;mestizo\u0026rdquo; classifications, there is marked ancestral heterogeneity that may carry health and pharmacogenomic implications.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e"},{"header":"Discussion","content":"\u003cp\u003eOur comprehensive genomic profiling of multiple Peruvian populations provides a unique window into the clinical and evolutionary significance of underexplored genetic variation in Latin America. By characterizing whole-genome sequences and high-density SNP array data across 30 diverse groups, we uncovered a high proportion of rare and novel alleles, many of which are unknown or poorly annotated in existing clinical databases. These findings address a critical gap in genomic references, given that Latin American populations represent nearly 10% of the global population yet historically account for only around 1% of the data in large consortia \u003csup\u003e3,6,8\u003c/sup\u003e. The skewed Eurocentric focus of current databases limits diagnostic accuracy, disease-risk stratification, and pharmacogenomic utility for Indigenous and admixed communities. Our study highlights how more inclusive population references can advance precision medicine by improving variant interpretation and tailoring clinical interventions to the genetic architecture of understudied groups.\u003c/p\u003e \u003cp\u003eA major unmet clinical need involves identifying and validating high-impact variants relevant to both infectious and non-communicable diseases. In our cohort, roughly one-eighth of all single nucleotide polymorphisms (SNPs) were missing from dbSNP, underscoring the extent of “hidden” diversity that standard resources overlook. Even more striking, 94% of the rare high-impact variants that exceeded 10% allele frequency in our Peruvian populations had no entry in ClinVar, an essential database for translating genetic findings into clinical decision-making. This gap poses real dangers for patient care: healthcare practitioners relying on incomplete references may misclassify or completely miss variants that significantly contribute to disease risk, immune function, or drug metabolism in patients of Latin American ancestry. By systematically cataloging these unknown or poorly annotated sites, our work offers a foundation to refine carrier screening, diagnostic interpretations, and population-based risk predictions.\u003c/p\u003e \u003cp\u003eWithin the immune-related genes, we observed multiple high-frequency truncating or splice-disrupting variants that appear to be geographically clustered—especially in more isolated communities such as the Uros, Matzes, and certain Andean highlanders. These populations exhibited notable allelic patterns in key loci involved in antiviral defense (\u003cem\u003eUBE2NL\u003c/em\u003e, \u003cem\u003eNLRP8\u003c/em\u003e, and others) and antigen presentation (e.g., \u003cem\u003eHLA-DQB1\u003c/em\u003e). Although functional assays are needed to clarify their exact impact on infection susceptibility or vaccine response, the enrichment of disruptive alleles suggests possible historical adaptations to local pathogen pressures. In a public health context, these variants may help explain population-specific immune profiles and disease burdens \u003csup\u003e30,31\u003c/sup\u003e. Targeted screening or tailored vaccination schedules could eventually be considered, particularly in communities historically prone to certain infections or inflammatory conditions.\u003c/p\u003e \u003cp\u003eOur pharmacogenomic analysis points to similarly important clinical implications. We assessed 56 pharmacogenes with known relevance to drug metabolism and toxicity risk—genes such as \u003cem\u003eCYP2C19\u003c/em\u003e, \u003cem\u003eCYP2D6\u003c/em\u003e, \u003cem\u003eUGT1A1\u003c/em\u003e, and \u003cem\u003eDPYD\u003c/em\u003e. For nearly half of these loci, we detected significant genotype frequency differences among the seven populations with high-coverage whole-genome data. In some subgroups, “poor metabolizer” phenotypes were more common than global references would predict, indicating a heightened risk of adverse drug reactions if standard dosing is applied without genetic considerations. For instance, \u003cem\u003eCYP2C19\u003c/em\u003e poor metabolizers might face higher failure rates or elevated side effects when prescribed antiplatelet agents like clopidogrel at standard doses. Incorporating these data into local prescribing guidelines could enhance therapeutic safety and efficacy, particularly for individuals carrying variants at elevated frequencies that have been overlooked in mainstream references.\u003c/p\u003e \u003cp\u003eFrom a broader perspective, the high levels of admixture in coastal populations and the unique bottlenecks in Titicaca Lake or rainforest communities highlight why uniform “Latino” or “Hispanic” categories can be misleading in a precision medicine context. Within-country differences in genetic structure can be more pronounced than aggregated continental labels suggest. Our principal component analyses and population-specific variant frequencies underscore how local genealogical histories shape allele distributions relevant to immune function and drug response. This complex heterogeneity signals a need for region- or group-specific reference panels and a commitment to collecting data at finer geographic and cultural scales.\u003c/p\u003e \u003cp\u003eDespite these advances, there are several key limitations to note. First, our study design is cross-sectional and focused on genetic variation without direct measurements of clinical outcomes or phenotypic traits. While we can highlight potential disease susceptibilities or pharmacogenomic risks, we cannot confirm whether these variants translate to real-world differences in infection rates, autoimmune conditions, or adverse drug events. Prospective studies correlating genotypes with health records or clinical trials would be essential to establish clinical utility definitively. Second, many of the variants we prioritized for their presumed high impact remain unvalidated experimentally. Computational tools like CADD \u003csup\u003e32\u003c/sup\u003e and VEP \u003csup\u003e33\u003c/sup\u003e can flag a variant as disruptive, but functional assays—ranging from in vitro gene expression to protein-domain mapping—are crucial to ascertain the actual magnitude of phenotypic change. Third, though we cover 30 populations, Peru is home to many additional Indigenous groups with distinct ancestries and cultural practices \u003csup\u003e34\u003c/sup\u003e. Our current sample may not fully capture the entire spectrum of Peruvian genomic diversity. Finally, while we engaged with communities and sought to respect data sovereignty, expanding these efforts toward longitudinal sampling and stronger local research capacity remains a work in progress.\u003c/p\u003e \u003cp\u003eFuture research efforts should emphasize bridging genomic findings with public health and clinical applications. A logical step is to initiate community-led epidemiological cohorts in which identified immune and pharmacogenomic variants are systematically evaluated for associations with infection outcomes, disease phenotypes, and therapy responses. Further, building specialized reference panels for the most isolated or genetically distinct Peruvian groups could minimize false negatives in diagnostic labs and guide dose modifications for critical medications. On a functional level, collaborating with molecular biology laboratories to perform targeted assays on top candidate variants (e.g., \u003cem\u003eHLA-DQB1\u003c/em\u003e truncations) would clarify mechanistic underpinnings of immune or metabolic phenotypes. Equally important, sustained capacity-building within Peruvian institutions and continued dialogue with local communities will help ensure that genomic research benefits are realized where they are most needed. By integrating these steps—functional validation, clinical correlations, expanded sampling, and robust partnerships—we can accelerate equitable precision medicine across not only Peru but other underserved Latin American regions.\u003c/p\u003e \u003cdiv id=\"Sec8\" class=\"Section2\"\u003e \u003cdiv id=\"Sec9\" class=\"Section3\"\u003e \u003c/div\u003e \u003c/div\u003e\n\n "},{"header":"Online Methods","content":"\u003ch2\u003eEthics and Consent\u003c/h2\u003e\u003cp\u003eAll sample collection and data analyses were conducted in accordance with the Declaration of Helsinki for medical research involving human subjects and were approved by the Instituto Nacional de Salud del Perú (approval no. OI-003-11 and no. OI-087-13). Prior to initiating fieldwork, researchers spent one month preparing communication materials in Spanish and local Indigenous languages. These included a brochure describing the project, a poster explaining informed consent, and radio and television announcements. The final decision to participate was made collectively by each community. Potential participants were identified based on inclusion criteria, and written informed consent was obtained in the presence of a translator and two local witnesses. Subjects could withdraw at any time and request destruction of their samples or data. In total, 22 individuals declined participation and 2 withdrew after enrollment.\u003c/p\u003e\u003ch3\u003ePopulation Selection and Sample Collection\u003c/h3\u003e\u003cp\u003eA stakeholder panel comprising local community representatives and the Peruvian Ministry of Culture oversaw the selection of 30 target populations, chosen to reflect Peru’s coastal, Andean, and Amazonian diversity. Inclusion criteria for “native individuals” required that parents and grandparents be born in the same community, have a mother tongue other than Spanish, and a cultural legacy with minimal outside admixture. This process yielded 17 Indigenous and 13 mestizo populations (1,149 participants total). Of these, 723 were genotyped on a 2.5M Illumina array, and 150 were selected for high-coverage (35×) whole-genome sequencing (WGS).\u003c/p\u003e\u003ch2\u003eGenotyping, Whole Genome Sequencing, and Variant Calling\u003c/h2\u003e\u003cp\u003eGenomic DNA was extracted from peripheral blood samples. SNP arrays (2.5M Illumina) were processed per the manufacturer’s protocol. For the 150 WGS samples (Matsés, Uros, Chopccas, Moches, Iquitos, Cusco, Trujillo), libraries were prepared for Illumina HiSeq X10 (35× coverage). Reads were aligned to hg19 using BWA-MEM \u003csup\u003e35\u003c/sup\u003e, and duplicates marked with Picard \u003csup\u003e36\u003c/sup\u003e. Variant calling was performed jointly with GATK \u003csup\u003e37\u003c/sup\u003e UnifiedGenotyper for biallelic single nucleotide variants (SNVs) across the nuclear and mitochondrial genomes. Low-confidence (LowQual) calls and sites with Phred \u0026lt; 20 were excluded.\u003c/p\u003e\u003ch2\u003eKinship Analysis\u003c/h2\u003e\u003cp\u003eTo identify and remove closely related individuals, we retained 12,897,474 variants from the WGS set and estimated kinship coefficients with PLINK (v1.9) \u003csup\u003e38\u003c/sup\u003e. Any pair with PI_HAT \u0026gt; 0.95 was flagged as duplicates. We then used the --remove option to keep only one representative per pair, yielding 109 unrelated WGS individuals from the original 150.\u003c/p\u003e\u003ch2\u003eMerging Array and WGS Samples, Identity-by-Descent Filtering\u003c/h2\u003e\u003cp\u003eWe merged the 150 WGS with 722 array-based genotypes to form a single dataset of overlapping variants (~ 936k). To remove duplicate or closely related individuals, we again applied PLINK (v1.9) with the --genome flag. Samples above the PI_HAT \u0026gt; 0.95 threshold were excluded, resulting in a final curated set of 109 WGS and 627 array samples (736 total).\u003c/p\u003e\u003ch2\u003eVariant Effect Predictor (VEP) Analysis\u003c/h2\u003e\u003cp\u003eTo characterize coding variants and predict functional impacts, we used the Ensembl Variant Effect Predictor (VEP) \u003csup\u003e33\u003c/sup\u003e on the final WGS dataset. Only biallelic SNVs passing quality filters were included. We further focused on “HIGH-impact” variants (stop-gained, frameshift, splice donor/acceptor, etc.) by uploading only those VEP annotations with strong evidence for loss of function. Potential confounding factors (multi-allelic splitting, ambiguous alternate alleles, and partial annotation mismatches) were reconciled as described in the Supplementary Methods.\u003c/p\u003e\u003ch2\u003eAllele Count and Frequency Analysis\u003c/h2\u003e\u003cp\u003eFor each variant, we calculated the frequency of alternative alleles in each of the seven WGS populations. Variants with minor allele frequency (MAF) \u0026lt; 1% in 1000 Genomes were prioritized to highlight rare or novel sites. For additional filtering, we set CADD_PHRED \u003csup\u003e32\u003c/sup\u003e \u0026gt;10 and required at least five allele counts in one population. Comparisons with gnomAD \u003csup\u003e20\u003c/sup\u003e AMR and Fisher’s exact tests were performed to identify population-specific deviations of clinical or evolutionary interest.\u003c/p\u003e\u003ch2\u003ePharmacogenomic Annotation and PyPGx Integration\u003c/h2\u003e\u003cp\u003eFifty-six pharmacogenes were studied using PyPGx \u003csup\u003e25\u003c/sup\u003e, which infers star-allele diplotypes and predicted metabolizer phenotypes. We matched these phenotypes to FDA-labeled gene–drug pairs (Table of Pharmacogenetic Associations) \u003csup\u003e27\u003c/sup\u003e, employing both direct category overlap (e.g., “poor metabolizer”) and a fallback fuzzy matching (threshold 85% similarity). This approach minimized spurious mappings and enabled personalized gene–drug interaction assessments.\u003c/p\u003e\u003ch2\u003eGenotype Frequency Analysis\u003c/h2\u003e\u003cp\u003eWe computed genotype frequencies within and across populations for the curated pharmacogenes to capture inter-individual variability. Three metrics were derived for each genotype:\u003c/p\u003e\u003cp\u003e \u003c/p\u003e\u003col\u003e \u003cspan\u003e \u003cli\u003e \u003cp\u003eAbsolute count,\u003c/p\u003e \u003c/li\u003e \u003c/span\u003e \u003cspan\u003e \u003cli\u003e \u003cp\u003eFractional frequency,\u003c/p\u003e \u003c/li\u003e \u003c/span\u003e \u003cspan\u003e \u003cli\u003e \u003cp\u003ePercentage representation.\u003c/p\u003e \u003c/li\u003e \u003c/span\u003e \u003cspan\u003e \u003cli\u003e \u003cp\u003eComparisons among populations highlight potential founder effects or selective pressures relevant to drug metabolism.\u003c/p\u003e \u003c/li\u003e \u003c/span\u003e \u003c/ol\u003e\u003cp\u003e\u003c/p\u003e\u003ch2\u003ePrincipal Component Analysis (PCA)\u003c/h2\u003e\u003cp\u003eWe performed PCA with PLINK (v1.90) using ~ 0.94\u0026nbsp;million intersecting variants to visualize genetic relationships among the 746 unrelated individuals (109 WGS + 627 array samples). Eigenvalues/eigenvectors were computed from the genetic relationship matrix. Additional reference samples (Simons Genome Diversity Project \u003csup\u003e29\u003c/sup\u003e) were merged for global contextualization (Supplementary Methods).\u003c/p\u003e\u003ch2\u003eData Availability and Code Access\u003c/h2\u003e\u003cp\u003eAll WGS and SNP array data generated by the Peruvian Genome Project are deposited at the European Genome-phenome Archive (EGA) under accession numbers EGAD00010001958, EGAD00010001990 (WGS) and EGAD00010001991, EGAD00010001992 (array). Requests for controlled access must be submitted to our Data Access Committee, which reviews proposals for consistency with informed consent and community agreements. The data processing pipelines, including scripts and configuration files, are available on GitHub at \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://github.com/manuelcorpas/15-PERU\u003c/span\u003e\u003cspan address=\"https://github.com/manuelcorpas/15-PERU\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/p\u003e\u003ch2\u003eCommunity-Led Return of Results\u003c/h2\u003e\u003cp\u003ePer a community-led framework, each group could request aggregate variant frequencies and consult with local healthcare professionals about individual results. Data sovereignty remains with Indigenous communities; secondary analyses require approval from both the institutional review board and community leaders. By respecting self-determination in data usage, we aim to foster equitable collaboration and avoid exploitative research practices.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\u003cli\u003e\u003cspan\u003e Lemke, A. A. \u003cem\u003eet al.\u003c/em\u003e Addressing underrepresentation in genomics research through community engagement. \u003cem\u003eAm J Hum Genet\u003c/em\u003e \u003cb\u003e109\u003c/b\u003e, 1563\u0026ndash;1571 (2022).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003e Samarasinghe, S. R. \u003cem\u003eet al.\u003c/em\u003e Mapping the Pharmacogenetic Landscape in a Ugandan Population: Implications for Personalized Medicine in an Underrepresented Population. \u003cem\u003eClin. Pharmacol. Ther.\u003c/em\u003e (2024).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003e Corpas, M. \u003cem\u003eet al.\u003c/em\u003e Addressing Ancestry and Sex Bias in Pharmacogenomics. \u003cem\u003eAnnu. Rev. Pharmacol. Toxicol.\u003c/em\u003e \u003cb\u003e64\u003c/b\u003e, (2024).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003e Guio, H. \u003cem\u003eet al.\u003c/em\u003e The Peruvian Genome Project: expanding the global pool of genome diversity from South America. \u003cem\u003emedRxiv\u003c/em\u003e 2024\u0026ndash;05 (2024).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003e Harris, D. N. \u003cem\u003eet al.\u003c/em\u003e Evolutionary genomic dynamics of Peruvians before, during, and after the Inca Empire. \u003cem\u003eProc. Natl. Acad. Sci. U. S. A.\u003c/em\u003e \u003cb\u003e115\u003c/b\u003e, E6526\u0026ndash;E6535 (2018).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003e Fatumo, S. \u003cem\u003eet al.\u003c/em\u003e A roadmap to increase diversity in genomic studies. \u003cem\u003eNat Med\u003c/em\u003e \u003cb\u003e28\u003c/b\u003e, 243\u0026ndash;250 (2022).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003e Geneticists Drive Inclusive Research. \u003cem\u003eChan Zuckerberg Initiative\u003c/em\u003e https://chanzuckerberg.com/blog/indigenous-latin-american-representation-genomics/.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003e Corpas, M. \u003cem\u003eet al.\u003c/em\u003e Bridging genomics\u0026rsquo; greatest challenge: The diversity gap. \u003cem\u003eCell Genomics\u003c/em\u003e \u003cb\u003e5\u003c/b\u003e, (2025).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003e De Oliveira, T. C., Secolin, R. \u0026amp; Lopes-Cendes, I. A review of ancestrality and admixture in Latin America and the caribbean focusing on native American and African descendant populations. \u003cem\u003eFront. Genet.\u003c/em\u003e \u003cb\u003e14\u003c/b\u003e, (2023).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003e Andean peoples | Pre-Columbian Cultures, Indigenous Tribes \u0026amp; History | Britannica. https://www.britannica.com/topic/Andean-peoples (2025).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003e Peru - Spanish Conquest, Inca Empire, Andes | Britannica. https://www.britannica.com/place/Peru/Discovery-and-exploration-by-Europeans (2025).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003e Lindo, J. \u003cem\u003eet al.\u003c/em\u003e A time transect of exomes from a Native American population before and after European contact. \u003cem\u003eNat. Commun.\u003c/em\u003e \u003cb\u003e7\u003c/b\u003e, 13175 (2016).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003e Barbieri, C. \u003cem\u003eet al.\u003c/em\u003e The Current Genomic Landscape of Western South America: Andes, Amazonia, and Pacific Coast. \u003cem\u003eMol. Biol. Evol.\u003c/em\u003e \u003cb\u003e36\u003c/b\u003e, 2698\u0026ndash;2713 (2019).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003e Borda, V. \u003cem\u003eet al.\u003c/em\u003e The genetic structure and adaptation of Andean highlanders and Amazonians are influenced by the interplay between geography and culture. \u003cem\u003eProc. Natl. Acad. Sci.\u003c/em\u003e \u003cb\u003e117\u003c/b\u003e, 32557\u0026ndash;32565 (2020).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003e Julian, C. G. \u0026amp; Moore, L. G. Human Genetic Adaptation to High Altitude: Evidence from the Andes. \u003cem\u003eGenes\u003c/em\u003e \u003cb\u003e10\u003c/b\u003e, 150 (2019).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003e Caro-Consuegra, R. \u003cem\u003eet al.\u003c/em\u003e Uncovering Signals of Positive Selection in Peruvian Populations from Three Ecological Regions. \u003cem\u003eMol. Biol. Evol.\u003c/em\u003e \u003cb\u003e39\u003c/b\u003e, msac158 (2022).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003e Sherry, S. T. \u003cem\u003eet al.\u003c/em\u003e dbSNP: the NCBI database of genetic variation. \u003cem\u003eNucleic Acids Res.\u003c/em\u003e \u003cb\u003e29\u003c/b\u003e, 308\u0026ndash;311 (2001).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003e Kent, M. The importance of being Uros: Indigenous identity politics in the genomic age. \u003cem\u003eSoc. Stud. Sci.\u003c/em\u003e \u003cb\u003e43\u003c/b\u003e, 534\u0026ndash;556 (2013).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003e Prohaska, A. \u003cem\u003eet al.\u003c/em\u003e Human Disease Variation in the Light of Population Genomics. \u003cem\u003eCell\u003c/em\u003e \u003cb\u003e177\u003c/b\u003e, 115\u0026ndash;131 (2019).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003e Karczewski, K. J. \u003cem\u003eet al.\u003c/em\u003e The mutational constraint spectrum quantified from variation in 141,456 humans. \u003cem\u003eNature\u003c/em\u003e \u003cb\u003e581\u003c/b\u003e, 434\u0026ndash;443 (2020).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003e Landrum, M. J. \u003cem\u003eet al.\u003c/em\u003e ClinVar: improving access to variant interpretations and supporting evidence. \u003cem\u003eNucleic Acids Res.\u003c/em\u003e \u003cb\u003e46\u003c/b\u003e, D1062\u0026ndash;D1067 (2018).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003e Lee, M. J. \u003cem\u003eet al.\u003c/em\u003e UBE2N is essential for maintenance of skin homeostasis and suppression of inflammation. \u003cem\u003ebioRxiv\u003c/em\u003e 2023.12.01.569631 (2023) doi:10.1101/2023.12.01.569631.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003e Arnaiz-Villena, A. \u003cem\u003eet al.\u003c/em\u003e HLA genes in Uros from Titikaka Lake, Peru: origin and relationship with other Amerindians and worldwide populations. \u003cem\u003eInt. J. Immunogenet.\u003c/em\u003e \u003cb\u003e36\u003c/b\u003e, 159\u0026ndash;167 (2009).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003e Correa-Medero, L. O. \u003cem\u003eet al.\u003c/em\u003e ER-associated degradation adapter Sel1L is required for CD8\u0026thinsp;+\u0026thinsp;T cell function and memory formation following acute viral infection. \u003cem\u003eCell Rep.\u003c/em\u003e \u003cb\u003e43\u003c/b\u003e, 114156 (2024).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003e Lee, S. \u0026lsquo;Steven\u0026rsquo;. sbslee/pypgx. (2025).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003e Lee, S., Shin, J.-Y., Kwon, N.-J., Kim, C. \u0026amp; Seo, J.-S. ClinPharmSeq: A targeted sequencing panel for clinical pharmacogenetics implementation. \u003cem\u003ePLOS ONE\u003c/em\u003e \u003cb\u003e17\u003c/b\u003e, e0272129 (2022).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003e Health, C. for D. and R. Table of Pharmacogenetic Associations. \u003cem\u003eFDA\u003c/em\u003e (2022).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003e Nguyen, A. B., Cavallari, L. H., Rossi, J. S., Stouffer, G. A. \u0026amp; Lee, C. R. Evaluation of race and ethnicity disparities in outcome studies of CYP2C19 genotype-guided antiplatelet therapy. \u003cem\u003eFront. Cardiovasc. Med.\u003c/em\u003e \u003cb\u003e9\u003c/b\u003e, 991646 (2022).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003e The Simons Genome Diversity Project: 300 genomes from 142 diverse populations | Nature. https://www.nature.com/articles/nature18964.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003e Accinelli, R. A. \u0026amp; Leon-Abarca, J. A. At High Altitude COVID-19 Is Less Frequent: The Experience of Peru. \u003cem\u003eArch. Bronconeumol.\u003c/em\u003e \u003cb\u003e56\u003c/b\u003e, 760\u0026ndash;761 (2020).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003e Gao, W. \u003cem\u003eet al.\u003c/em\u003e The Deubiquitinase USP29 Promotes SARS-CoV-2 Virulence by Preventing Proteasome Degradation of ORF9b. \u003cem\u003emBio\u003c/em\u003e \u003cb\u003e13\u003c/b\u003e, e0130022 (2022).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003e Rentzsch, P., Witten, D., Cooper, G. M., Shendure, J. \u0026amp; Kircher, M. CADD: predicting the deleteriousness of variants throughout the human genome. \u003cem\u003eNucleic Acids Res.\u003c/em\u003e \u003cb\u003e47\u003c/b\u003e, D886\u0026ndash;D894 (2019).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003e McLaren, W. \u003cem\u003eet al.\u003c/em\u003e The Ensembl Variant Effect Predictor. \u003cem\u003eGenome Biol.\u003c/em\u003e \u003cb\u003e17\u003c/b\u003e, 122 (2016).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003e Indigenous peoples of Peru. \u003cem\u003eWikipedia\u003c/em\u003e (2025).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003e Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. Preprint at https://doi.org/10.48550/arXiv.1303.3997 (2013).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003e Picard Tools - By Broad Institute. https://broadinstitute.github.io/picard/.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003e DePristo, M. A. \u003cem\u003eet al.\u003c/em\u003e A framework for variation discovery and genotyping using next-generation DNA sequencing data. \u003cem\u003eNat. Genet.\u003c/em\u003e \u003cb\u003e43\u003c/b\u003e, 491\u0026ndash;498 (2011).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003e Purcell, S. \u003cem\u003eet al.\u003c/em\u003e PLINK: a tool set for whole-genome association and population-based linkage analyses. \u003cem\u003eAm. J. Hum. Genet.\u003c/em\u003e \u003cb\u003e81\u003c/b\u003e, 559\u0026ndash;575 (2007).\u003c/span\u003e\u003c/li\u003e\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":true,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"nature-portfolio","isNatureJournal":true,"hasQc":false,"allowDirectSubmit":false,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"","title":"Nature Portfolio","twitterHandle":"","acdcEnabled":false,"dfaEnabled":false,"editorialSystem":"ejp","reportingPortfolio":"","inReviewEnabled":true,"inReviewRevisionsEnabled":false},"keywords":"Genome Medicine, Ancestry, Andes, Indigenous Genomics, Amazon, Latin America, Genomic Diversity, Genetic Epidemiology, Equity, Inclusion, Healthcare Disparities; Pharmacogenomics, Precision Medicine","lastPublishedDoi":"10.21203/rs.3.rs-6362225/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-6362225/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eLatin American populations remain vastly underrepresented in genomic references, limiting their benefit from precision medicine. Here, we analyzed data from 1,149 individuals in 30 Peruvian populations\u0026mdash;including 17 Indigenous and 13 mestizo groups\u0026mdash;using deep whole-genome sequencing (n\u0026thinsp;=\u0026thinsp;150) and high‐density SNP array genotyping (n\u0026thinsp;=\u0026thinsp;873). Over 1.6\u0026nbsp;million variants (~\u0026thinsp;13%) were novel relative to major databases, underscoring uncharted genetic diversity. We identified 1,210 high‐impact variants (e.g., stop‐gained, splice‐site) with allele frequencies below 1% globally, 94% of which were unannotated in ClinVar. Pharmacogenomic profiling of 56 drug‐response genes revealed distinct allele frequency patterns, including clinically actionable variants in \u003cem\u003eCYP2D6\u003c/em\u003e, \u003cem\u003eCYP2C19\u003c/em\u003e, \u003cem\u003eUGT1A1\u003c/em\u003e, and \u003cem\u003eG6PD\u003c/em\u003e, emphasizing the need for population‐specific guidelines. These findings highlight how expanding genomic data from understudied groups can improve risk prediction, enable targeted screening, and ultimately foster more equitable genome medicine throughout Latin America.\u003c/p\u003e","manuscriptTitle":"Genome Interpretation of Peruvian Inca Empire Descendants Reveals Actionable Insights","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-08-05 05:06:34","doi":"10.21203/rs.3.rs-6362225/v1","editorialEvents":[],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"nature-communications","isNatureJournal":true,"hasQc":false,"allowDirectSubmit":false,"externalIdentity":"NCOMMS","sideBox":"Learn more about [Nature Communications](http://www.nature.com/ncomms/)","snPcode":"","submissionUrl":"https://mts-ncomms.nature.com/","title":"Nature Communications","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"ejp","reportingPortfolio":"Nature Communications","inReviewEnabled":true,"inReviewRevisionsEnabled":false}}],"origin":"","ownerIdentity":"2bbe2c2b-8372-4b48-ba7b-15249f49a9a9","owner":[],"postedDate":"August 5th, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"under-review","subjectAreas":[{"id":47913266,"name":"Biological sciences/Computational biology and bioinformatics/Genome informatics"},{"id":47913267,"name":"Biological sciences/Genetics/Clinical genetics"},{"id":47913268,"name":"Biological sciences/Genetics/Genomics/Pharmacogenomics"},{"id":47913269,"name":"Biological sciences/Genetics/Genomics/Medical genomics"}],"tags":[],"updatedAt":"2026-03-23T15:05:35+00:00","versionOfRecord":[],"versionCreatedAt":"2025-08-05 05:06:34","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-6362225","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-6362225","identity":"rs-6362225","version":["v1"]},"buildId":"8U1c8b4HqxoKbykW_rLl7","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.