Long Reads de novo sequencing of the Anas diazi genome reveals changes in gene orthology in waterfowl | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article Long Reads de novo sequencing of the Anas diazi genome reveals changes in gene orthology in waterfowl Patricia Padilla-Aguilar, María Guadalupe Bravo-Vinaja, David Colón-Quezada, and 2 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-8736544/v1 This work is licensed under a CC BY 4.0 License Status: Under Revision Version 1 posted 17 You are reading this latest preprint version Abstract The Mexican duck (Anas diazi ) is the only duck species endemic to Mexico and is currently listed as threatened under NOM-059-SEMARNAT-2010. Its taxonomic status has long been controversial, historically considered a subspecies of Anas platyrhynchos due to their strong morphological similarity, particularly in females. In 2020, the American Ornithological Society formally recognized A. diazi as a distinct species; however, its genomic architecture remains largely unexplored. Here we present the first whole-genome analysis of A. diazi , based on PacBio HiFi long-read sequencing and a de novo assembly strategy. This represents the first genomic resource available for this endemic and threatened species. Genome size estimation based on K-mer and GenomeScope2 modeling revealed a haploid genome size of approximately 1.02 Gb, with a high model fit (> 96%), low repeat content (~ 7.4%), and moderate heterozygosity (~ 1.1%), values consistent with other waterfowl genomes. Comparative alignment of reads with the A. platyrhynchos reference genome showed an alignment rate of approximately 86%, suggesting substantial lineage-specific genomic divergence. Gene prediction and functional annotation were performed using avian reference datasets (Anatidae and Gallus gallus ), generating nearly 4,000 highly reliable annotated proteins, the integrity of which was supported by BUSCO analysis. Using OrthoFinder and CAFE5, we investigated the evolution of gene families in A. diazi, A. platyrhynchos, Aix galericulata, Anser cygnoides , and G. gallus , identifying lineage-specific patterns of gene family expansion and contraction potentially associated with domestication, ecological specialization, and evolutionary divergence within Anatidae. Taken together, our results provide the first genomic framework for Anas diazi and lay the groundwork for future evolutionary, ecological, and conservation genomic studies of this waterbird endemic to Mexico. Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Figure 6 Figure 7 Introduction The Anatidae family comprises approximately 145 species of waterfowl worldwide, 33 of which are found in Mexico [1,2]. Among these, six species are considered resident, including the Mexican duck (Anas diazi ), the country's only endemic duck species [3]. Anas diazi is distributed year-round in the inland wetlands of the Mexican Plateau and is currently listed as a threatened species under Mexican conservation legislation (NOM-059-SEMARNAT-2010) [4]. Population estimates suggest approximately 55,500 individuals worldwide, with nearly 98% concentrated in the Central Mexican Plateau, highlighting its restricted distribution and conservation importance [5]. The Mexican duck belongs to the tribe Anatini and is classified as a dabbling duck, inhabiting shallow freshwater wetlands and exhibiting omnivorous feeding behavior [6]. Morphologically, A. diazi closely resembles the female mallard ( Anas platyrhynchos ), a similarity that has historically contributed to taxonomic uncertainty [7]. For decades, A. diazi was considered a subspecies of A. platyrhynchos or part of a complex with clinal variation in North America [8,9]. However, phylogenetic, morphological, and behavioral evidence accumulated over the past few decades led the American Ornithological Society to recognize Anas diazi as a distinct species in 2020 [10]. Despite its ecological and evolutionary importance, A. diazi remains one of the least studied waterbird species in the Holarctic region, particularly from a genomic perspective [11]. Advances in long-read sequencing technologies, alone or in combination with short reads, have enabled de novo genome assemblies for numerous bird species, including raptors, passerines, galliformes, and some aquatic birds [12–14]. These resources have facilitated comparative genomic analyses of genome architecture, gene family evolution, and lineage-specific adaptations [15,16]. Within the genus Anas , genomic data are now available for an increasing number of species, although assemblies vary substantially in depth, coverage, annotation quality, and availability [17,18]. Nevertheless, genomic resources remain limited, and a reference genome for the Mexican duck is still unavailable, despite its importance as the only endemic duck species in Mexico and its restricted distribution [11,19]. Avian genomes are characterized by relatively small and compact sizes compared to other vertebrates, typically ranging from 0.9 to 1.3 Gb, with low repetitive content and a streamlined genomic architecture [20,21]. Within this range, Anatidae genomes tend to cluster near the upper end, although substantial variation exists among species [22]. Differences in genome size among birds are primarily driven by variation in repetitive elements, segmental duplications, and lineage-specific expansions, rather than large-scale changes in gene number [23,24]. Previous studies suggest that genome size variation in birds may be influenced by life-history traits, metabolic constraints associated with flight, and population history [25,26]. However, genome size estimates remain unavailable for several endemic and non-model waterfowl species, including Anas diazi , limiting comparative assessments of genomic architecture and evolutionary dynamics within the family [22,27]. Comparative genomic studies in birds have revealed that variation in gene family size plays a significant role in lineage-specific adaptation, ecological specialization, and domestication [28,29]. Analyses across avian genomes consistently show that gene family expansions and contractions are often associated with immune response, sensory perception, metabolism, and developmental processes [30,31]. In waterfowl and other avian lineages, changes in gene family composition have been linked to ecological transitions, migratory behavior, dietary specialization, and adaptation to aquatic environments [32,33]. Furthermore, domesticated or semi-domesticated lineages, such as the mallard ( Anas platyrhynchos ), frequently exhibit gene family expansions related to metabolism, reproduction, and immune regulation, reflecting both artificial selection and the relaxation of selective constraints [34,35]. Despite these advances, the evolution of gene families remains poorly characterized in many endemic and non-model waterfowl species, limiting our understanding of how genomic innovation and gene loss have contributed to diversification within Anatidae [36]. Here we present the first whole-genome analysis of Anas diazi , based on PacBio HiFi long-read sequencing and de novo assembly [37]. We estimated genome size using k-mer frequency analysis and GenomeScope2 modeling and compared genomic characteristics with those of A. platyrhynchos and other representative waterfowl species [38,39]. Furthermore, we investigated the evolution of gene families in A. diazi , A. platyrhynchos, Aix galericulata, Anser cygnoides , and Gallus gallus using orthology inference and probabilistic modeling of gene family expansion and contraction [29,30,40]. This study provides a fundamental genomic resource for A. diazi and contributes to a broader understanding of genome evolution, specialization, and diversification in Anatidae [36,41]. Results Genome sequencing, assembly and annotation PacBio HiFi sequencing of Anas diazi generated a total of 17,611,587,254 base pairs, corresponding to 1,374,023 long reads [42]. The average read length was 12,817.5 bp, with maximum read lengths reaching 55,850 bp, while the shortest reads were 75 bp, consistent with the performance characteristics of circular consensus sequencing [37,42]. After quality filtering and trimming, approximately 99% of the original sequences were retained, resulting in 1,360,282 high-quality reads used for downstream analyses [43] (See Supplementary_file_1 ). Prior to de novo assembly, filtered reads were aligned against the Anas platyrhynchos reference genome (GCF_015476345.1), yielding an alignment rate of 86.78%, indicating substantial genomic similarity while retaining a significant fraction of A. diazi -specific sequences [44]. De novo assembly produced 4,570 contigs with a total assembled length of 1,058,566,110 bp (~ 1.06 Gb), consistent with genome sizes reported for other Anatidae species (Fig. 2 ) [18,22]. The longest contig measured 3,234,602 bp, with an N50 of 485,801 bp. Assembly continuity metrics indicated an L50 of 601 contigs and an L90 of 2,466 contigs, values comparable to other long-read–based avian genome assemblies lacking chromosomal scaffolding [12,45], see Table 1 and Fig. 1 . Gene prediction using AUGUSTUS identified 33,712 protein-coding transcripts distributed across the assembled contigs [46], See Supplementary_file_2. However, this number likely reflects transcript fragmentation and redundancy associated with the draft nature of the assembly. To minimize annotation artifacts and ensure biological reliability, all downstream comparative and evolutionary analyses were restricted to a high-confidence subset of approximately 3,700–4,000 genes supported by BUSCO completeness and functional annotation. Assembly completeness assessed with BUSCO recovered 3,791 complete single-copy orthologs and only 9 duplicated BUSCOs, indicating low redundancy [47]. Additionally, 17 fragmented BUSCOs and 4,521 missing BUSCOs were detected, yielding a total of 8,338 BUSCO hits. Although overall BUSCO completeness is lower than that of chromosome-level assemblies, this pattern is consistent with fragmented avian draft genomes lacking chromosomal scaffolding and reflects the challenges associated with assembling microchromosomes, repetitive regions, and divergent loci rather than major deficiencies in sequencing depth or assembly quality. See Fig. 2 and Supplementary_file_3 . Functional annotation with InterProScan and BUSCO successfully assigned protein domains ( See Supplementary_file_4 ) and Gene Ontology (GO) terms of predicted transcripts, supporting the biological completeness and functional relevance of the assembly [48,49] see Figure_supplementary_1 and Supplementary_file_5 . Genome size estimation and genomic features Genome size estimation based on PacBio HiFi reads and k-mer frequency analysis revealed a haploid genome size of approximately 1.02 Gb for Anas diazi [38,39] ( See Table 2 ). GenomeScope2 modeling showed a high model fit (> 96%), indicating a robust and reliable estimate [39] ( See Fig. 3 and Table 3 ). The genome exhibited a low repetitive content (~ 7.4%) and moderate heterozygosity (~ 1.1%), values consistent with other Anseriformes genomes and previously reported patterns of compact avian genome architecture [20,22,23,50]. Importantly, these results indicate that gene family expansions detected in A. diazi occurred without a concomitant increase in overall genome size, suggesting that genomic innovation in this species is driven by localized gene duplication events rather than genome-wide expansion, a pattern previously reported across multiple avian lineages [24,26,36,51]. Gene family evolution across Anseriformes To investigate gene family evolution, orthogroups inferred with OrthoFinder were analyzed using CAFE5 across five species: Anas diazi, Anas platyrhynchos, Aix galericulata, Anser cygnoides , and Gallus gallus [29,40] Supplementary_file_6 . The total number and direction of significant gene family changes varied markedly among species, revealing distinct evolutionary trajectories consistent with lineage-specific rates of gene gain and loss [28,36,52], See Table 4 . Anas diazi exhibited a strong bias toward gene family contraction, whereas A. platyrhynchos, A. cygnoides , and G. gallus showed predominantly expansive profiles [34,35,40]. Aix galericulata displayed a contraction-dominated pattern similar to A. diazi , although with fewer extreme events. These results define a clear evolutionary gradient, with A. diazi and A. galericulata tending toward gene loss, while A. platyrhynchos , A. cygnoides , and especially G. gallus exhibit extensive gene family expansion, a pattern consistent with differences in life history, domestication, and long-term effective population size [26,36,53] Supplementary_file_6 . This contrast is visually summarized in Fig. 6 , which shows both the absolute number of gene family expansions and contractions per species (Fig. 6 A) and the relative directionality of gene family size changes across lineages (Fig. 6 B). Together, these representations highlight the contraction-dominated profile of Anas diazi and Aix galericulata relative to the expansion-biased trajectories of Anas platyrhynchos, Anser cygnoides , and Gallus gallus . Gene family expansion and contraction in Anas diazi In Anas diazi , CAFE5 detected 51 gene families with significant evolutionary changes, comprising 6 expanded and 45 contracted orthogroups ( Table 5 , Fig. 4 ) [40,52]. This asymmetrical pattern contrasts sharply with other analyzed Anseriformes, which displayed higher numbers of expansions than contractions, consistent with previously reported lineage-specific differences in gene turnover rates [28,36]. Expanded families in A. diazi exhibited large effect sizes, including OG0000006 (+ 103 copies), OG0000052 (+ 62), OG0000150 (+ 44), OG0000348 (+ 30), OG0000363 (+ 29), and OG0000734 (+ 14), all with highly significant P-values [40] Table 5 . While these large effect sizes suggest pronounced lineage-specific expansion, we acknowledge that assembly fragmentation and the proliferation of transposable element–derived sequences may contribute to inflated copy number estimates, and therefore these values should be interpreted cautiously. In contrast, contracted families showed a broad distribution of moderate to strong reductions, typically ranging from − 8 to − 15 gene copies ( Table 5 , Fig. 4 ) , indicating sustained gene loss across multiple functional categories, a pattern consistent with long-term purifying selection and reduced effective population size in non-domesticated lineages [36,53,54]. Comparative patterns of gene family evolution Multispecies comparison revealed that A. diazi possesses the most contraction-biased gene family profile among the analyzed Anseriformes [36,40,52]. Anas platyrhynchos exhibited extensive expansions across numerous orthogroups, consistent with its domestication history and broad ecological plasticity [34,35,55]. Anser cygnoides also showed expansion-dominated dynamics, including several families with gains exceeding + 30 copies, a pattern previously associated with domestication and intensive artificial selection [56]. Gallus gallus presented the most extreme expansion profile, with multiple gene families exhibiting deep duplications, reflecting strong artificial selection pressures [55,57] Supplementary_file_6 . Several orthogroups displayed opposite evolutionary trends among species, with expansions in A. platyrhynchos or A. cygnoides and contractions in A. diazi and A. galericulata , suggesting divergent selective regimes and lineage-specific functional optimization driven by differences in ecology, demographic history, and human-mediated selection [36,53,54,58]. Functional annotation of expanded and contracted gene families Functional annotation of gene families was performed using InterProScan and eggNOG-mapper based on protein sequences of Anas diazi [48,59]. Due to the absence of a reference genome, all annotations were inferred by homology and are used strictly as functional descriptors, a common approach for non-model avian genomes [12,36]. Expanded gene families were strongly enriched in functions related to RNA-mediated transposition and retroelement activity, including domains such as RNase H, reverse transcriptase (RVT_1), and rve/integrase [23,24,60]. This functional coherence indicates that the major expansion events in A. diazi are driven primarily by mobile genetic elements or genes derived from them, a pattern previously reported in compact avian genomes where transposable element activity occurs in localized bursts rather than genome-wide proliferation [23,50,61]. In contrast, contracted gene families were functionally heterogeneous. See Fig. 5 . Comparative functional patterns between Anas diazi and the other species are summarized in Fig. 7 . Many orthogroups were annotated as uncharacterized or hypothetical proteins, consistent with lineage-specific divergence or rapid sequence evolution [36,53]. Among annotated functions, recurrent categories included cytoskeletal and structural proteins, membrane-associated and transport-related components, mitochondrial and metabolic proteins, and low-frequency regulatory and signaling proteins [15,25,62]. Notably, no enrichment of transposable element–related domains was detected among contracted families, indicating that gene loss in A. diazi reflects diffuse functional reduction rather than genome-wide purging of repetitive elements [24,26,50]. Functional landscape of the Anas diazi genome Global functional annotation of high-confidence predicted genes revealed a diverse repertoire involved in core cellular, metabolic, and regulatory processes [48,59]. Gene Ontology, KEGG, and Reactome enrichment analyses consistently identified pathways associated with cellular metabolism, signal transduction, nucleotide metabolism, DNA repair, and neuroendocrine signaling [49,63–65]. In particular, G protein–coupled receptor–mediated pathways were recurrently enriched, reflecting the conserved importance of sensory, behavioral, and physiological regulation in avian genomes [66–68]. See Figure Supplementary 1 . Together, these results demonstrate that the Anas diazi genome is functionally complete and biologically coherent, while exhibiting a distinctive evolutionary signature characterized by a compact genome, dominant gene family contraction, and localized expansion of transposable element–related families [23,24,36,50,61]. Discussion Genome architecture and assembly quality of Anas diazi In this study, we present the first whole-genome assembly and comparative genomic analysis of the Mexican duck ( Anas diazi ), providing a foundational genomic resource for this endemic and threatened waterfowl species. The de novo assembly generated from PacBio HiFi long reads yielded a genome size of approximately 1.06 Gb, which is well within the expected range for avian genomes and closely matches genome size estimates obtained through k-mer–based modeling (~ 1.02 Gb) [20,22,38,39]. This concordance between assembly length and independent genome size estimation supports the overall reliability of the assembly and is consistent with best practices for validating draft genomes in non-model species [12,50,69]. Although BUSCO completeness was moderate, this reflects the draft nature of the assembly and the well-known challenges associated with avian microchromosomes and repetitive regions. Importantly, all comparative genomic analyses were performed using a filtered, high-confidence gene set supported by BUSCO and functional annotation, ensuring that downstream evolutionary inferences were not driven by fragmented or low-confidence gene models. Although the assembly remains fragmented relative to chromosome-level bird genomes, contiguity metrics (N50 ≈ 486 kb) and BUSCO completeness values indicate that the assembly captures the majority of conserved avian genes [45,47]. As observed in other non-model bird species assembled without Hi-C or linkage maps, fragmentation likely reflects the abundance of repetitive regions, the presence of numerous microchromosomes, and residual heterozygosity rather than major deficiencies in sequencing depth or assembly strategy [12,45,62,70]. Importantly, the high alignment rate of reads to the Anas platyrhynchos reference genome (~ 87%) further supports the accuracy of the assembly and its suitability for comparative genomic analyses within Anatidae [18,44]. Genome size evolution and repeat landscape in Anatidae Avian genomes are characteristically compact compared to those of other vertebrates, typically ranging between 0.9 and 1.3 Gb [20,21]. Within this range, Anatidae genomes tend to occupy the upper end, although substantial interspecific variation has been reported [22,26]. The genome size of A. diazi estimated here (~ 1.02 Gb) places it among the smaller genomes within the family, closer to the lower bound of the Anatidae range, and comparable to other wild, non-domesticated Anseriformes [22,50]. GenomeScope2 modeling revealed a relatively low proportion of repetitive content (~ 7.4%) and moderate heterozygosity (~ 1.1%), values that are consistent with other wild Anseriformes and contrast with patterns observed in some domesticated or intensively selected bird lineages [26,50,55]. Notably, despite the detection of several large gene family expansions, overall genome size remains compact, indicating that these expansions occurred without large-scale genome inflation. This pattern suggests that genome size evolution in A. diazi is shaped primarily by localized gene duplication and loss rather than by widespread accumulation of repetitive elements or segmental duplications [23,24,50,61]. These results support previous hypotheses proposing that variation in avian genome size is driven more by differences in repetitive DNA dynamics than by changes in gene number, and that strong selective constraints associated with flight performance, metabolic efficiency, and effective population size may limit genome expansion in wild bird lineages [21,25,26,70,71]. Functional composition of the Anas diazi gene repertoire Functional annotation revealed a diverse and biologically coherent gene repertoire involved in core cellular, metabolic, and regulatory processes. Enrichment analyses across Gene Ontology, KEGG, and Reactome databases consistently highlighted pathways related to cellular metabolism, nucleotide processing, DNA repair, and signal transduction, mirroring functional profiles reported for other avian genomes [49,63–65,72]. A prominent and recurrent signal across all annotation frameworks was the enrichment of G protein–coupled receptor (GPCR)–mediated signaling pathways. GPCR-related genes identified in A. diazi are involved in neuroendocrine regulation, sensory perception, feeding behavior, locomotion, and stress responses [66–68,73]. These functional categories are of particular relevance for waterfowl, which rely heavily on complex behavioral and physiological regulation to cope with fluctuating wetland environments, seasonal resource availability, and social interactions [74,75]. Together, these results indicate that despite the fragmented nature of the assembly, the predicted gene set of A. diazi is functionally complete and comparable to those of other avian species, validating its use for evolutionary and ecological genomic analyses within Anatidae and across birds more broadly [36,41,70,72]. Asymmetric gene family evolution in Anas diazi Comparative analysis of gene family evolution using CAFE5 revealed a strikingly asymmetric pattern in A. diazi , characterized by a strong predominance of gene family contractions over expansions. In total, 45 gene families were significantly contracted, whereas only six families showed significant expansion along the A. diazi lineage [40,52]. This contraction-dominated profile contrasts sharply with patterns observed in other analyzed Anseriformes, particularly Anas platyrhynchos and Anser cygnoides , which exhibited expansion-biased dynamics. See Fig. 6 [34,35,56]. The contraction profile of A. diazi suggests a lineage-specific process of gene repertoire reduction or depuration. Similar patterns have been reported in other wild or ecologically specialized bird species and are often interpreted as the result of long-term stabilizing selection, reduced effective population size, or ecological specialization [36,53,76]. Given the restricted geographic distribution and relatively small population size of A. diazi , gene family contraction may reflect historical demographic constraints combined with local adaptation to the inland wetlands of the Mexican Plateau, rather than recent anthropogenic pressures or domestication-related selection [25,26,77]. This contraction-dominated pattern likely reflects long-term demographic constraints and ecological specialization rather than recent anthropogenic selection or domestication-related processes, consistent with the species’ restricted distribution and endemism. Functional contrasts between expanded and contracted gene families Functional annotation revealed a clear asymmetry between expanded and contracted gene families in A. diazi . Expanded families were overwhelmingly dominated by domains associated with retrotransposition and mobile genetic elements, including reverse transcriptase (RVT_1), RNase H, and integrase-related domains [23,24,60,61]. This pattern indicates that the most pronounced expansions in A. diazi are driven primarily by transposable element–related activity or by genes derived from such elements, consistent with previous observations in avian genomes where TE activity occurs in episodic, lineage-specific bursts rather than through genome-wide expansion [23,50,61,78]. Because these expansions are largely TE-derived, they are unlikely to reflect functional innovation in core biological pathways but instead represent localized genomic activity consistent with episodic transposable element dynamics in compact avian genomes. In contrast, contracted gene families were functionally heterogeneous and lacked enrichment for any single biological pathway. Contracted orthogroups included genes associated with cellular structure, membrane components, transport processes, mitochondrial function, and general metabolic activity [15,25,36,62]. Importantly, no signal of transposable element–related domain enrichment was detected among contracted families, suggesting that gene loss is not simply a byproduct of genome-wide TE purging, but rather reflects diffuse, localized gene loss across diverse functional categories [24,26,54]. This functional asymmetry suggests that distinct evolutionary mechanisms underlie gene family gains and losses in A. diazi, with expansions driven primarily by mobile genetic elements and contractions reflecting gradual reduction across multiple cellular processes, likely shaped by long-term demographic history and stabilizing selection rather than episodic directional selection [36,53,77,79]. Comparative evolutionary trajectories within Anseriformes When placed in a broader phylogenetic context, A. diazi clusters with Aix galericulata in exhibiting contraction-dominated gene family evolution, whereas A. platyrhynchos and Anser cygnoides show expansion-biased trajectories [36,40,52]. These contrasting patterns likely reflect differences in life history, ecological breadth, demographic history, and domestication intensity across Anseriformes [25,26,53,77]. The mallard ( A. platyrhynchos ), a species with broad geographic distribution and a documented history of domestication, introgression, and human-mediated selection, displayed numerous expansions in immune, metabolic, and regulatory gene families, consistent with previous genomic studies [34,35,55]. Similarly, A. cygnoides , a domesticated goose species, showed moderate but consistent gene family expansions, reflecting artificial selection and relaxed selective constraints in managed populations [56,57]. In contrast, the contraction-biased profiles of A. diazi and A. galericulata may reflect more specialized ecological niches, reduced effective population sizes, and limited exposure to artificial or anthropogenic selection pressures, resulting in long-term gene repertoire streamlining rather than expansion [36,53,77,79,80]. Evolutionary and conservation implications The genomic patterns described here suggest that the evolutionary history of Anas diazi has been shaped by a combination of genome size stability, pervasive gene family depuration, and limited but pronounced expansion of mobile element–related sequences [25,36,53,77]. This constellation of features is consistent with long-term adaptation to a relatively stable but geographically restricted ecological niche, combined with demographic constraints associated with endemism, reduced effective population size, and limited opportunities for range expansion [26,80,81]. Similar contraction-dominated genomic trajectories have been reported in other wild bird species with specialized ecologies, supporting the hypothesis that genome streamlining may be favored under persistent stabilizing selection [36,53,82]. From a conservation perspective, the availability of a reference genome and the identification of lineage-specific genomic features provide essential resources for future studies addressing population structure, local adaptation, introgressive hybridization with A. platyrhynchos , and the genomic basis of ecological specialization [55,83–85]. In particular, the ability to distinguish species-specific gene family contractions and expansions may facilitate the detection of adaptive versus neutral variation in conservation genomics frameworks. More broadly, this study contributes to a growing body of evidence indicating that wild, non-domesticated waterfowl follow evolutionary genomic trajectories that differ fundamentally from those of their domesticated relatives, underscoring the importance of including endemic and understudied taxa in comparative and conservation-oriented genomics [34,35,56,57,79,86]. Conclusions This study presents the first whole-genome assembly and comparative genomic analysis of the Mexican duck ( Anas diazi ), providing a foundational genomic resource for an endemic and understudied waterfowl species. Using PacBio HiFi long-read sequencing, we generated a de novo assembly that captures the majority of conserved avian genes and yields a genome size estimate consistent with other members of Anatidae. Despite its compact genome and low repetitive content, A. diazi exhibits a strikingly asymmetric pattern of gene family evolution, characterized by widespread contraction and limited but pronounced expansion of a small number of families. Notably, expanded gene families are overwhelmingly associated with retrotransposition-related domains, indicating that genomic innovation in this species is driven primarily by localized activity of mobile-element–derived sequences rather than genome-wide expansion. Comparative analyses across Anseriformes place A. diazi alongside Aix galericulata as contraction-dominated lineages, in contrast to the expansion-biased profiles observed in domesticated or ecologically generalist species such as Anas platyrhynchos , Anser cygnoides , and Gallus gallus . These contrasting evolutionary trajectories suggest that gene family depuration may be associated with ecological specialization, demographic history, and the absence of artificial selection. Beyond its evolutionary insights, the genomic resource presented here provides an essential foundation for future studies on population genomics, local adaptation, and hybridization in A. diazi . More broadly, this work underscores the importance of including endemic and non-model species in comparative genomics to fully capture the diversity of evolutionary processes shaping avian genomes. Materials and Methods Collection of biological material Biological samples were obtained at the UMA Ejido Capulhuac, located in Polygon Two of the Ciénegas del Lerma Flora and Fauna Protection Area (19°12′45″ N, 99°27′30″ W), State of Mexico. Blood samples were collected from a female Mexican duck ( Anas diazi ; Fig. 1 ) by venipuncture of the brachial (ulnar) vein using a 23 G hypodermic needle and a 3 mL syringe, following standard avian sampling protocols [87,88]. Blood was immediately transferred to EDTA-coated microtainer tubes to prevent coagulation and preserve nucleic acid integrity. Sampling was conducted during the 2024 hunting season through legally authorized hunting activities, in accordance with Mexican wildlife regulations and ethical guidelines for the use of vertebrates in research [89]. Collected samples were preserved in liquid nitrogen and transported to the Molecular Genetics Laboratory at the Instituto de Ecología, Universidad Nacional Autónoma de México (UNAM), where they were stored until processing. DNA extraction and quality assessment Genomic DNA was extracted using the DNeasy Blood & Tissue Kit (Qiagen), following the manufacturer’s protocol for whole blood samples, a method widely used for high-quality avian genomic DNA isolation [90]. DNA integrity and molecular weight were assessed by electrophoresis on 0.8% agarose gels using high–molecular weight DNA markers (25 kb and 50 kb), ensuring suitability for long-read sequencing [91]. DNA concentration was measured fluorometrically using the Qubit Broad Range (BR) dsDNA kit and a Qubit 3 fluorometer (Invitrogen), while DNA purity was evaluated using NanoDrop Lite spectrophotometer readings (Thermo Scientific), assessing 260/280 and 260/230 absorbance ratios [92]. Prior to quantification and visualization, DNA samples were treated with RNase A to remove residual RNA. High-quality DNA was dehydrated using a Savant® DNA SpeedVac® system (Thermo Scientific) prior to shipment for sequencing. Genome sequencing and quality control High-molecular-weight genomic DNA was sent to Innomics Inc. (Sunnyvale, CA, USA) for PacBio HiFi library construction and sequencing. Libraries were prepared according to Pacific Biosciences specifications and sequenced on the PacBio Revio platform using extended read chemistry, generating HiFi reads with expected lengths of approximately 12–18 kb and high per-base accuracy [93,94]. Raw sequencing reads were evaluated using FastQC to assess base quality, read length distribution, GC content, and overall sequencing performance [95]. These metrics guided subsequent filtering and assembly strategies. Read filtering and genome assembly Initial read filtering was performed using Trimmomatic to remove low-quality and excessively short reads [96]. In parallel, raw reads were independently filtered using QIAGEN CLC Genomics Workbench version 25 to confirm read quality and consistency across software pipelines. Two assembly strategies were explored. First, an exploratory assembly was generated using SPAdes configured for long-read data [97]. Second, and ultimately used for downstream analyses, a de novo assembly was performed using the HiFi long-read genome assembly module implemented in QIAGEN CLC Genomics Workbench version 25, optimized for PacBio HiFi data. Assembly quality and contiguity metrics were evaluated using QUAST [98]. To assess overall genomic similarity and read mapping efficiency, filtered reads were aligned to the Anas platyrhynchos reference genome (GCF_015476345.1) using Bowtie2 [99], and independently using QIAGEN CLC Genomics Workbench. Genome size estimation Genome size estimation was performed using k-mer frequency analysis based on PacBio HiFi reads. K-mer histograms were generated and modeled using GenomeScope2 to estimate haploid genome size, heterozygosity, and repetitive content [100]. Model fit values were used to assess the reliability and robustness of genome size estimates. Genome annotation and functional characterization Structural annotation of the Anas diazi genome was performed using AUGUSTUS, with gene models trained using the Gallus gallus reference annotation as a guide [101]. Independent annotation was also tested within QIAGEN CLC Genomics Workbench version 25 for comparison and validation. Functional annotation of predicted protein-coding genes was conducted using InterProScan, enabling the identification of conserved protein domains and the assignment of Gene Ontology (GO) terms [102]. Annotation completeness and redundancy were evaluated using BUSCO with the appropriate avian lineage dataset [103]. Gene Ontology enrichment analyses and interaction-based visualization of annotated genes were performed using STRING v12.0 [104]. All functional annotations were inferred by homology, as Anas diazi lacks a curated reference genome. Orthology inference and gene family evolution analysis Orthologous gene families were inferred using OrthoFinder, based on predicted protein sequences from Anas diazi, Anas platyrhynchos, Aix galericulata, Anser cygnoides , and Gallus gallus [105]. Single-copy orthologs were used to generate a species phylogeny, which was subsequently rendered ultrametric for downstream evolutionary modeling. Gene family expansion and contraction analyses were performed using CAFE5 under a probabilistic birth–death model [106]. Significant changes in gene family size were identified using branch-specific likelihood estimates and corrected p-values. Expanded and contracted orthogroups identified in Anas diazi were further subjected to functional annotation using InterProScan and eggNOG-mapper to characterize the biological processes underlying gene family evolution [107]. Declarations Declaration of competing interest The authors declare no competing interests. Ethics statement No experimental procedures were performed on live animals in this study. Biological samples (blood and liver tissue) were obtained from a single female Mexican duck ( Anas diazi ) that was legally harvested during the authorized hunting season within a registered Wildlife Management Unit (Unidad de Manejo para la Conservación de la Vida Silvestre, UMA Ejido Capulhuac), located in Polygon Two of the Ciénegas del Lerma Flora and Fauna Protection Area, State of Mexico. Scientific collection authorization was granted by the Dirección General de Vida Silvestre, Secretaría de Medio Ambiente y Recursos Naturales (SEMARNAT), Government of Mexico, under permit number SPARN/DGVS/09070/24 (dated August 1, 2024), issued to the first author. Because no live animal experimentation, capture, anesthesia, or euthanasia was performed by the research team, this study did not require institutional animal care or use committee (IACUC) or IRB approval. Funding Declaration Patricia Padilla-Aguilar received a postdoctoral fellowship from the Council of Science, Technology and Innovation of Hidalgo, Mexico (CITNOVA), for the completion of this work. Author Contribution Padilla-Aguilar Patricia, Bravo-Vinaja María Guadalupe, Colón-Quezada David, Contreras-Jiménez Gastón, Solano-De la Cruz Marco Tulio, conceived and designed the experiments.Padilla-Aguilar Patricia, Bravo-Vinaja María Guadalupe, Colón-Quezada David, Solano-De la Cruz Marco Tulio collected biological samples.Padilla-Aguilar Patricia, Bravo-Vinaja María Guadalupe, Colón-Quezada David, Contreras-Jiménez Gastón, Solano-De la Cruz Marco Tulio, performed the experiments and analyzed the data.Padilla-Aguilar Patricia, Bravo-Vinaja María Guadalupe, Colón-Quezada David, Contreras-Jiménez Gastón, Solano-De la Cruz Marco Tulio, wrote the article. All the authors read and approved the manuscript. Acknowledgement Special thanks to the Consejo de Ciencia, Tecnología e Innovación de Hidalgo (CITNOVA) for the grant awarded to carry out the postdoctoral stay; without this financial resource, this research would not have been possible. We would also like to thank the hunters who kindly donated the duck used for this study. We would like to thank the Dirección General de Vida Silvestre y Secretaría del Medio Ambiente y Recursos Naturales (SEMARNAT) for the permission granted for this study (SPARN/DGVS/09070/24). Finally, we would like to thank Mr. Tomás Ramírez Barón, Mr. Roberto Reza and Engineer Jaqueline Delgado Nava for all the support provided for the collection of biological samples. Data Availability The data that support the findings of this study are available from the corresponding author upon reasonable request. The Raw data for this study can be found in the NCBI SRA repository with accession number BioProject **PRJNA1212198** , BioSample accession SAMN46294654. References Carboneras C, Kirwan GM. Family Anatidae (ducks, geese and swans). In: del Hoyo J, Elliott A, Sargatal J, Christie DA, de Juana E, editors. Handbook of the Birds of the World Alive. Barcelona: Lynx Edicions; 2018. Navarro-Sigüenza AG, Rebón-Gallardo MF, Gordillo-Martínez A, Townsend Peterson A, Berlanga-García H, Sánchez-González LA. Biodiversidad de aves en México. Rev Mex Biodivers. 2014;85(Suppl):S476–S495. Howell SNG, Webb S. A Guide to the Birds of Mexico and Northern Central America. Oxford: Oxford University Press; 1995. Secretaría de Medio Ambiente y Recursos Naturales (SEMARNAT). Norma Oficial Mexicana NOM-059-SEMARNAT-2010, Protección ambiental–Especies nativas de México de flora y fauna silvestres–Categorías de riesgo y especificaciones para su inclusión, exclusión o cambio–Lista de especies en riesgo. Diario Oficial de la Federación; 2010. BirdLife International. Species factsheet: Anas diazi. BirdLife International; 2023. Johnsgard PA. Ducks, Geese, and Swans of the World. Lincoln: University of Nebraska Press; 1978. Hubbard JP. The status of the Mexican Duck (Anas platyrhynchos diazi). Auk. 1977;94:554–566. Livezey BC. Phylogenetic relationships of dabbling ducks (tribe Anatini). Auk. 1991;108:471–507. Lavretsky P, McCracken KG. To hybridize or not to hybridize? A case study of North American dabbling ducks. Biol J Linn Soc. 2013;108:813–829. Chesser RT, Burns KJ, Cicero C, Dunn JL, Kratter AW, Lovette IJ, et al. Checklist of North American Birds of the American Ornithological Society. Auk. 2020;137:1–23. Lavretsky P, Peters JL, Winker K, McCracken KG. Phylogenomics of modern waterfowl (Anseriformes) using target enrichment. Mol Phylogenet Evol. 2020;142:106646. Rhie A, McCarthy SA, Fedrigo O, Damas J, Formenti G, Koren S, et al. Towards complete and error-free genome assemblies of all vertebrate species. Nature. 2021;592:737–746. Zhang G, Li B, Li C, Gilbert MTP, Jarvis ED, Wang J. Comparative genomics reveals insights into avian genome evolution and adaptation. Science. 2014;346:1311–1320. Feng S, Stiller J, Deng Y, Armstrong J, Fang Q, Reeve AH, et al. Dense sampling of bird diversity increases power of comparative genomics. Nature. 2020;587:252–257. Warren WC, Hillier LW, Tomlinson C, Minx P, Kremitzki M, Graves T, et al. A New Chicken Genome Assembly Provides Insight into Avian Genome Structure. G3 (Bethesda). 2017;7:109–117. Lovell PV, Wirthlin M, Wilhelm L, Minx P, Lazar NH, Carbone L, et al. Conserved syntenic clusters of protein coding genes are missing in birds. Genome Biol. 2014;15:565. Sun YB, Zhou WP, Liu HQ, Irwin DM, Zhang YP. Diversity and evolution of avian genomes. Mol Ecol. 2019;28:4241–4257. Zhou C, Wang N, Wang L, Han L, Zhang Y, Sun Z, et al. A chromosome-level genome assembly of the mallard (Anas platyrhynchos). Gigascience. 2021;10:giaa162. McCracken KG, Lavretsky P, Peters JL. Population genomic insights into the evolutionary history of North American dabbling ducks. Mol Ecol. 2016;25:3623–3640. Gregory TR. Genome size evolution in animals. In: Gregory TR, editor. The Evolution of the Genome. San Diego: Elsevier; 2005. p. 3–87. Zhang Q, Edwards SV. The evolution of intron size in amniotes: a role for powered flight? Genome Biol Evol. 2012;4:1033–1043. Wright NA, Gregory TR, Witt CC. Metabolic ‘engines’ of flight drive genome size reduction in birds. Proc R Soc B. 2014;281:20132780. Kapusta A, Suh A. Evolution of bird genomes—a transposon’s-eye view. Ann N Y Acad Sci. 2017;1389:164–185. Sotero-Caio CG, Platt RN II, Suh A, Ray DA. Evolution and diversity of transposable elements in vertebrate genomes. Genome Biol Evol. 2017;9:161–177. Organ CL, Shedlock AM, Meade A, Pagel M, Edwards SV. Origin of avian genome size and structure in non-avian dinosaurs. Nature. 2007;446:180–184. Wright NA, Gregory TR. Determinants of genome size variation in birds. Genome Biol Evol. 2017;9:245–257. Dufresnes C, Béziers P, Litvinchuk SN, Crochet PA. Genome size variation in birds: unresolved patterns and neglected taxa. J Avian Biol. 2021;52:e02768. Hahn MW, Demuth JP, Han SG. Accelerated rate of gene gain and loss in primates. Genetics. 2007;177:1941–1949. Emms DM, Kelly S. OrthoFinder: phylogenetic orthology inference for comparative genomics. Genome Biol. 2019;20:238. Han MV, Thomas GWC, Lugo-Martinez J, Hahn MW. Estimating gene gain and loss rates in the presence of error in genome assembly and annotation using CAFE 3. Mol Biol Evol. 2013;30:1987–1997. Qian W, Zhang J. Genomic evidence for adaptation by gene duplication. Genome Res. 2014;24:1356–1362. McCracken KG, Barger CP, Bulgarella M, Johnson KP, Kuhner MK, Moore AV, et al. Parallel evolution in the major haemoglobin genes of eight species of Andean waterfowl. Mol Ecol. 2009;18:3992–4005. Lavretsky P, Peters JL, McCracken KG. Population genomics of divergence and admixture between closely related species of North American dabbling ducks. Mol Ecol. 2019;28:265–281. Huang Y, Li Y, Burt DW, Chen H, Zhang Y, Qian W, et al. The duck genome and transcriptome provide insight into an avian influenza virus reservoir species. Nat Genet. 2013;45:776–783. Zhou Z, Li M, Cheng H, Fan W, Yuan Z, Gao Q, et al. An intercross population study reveals genes associated with body size and immune traits in ducks. BMC Genomics. 2018;19:612. Thomas GWC, Hahn MW. Gene-family evolution in mammals and birds. Annu Rev Ecol Evol Syst. 2014;45:191–216. Wenger AM, Peluso P, Rowell WJ, Chang PC, Hall RJ, Concepcion GT, et al. Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome. Nat Biotechnol. 2019;37:1155–1162. Vurture GW, Sedlazeck FJ, Nattestad M, Underwood CJ, Fang H, Gurtowski J, et al. GenomeScope: fast reference-free genome profiling from short reads. Bioinformatics. 2017;33:2202–2204. Ranallo-Benavidez TR, Jaron KS, Schatz MC. GenomeScope 2.0 and Smudgeplot for reference-free profiling of polyploid genomes. Nat Commun. 2020;11:1432. Mendes FK, Vanderpool D, Fulton B, Hahn MW. CAFE 5 models variation in evolutionary rates among gene families. Bioinformatics. 2020;36:5516–5518. Zhang G. Birds as a model system for comparative genomics. Nat Rev Genet. 2015;16:390–403. Hon T, Mars K, Young G, Tsai YC, Karalius JW, Landolin JM, et al. Highly accurate long-read HiFi sequencing data for five complex genomes. Sci Data. 2020;7:399. Pacific Biosciences. SMRT® Link User Guide: Circular Consensus Sequencing (CCS) Analysis. PacBio; 2022. Zhou C, Wang N, Wang L, Han L, Zhang Y, Sun Z, et al. A chromosome-level genome assembly of the mallard (Anas platyrhynchos). Gigascience. 2021;10:giaa162. Korlach J, Gedman G, Kingan SB, Chin CS, Howard JT, Audet JN, et al. De novo PacBio long-read and phased avian genome assemblies correct and add to reference genes generated with intermediate- and short-read sequencing. Gigascience. 2017;6:1–16. Stanke M, Waack S. Gene prediction with a hidden Markov model and a new intron submodel. Bioinformatics. 2003;19(Suppl 2):ii215–ii225. Simão FA, Waterhouse RM, Ioannidis P, Kriventseva EV, Zdobnov EM. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics. 2015;31:3210–3212. Jones P, Binns D, Chang HY, Fraser M, Li W, McAnulla C, et al. InterProScan 5: genome-scale protein function classification. Bioinformatics. 2014;30:1236–1240. Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, et al. Gene ontology: tool for the unification of biology. Nat Genet. 2000;25:25–29. Kapusta A, Suh A, Feschotte C. Dynamics of genome size evolution in birds and mammals. Proc Natl Acad Sci U S A. 2017;114:E1460–E1469. Zhang G, Jarvis ED, Gilbert MTP. A genomic perspective on the origin and evolution of birds. Genome Biol. 2014;15:502. Hahn MW, Han MV, Han SG. Gene family evolution across 12 Drosophila genomes. PLoS Genet. 2007;3:e197. Lynch M, Conery JS. The evolutionary fate and consequences of duplicate genes. Science. 2000;290:1151–1155. Nei M, Rooney AP. Concerted and birth-and-death evolution of multigene families. Annu Rev Genet. 2005;39:121–152. Qanbari S, Rubin CJ, Maqbool K, Weigend S, Weigend A, Geibel J, et al. Genetics of adaptation in modern chicken. PLoS Genet. 2019;15:e1007989. Wang MS, Thakur M, Peng MS, Jiang Y, Frantz LAF, Li M, et al. 863 genomes reveal the origin and domestication of chicken. Cell. 2020;180:1080–1096.e6. Rubin CJ, Zody MC, Eriksson J, Meadows JRS, Sherwood E, Webster MT, et al. Whole-genome resequencing reveals loci under selection during chicken domestication. Nature. 2010;464:587–591. Helsen P, Van Den Broeck M, Van Houdt J, Volckaert FAM. Gene family evolution and adaptation in vertebrates. Mol Biol Evol. 2020;37:301–315. Cantalapiedra CP, Hernández-Plaza A, Letunic I, Bork P, Huerta-Cepas J. eggNOG-mapper v2: functional annotation, orthology assignments, and domain prediction at the metagenomic scale. Mol Biol Evol. 2021;38:5825–5829. Feschotte C, Pritham EJ. DNA transposons and the evolution of eukaryotic genomes. Annu Rev Genet. 2007;41:331–368. Suh A, Kapusta A, Churakov G, et al. Early mesozoic coexistence of amniotes and transposable elements. Genome Res. 2014;24:1514–1524. O’Connor RE, Farré M, Joseph S, et al. Patterns of structural variation in avian genomes. Genome Res. 2019;29:1981–1994. Kanehisa M, Goto S. KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res. 2000;28:27–30. Kanehisa M, Sato Y, Kawashima M, Furumichi M, Tanabe M. KEGG as a reference resource for gene and protein annotation. Nucleic Acids Res. 2016;44:D457–D462. Jassal B, Matthews L, Viteri G, Gong C, Lorente P, Fabregat A, et al. The Reactome pathway knowledgebase. Nucleic Acids Res. 2020;48:D498–D503. Lagerström MC, Schiöth HB. Structural diversity of G protein-coupled receptors and significance for drug discovery. Nat Rev Drug Discov. 2008;7:339–357. Nordström KJV, Lagerström MC, Wallér LM, Fredriksson R, Schiöth HB. The secretin GPCRs descended from the family of adhesion GPCRs. Mol Biol Evol. 2009;26:71–84. Dong Y, Jones G, Zhang S. Dynamic evolution of GPCR genes in vertebrates. BMC Evol Biol. 2016;16:206. Richards EJ, Rosas U, Banta J, Bhambhra N, Purugganan MD. Genome-wide patterns of Arabidopsis gene duplication and their evolutionary implications . PLoS Genet . 2012;8:e1002973. Ellegren H. Evolutionary stasis: the stable chromosomes of birds . Trends Ecol Evol . 2010;25:283–291. Wright NA, Gregory TR, Witt CC. Metabolic ‘engines’ of flight drive genome size reduction in birds. Proc R Soc B. 2014;281:20132780. Jarvis ED, Mirarab S, Aberer AJ, Li B, Houde P, Li C, et al. Whole-genome analyses resolve early branches in the tree of life of modern birds . Science . 2014;346:1320–1331. Fredriksson R, Lagerström MC, Lundin LG, Schiöth HB. The G-protein-coupled receptors in the human genome form five main families . Mol Pharmacol . 2003;63:1256–1272. McWilliams SR, Guglielmo C, Pierce B, Klaassen M. Flying, fasting, and feeding in birds during migration: a nutritional and physiological ecology perspective . J Avian Biol . 2004;35:377–393. Williams TD. Physiological Adaptations for Breeding in Birds . Princeton: Princeton University Press; 2012. Alfoldi J, Lindblad-Toh K. Comparative genomics as a tool to understand evolution and disease. Genome Res. 2013;23:1063–1069. Frankham R. Genetics and extinction. Biol Conserv. 2005;126:131–140. Bourque G, Burns KH, Gehring M, et al. Ten things you should know about transposable elements. Genome Biol. 2018;19:199. Lynch M. The origins of genome architecture. Sunderland: Sinauer Associates; 2007. Ellegren H, Galtier N. Determinants of genetic diversity . Nat Rev Genet . 2016;17:422–433. Frankham R. Genetics and extinction. Biol Conserv. 2005;126:131–140. Zhang G, Li B, Li C, et al. Comparative genomics reveals insights into avian genome evolution and adaptation. Science. 2014;346:1311–1320. Allendorf FW, Luikart G, Aitken SN. Conservation and the Genetics of Populations. 2nd ed. Wiley-Blackwell; 2013. Funk WC, McKay JK, Hohenlohe PA, Allendorf FW. Harnessing genomics for delineating conservation units. Trends Ecol Evol. 2012;27:489–496. Shafer ABA, Wolf JBW, Alves PC, et al. Genomics and the challenging translation into conservation practice. Trends Ecol Evol. 2015;30:78–87. Dussex N, von Seth J, Knapp M, et al. Genomes and the conservation of endangered species. Annu Rev Anim Biosci. 2021;9:519–545. Fair J, Paul E, Jones J, eds. Guidelines to the Use of Wild Birds in Research . 3rd ed. Ornithological Council; 2010. Sheldon LD, Chin EH, Gill SA, Schmaltz G, Newman AEM, Soma KK. Effects of blood collection on wild birds. J Avian Biol. 2008;39:720–726 Secretaría de Medio Ambiente y Recursos Naturales (SEMARNAT). Ley General de Vida Silvestre. México; 2023. Sambrook J, Russell DW. Molecular Cloning: A Laboratory Manual. 3rd ed. Cold Spring Harbor Laboratory Press; 2001. Koren S, Phillippy AM. One chromosome, one contig: complete microbial genomes from long-read sequencing. Curr Opin Microbiol. 2015;23:110–120. Simbolo M, Gottardi M, Corbo V, et al. DNA qualification workflow for next generation sequencing of histopathological samples. PLoS One. 2013;8:e62692. Wenger AM, Peluso P, Rowell WJ, et al. Accurate circular consensus long-read sequencing improves variant detection and assembly. Nat Biotechnol. 2019;37:1155–1162. Hon T, Mars K, Young G, et al. Highly accurate long-read HiFi sequencing data for five complex genomes. Sci Data. 2020;7:399. Andrews S. FastQC: a quality control tool for high throughput sequence data. 2010. Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. 2014;30:2114–2120. Bankevich A, Nurk S, Antipov D, et al. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J Comput Biol. 2012;19:455–477. Gurevich A, Saveliev V, Vyahhi N, Tesler G. QUAST: quality assessment tool for genome assemblies. Bioinformatics. 2013;29:1072–1075. Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nat Methods. 2012;9:357–359. Ranallo-Benavidez TR, Jaron KS, Schatz MC. GenomeScope 2.0 and Smudgeplot for reference-free profiling of polyploid genomes. Nat Commun. 2020;11:1432. Stanke M, Diekhans M, Baertsch R, Haussler D. Using native and syntenically mapped cDNA alignments to improve de novo gene finding. Bioinformatics. 2008;24:637–644. Jones P, Binns D, Chang HY, et al. InterProScan 5: genome-scale protein function classification. Bioinformatics. 2014;30:1236–1240. Simão FA, Waterhouse RM, Ioannidis P, Kriventseva EV, Zdobnov EM. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics. 2015;31:3210–3212. Szklarczyk D, Kirsch R, Koutrouli M, et al. The STRING database in 2023. Nucleic Acids Res. 2023;51:D638–D646. Emms DM, Kelly S. OrthoFinder: phylogenetic orthology inference for comparative genomics. Genome Biol. 2019;20:238. Mendes FK, Vanderpool D, Fulton B, Hahn MW. CAFE 5 models variation in evolutionary rates among gene families. Bioinformatics. 2020;36:5516–5518. Cantalapiedra CP, Hernández-Plaza A, Letunic I, Bork P, Huerta-Cepas J. eggNOG-mapper v2. Mol Biol Evol. 2021;38:5825–5829. Tables Tables 1 to 4 are available in the Supplementary Files section. Table 5 is not available with this version. Additional Declarations No competing interests reported. Supplementary Files Table1.xlsx Table 1. QUAST (Quality Assessment Tool for Genome Assemblies) analysis of the Anas diazi HiFi de novo assembly. We observe the size of the contigs, the number of contigs, and the N and L values obtained from this assembly. Table2.xlsx Table 2.Determination of genome size using GenomeScope version 2.0 software, based on PacBio HiFi reads and k-mer frequency. Table3.xlsx Table 3. Statistical summary of the model for estimating genome size based on Karemos frequency. Table4.xlsx Table 4. Summary of contraction and expansion events in OGs (gene families determined with Orthofinder), analyzed with CAFE5 (see Supplementary_file_6 for the complete analysis and its statistical support values). * Aix_galericulata does not contain OGs with contractions. FigureSupplementary1.png Figure_Supplementary_1. Analysis of Gene Ontology enrichment, Biological Function, performed in STRING software, from transcripts annotated with Augustus and evaluated with BUSCO. Supplementaryfile6.tsv Supplementaryfile3.txt Supplementaryfile5.tsv Supplementaryfile4.tabular Supplementaryfile1.pdf Supplementaryfile2.fasta Cite Share Download PDF Status: Under Revision Version 1 posted Editorial decision: Revision requested 06 Mar, 2026 Reviewers agreed at journal 04 Mar, 2026 Reviewers agreed at journal 04 Mar, 2026 Reviewers agreed at journal 02 Mar, 2026 Reviewers agreed at journal 27 Feb, 2026 Reviews received at journal 26 Feb, 2026 Reviews received at journal 25 Feb, 2026 Reviewers agreed at journal 25 Feb, 2026 Reviews received at journal 22 Feb, 2026 Reviewers agreed at journal 18 Feb, 2026 Reviewers agreed at journal 15 Feb, 2026 Reviewers agreed at journal 14 Feb, 2026 Reviewers invited by journal 12 Feb, 2026 Editor assigned by journal 12 Feb, 2026 Editor invited by journal 11 Feb, 2026 Submission checks completed at journal 07 Feb, 2026 First submitted to journal 07 Feb, 2026 You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-8736544","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":591705550,"identity":"52f87161-eabb-4937-814a-8e929aafeb70","order_by":0,"name":"Patricia Padilla-Aguilar","email":"","orcid":"","institution":"Colegio de Postgraduados Campus Montecillo","correspondingAuthor":false,"prefix":"","firstName":"Patricia","middleName":"","lastName":"Padilla-Aguilar","suffix":""},{"id":591705551,"identity":"8bfb4e2d-77ef-413d-b1bd-c767b275e010","order_by":1,"name":"María Guadalupe Bravo-Vinaja","email":"","orcid":"","institution":"Colegio de Postgraduados Campus Montecillo","correspondingAuthor":false,"prefix":"","firstName":"María","middleName":"Guadalupe","lastName":"Bravo-Vinaja","suffix":""},{"id":591705552,"identity":"5ccaf54f-70d5-4a11-ac82-f9f5eec64c6d","order_by":2,"name":"David Colón-Quezada","email":"","orcid":"","institution":"Comisión Nacional de Áreas Naturales Protegidas","correspondingAuthor":false,"prefix":"","firstName":"David","middleName":"","lastName":"Colón-Quezada","suffix":""},{"id":591705553,"identity":"b2e611a4-1e9d-481a-b9ee-baf41d419f13","order_by":3,"name":"Gastón Contreras-Jiménez","email":"","orcid":"","institution":"Universidad Nacional Autónoma de México","correspondingAuthor":false,"prefix":"","firstName":"Gastón","middleName":"","lastName":"Contreras-Jiménez","suffix":""},{"id":591705554,"identity":"612d50c0-5298-4d15-b70c-28bf4c3cc127","order_by":4,"name":"Marco Tulio Solano-De la Cruz","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAABQElEQVRIie2RMUvDQBTHrwQuyyuuF2qaTyAkBOJS8IO4JEu6NJNQKmiIBC5TzWq/RV2CmxcO2iXa1THZLSiC0KHgpXFJtVI3kfyGBw/ej/+9dwg1NPxNWqwsgVx1XVzWHCH4yakUqWrMjWL/RnGCT2UnRzFnbHXXU2NJKt7Oqd+/lsMit0eXhwcBWPlqgE51VlOsJ9dOx5lrTkJsKg8UexRmpm5ncyAMjo1xgrxJsKWAztqUO1OOrNYVBY+SASYOnQFiYJG2UOL6w6xFpqdrodxz+V0opI+150rRhKKshUK3FDbQ+SZFgjJFtzGBUrkQ6WB1vksRu3CVuuYNhzMleLQNCm65CwOD42FHTciXXRY8LZZUXCyKbl+Doa9pES/yl5F/0p2HibJMet60frGdcISqnyL7zQv8vScbGhoa/j0fDzJ2JSaDYooAAAAASUVORK5CYII=","orcid":"","institution":"Universidad Nacional Autónoma de México","correspondingAuthor":true,"prefix":"","firstName":"Marco","middleName":"Tulio Solano-De la","lastName":"Cruz","suffix":""}],"badges":[],"createdAt":"2026-01-30 03:23:43","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-8736544/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-8736544/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":102963389,"identity":"c57eaacb-6e43-4730-a69f-85ec82cedd83","added_by":"auto","created_at":"2026-02-19 04:16:38","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":27613,"visible":true,"origin":"","legend":"\u003cp\u003eGenome assembly evaluation graph, performed with QUAST software. We observed the length of the Anas platyrhynchos genome (reference, 1.2 Gb), and the distribution of the obtained contigs and their length, in the set of Anas diazi (pato_1_fasta).\u003c/p\u003e","description":"","filename":"Figure1.png","url":"https://assets-eu.researchsquare.com/files/rs-8736544/v1/b6d5c097180df70657c7f829.png"},{"id":102964440,"identity":"fe4f7f2d-537c-4d64-a4bb-a1b70545d976","added_by":"auto","created_at":"2026-02-19 04:22:18","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":90617,"visible":true,"origin":"","legend":"\u003cp\u003eResult of the evaluation of the annotation of the transcripts obtained with Augustus software, using the BUSCO software for transcripts completeness analysis.\u003c/p\u003e","description":"","filename":"Figure2.png","url":"https://assets-eu.researchsquare.com/files/rs-8736544/v1/7313f9eda4a84a9e1bdf6ce9.png"},{"id":102894874,"identity":"2eff914a-1bfe-4697-a58c-891a76d64671","added_by":"auto","created_at":"2026-02-18 06:14:54","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":84947,"visible":true,"origin":"","legend":"\u003cp\u003eGenome size determination using GenomeScope version 2.0 software, based on PacBio HiFi reads and k-mer frequency. Transformed logarithmic plot, including the model, coverage, and k-mer frequencies.\u003c/p\u003e","description":"","filename":"Figure3.png","url":"https://assets-eu.researchsquare.com/files/rs-8736544/v1/78c9296d39e1b6ce85df66cb.png"},{"id":102894881,"identity":"9a037a96-6f41-464e-b3d9-e6aa274652d3","added_by":"auto","created_at":"2026-02-18 06:14:55","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":39121,"visible":true,"origin":"","legend":"\u003cp\u003eA) Dynamics of gene family expansions and contractions in Anas diazi. B) Magnitude of expansions and contractions within each gene family in Anas diazi.\u003c/p\u003e","description":"","filename":"Figure4.png","url":"https://assets-eu.researchsquare.com/files/rs-8736544/v1/7a7d44dbd8ba83821f4b67f7.png"},{"id":102963641,"identity":"39e3dafd-6fb8-4dbd-926f-d16a8cec82a0","added_by":"auto","created_at":"2026-02-19 04:19:42","extension":"png","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":24274,"visible":true,"origin":"","legend":"\u003cp\u003eFunctional annotation of contracted gene families in Anas diazi, determined by CAFE5 from OthoFinder analysis.\u003c/p\u003e","description":"","filename":"Figure5.png","url":"https://assets-eu.researchsquare.com/files/rs-8736544/v1/4e0bac603faaab912c139365.png"},{"id":102894878,"identity":"c0913f82-05b5-44d6-aa68-10ed5a400b31","added_by":"auto","created_at":"2026-02-18 06:14:55","extension":"png","order_by":6,"title":"Figure 6","display":"","copyAsset":false,"role":"figure","size":175467,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eComparative patterns of gene family size evolution across waterfowl and outgroup species.\u003c/strong\u003e (A) Bar plot showing the number of significantly expanded and contracted gene families per species, as inferred by CAFE5. Anas diazi and Aix galericulata exhibit a contraction-dominated profile, whereas Anas platyrhynchos, Anser cygnoides, and Gallus gallus display predominantly expansion-biased trajectories. (B) Heatmap summarizing the relative directionality of gene family size changes across species, where red indicates enrichment of expanded families and blue indicates enrichment of contracted families. Together, these panels highlight contrasting evolutionary trajectories among closely related Anatidae lineages and the outgroup species.\u003c/p\u003e","description":"","filename":"Figure6A6B.png","url":"https://assets-eu.researchsquare.com/files/rs-8736544/v1/84733fd62bd147052597c688.png"},{"id":102894870,"identity":"73ab60ae-a60f-4f42-815b-31bd6accb2d6","added_by":"auto","created_at":"2026-02-18 06:14:54","extension":"png","order_by":7,"title":"Figure 7","display":"","copyAsset":false,"role":"figure","size":28235,"visible":true,"origin":"","legend":"\u003cp\u003eFunctional annotation of contracted gene families in Anas diazi, inferred by CAFE5 from OrthoFinder orthogroups, and contrasted with the corresponding expanded gene families in other analyzed species within the same functional categories.\u003c/p\u003e","description":"","filename":"Figure7.png","url":"https://assets-eu.researchsquare.com/files/rs-8736544/v1/5ea16de053fe63cc318c1335.png"},{"id":102965479,"identity":"265b5704-5f08-4f1d-852f-f8a3749ba709","added_by":"auto","created_at":"2026-02-19 04:31:38","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":1861187,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-8736544/v1/b1433d1a-a99c-4caa-8017-834e89cbe4d4.pdf"},{"id":102894867,"identity":"40976679-7b36-4f64-bd2a-3b0f5b7b7899","added_by":"auto","created_at":"2026-02-18 06:14:54","extension":"xlsx","order_by":1,"title":"","display":"","copyAsset":false,"role":"supplement","size":10740,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eTable 1.\u003c/strong\u003e QUAST (Quality Assessment Tool for Genome Assemblies) analysis of the Anas diazi HiFi de novo assembly. We observe the size of the contigs, the number of contigs, and the N and L values obtained from this assembly.\u003c/p\u003e","description":"","filename":"Table1.xlsx","url":"https://assets-eu.researchsquare.com/files/rs-8736544/v1/8b5e7be699bb019685168338.xlsx"},{"id":102894873,"identity":"7a627cfe-d849-4760-9a96-f8376c131f85","added_by":"auto","created_at":"2026-02-18 06:14:54","extension":"xlsx","order_by":2,"title":"","display":"","copyAsset":false,"role":"supplement","size":10638,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eTable 2.\u003c/strong\u003eDetermination of genome size using GenomeScope version 2.0 software, based on PacBio HiFi reads and k-mer frequency.\u003c/p\u003e","description":"","filename":"Table2.xlsx","url":"https://assets-eu.researchsquare.com/files/rs-8736544/v1/b1e07a6308a06d0165bd893e.xlsx"},{"id":102894877,"identity":"1f2e7eb3-3af2-42f2-a3ca-99d63df00a64","added_by":"auto","created_at":"2026-02-18 06:14:55","extension":"xlsx","order_by":3,"title":"","display":"","copyAsset":false,"role":"supplement","size":10592,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eTable 3. \u003c/strong\u003eStatistical summary of the model for estimating genome size based on Karemos frequency.\u003c/p\u003e","description":"","filename":"Table3.xlsx","url":"https://assets-eu.researchsquare.com/files/rs-8736544/v1/7708174c24f378399324919d.xlsx"},{"id":102894876,"identity":"58d1ecc5-f06c-440e-866d-ca7e62429561","added_by":"auto","created_at":"2026-02-18 06:14:54","extension":"xlsx","order_by":4,"title":"","display":"","copyAsset":false,"role":"supplement","size":10282,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eTable 4.\u003c/strong\u003e Summary of contraction and expansion events in OGs (gene families determined with Orthofinder), analyzed with CAFE5 (see Supplementary_file_6 for the complete analysis and its statistical support values). * Aix_galericulata does not contain OGs with contractions.\u003c/p\u003e","description":"","filename":"Table4.xlsx","url":"https://assets-eu.researchsquare.com/files/rs-8736544/v1/bae117f119b68ba36281187b.xlsx"},{"id":102963411,"identity":"72411b28-4784-4a7e-95e7-8fcbbddf50bf","added_by":"auto","created_at":"2026-02-19 04:17:45","extension":"png","order_by":5,"title":"","display":"","copyAsset":false,"role":"supplement","size":196957,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eFigure_Supplementary_1.\u003c/strong\u003e Analysis of Gene Ontology enrichment, Biological Function, performed in STRING software, from transcripts annotated with Augustus and evaluated with BUSCO.\u003c/p\u003e","description":"","filename":"FigureSupplementary1.png","url":"https://assets-eu.researchsquare.com/files/rs-8736544/v1/407ea7f959e6df8e78a54fe5.png"},{"id":102963896,"identity":"68cf64f4-fc9a-4e19-be45-d6baf8be98b2","added_by":"auto","created_at":"2026-02-19 04:20:49","extension":"tsv","order_by":6,"title":"","display":"","copyAsset":false,"role":"supplement","size":21085,"visible":true,"origin":"","legend":"","description":"","filename":"Supplementaryfile6.tsv","url":"https://assets-eu.researchsquare.com/files/rs-8736544/v1/6d4fc64c922bb6e3df8ad96c.tsv"},{"id":102963387,"identity":"201f08be-6a55-45d7-9a3d-33de9c147573","added_by":"auto","created_at":"2026-02-19 04:16:38","extension":"txt","order_by":7,"title":"","display":"","copyAsset":false,"role":"supplement","size":1441,"visible":true,"origin":"","legend":"","description":"","filename":"Supplementaryfile3.txt","url":"https://assets-eu.researchsquare.com/files/rs-8736544/v1/58f023eba449dc5592e6b5a1.txt"},{"id":102894879,"identity":"c5b4f7c5-ed0f-4388-9e88-f608215d45ff","added_by":"auto","created_at":"2026-02-18 06:14:55","extension":"tsv","order_by":8,"title":"","display":"","copyAsset":false,"role":"supplement","size":256869,"visible":true,"origin":"","legend":"","description":"","filename":"Supplementaryfile5.tsv","url":"https://assets-eu.researchsquare.com/files/rs-8736544/v1/2e189a70977ab7b7c16e8288.tsv"},{"id":102894882,"identity":"8c7bf425-7f66-47e8-95e0-4c89d6045215","added_by":"auto","created_at":"2026-02-18 06:14:55","extension":"tabular","order_by":9,"title":"","display":"","copyAsset":false,"role":"supplement","size":602101,"visible":true,"origin":"","legend":"","description":"","filename":"Supplementaryfile4.tabular","url":"https://assets-eu.researchsquare.com/files/rs-8736544/v1/9fb8e1cd7a1ae8e8e2af854e.tabular"},{"id":102894884,"identity":"f1c335d4-de8d-4e0c-933c-8d5c602d90ea","added_by":"auto","created_at":"2026-02-18 06:14:55","extension":"pdf","order_by":10,"title":"","display":"","copyAsset":false,"role":"supplement","size":3997673,"visible":true,"origin":"","legend":"","description":"","filename":"Supplementaryfile1.pdf","url":"https://assets-eu.researchsquare.com/files/rs-8736544/v1/e5fd935c1e92f444d60d806f.pdf"},{"id":102894885,"identity":"116182ef-d9ee-4d93-8312-5080692f1539","added_by":"auto","created_at":"2026-02-18 06:14:56","extension":"fasta","order_by":11,"title":"","display":"","copyAsset":false,"role":"supplement","size":42642178,"visible":true,"origin":"","legend":"","description":"","filename":"Supplementaryfile2.fasta","url":"https://assets-eu.researchsquare.com/files/rs-8736544/v1/74aa3878601e3cceb148c87e.fasta"}],"financialInterests":"No competing interests reported.","formattedTitle":"Long Reads de novo sequencing of the Anas diazi genome reveals changes in gene orthology in waterfowl","fulltext":[{"header":"Introduction","content":"\u003cp\u003eThe Anatidae family comprises approximately 145 species of waterfowl worldwide, 33 of which are found in Mexico [1,2]. Among these, six species are considered resident, including the Mexican duck (Anas \u003cem\u003ediazi\u003c/em\u003e), the country's only endemic duck species [3]. \u003cem\u003eAnas diazi\u003c/em\u003e is distributed year-round in the inland wetlands of the Mexican Plateau and is currently listed as a threatened species under Mexican conservation legislation (NOM-059-SEMARNAT-2010) [4]. Population estimates suggest approximately 55,500 individuals worldwide, with nearly 98% concentrated in the Central Mexican Plateau, highlighting its restricted distribution and conservation importance [5].\u003c/p\u003e \u003cp\u003eThe Mexican duck belongs to the tribe Anatini and is classified as a dabbling duck, inhabiting shallow freshwater wetlands and exhibiting omnivorous feeding behavior [6]. Morphologically, \u003cem\u003eA. diazi\u003c/em\u003e closely resembles the female mallard (\u003cem\u003eAnas platyrhynchos\u003c/em\u003e), a similarity that has historically contributed to taxonomic uncertainty [7]. For decades, \u003cem\u003eA. diazi\u003c/em\u003e was considered a subspecies of \u003cem\u003eA. platyrhynchos\u003c/em\u003e or part of a complex with clinal variation in North America [8,9]. However, phylogenetic, morphological, and behavioral evidence accumulated over the past few decades led the American Ornithological Society to recognize Anas diazi as a distinct species in 2020 [10].\u003c/p\u003e \u003cp\u003eDespite its ecological and evolutionary importance, \u003cem\u003eA. diazi\u003c/em\u003e remains one of the least studied waterbird species in the Holarctic region, particularly from a genomic perspective [11]. Advances in long-read sequencing technologies, alone or in combination with short reads, have enabled \u003cem\u003ede novo\u003c/em\u003e genome assemblies for numerous bird species, including raptors, passerines, galliformes, and some aquatic birds [12\u0026ndash;14]. These resources have facilitated comparative genomic analyses of genome architecture, gene family evolution, and lineage-specific adaptations [15,16]. Within the genus \u003cem\u003eAnas\u003c/em\u003e, genomic data are now available for an increasing number of species, although assemblies vary substantially in depth, coverage, annotation quality, and availability [17,18]. Nevertheless, genomic resources remain limited, and a reference genome for the Mexican duck is still unavailable, despite its importance as the only endemic duck species in Mexico and its restricted distribution [11,19].\u003c/p\u003e \u003cp\u003eAvian genomes are characterized by relatively small and compact sizes compared to other vertebrates, typically ranging from 0.9 to 1.3 Gb, with low repetitive content and a streamlined genomic architecture [20,21]. Within this range, Anatidae genomes tend to cluster near the upper end, although substantial variation exists among species [22]. Differences in genome size among birds are primarily driven by variation in repetitive elements, segmental duplications, and lineage-specific expansions, rather than large-scale changes in gene number [23,24]. Previous studies suggest that genome size variation in birds may be influenced by life-history traits, metabolic constraints associated with flight, and population history [25,26]. However, genome size estimates remain unavailable for several endemic and non-model waterfowl species, including \u003cem\u003eAnas diazi\u003c/em\u003e, limiting comparative assessments of genomic architecture and evolutionary dynamics within the family [22,27].\u003c/p\u003e \u003cp\u003eComparative genomic studies in birds have revealed that variation in gene family size plays a significant role in lineage-specific adaptation, ecological specialization, and domestication [28,29]. Analyses across avian genomes consistently show that gene family expansions and contractions are often associated with immune response, sensory perception, metabolism, and developmental processes [30,31]. In waterfowl and other avian lineages, changes in gene family composition have been linked to ecological transitions, migratory behavior, dietary specialization, and adaptation to aquatic environments [32,33]. Furthermore, domesticated or semi-domesticated lineages, such as the mallard (\u003cem\u003eAnas platyrhynchos\u003c/em\u003e), frequently exhibit gene family expansions related to metabolism, reproduction, and immune regulation, reflecting both artificial selection and the relaxation of selective constraints [34,35]. Despite these advances, the evolution of gene families remains poorly characterized in many endemic and non-model waterfowl species, limiting our understanding of how genomic innovation and gene loss have contributed to diversification within Anatidae [36].\u003c/p\u003e \u003cp\u003eHere we present the first whole-genome analysis of \u003cem\u003eAnas diazi\u003c/em\u003e, based on PacBio HiFi long-read sequencing and de novo assembly [37]. We estimated genome size using k-mer frequency analysis and GenomeScope2 modeling and compared genomic characteristics with those of \u003cem\u003eA. platyrhynchos\u003c/em\u003e and other representative waterfowl species [38,39]. Furthermore, we investigated the evolution of gene families in \u003cem\u003eA. diazi\u003c/em\u003e, \u003cem\u003eA. platyrhynchos, Aix galericulata, Anser cygnoides\u003c/em\u003e, and \u003cem\u003eGallus gallus\u003c/em\u003e using orthology inference and probabilistic modeling of gene family expansion and contraction [29,30,40]. This study provides a fundamental genomic resource for \u003cem\u003eA. diazi\u003c/em\u003e and contributes to a broader understanding of genome evolution, specialization, and diversification in Anatidae [36,41].\u003c/p\u003e"},{"header":"Results","content":"\u003cdiv id=\"Sec3\" class=\"Section2\"\u003e \u003ch2\u003eGenome sequencing, assembly and annotation\u003c/h2\u003e \u003cp\u003ePacBio HiFi sequencing of \u003cem\u003eAnas diazi\u003c/em\u003e generated a total of 17,611,587,254 base pairs, corresponding to 1,374,023 long reads [42]. The average read length was 12,817.5 bp, with maximum read lengths reaching 55,850 bp, while the shortest reads were 75 bp, consistent with the performance characteristics of circular consensus sequencing [37,42]. After quality filtering and trimming, approximately 99% of the original sequences were retained, resulting in 1,360,282 high-quality reads used for downstream analyses [43] \u003cb\u003e(See Supplementary_file_1\u003c/b\u003e).\u003c/p\u003e \u003cp\u003ePrior to de novo assembly, filtered reads were aligned against the \u003cem\u003eAnas platyrhynchos\u003c/em\u003e reference genome (GCF_015476345.1), yielding an alignment rate of 86.78%, indicating substantial genomic similarity while retaining a significant fraction of \u003cem\u003eA. diazi\u003c/em\u003e-specific sequences [44]. \u003cem\u003eDe novo\u003c/em\u003e assembly produced 4,570 contigs with a total assembled length of 1,058,566,110 bp (~\u0026thinsp;1.06 Gb), consistent with genome sizes reported for other Anatidae species (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003e) [18,22]. The longest contig measured 3,234,602 bp, with an N50 of 485,801 bp. Assembly continuity metrics indicated an L50 of 601 contigs and an L90 of 2,466 contigs, values comparable to other long-read\u0026ndash;based avian genome assemblies lacking chromosomal scaffolding [12,45], \u003cb\u003esee Table\u0026nbsp;1 and\u003c/b\u003e Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eGene prediction using AUGUSTUS identified 33,712 protein-coding transcripts distributed across the assembled contigs [46], \u003cb\u003eSee Supplementary_file_2.\u003c/b\u003e However, this number likely reflects transcript fragmentation and redundancy associated with the draft nature of the assembly. To minimize annotation artifacts and ensure biological reliability, all downstream comparative and evolutionary analyses were restricted to a high-confidence subset of approximately 3,700\u0026ndash;4,000 genes supported by BUSCO completeness and functional annotation. Assembly completeness assessed with BUSCO recovered 3,791 complete single-copy orthologs and only 9 duplicated BUSCOs, indicating low redundancy [47]. Additionally, 17 fragmented BUSCOs and 4,521 missing BUSCOs were detected, yielding a total of 8,338 BUSCO hits. Although overall BUSCO completeness is lower than that of chromosome-level assemblies, this pattern is consistent with fragmented avian draft genomes lacking chromosomal scaffolding and reflects the challenges associated with assembling microchromosomes, repetitive regions, and divergent loci rather than major deficiencies in sequencing depth or assembly quality. \u003cb\u003eSee\u003c/b\u003e Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003e \u003cb\u003eand Supplementary_file_3\u003c/b\u003e. Functional annotation with InterProScan and BUSCO successfully assigned protein domains (\u003cb\u003eSee Supplementary_file_4\u003c/b\u003e) and Gene Ontology (GO) terms of predicted transcripts, supporting the biological completeness and functional relevance of the assembly [48,49] \u003cb\u003esee Figure_supplementary_1 and Supplementary_file_5\u003c/b\u003e.\u003c/p\u003e \u003c/div\u003e\n\u003ch3\u003eGenome size estimation and genomic features\u003c/h3\u003e\n\u003cp\u003eGenome size estimation based on PacBio HiFi reads and k-mer frequency analysis revealed a haploid genome size of approximately 1.02 Gb for \u003cem\u003eAnas diazi\u003c/em\u003e [38,39] (\u003cb\u003eSee Table\u0026nbsp;2\u003c/b\u003e). GenomeScope2 modeling showed a high model fit (\u0026gt;\u0026thinsp;96%), indicating a robust and reliable estimate [39] (\u003cb\u003eSee\u003c/b\u003e Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003e \u003cb\u003eand Table\u0026nbsp;3\u003c/b\u003e). The genome exhibited a low repetitive content (~\u0026thinsp;7.4%) and moderate heterozygosity (~\u0026thinsp;1.1%), values consistent with other Anseriformes genomes and previously reported patterns of compact avian genome architecture [20,22,23,50].\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eImportantly, these results indicate that gene family expansions detected in \u003cem\u003eA. diazi\u003c/em\u003e occurred without a concomitant increase in overall genome size, suggesting that genomic innovation in this species is driven by localized gene duplication events rather than genome-wide expansion, a pattern previously reported across multiple avian lineages [24,26,36,51].\u003c/p\u003e\n\u003ch3\u003eGene family evolution across Anseriformes\u003c/h3\u003e\n\u003cp\u003eTo investigate gene family evolution, orthogroups inferred with OrthoFinder were analyzed using CAFE5 across five species: \u003cem\u003eAnas diazi, Anas platyrhynchos, Aix galericulata, Anser cygnoides\u003c/em\u003e, and \u003cem\u003eGallus gallus\u003c/em\u003e [29,40] \u003cb\u003eSupplementary_file_6\u003c/b\u003e. The total number and direction of significant gene family changes varied markedly among species, revealing distinct evolutionary trajectories consistent with lineage-specific rates of gene gain and loss [28,36,52], \u003cb\u003eSee Table\u0026nbsp;4\u003c/b\u003e.\u003c/p\u003e \u003cp\u003e \u003cem\u003eAnas diazi\u003c/em\u003e exhibited a strong bias toward gene family contraction, whereas \u003cem\u003eA. platyrhynchos, A. cygnoides\u003c/em\u003e, and \u003cem\u003eG. gallus\u003c/em\u003e showed predominantly expansive profiles [34,35,40]. \u003cem\u003eAix galericulata\u003c/em\u003e displayed a contraction-dominated pattern similar to \u003cem\u003eA. diazi\u003c/em\u003e, although with fewer extreme events. These results define a clear evolutionary gradient, with \u003cem\u003eA. diazi\u003c/em\u003e and \u003cem\u003eA. galericulata\u003c/em\u003e tending toward gene loss, while \u003cem\u003eA. platyrhynchos\u003c/em\u003e, \u003cem\u003eA. cygnoides\u003c/em\u003e, and especially \u003cem\u003eG. gallus\u003c/em\u003e exhibit extensive gene family expansion, a pattern consistent with differences in life history, domestication, and long-term effective population size [26,36,53] \u003cb\u003eSupplementary_file_6\u003c/b\u003e. This contrast is visually summarized in Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e6\u003c/span\u003e, which shows both the absolute number of gene family expansions and contractions per species (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e6\u003c/span\u003eA) and the relative directionality of gene family size changes across lineages (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e6\u003c/span\u003eB). Together, these representations highlight the contraction-dominated profile of \u003cem\u003eAnas diazi\u003c/em\u003e and \u003cem\u003eAix galericulata\u003c/em\u003e relative to the expansion-biased trajectories of \u003cem\u003eAnas platyrhynchos, Anser cygnoides\u003c/em\u003e, and \u003cem\u003eGallus gallus\u003c/em\u003e.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003cb\u003eGene family expansion and contraction in\u003c/b\u003e \u003cb\u003eAnas diazi\u003c/b\u003e\u003c/p\u003e \u003cp\u003eIn \u003cem\u003eAnas diazi\u003c/em\u003e, CAFE5 detected 51 gene families with significant evolutionary changes, comprising 6 expanded and 45 contracted orthogroups (\u003cb\u003eTable\u0026nbsp;5\u003c/b\u003e, Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e4\u003c/span\u003e) [40,52]. This asymmetrical pattern contrasts sharply with other analyzed Anseriformes, which displayed higher numbers of expansions than contractions, consistent with previously reported lineage-specific differences in gene turnover rates [28,36].\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eExpanded families in A. diazi exhibited large effect sizes, including OG0000006 (+\u0026thinsp;103 copies), OG0000052 (+\u0026thinsp;62), OG0000150 (+\u0026thinsp;44), OG0000348 (+\u0026thinsp;30), OG0000363 (+\u0026thinsp;29), and OG0000734 (+\u0026thinsp;14), all with highly significant P-values [40] \u003cb\u003eTable\u0026nbsp;5\u003c/b\u003e. While these large effect sizes suggest pronounced lineage-specific expansion, we acknowledge that assembly fragmentation and the proliferation of transposable element\u0026ndash;derived sequences may contribute to inflated copy number estimates, and therefore these values should be interpreted cautiously. In contrast, contracted families showed a broad distribution of moderate to strong reductions, typically ranging from \u0026minus;\u0026thinsp;8 to \u0026minus;\u0026thinsp;15 gene copies (\u003cb\u003eTable\u0026nbsp;5\u003c/b\u003e, Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e4\u003c/span\u003e\u003cb\u003e)\u003c/b\u003e, indicating sustained gene loss across multiple functional categories, a pattern consistent with long-term purifying selection and reduced effective population size in non-domesticated lineages [36,53,54].\u003c/p\u003e\n\u003ch3\u003eComparative patterns of gene family evolution\u003c/h3\u003e\n\u003cp\u003eMultispecies comparison revealed that \u003cem\u003eA. diazi\u003c/em\u003e possesses the most contraction-biased gene family profile among the analyzed Anseriformes [36,40,52]. \u003cem\u003eAnas platyrhynchos\u003c/em\u003e exhibited extensive expansions across numerous orthogroups, consistent with its domestication history and broad ecological plasticity [34,35,55]. \u003cem\u003eAnser cygnoides\u003c/em\u003e also showed expansion-dominated dynamics, including several families with gains exceeding\u0026thinsp;+\u0026thinsp;30 copies, a pattern previously associated with domestication and intensive artificial selection [56]. \u003cem\u003eGallus gallus\u003c/em\u003e presented the most extreme expansion profile, with multiple gene families exhibiting deep duplications, reflecting strong artificial selection pressures [55,57] \u003cb\u003eSupplementary_file_6\u003c/b\u003e.\u003c/p\u003e \u003cp\u003eSeveral orthogroups displayed opposite evolutionary trends among species, with expansions in \u003cem\u003eA. platyrhynchos\u003c/em\u003e or \u003cem\u003eA. cygnoides\u003c/em\u003e and contractions in \u003cem\u003eA. diazi\u003c/em\u003e and \u003cem\u003eA. galericulata\u003c/em\u003e, suggesting divergent selective regimes and lineage-specific functional optimization driven by differences in ecology, demographic history, and human-mediated selection [36,53,54,58].\u003c/p\u003e\n\u003ch3\u003eFunctional annotation of expanded and contracted gene families\u003c/h3\u003e\n\u003cp\u003eFunctional annotation of gene families was performed using InterProScan and eggNOG-mapper based on protein sequences of \u003cem\u003eAnas diazi\u003c/em\u003e [48,59]. Due to the absence of a reference genome, all annotations were inferred by homology and are used strictly as functional descriptors, a common approach for non-model avian genomes [12,36].\u003c/p\u003e \u003cp\u003eExpanded gene families were strongly enriched in functions related to RNA-mediated transposition and retroelement activity, including domains such as RNase H, reverse transcriptase (RVT_1), and rve/integrase [23,24,60]. This functional coherence indicates that the major expansion events in \u003cem\u003eA. diazi\u003c/em\u003e are driven primarily by mobile genetic elements or genes derived from them, a pattern previously reported in compact avian genomes where transposable element activity occurs in localized bursts rather than genome-wide proliferation [23,50,61].\u003c/p\u003e \u003cp\u003eIn contrast, contracted gene families were functionally heterogeneous. \u003cb\u003eSee\u003c/b\u003e Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e5\u003c/span\u003e. Comparative functional patterns between Anas diazi and the other species are summarized in Fig.\u0026nbsp;\u003cspan refid=\"Fig7\" class=\"InternalRef\"\u003e7\u003c/span\u003e. Many orthogroups were annotated as uncharacterized or hypothetical proteins, consistent with lineage-specific divergence or rapid sequence evolution [36,53]. Among annotated functions, recurrent categories included cytoskeletal and structural proteins, membrane-associated and transport-related components, mitochondrial and metabolic proteins, and low-frequency regulatory and signaling proteins [15,25,62]. Notably, no enrichment of transposable element\u0026ndash;related domains was detected among contracted families, indicating that gene loss in \u003cem\u003eA. diazi\u003c/em\u003e reflects diffuse functional reduction rather than genome-wide purging of repetitive elements [24,26,50].\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003cb\u003eFunctional landscape of the\u003c/b\u003e \u003cb\u003eAnas diazi\u003c/b\u003e \u003cb\u003egenome\u003c/b\u003e\u003c/p\u003e \u003cp\u003eGlobal functional annotation of high-confidence predicted genes revealed a diverse repertoire involved in core cellular, metabolic, and regulatory processes [48,59]. Gene Ontology, KEGG, and Reactome enrichment analyses consistently identified pathways associated with cellular metabolism, signal transduction, nucleotide metabolism, DNA repair, and neuroendocrine signaling [49,63\u0026ndash;65]. In particular, G protein\u0026ndash;coupled receptor\u0026ndash;mediated pathways were recurrently enriched, reflecting the conserved importance of sensory, behavioral, and physiological regulation in avian genomes [66\u0026ndash;68]. \u003cb\u003eSee Figure Supplementary 1\u003c/b\u003e.\u003c/p\u003e \u003cp\u003eTogether, these results demonstrate that the \u003cem\u003eAnas diazi\u003c/em\u003e genome is functionally complete and biologically coherent, while exhibiting a distinctive evolutionary signature characterized by a compact genome, dominant gene family contraction, and localized expansion of transposable element\u0026ndash;related families [23,24,36,50,61].\u003c/p\u003e"},{"header":"Discussion","content":"\u003cp\u003e \u003cb\u003eGenome architecture and assembly quality of\u003c/b\u003e \u003cb\u003eAnas diazi\u003c/b\u003e\u003c/p\u003e \u003cp\u003eIn this study, we present the first whole-genome assembly and comparative genomic analysis of the Mexican duck (\u003cem\u003eAnas diazi\u003c/em\u003e), providing a foundational genomic resource for this endemic and threatened waterfowl species. The de novo assembly generated from PacBio HiFi long reads yielded a genome size of approximately 1.06 Gb, which is well within the expected range for avian genomes and closely matches genome size estimates obtained through k-mer\u0026ndash;based modeling (~\u0026thinsp;1.02 Gb) [20,22,38,39]. This concordance between assembly length and independent genome size estimation supports the overall reliability of the assembly and is consistent with best practices for validating draft genomes in non-model species [12,50,69]. Although BUSCO completeness was moderate, this reflects the draft nature of the assembly and the well-known challenges associated with avian microchromosomes and repetitive regions. Importantly, all comparative genomic analyses were performed using a filtered, high-confidence gene set supported by BUSCO and functional annotation, ensuring that downstream evolutionary inferences were not driven by fragmented or low-confidence gene models.\u003c/p\u003e \u003cp\u003eAlthough the assembly remains fragmented relative to chromosome-level bird genomes, contiguity metrics (N50\u0026thinsp;\u0026asymp;\u0026thinsp;486 kb) and BUSCO completeness values indicate that the assembly captures the majority of conserved avian genes [45,47]. As observed in other non-model bird species assembled without Hi-C or linkage maps, fragmentation likely reflects the abundance of repetitive regions, the presence of numerous microchromosomes, and residual heterozygosity rather than major deficiencies in sequencing depth or assembly strategy [12,45,62,70]. Importantly, the high alignment rate of reads to the \u003cem\u003eAnas platyrhynchos\u003c/em\u003e reference genome (~\u0026thinsp;87%) further supports the accuracy of the assembly and its suitability for comparative genomic analyses within Anatidae [18,44].\u003c/p\u003e\n\u003ch3\u003eGenome size evolution and repeat landscape in Anatidae\u003c/h3\u003e\n\u003cp\u003eAvian genomes are characteristically compact compared to those of other vertebrates, typically ranging between 0.9 and 1.3 Gb [20,21]. Within this range, Anatidae genomes tend to occupy the upper end, although substantial interspecific variation has been reported [22,26]. The genome size of \u003cem\u003eA. diazi\u003c/em\u003e estimated here (~\u0026thinsp;1.02 Gb) places it among the smaller genomes within the family, closer to the lower bound of the Anatidae range, and comparable to other wild, non-domesticated Anseriformes [22,50].\u003c/p\u003e \u003cp\u003eGenomeScope2 modeling revealed a relatively low proportion of repetitive content (~\u0026thinsp;7.4%) and moderate heterozygosity (~\u0026thinsp;1.1%), values that are consistent with other wild Anseriformes and contrast with patterns observed in some domesticated or intensively selected bird lineages [26,50,55]. Notably, despite the detection of several large gene family expansions, overall genome size remains compact, indicating that these expansions occurred without large-scale genome inflation. This pattern suggests that genome size evolution in A. diazi is shaped primarily by localized gene duplication and loss rather than by widespread accumulation of repetitive elements or segmental duplications [23,24,50,61].\u003c/p\u003e \u003cp\u003eThese results support previous hypotheses proposing that variation in avian genome size is driven more by differences in repetitive DNA dynamics than by changes in gene number, and that strong selective constraints associated with flight performance, metabolic efficiency, and effective population size may limit genome expansion in wild bird lineages [21,25,26,70,71].\u003c/p\u003e \u003cp\u003e \u003cb\u003eFunctional composition of the\u003c/b\u003e \u003cb\u003eAnas diazi\u003c/b\u003e \u003cb\u003egene repertoire\u003c/b\u003e\u003c/p\u003e \u003cp\u003eFunctional annotation revealed a diverse and biologically coherent gene repertoire involved in core cellular, metabolic, and regulatory processes. Enrichment analyses across Gene Ontology, KEGG, and Reactome databases consistently highlighted pathways related to cellular metabolism, nucleotide processing, DNA repair, and signal transduction, mirroring functional profiles reported for other avian genomes [49,63\u0026ndash;65,72].\u003c/p\u003e \u003cp\u003eA prominent and recurrent signal across all annotation frameworks was the enrichment of G protein\u0026ndash;coupled receptor (GPCR)\u0026ndash;mediated signaling pathways. GPCR-related genes identified in \u003cem\u003eA. diazi\u003c/em\u003e are involved in neuroendocrine regulation, sensory perception, feeding behavior, locomotion, and stress responses [66\u0026ndash;68,73]. These functional categories are of particular relevance for waterfowl, which rely heavily on complex behavioral and physiological regulation to cope with fluctuating wetland environments, seasonal resource availability, and social interactions [74,75].\u003c/p\u003e \u003cp\u003eTogether, these results indicate that despite the fragmented nature of the assembly, the predicted gene set of A. diazi is functionally complete and comparable to those of other avian species, validating its use for evolutionary and ecological genomic analyses within Anatidae and across birds more broadly [36,41,70,72].\u003c/p\u003e \u003cp\u003e \u003cb\u003eAsymmetric gene family evolution in\u003c/b\u003e \u003cb\u003eAnas diazi\u003c/b\u003e\u003c/p\u003e \u003cp\u003eComparative analysis of gene family evolution using CAFE5 revealed a strikingly asymmetric pattern in \u003cem\u003eA. diazi\u003c/em\u003e, characterized by a strong predominance of gene family contractions over expansions. In total, 45 gene families were significantly contracted, whereas only six families showed significant expansion along the \u003cem\u003eA. diazi\u003c/em\u003e lineage [40,52]. This contraction-dominated profile contrasts sharply with patterns observed in other analyzed Anseriformes, particularly \u003cem\u003eAnas platyrhynchos\u003c/em\u003e and \u003cem\u003eAnser cygnoides\u003c/em\u003e, which exhibited expansion-biased dynamics. \u003cb\u003eSee\u003c/b\u003e Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e6\u003c/span\u003e[34,35,56].\u003c/p\u003e \u003cp\u003eThe contraction profile of \u003cem\u003eA. diazi\u003c/em\u003e suggests a lineage-specific process of gene repertoire reduction or depuration. Similar patterns have been reported in other wild or ecologically specialized bird species and are often interpreted as the result of long-term stabilizing selection, reduced effective population size, or ecological specialization [36,53,76]. Given the restricted geographic distribution and relatively small population size of \u003cem\u003eA. diazi\u003c/em\u003e, gene family contraction may reflect historical demographic constraints combined with local adaptation to the inland wetlands of the Mexican Plateau, rather than recent anthropogenic pressures or domestication-related selection [25,26,77]. This contraction-dominated pattern likely reflects long-term demographic constraints and ecological specialization rather than recent anthropogenic selection or domestication-related processes, consistent with the species\u0026rsquo; restricted distribution and endemism.\u003c/p\u003e\n\u003ch3\u003eFunctional contrasts between expanded and contracted gene families\u003c/h3\u003e\n\u003cp\u003eFunctional annotation revealed a clear asymmetry between expanded and contracted gene families in \u003cem\u003eA. diazi\u003c/em\u003e. Expanded families were overwhelmingly dominated by domains associated with retrotransposition and mobile genetic elements, including reverse transcriptase (RVT_1), RNase H, and integrase-related domains [23,24,60,61]. This pattern indicates that the most pronounced expansions in \u003cem\u003eA. diazi\u003c/em\u003e are driven primarily by transposable element\u0026ndash;related activity or by genes derived from such elements, consistent with previous observations in avian genomes where TE activity occurs in episodic, lineage-specific bursts rather than through genome-wide expansion [23,50,61,78]. Because these expansions are largely TE-derived, they are unlikely to reflect functional innovation in core biological pathways but instead represent localized genomic activity consistent with episodic transposable element dynamics in compact avian genomes.\u003c/p\u003e \u003cp\u003eIn contrast, contracted gene families were functionally heterogeneous and lacked enrichment for any single biological pathway. Contracted orthogroups included genes associated with cellular structure, membrane components, transport processes, mitochondrial function, and general metabolic activity [15,25,36,62]. Importantly, no signal of transposable element\u0026ndash;related domain enrichment was detected among contracted families, suggesting that gene loss is not simply a byproduct of genome-wide TE purging, but rather reflects diffuse, localized gene loss across diverse functional categories [24,26,54].\u003c/p\u003e \u003cp\u003eThis functional asymmetry suggests that distinct evolutionary mechanisms underlie gene family gains and losses in A. diazi, with expansions driven primarily by mobile genetic elements and contractions reflecting gradual reduction across multiple cellular processes, likely shaped by long-term demographic history and stabilizing selection rather than episodic directional selection [36,53,77,79].\u003c/p\u003e \u003cdiv id=\"Sec11\" class=\"Section2\"\u003e \u003ch2\u003eComparative evolutionary trajectories within Anseriformes\u003c/h2\u003e \u003cp\u003eWhen placed in a broader phylogenetic context, \u003cem\u003eA. diazi\u003c/em\u003e clusters with \u003cem\u003eAix galericulata\u003c/em\u003e in exhibiting contraction-dominated gene family evolution, whereas \u003cem\u003eA. platyrhynchos\u003c/em\u003e and \u003cem\u003eAnser cygnoides\u003c/em\u003e show expansion-biased trajectories [36,40,52]. These contrasting patterns likely reflect differences in life history, ecological breadth, demographic history, and domestication intensity across Anseriformes [25,26,53,77].\u003c/p\u003e \u003cp\u003eThe mallard (\u003cem\u003eA. platyrhynchos\u003c/em\u003e), a species with broad geographic distribution and a documented history of domestication, introgression, and human-mediated selection, displayed numerous expansions in immune, metabolic, and regulatory gene families, consistent with previous genomic studies [34,35,55]. Similarly, \u003cem\u003eA. cygnoides\u003c/em\u003e, a domesticated goose species, showed moderate but consistent gene family expansions, reflecting artificial selection and relaxed selective constraints in managed populations [56,57]. In contrast, the contraction-biased profiles of \u003cem\u003eA. diazi\u003c/em\u003e and \u003cem\u003eA. galericulata\u003c/em\u003e may reflect more specialized ecological niches, reduced effective population sizes, and limited exposure to artificial or anthropogenic selection pressures, resulting in long-term gene repertoire streamlining rather than expansion [36,53,77,79,80].\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec12\" class=\"Section2\"\u003e \u003ch2\u003eEvolutionary and conservation implications\u003c/h2\u003e \u003cp\u003eThe genomic patterns described here suggest that the evolutionary history of \u003cem\u003eAnas diazi\u003c/em\u003e has been shaped by a combination of genome size stability, pervasive gene family depuration, and limited but pronounced expansion of mobile element\u0026ndash;related sequences [25,36,53,77]. This constellation of features is consistent with long-term adaptation to a relatively stable but geographically restricted ecological niche, combined with demographic constraints associated with endemism, reduced effective population size, and limited opportunities for range expansion [26,80,81]. Similar contraction-dominated genomic trajectories have been reported in other wild bird species with specialized ecologies, supporting the hypothesis that genome streamlining may be favored under persistent stabilizing selection [36,53,82].\u003c/p\u003e \u003cp\u003eFrom a conservation perspective, the availability of a reference genome and the identification of lineage-specific genomic features provide essential resources for future studies addressing population structure, local adaptation, introgressive hybridization with \u003cem\u003eA. platyrhynchos\u003c/em\u003e, and the genomic basis of ecological specialization [55,83\u0026ndash;85]. In particular, the ability to distinguish species-specific gene family contractions and expansions may facilitate the detection of adaptive versus neutral variation in conservation genomics frameworks. More broadly, this study contributes to a growing body of evidence indicating that wild, non-domesticated waterfowl follow evolutionary genomic trajectories that differ fundamentally from those of their domesticated relatives, underscoring the importance of including endemic and understudied taxa in comparative and conservation-oriented genomics [34,35,56,57,79,86].\u003c/p\u003e \u003c/div\u003e"},{"header":"Conclusions","content":"\u003cp\u003eThis study presents the first whole-genome assembly and comparative genomic analysis of the Mexican duck (\u003cem\u003eAnas diazi\u003c/em\u003e), providing a foundational genomic resource for an endemic and understudied waterfowl species. Using PacBio HiFi long-read sequencing, we generated a de novo assembly that captures the majority of conserved avian genes and yields a genome size estimate consistent with other members of Anatidae.\u003c/p\u003e \u003cp\u003eDespite its compact genome and low repetitive content, \u003cem\u003eA. diazi\u003c/em\u003e exhibits a strikingly asymmetric pattern of gene family evolution, characterized by widespread contraction and limited but pronounced expansion of a small number of families. Notably, expanded gene families are overwhelmingly associated with retrotransposition-related domains, indicating that genomic innovation in this species is driven primarily by localized activity of mobile-element\u0026ndash;derived sequences rather than genome-wide expansion.\u003c/p\u003e \u003cp\u003eComparative analyses across Anseriformes place \u003cem\u003eA. diazi\u003c/em\u003e alongside \u003cem\u003eAix galericulata\u003c/em\u003e as contraction-dominated lineages, in contrast to the expansion-biased profiles observed in domesticated or ecologically generalist species such as \u003cem\u003eAnas platyrhynchos\u003c/em\u003e, \u003cem\u003eAnser cygnoides\u003c/em\u003e, and \u003cem\u003eGallus gallus\u003c/em\u003e. These contrasting evolutionary trajectories suggest that gene family depuration may be associated with ecological specialization, demographic history, and the absence of artificial selection.\u003c/p\u003e \u003cp\u003eBeyond its evolutionary insights, the genomic resource presented here provides an essential foundation for future studies on population genomics, local adaptation, and hybridization in \u003cem\u003eA. diazi\u003c/em\u003e. More broadly, this work underscores the importance of including endemic and non-model species in comparative genomics to fully capture the diversity of evolutionary processes shaping avian genomes.\u003c/p\u003e"},{"header":"Materials and Methods","content":"\u003cdiv id=\"Sec15\" class=\"Section2\"\u003e \u003ch2\u003eCollection of biological material\u003c/h2\u003e \u003cp\u003eBiological samples were obtained at the UMA Ejido Capulhuac, located in Polygon Two of the Ci\u0026eacute;negas del Lerma Flora and Fauna Protection Area (19\u0026deg;12\u0026prime;45\u0026Prime; N, 99\u0026deg;27\u0026prime;30\u0026Prime; W), State of Mexico. Blood samples were collected from a female Mexican duck (\u003cem\u003eAnas diazi\u003c/em\u003e; Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e) by venipuncture of the brachial (ulnar) vein using a 23 G hypodermic needle and a 3 mL syringe, following standard avian sampling protocols [87,88]. Blood was immediately transferred to EDTA-coated microtainer tubes to prevent coagulation and preserve nucleic acid integrity.\u003c/p\u003e \u003cp\u003e Sampling was conducted during the 2024 hunting season through legally authorized hunting activities, in accordance with Mexican wildlife regulations and ethical guidelines for the use of vertebrates in research [89]. Collected samples were preserved in liquid nitrogen and transported to the Molecular Genetics Laboratory at the Instituto de Ecolog\u0026iacute;a, Universidad Nacional Aut\u0026oacute;noma de M\u0026eacute;xico (UNAM), where they were stored until processing.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec16\" class=\"Section2\"\u003e \u003ch2\u003eDNA extraction and quality assessment\u003c/h2\u003e \u003cp\u003eGenomic DNA was extracted using the DNeasy Blood \u0026amp; Tissue Kit (Qiagen), following the manufacturer\u0026rsquo;s protocol for whole blood samples, a method widely used for high-quality avian genomic DNA isolation [90]. DNA integrity and molecular weight were assessed by electrophoresis on 0.8% agarose gels using high\u0026ndash;molecular weight DNA markers (25 kb and 50 kb), ensuring suitability for long-read sequencing [91].\u003c/p\u003e \u003cp\u003eDNA concentration was measured fluorometrically using the Qubit Broad Range (BR) dsDNA kit and a Qubit 3 fluorometer (Invitrogen), while DNA purity was evaluated using NanoDrop Lite spectrophotometer readings (Thermo Scientific), assessing 260/280 and 260/230 absorbance ratios [92]. Prior to quantification and visualization, DNA samples were treated with RNase A to remove residual RNA. High-quality DNA was dehydrated using a Savant\u0026reg; DNA SpeedVac\u0026reg; system (Thermo Scientific) prior to shipment for sequencing.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec17\" class=\"Section2\"\u003e \u003ch2\u003eGenome sequencing and quality control\u003c/h2\u003e \u003cp\u003eHigh-molecular-weight genomic DNA was sent to Innomics Inc. (Sunnyvale, CA, USA) for PacBio HiFi library construction and sequencing. Libraries were prepared according to Pacific Biosciences specifications and sequenced on the PacBio Revio platform using extended read chemistry, generating HiFi reads with expected lengths of approximately 12\u0026ndash;18 kb and high per-base accuracy [93,94].\u003c/p\u003e \u003cp\u003eRaw sequencing reads were evaluated using FastQC to assess base quality, read length distribution, GC content, and overall sequencing performance [95]. These metrics guided subsequent filtering and assembly strategies.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec18\" class=\"Section2\"\u003e \u003ch2\u003eRead filtering and genome assembly\u003c/h2\u003e \u003cp\u003eInitial read filtering was performed using Trimmomatic to remove low-quality and excessively short reads [96]. In parallel, raw reads were independently filtered using QIAGEN CLC Genomics Workbench version 25 to confirm read quality and consistency across software pipelines.\u003c/p\u003e \u003cp\u003eTwo assembly strategies were explored. First, an exploratory assembly was generated using SPAdes configured for long-read data [97]. Second, and ultimately used for downstream analyses, a de novo assembly was performed using the HiFi long-read genome assembly module implemented in QIAGEN CLC Genomics Workbench version 25, optimized for PacBio HiFi data.\u003c/p\u003e \u003cp\u003eAssembly quality and contiguity metrics were evaluated using QUAST [98]. To assess overall genomic similarity and read mapping efficiency, filtered reads were aligned to the Anas platyrhynchos reference genome (GCF_015476345.1) using Bowtie2 [99], and independently using QIAGEN CLC Genomics Workbench.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec19\" class=\"Section2\"\u003e \u003ch2\u003eGenome size estimation\u003c/h2\u003e \u003cp\u003eGenome size estimation was performed using k-mer frequency analysis based on PacBio HiFi reads. K-mer histograms were generated and modeled using GenomeScope2 to estimate haploid genome size, heterozygosity, and repetitive content [100]. Model fit values were used to assess the reliability and robustness of genome size estimates.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec20\" class=\"Section2\"\u003e \u003ch2\u003eGenome annotation and functional characterization\u003c/h2\u003e \u003cp\u003eStructural annotation of the \u003cem\u003eAnas diazi\u003c/em\u003e genome was performed using AUGUSTUS, with gene models trained using the \u003cem\u003eGallus gallus\u003c/em\u003e reference annotation as a guide [101]. Independent annotation was also tested within QIAGEN CLC Genomics Workbench version 25 for comparison and validation.\u003c/p\u003e \u003cp\u003eFunctional annotation of predicted protein-coding genes was conducted using InterProScan, enabling the identification of conserved protein domains and the assignment of Gene Ontology (GO) terms [102]. Annotation completeness and redundancy were evaluated using BUSCO with the appropriate avian lineage dataset [103].\u003c/p\u003e \u003cp\u003eGene Ontology enrichment analyses and interaction-based visualization of annotated genes were performed using STRING v12.0 [104]. All functional annotations were inferred by homology, as \u003cem\u003eAnas diazi\u003c/em\u003e lacks a curated reference genome.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec21\" class=\"Section2\"\u003e \u003ch2\u003eOrthology inference and gene family evolution analysis\u003c/h2\u003e \u003cp\u003eOrthologous gene families were inferred using OrthoFinder, based on predicted protein sequences from \u003cem\u003eAnas diazi, Anas platyrhynchos, Aix galericulata, Anser cygnoides\u003c/em\u003e, and \u003cem\u003eGallus gallus\u003c/em\u003e [105]. Single-copy orthologs were used to generate a species phylogeny, which was subsequently rendered ultrametric for downstream evolutionary modeling.\u003c/p\u003e \u003cp\u003eGene family expansion and contraction analyses were performed using CAFE5 under a probabilistic birth\u0026ndash;death model [106]. Significant changes in gene family size were identified using branch-specific likelihood estimates and corrected p-values. Expanded and contracted orthogroups identified in \u003cem\u003eAnas diazi\u003c/em\u003e were further subjected to functional annotation using InterProScan and eggNOG-mapper to characterize the biological processes underlying gene family evolution [107].\u003c/p\u003e \u003c/div\u003e"},{"header":"Declarations","content":"\u003cp\u003e \u003ch2\u003eDeclaration of competing interest\u003c/h2\u003e \u003cp\u003eThe authors declare no competing interests.\u003c/p\u003e \u003c/p\u003e\u003cp\u003e \u003ch2\u003eEthics statement\u003c/h2\u003e \u003cp\u003eNo experimental procedures were performed on live animals in this study. Biological samples (blood and liver tissue) were obtained from a single female Mexican duck (\u003cem\u003eAnas diazi\u003c/em\u003e) that was legally harvested during the authorized hunting season within a registered Wildlife Management Unit (Unidad de Manejo para la Conservaci\u0026oacute;n de la Vida Silvestre, UMA Ejido Capulhuac), located in Polygon Two of the Ci\u0026eacute;negas del Lerma Flora and Fauna Protection Area, State of Mexico.\u003c/p\u003e \u003cp\u003eScientific collection authorization was granted by the Direcci\u0026oacute;n General de Vida Silvestre, Secretar\u0026iacute;a de Medio Ambiente y Recursos Naturales (SEMARNAT), Government of Mexico, under permit number SPARN/DGVS/09070/24 (dated August 1, 2024), issued to the first author. Because no live animal experimentation, capture, anesthesia, or euthanasia was performed by the research team, this study did not require institutional animal care or use committee (IACUC) or IRB approval.\u003c/p\u003e \u003c/p\u003e\u003ch2\u003eFunding Declaration\u003c/h2\u003e \u003cp\u003ePatricia Padilla-Aguilar received a postdoctoral fellowship from the Council of Science, Technology and Innovation of Hidalgo, Mexico (CITNOVA), for the completion of this work.\u003c/p\u003e\u003ch2\u003eAuthor Contribution\u003c/h2\u003e\u003cp\u003ePadilla-Aguilar Patricia, Bravo-Vinaja Mar\u0026iacute;a Guadalupe, Col\u0026oacute;n-Quezada David, Contreras-Jim\u0026eacute;nez Gast\u0026oacute;n, Solano-De la Cruz Marco Tulio, conceived and designed the experiments.Padilla-Aguilar Patricia, Bravo-Vinaja Mar\u0026iacute;a Guadalupe, Col\u0026oacute;n-Quezada David, Solano-De la Cruz Marco Tulio collected biological samples.Padilla-Aguilar Patricia, Bravo-Vinaja Mar\u0026iacute;a Guadalupe, Col\u0026oacute;n-Quezada David, Contreras-Jim\u0026eacute;nez Gast\u0026oacute;n, Solano-De la Cruz Marco Tulio, performed the experiments and analyzed the data.Padilla-Aguilar Patricia, Bravo-Vinaja Mar\u0026iacute;a Guadalupe, Col\u0026oacute;n-Quezada David, Contreras-Jim\u0026eacute;nez Gast\u0026oacute;n, Solano-De la Cruz Marco Tulio, wrote the article. All the authors read and approved the manuscript.\u003c/p\u003e\u003ch2\u003eAcknowledgement\u003c/h2\u003e\u003cp\u003eSpecial thanks to the Consejo de Ciencia, Tecnolog\u0026iacute;a e Innovaci\u0026oacute;n de Hidalgo (CITNOVA) for the grant awarded to carry out the postdoctoral stay; without this financial resource, this research would not have been possible. We would also like to thank the hunters who kindly donated the duck used for this study. We would like to thank the Direcci\u0026oacute;n General de Vida Silvestre y Secretar\u0026iacute;a del Medio Ambiente y Recursos Naturales (SEMARNAT) for the permission granted for this study (SPARN/DGVS/09070/24). Finally, we would like to thank Mr. Tom\u0026aacute;s Ram\u0026iacute;rez Bar\u0026oacute;n, Mr. Roberto Reza and Engineer Jaqueline Delgado Nava for all the support provided for the collection of biological samples.\u003c/p\u003e\u003ch2\u003eData Availability\u003c/h2\u003e\u003cp\u003eThe data that support the findings of this study are available from the corresponding author upon reasonable request. The Raw data for this study can be found in the NCBI SRA repository with accession number BioProject **PRJNA1212198** , BioSample accession SAMN46294654.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\n\u003cli\u003eCarboneras C, Kirwan GM. Family Anatidae (ducks, geese and swans). In: del Hoyo J, Elliott A, Sargatal J, Christie DA, de Juana E, editors. Handbook of the Birds of the World Alive. Barcelona: Lynx Edicions; 2018.\u003c/li\u003e\n\u003cli\u003eNavarro-Sig\u0026uuml;enza AG, Reb\u0026oacute;n-Gallardo MF, Gordillo-Mart\u0026iacute;nez A, Townsend Peterson A, Berlanga-Garc\u0026iacute;a H, S\u0026aacute;nchez-Gonz\u0026aacute;lez LA. Biodiversidad de aves en M\u0026eacute;xico. Rev Mex Biodivers. 2014;85(Suppl):S476\u0026ndash;S495.\u003c/li\u003e\n\u003cli\u003eHowell SNG, Webb S. A Guide to the Birds of Mexico and Northern Central America. Oxford: Oxford University Press; 1995.\u003c/li\u003e\n\u003cli\u003eSecretar\u0026iacute;a de Medio Ambiente y Recursos Naturales (SEMARNAT). Norma Oficial Mexicana NOM-059-SEMARNAT-2010, Protecci\u0026oacute;n ambiental\u0026ndash;Especies nativas de M\u0026eacute;xico de flora y fauna silvestres\u0026ndash;Categor\u0026iacute;as de riesgo y especificaciones para su inclusi\u0026oacute;n, exclusi\u0026oacute;n o cambio\u0026ndash;Lista de especies en riesgo. Diario Oficial de la Federaci\u0026oacute;n; 2010.\u003c/li\u003e\n\u003cli\u003eBirdLife International. Species factsheet: Anas diazi. BirdLife International; 2023.\u003c/li\u003e\n\u003cli\u003eJohnsgard PA. Ducks, Geese, and Swans of the World. Lincoln: University of Nebraska Press; 1978.\u003c/li\u003e\n\u003cli\u003eHubbard JP. The status of the Mexican Duck (Anas platyrhynchos diazi). Auk. 1977;94:554\u0026ndash;566.\u003c/li\u003e\n\u003cli\u003eLivezey BC. Phylogenetic relationships of dabbling ducks (tribe Anatini). Auk. 1991;108:471\u0026ndash;507.\u003c/li\u003e\n\u003cli\u003eLavretsky P, McCracken KG. To hybridize or not to hybridize? A case study of North American dabbling ducks. Biol J Linn Soc. 2013;108:813\u0026ndash;829.\u003c/li\u003e\n\u003cli\u003eChesser RT, Burns KJ, Cicero C, Dunn JL, Kratter AW, Lovette IJ, et al. Checklist of North American Birds of the American Ornithological Society. Auk. 2020;137:1\u0026ndash;23.\u003c/li\u003e\n\u003cli\u003eLavretsky P, Peters JL, Winker K, McCracken KG. Phylogenomics of modern waterfowl (Anseriformes) using target enrichment. Mol Phylogenet Evol. 2020;142:106646.\u003c/li\u003e\n\u003cli\u003eRhie A, McCarthy SA, Fedrigo O, Damas J, Formenti G, Koren S, et al. Towards complete and error-free genome assemblies of all vertebrate species. Nature. 2021;592:737\u0026ndash;746.\u003c/li\u003e\n\u003cli\u003eZhang G, Li B, Li C, Gilbert MTP, Jarvis ED, Wang J. Comparative genomics reveals insights into avian genome evolution and adaptation. Science. 2014;346:1311\u0026ndash;1320.\u003c/li\u003e\n\u003cli\u003eFeng S, Stiller J, Deng Y, Armstrong J, Fang Q, Reeve AH, et al. Dense sampling of bird diversity increases power of comparative genomics. Nature. 2020;587:252\u0026ndash;257.\u003c/li\u003e\n\u003cli\u003eWarren WC, Hillier LW, Tomlinson C, Minx P, Kremitzki M, Graves T, et al. A New Chicken Genome Assembly Provides Insight into Avian Genome Structure. G3 (Bethesda). 2017;7:109\u0026ndash;117.\u003c/li\u003e\n\u003cli\u003eLovell PV, Wirthlin M, Wilhelm L, Minx P, Lazar NH, Carbone L, et al. Conserved syntenic clusters of protein coding genes are missing in birds. Genome Biol. 2014;15:565.\u003c/li\u003e\n\u003cli\u003eSun YB, Zhou WP, Liu HQ, Irwin DM, Zhang YP. Diversity and evolution of avian genomes. Mol Ecol. 2019;28:4241\u0026ndash;4257.\u003c/li\u003e\n\u003cli\u003eZhou C, Wang N, Wang L, Han L, Zhang Y, Sun Z, et al. A chromosome-level genome assembly of the mallard (Anas platyrhynchos). Gigascience. 2021;10:giaa162.\u003c/li\u003e\n\u003cli\u003eMcCracken KG, Lavretsky P, Peters JL. Population genomic insights into the evolutionary history of North American dabbling ducks. Mol Ecol. 2016;25:3623\u0026ndash;3640.\u003c/li\u003e\n\u003cli\u003eGregory TR. Genome size evolution in animals. In: Gregory TR, editor. The Evolution of the Genome. San Diego: Elsevier; 2005. p. 3\u0026ndash;87.\u003c/li\u003e\n\u003cli\u003eZhang Q, Edwards SV. The evolution of intron size in amniotes: a role for powered flight? Genome Biol Evol. 2012;4:1033\u0026ndash;1043.\u003c/li\u003e\n\u003cli\u003eWright NA, Gregory TR, Witt CC. Metabolic \u0026lsquo;engines\u0026rsquo; of flight drive genome size reduction in birds. Proc R Soc B. 2014;281:20132780.\u003c/li\u003e\n\u003cli\u003eKapusta A, Suh A. Evolution of bird genomes\u0026mdash;a transposon\u0026rsquo;s-eye view. Ann N Y Acad Sci. 2017;1389:164\u0026ndash;185.\u003c/li\u003e\n\u003cli\u003eSotero-Caio CG, Platt RN II, Suh A, Ray DA. Evolution and diversity of transposable elements in vertebrate genomes. Genome Biol Evol. 2017;9:161\u0026ndash;177.\u003c/li\u003e\n\u003cli\u003eOrgan CL, Shedlock AM, Meade A, Pagel M, Edwards SV. Origin of avian genome size and structure in non-avian dinosaurs. Nature. 2007;446:180\u0026ndash;184.\u003c/li\u003e\n\u003cli\u003eWright NA, Gregory TR. Determinants of genome size variation in birds. Genome Biol Evol. 2017;9:245\u0026ndash;257.\u003c/li\u003e\n\u003cli\u003eDufresnes C, B\u0026eacute;ziers P, Litvinchuk SN, Crochet PA. Genome size variation in birds: unresolved patterns and neglected taxa. J Avian Biol. 2021;52:e02768.\u003c/li\u003e\n\u003cli\u003eHahn MW, Demuth JP, Han SG. Accelerated rate of gene gain and loss in primates. Genetics. 2007;177:1941\u0026ndash;1949.\u003c/li\u003e\n\u003cli\u003eEmms DM, Kelly S. OrthoFinder: phylogenetic orthology inference for comparative genomics. Genome Biol. 2019;20:238.\u003c/li\u003e\n\u003cli\u003eHan MV, Thomas GWC, Lugo-Martinez J, Hahn MW. Estimating gene gain and loss rates in the presence of error in genome assembly and annotation using CAFE 3. Mol Biol Evol. 2013;30:1987\u0026ndash;1997.\u003c/li\u003e\n\u003cli\u003eQian W, Zhang J. Genomic evidence for adaptation by gene duplication. Genome Res. 2014;24:1356\u0026ndash;1362.\u003c/li\u003e\n\u003cli\u003eMcCracken KG, Barger CP, Bulgarella M, Johnson KP, Kuhner MK, Moore AV, et al. Parallel evolution in the major haemoglobin genes of eight species of Andean waterfowl. Mol Ecol. 2009;18:3992\u0026ndash;4005.\u003c/li\u003e\n\u003cli\u003eLavretsky P, Peters JL, McCracken KG. Population genomics of divergence and admixture between closely related species of North American dabbling ducks. Mol Ecol. 2019;28:265\u0026ndash;281.\u003c/li\u003e\n\u003cli\u003eHuang Y, Li Y, Burt DW, Chen H, Zhang Y, Qian W, et al. The duck genome and transcriptome provide insight into an avian influenza virus reservoir species. Nat Genet. 2013;45:776\u0026ndash;783.\u003c/li\u003e\n\u003cli\u003eZhou Z, Li M, Cheng H, Fan W, Yuan Z, Gao Q, et al. An intercross population study reveals genes associated with body size and immune traits in ducks. BMC Genomics. 2018;19:612.\u003c/li\u003e\n\u003cli\u003eThomas GWC, Hahn MW. Gene-family evolution in mammals and birds. Annu Rev Ecol Evol Syst. 2014;45:191\u0026ndash;216.\u003c/li\u003e\n\u003cli\u003eWenger AM, Peluso P, Rowell WJ, Chang PC, Hall RJ, Concepcion GT, et al. Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome. Nat Biotechnol. 2019;37:1155\u0026ndash;1162.\u003c/li\u003e\n\u003cli\u003eVurture GW, Sedlazeck FJ, Nattestad M, Underwood CJ, Fang H, Gurtowski J, et al. GenomeScope: fast reference-free genome profiling from short reads. Bioinformatics. 2017;33:2202\u0026ndash;2204.\u003c/li\u003e\n\u003cli\u003eRanallo-Benavidez TR, Jaron KS, Schatz MC. GenomeScope 2.0 and Smudgeplot for reference-free profiling of polyploid genomes. Nat Commun. 2020;11:1432.\u003c/li\u003e\n\u003cli\u003eMendes FK, Vanderpool D, Fulton B, Hahn MW. CAFE 5 models variation in evolutionary rates among gene families. Bioinformatics. 2020;36:5516\u0026ndash;5518.\u003c/li\u003e\n\u003cli\u003eZhang G. Birds as a model system for comparative genomics. Nat Rev Genet. 2015;16:390\u0026ndash;403.\u003c/li\u003e\n\u003cli\u003eHon T, Mars K, Young G, Tsai YC, Karalius JW, Landolin JM, et al. Highly accurate long-read HiFi sequencing data for five complex genomes. Sci Data. 2020;7:399.\u003c/li\u003e\n\u003cli\u003ePacific Biosciences. SMRT\u0026reg; Link User Guide: Circular Consensus Sequencing (CCS) Analysis. PacBio; 2022.\u003c/li\u003e\n\u003cli\u003eZhou C, Wang N, Wang L, Han L, Zhang Y, Sun Z, et al. A chromosome-level genome assembly of the mallard (Anas platyrhynchos). Gigascience. 2021;10:giaa162.\u003c/li\u003e\n\u003cli\u003eKorlach J, Gedman G, Kingan SB, Chin CS, Howard JT, Audet JN, et al. De novo PacBio long-read and phased avian genome assemblies correct and add to reference genes generated with intermediate- and short-read sequencing. Gigascience. 2017;6:1\u0026ndash;16.\u003c/li\u003e\n\u003cli\u003eStanke M, Waack S. Gene prediction with a hidden Markov model and a new intron submodel. Bioinformatics. 2003;19(Suppl 2):ii215\u0026ndash;ii225.\u003c/li\u003e\n\u003cli\u003eSim\u0026atilde;o FA, Waterhouse RM, Ioannidis P, Kriventseva EV, Zdobnov EM. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics. 2015;31:3210\u0026ndash;3212.\u003c/li\u003e\n\u003cli\u003eJones P, Binns D, Chang HY, Fraser M, Li W, McAnulla C, et al. InterProScan 5: genome-scale protein function classification. Bioinformatics. 2014;30:1236\u0026ndash;1240.\u003c/li\u003e\n\u003cli\u003eAshburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, et al. Gene ontology: tool for the unification of biology. Nat Genet. 2000;25:25\u0026ndash;29.\u003c/li\u003e\n\u003cli\u003eKapusta A, Suh A, Feschotte C. Dynamics of genome size evolution in birds and mammals. Proc Natl Acad Sci U S A. 2017;114:E1460\u0026ndash;E1469.\u003c/li\u003e\n\u003cli\u003eZhang G, Jarvis ED, Gilbert MTP. A genomic perspective on the origin and evolution of birds. Genome Biol. 2014;15:502.\u003c/li\u003e\n\u003cli\u003eHahn MW, Han MV, Han SG. Gene family evolution across 12 Drosophila genomes. PLoS Genet. 2007;3:e197.\u003c/li\u003e\n\u003cli\u003eLynch M, Conery JS. The evolutionary fate and consequences of duplicate genes. Science. 2000;290:1151\u0026ndash;1155.\u003c/li\u003e\n\u003cli\u003eNei M, Rooney AP. Concerted and birth-and-death evolution of multigene families. Annu Rev Genet. 2005;39:121\u0026ndash;152.\u003c/li\u003e\n\u003cli\u003eQanbari S, Rubin CJ, Maqbool K, Weigend S, Weigend A, Geibel J, et al. Genetics of adaptation in modern chicken. PLoS Genet. 2019;15:e1007989.\u003c/li\u003e\n\u003cli\u003eWang MS, Thakur M, Peng MS, Jiang Y, Frantz LAF, Li M, et al. 863 genomes reveal the origin and domestication of chicken. Cell. 2020;180:1080\u0026ndash;1096.e6.\u003c/li\u003e\n\u003cli\u003eRubin CJ, Zody MC, Eriksson J, Meadows JRS, Sherwood E, Webster MT, et al. Whole-genome resequencing reveals loci under selection during chicken domestication. Nature. 2010;464:587\u0026ndash;591.\u003c/li\u003e\n\u003cli\u003eHelsen P, Van Den Broeck M, Van Houdt J, Volckaert FAM. Gene family evolution and adaptation in vertebrates. Mol Biol Evol. 2020;37:301\u0026ndash;315.\u003c/li\u003e\n\u003cli\u003eCantalapiedra CP, Hern\u0026aacute;ndez-Plaza A, Letunic I, Bork P, Huerta-Cepas J. eggNOG-mapper v2: functional annotation, orthology assignments, and domain prediction at the metagenomic scale. Mol Biol Evol. 2021;38:5825\u0026ndash;5829.\u003c/li\u003e\n\u003cli\u003eFeschotte C, Pritham EJ. DNA transposons and the evolution of eukaryotic genomes. Annu Rev Genet. 2007;41:331\u0026ndash;368.\u003c/li\u003e\n\u003cli\u003eSuh A, Kapusta A, Churakov G, et al. Early mesozoic coexistence of amniotes and transposable elements. Genome Res. 2014;24:1514\u0026ndash;1524.\u003c/li\u003e\n\u003cli\u003eO\u0026rsquo;Connor RE, Farr\u0026eacute; M, Joseph S, et al. Patterns of structural variation in avian genomes. Genome Res. 2019;29:1981\u0026ndash;1994.\u003c/li\u003e\n\u003cli\u003eKanehisa M, Goto S. KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res. 2000;28:27\u0026ndash;30.\u003c/li\u003e\n\u003cli\u003eKanehisa M, Sato Y, Kawashima M, Furumichi M, Tanabe M. KEGG as a reference resource for gene and protein annotation. Nucleic Acids Res. 2016;44:D457\u0026ndash;D462.\u003c/li\u003e\n\u003cli\u003eJassal B, Matthews L, Viteri G, Gong C, Lorente P, Fabregat A, et al. The Reactome pathway knowledgebase. Nucleic Acids Res. 2020;48:D498\u0026ndash;D503.\u003c/li\u003e\n\u003cli\u003eLagerstr\u0026ouml;m MC, Schi\u0026ouml;th HB. Structural diversity of G protein-coupled receptors and significance for drug discovery. Nat Rev Drug Discov. 2008;7:339\u0026ndash;357.\u003c/li\u003e\n\u003cli\u003eNordstr\u0026ouml;m KJV, Lagerstr\u0026ouml;m MC, Wall\u0026eacute;r LM, Fredriksson R, Schi\u0026ouml;th HB. The secretin GPCRs descended from the family of adhesion GPCRs. Mol Biol Evol. 2009;26:71\u0026ndash;84.\u003c/li\u003e\n\u003cli\u003eDong Y, Jones G, Zhang S. Dynamic evolution of GPCR genes in vertebrates. BMC Evol Biol. 2016;16:206.\u003c/li\u003e\n\u003cli\u003eRichards EJ, Rosas U, Banta J, Bhambhra N, Purugganan MD. \u003cstrong\u003eGenome-wide patterns of Arabidopsis gene duplication and their evolutionary implications\u003c/strong\u003e. \u003cem\u003ePLoS Genet\u003c/em\u003e. 2012;8:e1002973.\u003c/li\u003e\n\u003cli\u003eEllegren H. \u003cstrong\u003eEvolutionary stasis: the stable chromosomes of birds\u003c/strong\u003e. \u003cem\u003eTrends Ecol Evol\u003c/em\u003e. 2010;25:283\u0026ndash;291.\u003c/li\u003e\n\u003cli\u003eWright NA, Gregory TR, Witt CC. Metabolic \u0026lsquo;engines\u0026rsquo; of flight drive genome size reduction in birds. Proc R Soc B. 2014;281:20132780.\u003c/li\u003e\n\u003cli\u003eJarvis ED, Mirarab S, Aberer AJ, Li B, Houde P, Li C, et al. \u003cstrong\u003eWhole-genome analyses resolve early branches in the tree of life of modern birds\u003c/strong\u003e. \u003cem\u003eScience\u003c/em\u003e. 2014;346:1320\u0026ndash;1331.\u003c/li\u003e\n\u003cli\u003eFredriksson R, Lagerstr\u0026ouml;m MC, Lundin LG, Schi\u0026ouml;th HB. \u003cstrong\u003eThe G-protein-coupled receptors in the human genome form five main families\u003c/strong\u003e. \u003cem\u003eMol Pharmacol\u003c/em\u003e. 2003;63:1256\u0026ndash;1272.\u003c/li\u003e\n\u003cli\u003eMcWilliams SR, Guglielmo C, Pierce B, Klaassen M. \u003cstrong\u003eFlying, fasting, and feeding in birds during migration: a nutritional and physiological ecology perspective\u003c/strong\u003e. \u003cem\u003eJ Avian Biol\u003c/em\u003e. 2004;35:377\u0026ndash;393.\u003c/li\u003e\n\u003cli\u003eWilliams TD. \u003cem\u003ePhysiological Adaptations for Breeding in Birds\u003c/em\u003e. Princeton: Princeton University Press; 2012.\u003c/li\u003e\n\u003cli\u003eAlfoldi J, Lindblad-Toh K. Comparative genomics as a tool to understand evolution and disease. Genome Res. 2013;23:1063\u0026ndash;1069.\u003c/li\u003e\n\u003cli\u003eFrankham R. Genetics and extinction. Biol Conserv. 2005;126:131\u0026ndash;140.\u003c/li\u003e\n\u003cli\u003eBourque G, Burns KH, Gehring M, et al. Ten things you should know about transposable elements. Genome Biol. 2018;19:199.\u003c/li\u003e\n\u003cli\u003eLynch M. The origins of genome architecture. Sunderland: Sinauer Associates; 2007.\u003c/li\u003e\n\u003cli\u003eEllegren H, Galtier N. \u003cstrong\u003eDeterminants of genetic diversity\u003c/strong\u003e. \u003cem\u003eNat Rev Genet\u003c/em\u003e. 2016;17:422\u0026ndash;433.\u003c/li\u003e\n\u003cli\u003eFrankham R. Genetics and extinction. Biol Conserv. 2005;126:131\u0026ndash;140.\u003c/li\u003e\n\u003cli\u003eZhang G, Li B, Li C, et al. Comparative genomics reveals insights into avian genome evolution and adaptation. Science. 2014;346:1311\u0026ndash;1320.\u003c/li\u003e\n\u003cli\u003eAllendorf FW, Luikart G, Aitken SN. Conservation and the Genetics of Populations. 2nd ed. Wiley-Blackwell; 2013.\u003c/li\u003e\n\u003cli\u003eFunk WC, McKay JK, Hohenlohe PA, Allendorf FW. Harnessing genomics for delineating conservation units. Trends Ecol Evol. 2012;27:489\u0026ndash;496.\u003c/li\u003e\n\u003cli\u003eShafer ABA, Wolf JBW, Alves PC, et al. Genomics and the challenging translation into conservation practice. Trends Ecol Evol. 2015;30:78\u0026ndash;87.\u003c/li\u003e\n\u003cli\u003eDussex N, von Seth J, Knapp M, et al. Genomes and the conservation of endangered species. Annu Rev Anim Biosci. 2021;9:519\u0026ndash;545. \u003c/li\u003e\n\u003cli\u003eFair J, Paul E, Jones J, eds. \u003cstrong\u003eGuidelines to the Use of Wild Birds in Research\u003c/strong\u003e. 3rd ed. Ornithological Council; 2010.\u003c/li\u003e\n\u003cli\u003eSheldon LD, Chin EH, Gill SA, Schmaltz G, Newman AEM, Soma KK. Effects of blood collection on wild birds. J Avian Biol. 2008;39:720\u0026ndash;726\u003c/li\u003e\n\u003cli\u003eSecretar\u0026iacute;a de Medio Ambiente y Recursos Naturales (SEMARNAT). Ley General de Vida Silvestre. M\u0026eacute;xico; 2023.\u003c/li\u003e\n\u003cli\u003eSambrook J, Russell DW. Molecular Cloning: A Laboratory Manual. 3rd ed. Cold Spring Harbor Laboratory Press; 2001.\u003c/li\u003e\n\u003cli\u003eKoren S, Phillippy AM. One chromosome, one contig: complete microbial genomes from long-read sequencing. Curr Opin Microbiol. 2015;23:110\u0026ndash;120.\u003c/li\u003e\n\u003cli\u003eSimbolo M, Gottardi M, Corbo V, et al. DNA qualification workflow for next generation sequencing of histopathological samples. PLoS One. 2013;8:e62692.\u003c/li\u003e\n\u003cli\u003eWenger AM, Peluso P, Rowell WJ, et al. Accurate circular consensus long-read sequencing improves variant detection and assembly. Nat Biotechnol. 2019;37:1155\u0026ndash;1162.\u003c/li\u003e\n\u003cli\u003eHon T, Mars K, Young G, et al. Highly accurate long-read HiFi sequencing data for five complex genomes. Sci Data. 2020;7:399.\u003c/li\u003e\n\u003cli\u003eAndrews S. FastQC: a quality control tool for high throughput sequence data. 2010.\u003c/li\u003e\n\u003cli\u003eBolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. 2014;30:2114\u0026ndash;2120.\u003c/li\u003e\n\u003cli\u003eBankevich A, Nurk S, Antipov D, et al. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J Comput Biol. 2012;19:455\u0026ndash;477.\u003c/li\u003e\n\u003cli\u003eGurevich A, Saveliev V, Vyahhi N, Tesler G. QUAST: quality assessment tool for genome assemblies. Bioinformatics. 2013;29:1072\u0026ndash;1075.\u003c/li\u003e\n\u003cli\u003eLangmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nat Methods. 2012;9:357\u0026ndash;359.\u003c/li\u003e\n\u003cli\u003eRanallo-Benavidez TR, Jaron KS, Schatz MC. GenomeScope 2.0 and Smudgeplot for reference-free profiling of polyploid genomes. Nat Commun. 2020;11:1432.\u003c/li\u003e\n\u003cli\u003eStanke M, Diekhans M, Baertsch R, Haussler D. Using native and syntenically mapped cDNA alignments to improve de novo gene finding. Bioinformatics. 2008;24:637\u0026ndash;644.\u003c/li\u003e\n\u003cli\u003eJones P, Binns D, Chang HY, et al. InterProScan 5: genome-scale protein function classification. Bioinformatics. 2014;30:1236\u0026ndash;1240.\u003c/li\u003e\n\u003cli\u003eSim\u0026atilde;o FA, Waterhouse RM, Ioannidis P, Kriventseva EV, Zdobnov EM. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics. 2015;31:3210\u0026ndash;3212.\u003c/li\u003e\n\u003cli\u003eSzklarczyk D, Kirsch R, Koutrouli M, et al. The STRING database in 2023. Nucleic Acids Res. 2023;51:D638\u0026ndash;D646.\u003c/li\u003e\n\u003cli\u003eEmms DM, Kelly S. OrthoFinder: phylogenetic orthology inference for comparative genomics. Genome Biol. 2019;20:238.\u003c/li\u003e\n\u003cli\u003eMendes FK, Vanderpool D, Fulton B, Hahn MW. CAFE 5 models variation in evolutionary rates among gene families. Bioinformatics. 2020;36:5516\u0026ndash;5518.\u003c/li\u003e\n\u003cli\u003eCantalapiedra CP, Hern\u0026aacute;ndez-Plaza A, Letunic I, Bork P, Huerta-Cepas J. eggNOG-mapper v2. Mol Biol Evol. 2021;38:5825\u0026ndash;5829.\u003c/li\u003e\n\u003c/ol\u003e"},{"header":"Tables","content":"\u003cp\u003eTables 1 to 4 are available in the Supplementary Files section.\u003c/p\u003e\n\u003cp\u003eTable 5 is not available with this version.\u003c/p\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"bmc-genomics","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"gics","sideBox":"Learn more about [BMC Genomics](http://bmcgenomics.biomedcentral.com/)","snPcode":"","submissionUrl":"https://www.editorialmanager.com/gics","title":"BMC Genomics","twitterHandle":"#BMCGenomics","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"em","reportingPortfolio":"BMC Series","inReviewEnabled":true,"inReviewRevisionsEnabled":true},"keywords":"","lastPublishedDoi":"10.21203/rs.3.rs-8736544/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-8736544/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eThe Mexican duck \u003cem\u003e(Anas diazi\u003c/em\u003e) is the only duck species endemic to Mexico and is currently listed as threatened under NOM-059-SEMARNAT-2010. Its taxonomic status has long been controversial, historically considered a subspecies of \u003cem\u003eAnas platyrhynchos\u003c/em\u003e due to their strong morphological similarity, particularly in females. In 2020, the American Ornithological Society formally recognized \u003cem\u003eA. diazi\u003c/em\u003e as a distinct species; however, its genomic architecture remains largely unexplored. Here we present the first whole-genome analysis of \u003cem\u003eA. diazi\u003c/em\u003e, based on PacBio HiFi long-read sequencing and a de novo assembly strategy. This represents the first genomic resource available for this endemic and threatened species. Genome size estimation based on K-mer and GenomeScope2 modeling revealed a haploid genome size of approximately 1.02 Gb, with a high model fit (\u0026gt;\u0026thinsp;96%), low repeat content (~\u0026thinsp;7.4%), and moderate heterozygosity (~\u0026thinsp;1.1%), values consistent with other waterfowl genomes. Comparative alignment of reads with the \u003cem\u003eA. platyrhynchos\u003c/em\u003e reference genome showed an alignment rate of approximately 86%, suggesting substantial lineage-specific genomic divergence. Gene prediction and functional annotation were performed using avian reference datasets (Anatidae and \u003cem\u003eGallus gallus\u003c/em\u003e), generating nearly 4,000 highly reliable annotated proteins, the integrity of which was supported by BUSCO analysis. Using OrthoFinder and CAFE5, we investigated the evolution of gene families in \u003cem\u003eA. diazi, A. platyrhynchos, Aix galericulata, Anser cygnoides\u003c/em\u003e, and \u003cem\u003eG. gallus\u003c/em\u003e, identifying lineage-specific patterns of gene family expansion and contraction potentially associated with domestication, ecological specialization, and evolutionary divergence within Anatidae. Taken together, our results provide the first genomic framework for \u003cem\u003eAnas diazi\u003c/em\u003e and lay the groundwork for future evolutionary, ecological, and conservation genomic studies of this waterbird endemic to Mexico.\u003c/p\u003e","manuscriptTitle":"Long Reads de novo sequencing of the Anas diazi genome reveals changes in gene orthology in waterfowl","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2026-02-18 06:14:49","doi":"10.21203/rs.3.rs-8736544/v1","editorialEvents":[{"type":"communityComments","content":0},{"type":"decision","content":"Revision requested","date":"2026-03-06T09:47:56+00:00","index":"","fulltext":""},{"type":"reviewerAgreed","content":"26027010755788447714858740178306597017","date":"2026-03-04T18:53:57+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"86587478023654323214283306434563025268","date":"2026-03-04T15:38:50+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"76512175082528131668119169240455604984","date":"2026-03-02T08:35:45+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"106588979176326937481519754319921827790","date":"2026-02-27T13:08:17+00:00","index":"hide","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2026-02-27T02:29:46+00:00","index":"hide","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2026-02-25T16:53:15+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"183463186430543157498401495689112452194","date":"2026-02-25T13:07:41+00:00","index":"hide","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2026-02-22T21:40:06+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"326190879423843267077843108365217665862","date":"2026-02-18T15:53:46+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"241778748034609351738578336330648865281","date":"2026-02-15T18:21:01+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"189617339251131738656042990944427271594","date":"2026-02-14T18:29:02+00:00","index":"hide","fulltext":""},{"type":"reviewersInvited","content":"","date":"2026-02-12T05:26:42+00:00","index":"","fulltext":""},{"type":"editorAssigned","content":"","date":"2026-02-12T05:24:03+00:00","index":"","fulltext":""},{"type":"editorInvited","content":"","date":"2026-02-11T19:01:11+00:00","index":"","fulltext":""},{"type":"checksComplete","content":"","date":"2026-02-07T21:47:25+00:00","index":"","fulltext":""},{"type":"submitted","content":"BMC Genomics","date":"2026-02-07T21:41:55+00:00","index":"","fulltext":""}],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"bmc-genomics","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"gics","sideBox":"Learn more about [BMC Genomics](http://bmcgenomics.biomedcentral.com/)","snPcode":"","submissionUrl":"https://www.editorialmanager.com/gics","title":"BMC Genomics","twitterHandle":"#BMCGenomics","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"em","reportingPortfolio":"BMC Series","inReviewEnabled":true,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"691c13aa-fdd9-4fcf-b3d6-63f85a1a7b86","owner":[],"postedDate":"February 18th, 2026","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"in-revision","subjectAreas":[],"tags":[],"updatedAt":"2026-04-24T08:09:26+00:00","versionOfRecord":[],"versionCreatedAt":"2026-02-18 06:14:49","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-8736544","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-8736544","identity":"rs-8736544","version":["v1"]},"buildId":"XKTyCvWXoU3ODBz1xrDgd","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.