MicroFinder: Conserved gene-set mapping and assembly ordering for manual curation of bird microchromosomes

preprint OA: gold CC-BY-NC-4.0
📄 Open PDF Full text JSON View at publisher
Full text 53,596 characters · extracted from oa-pdf · 10 sections · click to expand

Abstract

Obtaining chromosomally complete genome assemblies across the tree of life is a major goal of biodiversity genomics. However, some lineages remain recalcitrant to assembly despite recent advances in sequencing technologies and assembly tools. Birds present a substanAal genome assembly challenge due to the presence of Any, hard to assemble microchromosomes that are oVen highly fragmented or even missing in draV genome assemblies. As such, bird genomes require a large amount of expert manual curaAon effort via manipulaAon of genome-wide Hi-C contact maps and many current chromosome-level bird genome assemblies do not resolve the known karyotype. Microchromosomes have disAnct geneAc and epigeneAc features. They are GC-biased, gene-rich, highly methylated, and have disAnct spaAal organisaAon in the centre of the nucleus. Importantly, they are conserved across avian evoluAon. Here, using a reference set of expert curated bird genomes, we have idenAfied a set of conserved microchromosome genes and developed MicroFinder, a pipeline that uses this gene set to find small microchromosome fragments in draV genome assemblies to act as anchors for manual curaAon of microchromosomes. We demonstrate how MicroFinder can be used to improve the speed and accuracy of bird genome curaAon. Furthermore, we highlight the usefulness of MicroFinder by carrying out MicroFinder-enabled re-curaAon of 12 previously released chromosome-scale bird genome assemblies, increasing the sequence content of microchromosome models. Introduc:on Recent advances in sequencing technology have dramaAcally improved the quanAty, quality and taxonomic breadth of reference genome assemblies across the tree of life (Feron & Waterhouse, 2022; Lewin et al., 2001; Rhie et al., 2021; The Darwin Tree of Life Project Consortium, 2021). Automated assembly of accurate long reads followed by scaffolding with high throughput in vivo chromaAn conformaAon capture sequence data (Hi-C) and manual .CC-BY-NC 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted May 14, 2025. ; https://doi.org/10.1101/2025.05.09.653066doi: bioRxiv preprint curaAon (Howe et al., 2021) rouAnely results in genome assemblies that meet or exceed accepted gold standard metrics (Lawniczak et al., 2022) . However, some lineages are recalcitrant to assembly and challenges remain to generate complete , chromosomally resolved genome assemblies for all taxa (H. Li & Durbin, 2024). Within vertebrates, birds present a substanAal assembly challenge due to the presence of Any, hard to assemble, microchromosomes. Since early cytogeneAc studies, it has been recognised that bird genomes typically contain six to eight pairs of large macrochromosomes and 31 to 33 pairs of small microchromosomes (Tegelström & Ryttman, 1981). In chicken, macrochromosome size based on a near-T2T assembly ranges from 250 Mb to 30 Mb, and microchromosomes range from 23 Mb to 2.5 Mb (Huang et al., 2023) . Ten of the smallest microchromosomes (ranging in size from 6.8 to 2.5 Mb) are fu rther categorised as “dot” chromosomes based on their minute size, morphology and extensive pericentromeric heterochromaAn. Once considered unimportant DNA fragments (Newcomer, 1955, 1957) , cytogeneAcs and genomics have revealed that microchromosomes are highly conserved across avian evoluAon and contain many important and highly expressed housekeeping genes (Liu et al., 2021; van Brink, 1959; Waters et al., 2021). Furthermore, microchromosomes have disAnct geneAc and epigeneAc features sehng them apart from macrochromosomes: they are GC -biased, gene-rich, highly methylated, and have disAnct spaAal organisaAon in the centre of the nucleus (Habermann et al., 2001; McQueen et al., 1998; O’Connor et al., 2019; Perry et al., 2021; Smith et al., 2000). Most recent bird genome assembly projects follow the Vertebrate Genome Project (VGP) assembly pipeline which uses accurate PacBio HiFi long reads for de novo assembly combined with Hi-C data for long range scaffolding and phasing (Larivière et al., 2024) . This pipeline produces assemblies with excellent conAguity and completeness staAsAcs. However, these metrics do not fully capture the challenge of assembling the smallest bird chromosomes as they represent a small fracAon of the total sequence content. Strikingly, despite high-quality sequence data , bird genome assemblies oVen do not fully resolve the known karyotype (Figure 1a; Supplementary Table 1). Of 105 species with chromosome-scale genome assemblies in InternaAonal NucleoAde Sequence Database CollaboraAon (INSDC) databases that also have karyotype data, 62 (59 %) differ from the expected karyotype by 2 or more chromosomes, with the majority (57/62) having fewer chromosomes than expected. Primarily, this is due to failure to assemble and idenAfy the full set of microchromosomes (Barros et al., 2023; Peona, Blom, et al., 2021) and even in karyotype-resolved assemblies, microchromosomes are oVen highly fragmented and can be incomplete (M. Li et al., 2022) . Painstaking m anual curaAon of bird genome s aVer de novo assembly and scaffolding is therefore an essenAal assembly step. For example, the Hi-C contact map for the draV genome assembly of the pink-footed goose Anser brachyrhynchus (assembled by the Darwin Tree of Life (DToL) project [Lopez Colom & O’Brien, 2024] ) reveals 28 clear chromosomal elements ( Figure 1b ), yet closely related karyotyped geese all have 40 or 41 chromosomes (Uno et al., 2019; Wójcik & Smalec, 2007). .CC-BY-NC 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted May 14, 2025. ; https://doi.org/10.1101/2025.05.09.653066doi: bioRxiv preprint Therefore, at least 12 chromosomes are expected to be among the unplaced “shrapnel” content located at the bopom right of the Hi-C contact map which predominantly contains repeAAve sequence ( Figure 1C ). To resolve the assembl y, genome curators siV through shrapnel scaffolds to idenAfy and assemble microchromosome fragments ( Figure 1d ). Techniques include making use of the elevated Hi-C background signal between microchromosomes (due to their central posiAon in the nucleus), genome alignments with

Reference

species and mapping of protein coding genes. This process is slow and laborious and there is a high likelihood of sequence content being miss ed from the assembled chromosomes. Figure 1: Bird genome assemblies are o/en not karyotype-complete and require extensive manual cura:on. (A) Correspondence analysis of chromosome counts in chromosome -scale genome assemblies versus their respec:ve haploid karyotype for 105 bird species. Colour gradient reflects the number of species in each category (bin of assembly (x) versus karyotype (y) count). The Solid black line marks the match of the chromosome number in assemblies (y -axis) and predicted chromosome number using cytology (x -axis). The dashed diagonal lines indicate ± 1 chromosome margin of error to account for expected varia:on from assemblies of males .CC-BY-NC 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted May 14, 2025. ; https://doi.org/10.1101/2025.05.09.653066doi: bioRxiv preprint (homogame:c sex will usually have 1 less assembled chromosome). (B) Hi-C contact map for the dra/ genome assembly of Anser brachyrhynchus (assembled by the Darwin Tree of Life (DToL) project [Lopez Colom & O’Brien, 2024]). Coloured squares highlight 28 clear chromosomal elements iden:fied during an ini:al assembly cura:on (painted “Scaffolds” in PretextView). (C) A zoomed in view of the unplaced assembly content grouped at the boWom -right of image ( B). (D) Hi-C contact map of the curated A. brachyrhynchus genome assembly zoomed in on the smallest 11 chromosomes. Content to the right of the red arrow is unplaced content . Microchromosomes have elevated background Hi-C signal but appear as independent elements in the Hi-C map. Here, to aid manual curaAon of bird genomes, we took advantage of conserved gene content to idenAfy microchromosome fragments in draV genome assemblies. Using 11 high-quality, manually curated bird genomes generated as part of the VGP , 25 Genomes Project and DToL (The Darwin Tree of Life Project ConsorAum, 2021), as well as a near telomere -to-telomere (T2T) assembly of chicken (Huang et al., 2023) , we idenAfied a set of conserved microchromosome genes and have developed MicroFinder (hpps://github.com/sanger- tol/MicroFinder), a pipeline that uses this gene set to find candidate microchromosome conAgs from draV assemblies to improve the speed and accuracy of manual curaAon. Using this approach, we revisited 12 previously released bird genome assemblies and improved the content and representaAon of their assembled microchromosomes.

Results

and Discussion Iden6fica6on of conserved microchromosome genes Given the gene-dense nature of microchromosomes and their conserved synteny across birds, we hypothesised that a dense marker set of protein coding genes would enable the idenAficaAon of microchromosome fragments in draV genome assemblies. To generate a set of marker genes, we made use of expert-curated genome assemblies generated for the VGP, DToL and 25 Genomes projects. We selected 11 published genome assemblies with NCBI RefSeq or Ensemble rapid release gene-sets (Supplementary Table 2). We also included a recent, near-T2T assembly of chicken (Huang et al., 2023). Together, these 12 assemblies span nine bird orders and 11 families (Supplementary Table 2, Figure 2A). Of note, this collecAon includes three high confidence genome assemblies (bCucCan1, bTaeGut1 and GGswu, herein referred to as the ToL reference set ) that are commonly used by genome curators at the Welcome Sanger Tree of Life (ToL) program as references for whole genome alignments when curaAng new bird assemblies. AddiAonally, six of the selected assemblies have been confirmed to be karyotype -complete based on cytology (Supplementary Table 2, Figure 2 A). Of the remaining assemblies, two species do not have published karyotypes and four likely have missing chromosomes based on expectaAons from cytology , further highlighAng the challenges of generaAng karyotype-complete genome assemblies for birds even when high- quality data is available and substanAal manual curaAon Ame has been invested. .CC-BY-NC 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted May 14, 2025. ; https://doi.org/10.1101/2025.05.09.653066doi: bioRxiv preprint Figure 2: Phylogeny of annotated chromosome-scale bird reference genomes used to generate the MicroFinder protein set and conserved macro synteny of bird dot chromosomes. (A) Maximum likelihood phylogeny based on a concatenated alignment of 9,400 conserved single-copy orthogroups. Branch lengths are in amino acid subs:tu:ons per site. All nodes have ≥ 99% bootstrap support (1000 ultrafast bootstrap replicates). Species with genome assemblies confirmed to be karyotype-complete based on cytology are highlighted in green. Full details of all assemblies are given in Supplementary Table 2. PhyloPic (hWps://www.phylopic.org) silhoueWes of each species are shown at the tree :p s. Species marked with an “*” form the ToL reference set and are rou:nely used as references when assembling diverse bird genomes. (B - F) Dot chromosome synteny between genomes in the ToL reference set based on whole genome alignments. F summarises dot chromosome homology between GGswu, bTaeGut1 and bCucCan1 based on the alignments shown in B - E. To idenAfy conserved, low copy number genes to use as markers we clustered proteomes from the 12 bird reference genome s into orthogroups with OrthoFinder (Emms & Kelly, 2015, 2019) and used Kin Fin (Laetsch & Blaxter, 2017) to select broadly conserved “fuzzy” orthogroups that have relaxed conservaAon and copy number constraints (<= 3 gene copies .CC-BY-NC 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted May 14, 2025. ; https://doi.org/10.1101/2025.05.09.653066doi: bioRxiv preprint per species and present in at least 50% of species). In total, 197,759 proteins were clustered into 16,589 orthogroups, of which 9,400 were conserved and single -copy in all species and 14,514 were idenAfied by Kin Fin as “fuzzy” orthogroups (Supplementary Table 3 and Supplementary Data). We further filtered the KinFin orthogroup set to only include genes located on dot chromosomes in any of the three ToL reference species, using the near-T2T GGswu chicken assembly to classify dot chromosome homologs in bCucCan1 and bTaeGut1 (Figure 2B-F). We reasoned that specifically targeAng dot chromosomes rather than all microchromosomes would be most beneficial for assembly curaAon as larger microchromosomes are typically much less fragmented than dot chromosomes. This filtering idenAfied 510 dot chromosome-associated orthogroups containing 4,510 proteins across all 12 reference species. To reduce redundancy, we clustered the dot chromosome-associated proteins with CD-HIT (Fu et al., 2012) to produce a final gene set containing 2,882 protein s which we refer to as the MicroFinder protein set. Next, we invesAgated coverage of MicroFinder loci across, to our knowledge, the most complete bird genome assembled to date, the near-T2T GGswu assembly of chicken. The 10 GGswu dot chromosomes have between 15 and 67 GGswu MicroFinder loci per chromosome (307 in total), with an average density of 7.5 loci per Mb of sequence (Figure 3). In comparison, the orthoDB10 avian Benchmarking Universal Single-Copy Orthologs (BUSCO) gene set (n = 8,338 orthogroups) has only 3 genes located on dot chromosomes (Supplementary Figure 1), likely due historical difficulAes with dot chromosome assembly leading to severe underrepresentaAon of dot chromosome genes in OrthoDB. Previously, Huang et. al. (2023) showed that chicken dot chromosomes are split into two disAnct domains - gene-rich euchromaAc regions and repeAAve, gene -poor heterochromaAc regions, with the euchromaAc parts typically occupying a large region of the long arm of each chromosome. In line with this, we find clustering of MicroFinder proteins in high expression, low repeat density regions of dot chromosomes (Figure 3). As such, the high density of MicroFinder proteins in euchromaAn will increase the likelihood of idenAfying genic regions of dot chromosomes in fragmented genome assemblies. .CC-BY-NC 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted May 14, 2025. ; https://doi.org/10.1101/2025.05.09.653066doi: bioRxiv preprint Figure 3: Distribu:on of MicroFinder proteins on chicken (GGswu assembly) dot chromosomes. Panels from top to boWom show the loca:on of MicroFinder loci (coral), RNA -seq alignment counts from female chicken liver (SRR18788805) (green) in 10 Kb fixed windows, and transposable element density in 10 Kb fixed windows (blue). To aid visualisa:on of lower coverage genes, maximum RNAseq read coverage was capped at 25x. Gene mapping and assembly ordering to aid genome cura6on To make use of the MicroFinder protein set we developed a pipeline to map and count MicroFinder proteins in a draV genome assembly and reorder scaffolds by MicroFinder protein count. This strategy means that putaAve dot chromosome scaffolds appear at the beginning of the Hi-C contact map separated from other small fragments, enabling curators to quickly idenAfy dot chromosome content and start building up chromosome-scale scaffolds without having to siV through repeAAve “shrapnel” conAgs as is the case for a standard, size-sorted map. The MicroFinder pipeline aligns the MicroFinder protein set to a draV assembly with miniprot (H. Li, 2023), selects the top ranking hit for each protein, removes alignments with less than 70% idenAty and then counts pr otein alignments per scaffold and outputs a reordered assembly fasta file and associated MicroFinder count data. OpAonally, the pipeline can apply a maximum scaffold size cutoff for assembly sorAng. During tesAng we found that macrochromosome scaffolds can someAmes contain a low number of MicroFinder hits, most likely due to the presence of divergent paralogs or mis-mapping. We therefore recommend using a 5 Mb maximum scaffold size cutoff for assembly sorAng. Following sorAng, new Hi-C contact maps can be made for assembly curaAon in PretextView (hpps://github.com/sanger- tol/PretextView) using the CuraAonPretext pipeline (Pointon, 2025). MicroFinder has been packaged up into Docker and Sin gularity containers for easy deployment (hpps://github.com/sanger-tol/MicroFinder) and we have developed a training workshop with example datasets to guide users (Mathers et al., 2024). To demonstrate how MicroFinder can be used as a curaAon aid, we applied it to the draV (pre curaAon) DToL genome assembly of Anas acuta (O’Brien & Lopez Colom, 2024). MicroFinder idenAfied 61 putaAve dot chromosome scaffolds shorter than 5 Mb and moved them to the start of the Hi-C contact map (Figure 4 ). These scaffolds were manually ordered and .CC-BY-NC 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted May 14, 2025. ; https://doi.org/10.1101/2025.05.09.653066doi: bioRxiv preprint rearranged to form 10 chromosomal elements during curaAon. Notably, we did not observe false posiAve MicroFinder ordered scaffolds with Hi-C signal placing them with macrochromosomes, indicaAng that MicroFinder proteins are reliable dot chromosome markers. This is likely due to conservaAon of microchromosome gene content and limited rearrangements between microchromosomes and macrochromosomes during avian evoluAon. As such, MicroFinder enables rapid curaAon of dot chromosomes using gene-rich scaffolds as anchors to build up dot chromosomes, removing the need for curators to trawl through repeAAve shrapnel conAgs and reducing the risk of small gene-rich dot chromosome conAgs being missed from dot chromosome models during the curaAon process. Figure 4: MicroFinder-enabled manual cura:on of bird dot chromosomes. Main panel shows Hi-C contact map of the MicroFinder-ordered dra/ (pre cura:on) genome assembly of Anas acuta (O’Brien & Lopez Colom, 2024). Central panel shows a zoomed in view of the puta:ve dot chromosome content that has been moved to the start of the of the assembly by MicroFinder for cura:on. Right hand panel shows zoomed in view of the curated dot chromosomes. Reassembly of DToL bird genomes using MicroFinder-aided cura6on Next, we invesAgated whether MicroFinder could be used to impro ve previously released chromosome-scale bird genome assemblies. We ran MicroFinder on 12 DToL bird genome assemblies that had been assembled using PacBio HiFi and Hi-C and subjected to manual curaAon by the ToL curaAon team (Supplementary Table 4). For each assembly, we ran MicroFinder with a 5 Mb maximum scaffold length cutoff and generated a new Hi-C contact map for curaAon in PretextView using the original sequence data. MicroFinder idenAfied between 22 and 74 (mean = 49) putaAve unplaced dot chromosome scaffolds per assembly (Figure 5a ). We were able to unambiguously place MicroFinder scaffolds onto dot .CC-BY-NC 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted May 14, 2025. ; https://doi.org/10.1101/2025.05.09.653066doi: bioRxiv preprint chromosome models in 11 out of 12 of the assemblies, placing between 2 and 16 scaffolds and increasing the total length of assembled chromosomes in 9 out of 12 assemblies, adding between 216 Kb and 4.3 MB of addiAonal content per assembly (average = 1.4 Mb) (Figure 5b). Two assemblies (bNetRuf1.1 and bAccGen1.1), had a decrease in assembled chromosome length due to idenAficaAon of errors in the original assembly. In total, MicroFinder enabled the placement of an addiAonal 12.5 MB of dot chromosome content across 9 DToL genomes. Furthermore, in the case of bAnaAcu1.1, were able to idenAfy an addiAonal dot chromosome model that had been missed in the original curaAon (Figure 5c). Unplaceable scaffolds either had ambiguous Hi-C signal or were too small to place, reflecAng the fragmented nature of dot chromosome assemblies (Figure 5c). .CC-BY-NC 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted May 14, 2025. ; https://doi.org/10.1101/2025.05.09.653066doi: bioRxiv preprint Figure 5: MicroFinder-enabled re-cura:on of 12 previously released DToL bird genome assemblies. (A) Bar chart showing counts of shrapnel scaffolds (previously unplaced content) iden:fied by MicroFinder for 12 genome assemblies. Bars are coloured by whether the scaffolds were placed onto chromosome models during manual cura:on. (B) As for ( A) but showing total sequence content added to chromosome models during manual cura:on of the MicroFinder sorted genome assemblies. (C) Hi-C contact map of the Anas acuta genome assembly (bAnaAcu1.1). The figure shows a zoomed in view of the smallest seven chromosomes. Scaffolds in the original assembly are separated by grey lines. Coloured squares indicate “painted” chromosomes and are assigned super scaffold IDs (Scaffold_(n)) by PretextView (shown above each square). Red ver:cal arrows indicate scaffolds that have been incorporated into chromosome models following MicroFinder -enabled manual re -cura:on. Scaffold_35 is a chromosome model that was uniden:fiable in the original cura:on. Full stats for all 12 re-curated genome assemblies are provided in Supplementary Table 4. .CC-BY-NC 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted May 14, 2025. ; https://doi.org/10.1101/2025.05.09.653066doi: bioRxiv preprint

Conclusion

Here, we have idenAfied a set of broadly conserved genes located on the smallest bird microchromosomes, known as dot chromosomes, and developed a pipeline (MicroFinder) to idenAfy and order putaAve dot chromosome scaffolds in draV genome assemblies. By using “fuzzy” orthogroup selecAon, our gene set includes a large number of broadly conserved single-copy (or low copy number) genes and provides good coverage across all avian dot chromosomes (Figure 3 ). Using this strategy, MicroFinder can detect putaAve dot chromosome scaffolds in fragmented draV genome assemblies and is an effecAve curaAon aid for bird genome assembly , even enabling improvement to genome assemblies that have already undergone expert curaAon (Figure 5). Previously, an integraAve method that uses a BAC panel to idenAfy chromosome-specific regions was developed to resolve fragmented assemblies, including idenAficaAon of microchromosomes (Damas et al., 2017) , however it requires experAse in molecular cytogeneAcs and is Ame -consuming and impracAcal for current large-scale sequencing projects. Instead, MicroFinder provides a quick and easy pipeline to effecAvely pull-out putaAve dot chromosome fragments in silico. Furthermore, the MicroFinder approach may be applicable to other systems which have conserved but hard to assemble chromosomes, such as the dot chromosome (Muller element F) in Diptera. Recently, near-T2T assemblies have been released for chicken, bustard and mallard (Hu et al., 2024; Huang et al., 2023; Luo et al., 2023) . These assemblies achieved higher microchromosome conAguity through the inclusion of Oxford Nanopore ultra long reads. This approach represents a promising avenue to further improve bird genome assembly quality. However, due to scale and inerAa, many projects sAll rely primarily on PacBio HiFi de novo assembly and will greatly benefit from our approach. We recommend MicroFinder is incorporated into bird genome assembly pipelines prior to manual curaAon to maximise the completeness of microchromosome assemblies.

Methods

Meta-analysis of bird karyotype and genome assembly chromosome counts Genomes on a Tree (GoaT) (Challis et al., 2023) was used to retrieve bird chromosome counts based on cytology and from chromosome -level assemblies hosted INDC databases (Supplementary Table 3). Our query was made on the “taxon” index of the database, and we excluded tax a with missing data, retaining 105 species for downstream analysis. For chromosome counts based on genome assemblies, a single summary value was used as the representaAve chromosome count per species. For each assembly, the chromosome count corresponds to the number of chromosomes idenAfied in the primary assembly (as opposed to the alternate assembly for a taxon). When mulAple assemblies were available per taxon, the summary corresponds to the primary haplotype of NCBI RefSeq assembly. Haploid cytology-based chromosome numbers were extracted by halving the diploid number from the .CC-BY-NC 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted May 14, 2025. ; https://doi.org/10.1101/2025.05.09.653066doi: bioRxiv preprint Bird Chromosome Database (Degrandi et al., 2020) and Animal Chromosome Counts Database (Release 1.0.1) (Román-Palacios et al., 2021) during GoaT import. A single summary value per species was calculated as the mode across all reported values per species. The ranges of values within each dataset were manually checked to ensure the summary values for chromosome number and haploid numbers from cytology were biologically consistent. We found that most of the variaAon detected within cytological observaAons corresponded to +-1 chromosome from the summary mode, consistent with reporAng of different total number of chromosomes in different sexes and/or small miscounAng from older manuscripts (e.g. Makino, 1951). The outliers were also manually checked on the original source, and all 7 detected cases corresponded to problemaAc values in their respecAve databases; because these values were not used as summaries, they were not included in our meta-analysis, and did not create bias in the data on Figure 1a. An interacAve version of the scaperplot is available on the GoaT website for raw data exploraAon and download (hpps://Anyurl.com/4jnc3pbb). Dot chromosome homology assignment between GGswu, bTaeGut1 and bCucCan1 Pairwise whole genome alignments were carried out between chicken (GGswu), zebra finch (bTaeGut1) and cuckoo (bCucCan1) (Supplementary Table 2) using nucmer v4.0.0rc1 (Marçais et al., 2018) and visualised with Dot (hpps://dot.sandbox.bio/). Using these alignments, we idenAfied homologs to GGswu dot chromosomes previously classified by Huang et al. (2023). Orthogroup clustering and iden6fica6on of the MicroFinder protein set To idenAfy a set of conserved protein coding genes to use as dot chromosome markers we built orthogroups across representaAve bird genome assemblies. We selected 11 published chromosome-scale bird genome assemblies that had NCBI RefSeq or Ensembl rapid release gene-sets and combined them with a recent, near -T2T assembly of chicken (Huang et al., 2023) (Supplementary Table 2). For each species, we selected the longest transcript per gene to be the representaAve transcript and clustered protein sequences with OrthoFinder v2.5.4 (Emms & Kelly, 2015, 2019) in mulAple sequence alignment mode (“-M msa”). The resulAng orthogroups were filtered with KinFin v1.1.1 (Laetsch & Blaxter, 2017) with the parameters “--max 3 -x 0.5” to idenAfy orthogroups present in at least 50 percent of species with a maximum of three gene copies per species. To create the MicroFinder protein set, the KinFin orthogroups were filtered to retain only those with a gene copy on chicken (GGswu), zebra finch (bTaeGut1) or cuckoo (bCucCan1) dot chromosomes . Pr oteins from the filtered orthogroups were then clustered with CD-HIT v4.8.1 (Fu et al., 2012) using default sehngs to reduce redundancy. Phylogene6c analysis To place the 12 bird reference genomes used to generate the MicroFinder protein set in evoluAonary context we carried out phylogeneAc analysis using protein sequence alignments generated by OrthoFinder for 9,400 strictly conserved single-copy orthogroups. IQTree v2.3.4 .CC-BY-NC 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted May 14, 2025. ; https://doi.org/10.1101/2025.05.09.653066doi: bioRxiv preprint was used to idenAfy the opAmal parAAoning scheme, carry out model selecAon, esAmate the maximum likelihood phylogeny and carry out 1,000 ultrafast bootstrap replicates to assess tree support (Chernomor et al., 2016; Kalyaanamoorthy et al., 2017; Minh et al., 2020a, 2020b, 2021) . The IQTree phylogeny was rooted on the branch leading to Galloanserae (Galliformes plus Anseriformes) following Prum et al. (2015). The MicroFinder pipeline All steps of the MicroFinder pipeline are implemented in a bash script and the whole pipeline is available as a docker or singularity container ( hpps://github.com/sanger-tol/MicroFinder). First, the MicroFinder protein set is aligned to the draV genome assembly with miniprot v0.14 (H. Li, 2023) with default sehngs. From the resulAng alignments, we retain the top hit and discard alignments with less than 70% idenAty. MicroFinder protein hits are counted for each scaffold and the input assembly fasta file is sorted by the alignment count. OpAonally, a maximum scaffold length cutoff can be applied to the assembly sorAng step. MicroFinder outputs a fasta file of the draV assembly sorted by MicroFinder protein alignment counts, a table of alignment counts per input scaffold and a GFF file of the miniprot alignments. It should be noted that MicroFinder counts reflect the number of protein hits from the MicroFinder protein set rather than counts of individual loci. We opted to map all proteins to maximise sensiAvity to detect candidate dot chromosome scaffolds across a wide range of bird species. The MicroFinder-sorted assembly file should be prepared for manual curaAon in PretextView (hpps://github.com/sanger-tol/PretextView) with the CuraAonPretext pipeline (Pointon, 2025) with the “--no-sort” parameter used to retain the order of the MicroFinder assembly file in the Hi-C contact map. MicroFinder protein distribu6on in chicken (GGswu) and associated features We invesAgated the distribuAon of MicroFinder proteins across the near-T2T GGswu chicken assembly (Huang et al., 2023) . Mic roFinder protein coordinates were extracted from the GGswu annotaAon GFF file. To place MicroFinder proteins in context we also esAmated genome-wide repeat content and gene expression levels. RNA-seq from a from female chicken liver (SRR18788805) was aligned to the GGswu assembly with HISAT2 v2.2.1 (Kim et al., 2015) and we calculated read depth in 10 Kb fixed windows using Sambamba v0.8.2 (Tarasov et al., 2015). To esAmate repeat density across the GGswu dot chromosomes, we ran RepeatMasker v4.1.8 (Smit et al., 2005; Tarailo-Graovac & Chen, 2009) using a manually curated avian repeat library (Peona et al., 2023; Peona, Palacios -Gimenez, et al., 2021; Pointon, 2025) and calculated repeat density in 10 Kb fixed windows with bedtools coverage v2.31.1 (Quinlan & Hall, 2010) using the RepeatMasker GFF file as input. To compare the distribuAon of MIcroFinder proteins to BUSCO genes we ran BUSCO v5.8.2 (Simão et al., 2015; Waterhouse et al., 2018) with the Aves OrthoDB gene set (n = 8338) on the GGswu assembly and extracted the coordinates of BUSCOs located on the dot chromosomes. .CC-BY-NC 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted May 14, 2025. ; https://doi.org/10.1101/2025.05.09.653066doi: bioRxiv preprint Reassembly of DToL bird genomes with MicroFinder-enabled cura6on We selected 12 previously published DToL bird genome assemblies for re -curaAon with MicroFinder (Supplementary Table 4). For each assembly, we ran MicroFinder with a 5 Mb maximum scaffold length cutoff and generated a new Hi-C contact map for curaAon in PretextView using the CuraAonPretext pipeline v1.0.1 (Pointon, 2025) with the “--no-sort” parameter. CuraAonPretext was provided with the original Hi-C and PacBio long reads for each assembly to create a Hi-C contact map with read coverage, gap, telomere and simple repeat density tracks. Manual curaAon was carried out using PretextView v1.0.0 (hpps://github.com/sanger-tol/PretextView). Following manual curaAon of each assembly, an AGP file was exported from PretextView and an updated assembly generated using pretext- to-asm (hpps://github.com/sanger-tol/agp-tpf-uAls). Data and Code availability Supplementary data containing OrthoFinder results, the MicroFinder gene set and the 12 re- curated bird genome assemblies is available from Zenodo (hpps://doi.org/10.5281/zenodo.15364993). For each of the re-curated genome assemblies, we have provided a MicroFinder -ordered Hi-C contact map of the original assembly, PretextView savestate and agp files to show changes made to the original assembly and an updated FASTA file of the assembly. The MicroFinder source code and containerised versions of the pipeline are available on GitHub (hpps://github.com/sanger-tol/MicroFinder). Mathers et. al. (2024) provides a pracAcal guide for using MicroFinder-ordered assemblies for curaAon with example datasets (https://doi.org/10.5281/zenodo.13913870). Acknowledgments We thank Prof. Alex Suh and Dr ValenAna Peona for providing access to their curated avian repeat library. We thank Dr KersAn Howe and Kr Kamil Joran for comments on an earlier version of the manuscript. This work was supported by Wellcome through core funding to the Wellcome Sanger InsAtute (220540) and the Darwin Tree of Life DiscreAonary Award (218328).

References

Barros, C. P ., Derks, M. F . L., Mohr, J., Wood, B. J., Crooijmans, R. P . M. A., Megens, H. J., Bink, M. C. A. M., & Groenen, M. A. M. (2023). A new haplotype-resolved turkey genome to enable turkey genetics and genomics research. GigaScience, 12. https://doi.org/10.1093/gigascience/giad051 Challis, R., Kumar, S., Sotero-Caio, C., Brown, M., & Blaxter, M. (2023). Genomes on a Tree (GoaT): A versatile, scalable search engine for genomic and sequencing project metadata across the eukaryotic tree of life. Wellcome Open Research, 8. https://doi.org/10.12688/wellcomeopenres.18658.1 .CC-BY-NC 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted May 14, 2025. ; https://doi.org/10.1101/2025.05.09.653066doi: bioRxiv preprint Chernomor, O., Von Haeseler, A., & Minh, B. Q. (2016). Terrace Aware Data Structure for Phylogenomic Inference from Supermatrices. Systematic Biology, 65(6), 997–1008. https://doi.org/10.1093/sysbio/syw037 Damas, J., O’Connor, R., Farré, M., Lenis, V . P . E., Martell, H. J., Mandawala, A., Fowler, K., Joseph, S., Swain, M. T., Grilin, D. K., & Larkin, D. M. (2017). Upgrading short- read animal genome assemblies to chromosome level using comparative genomics and a universal probe set. Genome Research, 27(5), 875–884. https://doi.org/10.1101/gr.213660.116 Degrandi, T. M., Barcellos, S. A., Costa, A. L., Garnero, A. D. V ., Hass, I., & Gunski, R. J. (2020). Introducing the Bird Chromosome Database: An Overview of Cytogenetic Studies in Birds. Cytogenetic and Genome Research, 160(4), 199–205. https://doi.org/10.1159/000507768 Emms, D. M., & Kelly, S. (2015). OrthoFinder: solving fundamental biases in whole genome comparisons dramatically improves orthogroup inference accuracy. Genome Biology, 16(1), 157. https://doi.org/10.1186/s13059-015-0721-2 Emms, D. M., & Kelly, S. (2019). OrthoFinder: Phylogenetic orthology inference for comparative genomics. Genome Biology, 20(1), 1–14. https://doi.org/10.1186/s13059-019-1832-y Feron, R., & Waterhouse, R. M. (2022). Assessing species coverage and assembly quality of rapidly accumulating sequenced genomes. GigaScience, 11. https://doi.org/10.1093/gigascience/giac006 Fu, L., Niu, B., Zhu, Z., Wu, S., & Li, W. (2012). CD-HIT: Accelerated for clustering the next-generation sequencing data. Bioinformatics, 28(23), 3150–3152. https://doi.org/10.1093/bioinformatics/bts565 Habermann, F . A., Cremer, M., Walter, J., Kreth, G., Von Hase, J., Bauer, K., Wienberg, J., Cremer, C., Cremer, T., & Solovei, I. (2001). Arrangements of macro-and microchromosomes in chicken cells. Chromosome Research, 9, 569–584. Howe, K., Chow, W., Collins, J., Pelan, S., Pointon, D. L., Sims, Y ., Torrance, J., Tracey, A., & Wood, J. (2021). Significantly improving the quality of genome assemblies through curation. GigaScience, 10(1), 1–9. https://doi.org/10.1093/gigascience/giaa153 Hu, J., Song, L., Ning, M., Niu, X., Han, M., Gao, C., Feng, X., Cai, H., Li, T., Li, F ., Li, H., Gong, D., Song, W., Liu, L., Pu, J., Liu, J., Smith, J., Sun, H., & Huang, Y . (2024). A new chromosome-scale duck genome shows a major histocompatibility complex with several expanded multigene families. BMC Biology, 22(1). https://doi.org/10.1186/s12915-024-01817-0 Huang, Z., Xu, Z., Bai, H., Huang, Y ., Kang, N., Ding, X., Liu, J., Luo, H., Yang, C., Chen, W., Guo, Q., Xue, L., Zhang, X., Xu, L., Chen, M., Fu, H., Chen, Y ., Yue, Z., Liu, T. F . S., … Xu, L. (2023). Evolutionary analysis of a complete chicken genome. Proceedings of the National Academy of Sciences of the United States of America, 120(8). https://doi.org/10.1073/pnas.2216641120 Kalyaanamoorthy, S., Minh, B. Q., Wong, T. K. F ., Von Haeseler, A., & Jermiin, L. S. (2017). ModelFinder: Fast model selection for accurate phylogenetic estimates. Nature Methods, 14(6), 587–589. https://doi.org/10.1038/nmeth.4285 Kim, D., Langmead, B., & Salzberg, S. L. (2015). HISAT: A fast spliced aligner with low memory requirements. Nature Methods, 12(4), 357–360. https://doi.org/10.1038/nmeth.3317 .CC-BY-NC 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted May 14, 2025. ; https://doi.org/10.1101/2025.05.09.653066doi: bioRxiv preprint Laetsch, D. R., & Blaxter, M. L. (2017). KinFin: Software for taxon-aware analysis of clustered protein sequences. G3: Genes, Genomes, Genetics, 7(10), 3349–3357. https://doi.org/10.1534/g3.117.300233 Larivière, D., Abueg, L., Brajuka, N., Gallardo-Alba, C., Grüning, B., Ko, B. J., Ostrovsky, A., Palmada-Flores, M., Pickett, B. D., Rabbani, K., Antunes, A., Balacco, J. R., Chaisson, M. J. P ., Cheng, H., Collins, J., Couture, M., Denisova, A., Fedrigo, O., Gallo, G. R., … Formenti, G. (2024). Scalable, accessible and reproducible

Reference

genome assembly and evaluation in Galaxy. In Nature Biotechnology (Vol. 42, Issue 3, pp. 367–370). Nature Research. https://doi.org/10.1038/s41587- 023-02100-3 Lawniczak, M. K. N., Durbin, R., Flicek, P ., Lindblad-Toh, K., Wei, X., Archibald, J. M., Baker, W. J., Belov, K., Blaxter, M. L., Marques Bonet, T., Childers, A. K., Coddington, J. A., Crandall, K. A., Crawford, A. J., Davey, R. P ., Palma, F . Di, Fang, Q., Haerty, W., Hall, N., … Richards, S. (2022). Standards recommendations for the Earth BioGenome Project. PNAS, 119(4), e2115639118. https://doi.org/https://doi.org/10.1073/pnas.2115639118 Lewin, H. A., Robinson, G. E., Kress, W. J., Baker, W. J., Coddington, J., Crandall, K. A., Durbin, R., Edwards, S. V , Forest, E., Thomas, M., Gilbert, P ., Goldstein, M. M., Grigoriev, I. V , Hackett, K. J., Haussler, D., Jarvis, E. D., Johnson, W. E., Patrinos, A., Richards, S., … Zhang, G. (2001). Earth BioGenome Project: Sequencing life for the future of life. Royal Botanic Gardens, 115(17), 4325–4333. https://doi.org/10.1073/pnas.1720115115/-/DCSupplemental Li, H. (2023). Protein-to-genome alignment with miniprot. Bioinformatics, 39(1). https://doi.org/10.1093/bioinformatics/btad014 Li, H., & Durbin, R. (2024). Genome assembly in the telomere-to-telomere era. In Nature Reviews Genetics (Vol. 25, Issue 9, pp. 658–670). Nature Research. https://doi.org/10.1038/s41576-024-00718-w Li, M., Sun, C., Xu, N., Bian, P ., Tian, X., Wang, X., Wang, Y ., Jia, X., Heller, R., Wang, M., Wang, F ., Dai, X., Luo, R., Guo, Y ., Wang, X., Yang, P ., Hu, D., Liu, Z., Fu, W., … Yang, N. (2022). De Novo Assembly of 20 Chicken Genomes Reveals the Undetectable Phenomenon for Thousands of Core Genes on Microchromosomes and Subtelomeric Regions. Molecular Biology and Evolution, 39(4). https://doi.org/10.1093/molbev/msac066 Liu, J., Wang, Z., Li, J., Xu, L., Liu, J., Feng, S., Guo, C., Chen, S., Ren, Z., Rao, J., Wei, K., Chen, Y ., Jarvis, E. D., Zhang, G., & Zhou, Q. (2021). A new emu genome illuminates the evolution of genome configuration and nuclear architecture of avian chromosomes. Genome Research, 31(3), 497–511. https://doi.org/10.1101/GR.271569.120 Lopez Colom, R., & O’Brien, M. (2024). The genome sequence of the pink-footed goose, Anser brachyrhynchus Baillon, 1834. Wellcome Open Research, 9, 613. https://doi.org/10.12688/wellcomeopenres.23194.1 Luo, H., Jiang, X., Li, B., Wu, J., Shen, J., Xu, Z., Zhou, X., Hou, M., Huang, Z., Ou, X., & Xu, L. (2023). A high-quality genome assembly highlights the evolutionary history of the great bustard (Otis tarda, Otidiformes). Communications Biology, 6(1). https://doi.org/10.1038/s42003-023-05137-x Makino, S. (1951). An atlas of the chromosome numbers in animals. (Issue 2nd ed.(1st American ed.)). Ames : The Iowa State College Press. .CC-BY-NC 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted May 14, 2025. ; https://doi.org/10.1101/2025.05.09.653066doi: bioRxiv preprint Marçais, G., Delcher, A. L., Phillippy, A. M., Coston, R., Salzberg, S. L., & Zimin, A. (2018). MUMmer4: A fast and versatile genome alignment system. PLoS Computational Biology, 14(1), 1–14. https://doi.org/10.1371/journal.pcbi.1005944 Mathers, T. C., Paulini, M., Collins, J., Absolon, D., Pelan, S., & Wood, J. (2024). Manual curation of bird microchromosomes with HiC and gene mapping. Zenodo. https://doi.org/10.5281/zenodo.13913870 McQueen, H. A., Siriaco, G., & Bird, A. P . (1998). Chicken Microchromosomes Are Hyperacetylated, Early Replicating, and Gene Rich. Genome Research, 8(6), 621– 630. https://doi.org/doi:10.1101/gr.8.6.621 Minh, B. Q., Dang, C. C., Vinh, L. S., & Lanfear, R. (2021). QMaker: Fast and Accurate

Method

to Estimate Empirical Models of Protein Evolution. Systematic Biology, 70(5), 1046–1060. https://doi.org/10.1093/sysbio/syab010 Minh, B. Q., Schmidt, H. A., Chernomor, O., Schrempf, D., Woodhams, M. D., Von Haeseler, A., Lanfear, R., & Teeling, E. (2020a). IQ-TREE 2: New models and elicient

Methods

for phylogenetic inference in the genomic era. Molecular Biology and Evolution, 37(5), 1530–1534. https://doi.org/10.1093/molbev/msaa015 Minh, B. Q., Schmidt, H. A., Chernomor, O., Schrempf, D., Woodhams, M. D., Von Haeseler, A., Lanfear, R., & Teeling, E. (2020b). IQ-TREE 2: New Models and Elicient

Methods

for Phylogenetic Inference in the Genomic Era. Molecular Biology and Evolution, 37(5), 1530–1534. https://doi.org/10.1093/molbev/msaa015 Newcomer, E. H. (1955). Accessory chromosomes in the domestic fowl. Genetics, 40(5). Newcomer, E. H. (1957). The mitotic chromosomes of the domestic fowl. Journal of Heredity, 48, 227–234. O’Brien, M. F ., & Lopez Colom, R. (2024). The genome sequence of the northern pintail, Anas acuta Linnaeus, 1758. Wellcome Open Research, 9, 446. https://doi.org/10.12688/wellcomeopenres.22770.1 O’Connor, R. E., Kiazim, L., Skinner, B., Fonseka, G., Joseph, S., Jennings, R., Larkin, D. M., & Grilin, D. K. (2019). Patterns of microchromosome organization remain highly conserved throughout avian evolution. Chromosoma, 128(1), 21–29. https://doi.org/10.1007/s00412-018-0685-6 Peona, V ., Blom, M. P . K., Xu, L., Burri, R., Sullivan, S., Bunikis, I., Liachko, I., Haryoko, T., Jønsson, K. A., Zhou, Q., Irestedt, M., & Suh, A. (2021). Identifying the causes and consequences of assembly gaps using a multiplatform genome assembly of a bird- of-paradise. Molecular Ecology Resources, 21(1), 263–286. https://doi.org/10.1111/1755-0998.13252 Peona, V., Palacios-Gimenez, O. M., Blommaert, J., Liu, J., Haryoko, T., Jønsson, K. A., Irestedt, M., Zhou, Q., Jern, P ., & Suh, A. (2021). The avian W chromosome is a refugium for endogenous retroviruses with likely elects on female-biased mutational load and genetic incompatibilities. Philosophical Transactions of the Royal Society B: Biological Sciences, 376(1833). https://doi.org/10.1098/rstb.2020.0186 Peona, V., Palacios-Gimenez, O. M., Lutgen, D., Olsen, R. A., Kakhki, N. A., Andriopoulos, P ., Bontzorlos, V ., Schweizer, M., Suh, A., & Burri, R. (2023). An annotated chromosome-scale reference genome for Eastern black-eared wheatear (Oenanthe melanoleuca). G3: Genes, Genomes, Genetics, 13(6). https://doi.org/10.1093/g3journal/jkad088 .CC-BY-NC 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted May 14, 2025. ; https://doi.org/10.1101/2025.05.09.653066doi: bioRxiv preprint Perry, B. W., Schield, D. R., Adams, R. H., & Castoe, T. A. (2021). Microchromosomes Exhibit Distinct Features of Vertebrate Chromosome Structure and Function with Underappreciated Ramifications for Genome Evolution. Molecular Biology and Evolution, 38(3), 904–910. https://doi.org/10.1093/molbev/msaa253 Pointon, D.-L. B. (2025). sanger-tol/curationpretext. Zenodo. https://doi.org/10.5281/zenodo.14621949 Prum, R. O., Berv, J. S., Dornburg, A., Field, D. J., Townsend, J. P ., Lemmon, E. M., & Lemmon, A. R. (2015). A comprehensive phylogeny of birds (Aves) using targeted next-generation DNA sequencing. Nature, 526(7574), 569–573. https://doi.org/10.1038/nature15697 Quinlan, A. R., & Hall, I. M. (2010). BEDTools: A flexible suite of utilities for comparing genomic features. Bioinformatics, 26(6), 841–842. https://doi.org/10.1093/bioinformatics/btq033 Rhie, A., McCarthy, S. A., Fedrigo, O., Damas, J., Formenti, G., Koren, S., Uliano-Silva, M., Chow, W., Fungtammasan, A., Kim, J., Lee, C., Ko, B. J., Chaisson, M., Gedman, G. L., Cantin, L. J., Thibaud-Nissen, F ., Haggerty, L., Bista, I., Smith, M., … Jarvis, E. D. (2021). Towards complete and error-free genome assemblies of all vertebrate species. Nature, 592(7856), 737–746. https://doi.org/10.1038/s41586-021-03451-0 Román-Palacios, C., Medina, C. A., Zhan, S. H., & Barker, M. S. (2021). Animal chromosome counts reveal a similar range of chromosome numbers but with less polyploidy in animals compared to flowering plants. Journal of Evolutionary Biology, 34(8), 1333–1339. https://doi.org/10.1111/jeb.13884 Simão, F . A., Waterhouse, R. M., Ioannidis, P ., Kriventseva, E. V ., & Zdobnov, E. M. (2015). BUSCO: Assessing genome assembly and annotation completeness with single- copy orthologs. Bioinformatics, 31(19), 3210–3212. https://doi.org/10.1093/bioinformatics/btv351 Smit, A. F . A., Hubley, R., & Green, P . (2005). RepeatMasker Open-4.0. Smith, J., Bruley, C. K., Paton, I. R., Dunn, I., Jones, C. T., Windsor, D., Morrice, D. R., Law, A. S., Masabanda, J., Sazanov, A., Waddington, D., Fries, R., & Burt, D. W. (2000). Dilerences in gene density on chicken macrochromosomes and microchromosomes. Animal Genetics, 31(2), 96–103. https://doi.org/10.1046/j.1365-2052.2000.00565.x Tarailo-Graovac, M., & Chen, N. (2009). Using RepeatMasker to identify repetitive elements in genomic sequences. Current Protocols in Bioinformatics, SUPPL. 25, 1–14. https://doi.org/10.1002/0471250953.bi0410s25 Tarasov, A., Vilella, A. J., Cuppen, E., Nijman, I. J., & Prins, P . (2015). Sambamba: Fast processing of NGS alignment formats. Bioinformatics, 31(12), 2032–2034. https://doi.org/10.1093/bioinformatics/btv098 Tegelström, H., & Ryttman, H. (1981). Chromosomes in birds (Aves): evolutionary implications of macro-and microchromosome numbers and lengths. Hereditas, 94(2), 225–233. https://doi.org/10.1111/j.1601-5223.1981.tb01757.x The Darwin Tree of Life Project Consortium. (2021). Sequence locally, think globally: The Darwin Tree of Life Project. Proceedings of the National Academy of Sciences, 119(4), e2115642118. https://doi.org/10.1073/pnas.2115642118/- /DCSupplemental Uno, Y ., Nishida, C., Hata, A., Ishishita, S., & Matsuda, Y . (2019). Molecular cytogenetic characterization of repetitive sequences comprising centromeric heterochromatin .CC-BY-NC 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted May 14, 2025. ; https://doi.org/10.1101/2025.05.09.653066doi: bioRxiv preprint in three Anseriformes species. PLoS ONE, 14(3). https://doi.org/10.1371/journal.pone.0214028 van Brink, J. M. (1959). L’expression morphologique de la digamétie chez les sauropsidés et les monotrèmes. Chromosoma, 10(1), 1–72. https://doi.org/10.1007/BF00396564 Waterhouse, R. M., Seppey, M., Simao, F . A., Manni, M., Ioannidis, P ., Klioutchnikov, G., Kriventseva, E. V ., & Zdobnov, E. M. (2018). BUSCO applications from quality assessments to gene prediction and phylogenomics. Molecular Biology and Evolution, 35(3), 543–548. https://doi.org/10.1093/molbev/msx319 Waters, P . D., Patel, H. R., Ruiz-Herrera, A., ıa Alvarez-Gonz alez, L., Lister, N. C., Simakov, O., Ezaz, T., Kaur, P ., Frere, C., Gr, F ., Georges, A., & Marshall Graves, J. A. (2021). Microchromosomes are building blocks of bird, reptile, and mammal chromosomes. Proceedings of the National Academy of Sciences, 118(45), e2112494118. https://doi.org/https://doi.org/10.1073/pnas.2112494118 Wójcik, E., & Smalec, E. (2007). Description of the Anser anser Goose Karyotype. Folia Biologica (Kraków), 55, 1–2. .CC-BY-NC 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted May 14, 2025. ; https://doi.org/10.1101/2025.05.09.653066doi: bioRxiv preprint

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

Ask this paper AI returns verbatim quotes from the full text · source: oa-pdf

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2025) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc
last seen: 2026-05-20T01:45:00.602351+00:00
unpaywall
last seen: 2026-05-21T05:10:58.409756+00:00
License: CC-BY-NC-4.0