Abstract
Obtaining chromosomally complete genome assemblies across the tree of life is a major goal
of biodiversity genomics. However, some lineages remain recalcitrant to assembly despite
recent advances in sequencing technologies and assembly tools. Birds present a substanAal
genome assembly challenge due to the presence of Any, hard to assemble microchromosomes
that are oVen highly fragmented or even missing in draV genome assemblies. As such, bird
genomes require a large amount of expert manual curaAon effort via manipulaAon of
genome-wide Hi-C contact maps and many current chromosome-level bird genome
assemblies do not resolve the known karyotype. Microchromosomes have disAnct geneAc and
epigeneAc features. They are GC-biased, gene-rich, highly methylated, and have disAnct
spaAal organisaAon in the centre of the nucleus. Importantly, they are conserved across avian
evoluAon. Here, using a reference set of expert curated bird genomes, we have idenAfied a
set of conserved microchromosome genes and developed MicroFinder, a pipeline that uses
this gene set to find small microchromosome fragments in draV genome assemblies to act as
anchors for manual curaAon of microchromosomes. We demonstrate how MicroFinder can
be used to improve the speed and accuracy of bird genome curaAon. Furthermore, we
highlight the usefulness of MicroFinder by carrying out MicroFinder-enabled re-curaAon of 12
previously released chromosome-scale bird genome assemblies, increasing the sequence
content of microchromosome models.
Introduc:on
Recent advances in sequencing technology have dramaAcally improved the quanAty, quality
and taxonomic breadth of reference genome assemblies across the tree of life (Feron &
Waterhouse, 2022; Lewin et al., 2001; Rhie et al., 2021; The Darwin Tree of Life Project
Consortium, 2021). Automated assembly of accurate long reads followed by scaffolding with
high throughput in vivo chromaAn conformaAon capture sequence data (Hi-C) and manual
.CC-BY-NC 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
The copyright holder for this preprintthis version posted May 14, 2025. ; https://doi.org/10.1101/2025.05.09.653066doi: bioRxiv preprint
curaAon (Howe et al., 2021) rouAnely results in genome assemblies that meet or exceed
accepted gold standard metrics (Lawniczak et al., 2022) . However, some lineages are
recalcitrant to assembly and challenges remain to generate complete , chromosomally
resolved genome assemblies for all taxa (H. Li & Durbin, 2024).
Within vertebrates, birds present a substanAal assembly challenge due to the presence of Any,
hard to assemble, microchromosomes. Since early cytogeneAc studies, it has been recognised
that bird genomes typically contain six to eight pairs of large macrochromosomes and 31 to
33 pairs of small microchromosomes (Tegelström & Ryttman, 1981). In chicken,
macrochromosome size based on a near-T2T assembly ranges from 250 Mb to 30 Mb, and
microchromosomes range from 23 Mb to 2.5 Mb (Huang et al., 2023) . Ten of the smallest
microchromosomes (ranging in size from 6.8 to 2.5 Mb) are fu rther categorised as “dot”
chromosomes based on their minute size, morphology and extensive pericentromeric
heterochromaAn. Once considered unimportant DNA fragments (Newcomer, 1955, 1957) ,
cytogeneAcs and genomics have revealed that microchromosomes are highly conserved
across avian evoluAon and contain many important and highly expressed housekeeping genes
(Liu et al., 2021; van Brink, 1959; Waters et al., 2021). Furthermore, microchromosomes have
disAnct geneAc and epigeneAc features sehng them apart from macrochromosomes: they
are GC -biased, gene-rich, highly methylated, and have disAnct spaAal organisaAon in the
centre of the nucleus (Habermann et al., 2001; McQueen et al., 1998; O’Connor et al., 2019;
Perry et al., 2021; Smith et al., 2000).
Most recent bird genome assembly projects follow the Vertebrate Genome Project (VGP)
assembly pipeline which uses accurate PacBio HiFi long reads for de novo assembly combined
with Hi-C data for long range scaffolding and phasing (Larivière et al., 2024) . This pipeline
produces assemblies with excellent conAguity and completeness staAsAcs. However, these
metrics do not fully capture the challenge of assembling the smallest bird chromosomes as
they represent a small fracAon of the total sequence content. Strikingly, despite high-quality
sequence data , bird genome assemblies oVen do not fully resolve the known karyotype
(Figure 1a; Supplementary Table 1). Of 105 species with chromosome-scale genome
assemblies in InternaAonal NucleoAde Sequence Database CollaboraAon (INSDC) databases
that also have karyotype data, 62 (59 %) differ from the expected karyotype by 2 or more
chromosomes, with the majority (57/62) having fewer chromosomes than expected.
Primarily, this is due to failure to assemble and idenAfy the full set of microchromosomes
(Barros et al., 2023; Peona, Blom, et al., 2021) and even in karyotype-resolved assemblies,
microchromosomes are oVen highly fragmented and can be incomplete (M. Li et al., 2022) .
Painstaking m anual curaAon of bird genome s aVer de novo assembly and scaffolding is
therefore an essenAal assembly step. For example, the Hi-C contact map for the draV genome
assembly of the pink-footed goose
Anser brachyrhynchus (assembled by the Darwin Tree of Life (DToL) project [Lopez Colom &
O’Brien, 2024] ) reveals 28 clear chromosomal elements ( Figure 1b ), yet closely related
karyotyped geese all have 40 or 41 chromosomes (Uno et al., 2019; Wójcik & Smalec, 2007).
.CC-BY-NC 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
The copyright holder for this preprintthis version posted May 14, 2025. ; https://doi.org/10.1101/2025.05.09.653066doi: bioRxiv preprint
Therefore, at least 12 chromosomes are expected to be among the unplaced “shrapnel”
content located at the bopom right of the Hi-C contact map which predominantly contains
repeAAve sequence ( Figure 1C ). To resolve the assembl y, genome curators siV through
shrapnel scaffolds to idenAfy and assemble microchromosome fragments ( Figure 1d ).
Techniques include making use of the elevated Hi-C background signal between
microchromosomes (due to their central posiAon in the nucleus), genome alignments with
Reference
species and mapping of protein coding genes. This process is slow and laborious
and there is a high likelihood of sequence content being miss ed from the assembled
chromosomes.
Figure 1: Bird genome assemblies are o/en not karyotype-complete and require extensive manual cura:on. (A)
Correspondence analysis of chromosome counts in chromosome -scale genome assemblies versus their
respec:ve haploid karyotype for 105 bird species. Colour gradient reflects the number of species in each category
(bin of assembly (x) versus karyotype (y) count). The Solid black line marks the match of the chromosome number
in assemblies (y -axis) and predicted chromosome number using cytology (x -axis). The dashed diagonal lines
indicate ± 1 chromosome margin of error to account for expected varia:on from assemblies of males
.CC-BY-NC 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
The copyright holder for this preprintthis version posted May 14, 2025. ; https://doi.org/10.1101/2025.05.09.653066doi: bioRxiv preprint
(homogame:c sex will usually have 1 less assembled chromosome). (B) Hi-C contact map for the dra/ genome
assembly of Anser brachyrhynchus (assembled by the Darwin Tree of Life (DToL) project [Lopez Colom &
O’Brien, 2024]). Coloured squares highlight 28 clear chromosomal elements iden:fied during an ini:al assembly
cura:on (painted “Scaffolds” in PretextView). (C) A zoomed in view of the unplaced assembly content grouped
at the boWom -right of image ( B). (D) Hi-C contact map of the curated A. brachyrhynchus genome assembly
zoomed in on the smallest 11 chromosomes. Content to the right of the red arrow is unplaced content .
Microchromosomes have elevated background Hi-C signal but appear as independent elements in the Hi-C map.
Here, to aid manual curaAon of bird genomes, we took advantage of conserved gene content
to idenAfy microchromosome fragments in draV genome assemblies. Using 11 high-quality,
manually curated bird genomes generated as part of the VGP , 25 Genomes Project and DToL
(The Darwin Tree of Life Project ConsorAum, 2021), as well as a near telomere -to-telomere
(T2T) assembly of chicken (Huang et al., 2023) , we idenAfied a set of conserved
microchromosome genes and have developed MicroFinder (hpps://github.com/sanger-
tol/MicroFinder), a pipeline that uses this gene set to find candidate microchromosome
conAgs from draV assemblies to improve the speed and accuracy of manual curaAon. Using
this approach, we revisited 12 previously released bird genome assemblies and improved the
content and representaAon of their assembled microchromosomes.
Results
and Discussion
Iden6fica6on of conserved microchromosome genes
Given the gene-dense nature of microchromosomes and their conserved synteny across birds,
we hypothesised that a dense marker set of protein coding genes would enable the
idenAficaAon of microchromosome fragments in draV genome assemblies. To generate a set
of marker genes, we made use of expert-curated genome assemblies generated for the VGP,
DToL and 25 Genomes projects. We selected 11 published genome assemblies with NCBI
RefSeq or Ensemble rapid release gene-sets (Supplementary Table 2). We also included a
recent, near-T2T assembly of chicken (Huang et al., 2023). Together, these 12 assemblies span
nine bird orders and 11 families (Supplementary Table 2, Figure 2A). Of note, this collecAon
includes three high confidence genome assemblies (bCucCan1, bTaeGut1 and GGswu, herein
referred to as the ToL reference set ) that are commonly used by genome curators at the
Welcome Sanger Tree of Life (ToL) program as references for whole genome alignments when
curaAng new bird assemblies. AddiAonally, six of the selected assemblies have been confirmed
to be karyotype -complete based on cytology (Supplementary Table 2, Figure 2 A). Of the
remaining assemblies, two species do not have published karyotypes and four likely have
missing chromosomes based on expectaAons from cytology , further highlighAng the
challenges of generaAng karyotype-complete genome assemblies for birds even when high-
quality data is available and substanAal manual curaAon Ame has been invested.
.CC-BY-NC 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
The copyright holder for this preprintthis version posted May 14, 2025. ; https://doi.org/10.1101/2025.05.09.653066doi: bioRxiv preprint
Figure 2: Phylogeny of annotated chromosome-scale bird reference genomes used to generate the MicroFinder
protein set and conserved macro synteny of bird dot chromosomes. (A) Maximum likelihood phylogeny based
on a concatenated alignment of 9,400 conserved single-copy orthogroups. Branch lengths are in amino acid
subs:tu:ons per site. All nodes have ≥ 99% bootstrap support (1000 ultrafast bootstrap replicates). Species with
genome assemblies confirmed to be karyotype-complete based on cytology are highlighted in green. Full details
of all assemblies are given in Supplementary Table 2. PhyloPic (hWps://www.phylopic.org) silhoueWes of each
species are shown at the tree :p s. Species marked with an “*” form the ToL reference set and are rou:nely used
as references when assembling diverse bird genomes. (B - F) Dot chromosome synteny between genomes in the
ToL reference set based on whole genome alignments. F summarises dot chromosome homology between
GGswu, bTaeGut1 and bCucCan1 based on the alignments shown in B - E.
To idenAfy conserved, low copy number genes to use as markers we clustered proteomes from
the 12 bird reference genome s into orthogroups with OrthoFinder (Emms & Kelly, 2015,
2019) and used Kin Fin (Laetsch & Blaxter, 2017) to select broadly conserved “fuzzy”
orthogroups that have relaxed conservaAon and copy number constraints (<= 3 gene copies
.CC-BY-NC 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
The copyright holder for this preprintthis version posted May 14, 2025. ; https://doi.org/10.1101/2025.05.09.653066doi: bioRxiv preprint
per species and present in at least 50% of species). In total, 197,759 proteins were clustered
into 16,589 orthogroups, of which 9,400 were conserved and single -copy in all species and
14,514 were idenAfied by Kin Fin as “fuzzy” orthogroups (Supplementary Table 3 and
Supplementary Data). We further filtered the KinFin orthogroup set to only include genes
located on dot chromosomes in any of the three ToL reference species, using the near-T2T
GGswu chicken assembly to classify dot chromosome homologs in bCucCan1 and bTaeGut1
(Figure 2B-F). We reasoned that specifically targeAng dot chromosomes rather than all
microchromosomes would be most beneficial for assembly curaAon as larger
microchromosomes are typically much less fragmented than dot chromosomes. This filtering
idenAfied 510 dot chromosome-associated orthogroups containing 4,510 proteins across all
12 reference species. To reduce redundancy, we clustered the dot chromosome-associated
proteins with CD-HIT (Fu et al., 2012) to produce a final gene set containing 2,882 protein s
which we refer to as the MicroFinder protein set.
Next, we invesAgated coverage of MicroFinder loci across, to our knowledge, the most
complete bird genome assembled to date, the near-T2T GGswu assembly of chicken. The 10
GGswu dot chromosomes have between 15 and 67 GGswu MicroFinder loci per chromosome
(307 in total), with an average density of 7.5 loci per Mb of sequence (Figure 3). In comparison,
the orthoDB10 avian Benchmarking Universal Single-Copy Orthologs (BUSCO) gene set (n =
8,338 orthogroups) has only 3 genes located on dot chromosomes (Supplementary Figure 1),
likely due historical difficulAes with dot chromosome assembly leading to severe
underrepresentaAon of dot chromosome genes in OrthoDB. Previously, Huang et. al. (2023)
showed that chicken dot chromosomes are split into two disAnct domains - gene-rich
euchromaAc regions and repeAAve, gene -poor heterochromaAc regions, with the
euchromaAc parts typically occupying a large region of the long arm of each chromosome. In
line with this, we find clustering of MicroFinder proteins in high expression, low repeat density
regions of dot chromosomes (Figure 3). As such, the high density of MicroFinder proteins in
euchromaAn will increase the likelihood of idenAfying genic regions of dot chromosomes in
fragmented genome assemblies.
.CC-BY-NC 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
The copyright holder for this preprintthis version posted May 14, 2025. ; https://doi.org/10.1101/2025.05.09.653066doi: bioRxiv preprint
Figure 3: Distribu:on of MicroFinder proteins on chicken (GGswu assembly) dot chromosomes. Panels from top
to boWom show the loca:on of MicroFinder loci (coral), RNA -seq alignment counts from female chicken liver
(SRR18788805) (green) in 10 Kb fixed windows, and transposable element density in 10 Kb fixed windows (blue).
To aid visualisa:on of lower coverage genes, maximum RNAseq read coverage was capped at 25x.
Gene mapping and assembly ordering to aid genome cura6on
To make use of the MicroFinder protein set we developed a pipeline to map and count
MicroFinder proteins in a draV genome assembly and reorder scaffolds by MicroFinder protein
count. This strategy means that putaAve dot chromosome scaffolds appear at the beginning
of the Hi-C contact map separated from other small fragments, enabling curators to quickly
idenAfy dot chromosome content and start building up chromosome-scale scaffolds without
having to siV through repeAAve “shrapnel” conAgs as is the case for a standard, size-sorted
map. The MicroFinder pipeline aligns the MicroFinder protein set to a draV assembly with
miniprot (H. Li, 2023), selects the top ranking hit for each protein, removes alignments with
less than 70% idenAty and then counts pr otein alignments per scaffold and outputs a
reordered assembly fasta file and associated MicroFinder count data. OpAonally, the pipeline
can apply a maximum scaffold size cutoff for assembly sorAng. During tesAng we found that
macrochromosome scaffolds can someAmes contain a low number of MicroFinder hits, most
likely due to the presence of divergent paralogs or mis-mapping. We therefore recommend
using a 5 Mb maximum scaffold size cutoff for assembly sorAng. Following sorAng, new Hi-C
contact maps can be made for assembly curaAon in PretextView (hpps://github.com/sanger-
tol/PretextView) using the CuraAonPretext pipeline (Pointon, 2025). MicroFinder has been
packaged up into Docker and Sin gularity containers for easy deployment
(hpps://github.com/sanger-tol/MicroFinder) and we have developed a training workshop
with example datasets to guide users (Mathers et al., 2024).
To demonstrate how MicroFinder can be used as a curaAon aid, we applied it to the draV (pre
curaAon) DToL genome assembly of Anas acuta (O’Brien & Lopez Colom, 2024). MicroFinder
idenAfied 61 putaAve dot chromosome scaffolds shorter than 5 Mb and moved them to the
start of the Hi-C contact map (Figure 4 ). These scaffolds were manually ordered and
.CC-BY-NC 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
The copyright holder for this preprintthis version posted May 14, 2025. ; https://doi.org/10.1101/2025.05.09.653066doi: bioRxiv preprint
rearranged to form 10 chromosomal elements during curaAon. Notably, we did not observe
false posiAve MicroFinder ordered scaffolds with Hi-C signal placing them with
macrochromosomes, indicaAng that MicroFinder proteins are reliable dot chromosome
markers. This is likely due to conservaAon of microchromosome gene content and limited
rearrangements between microchromosomes and macrochromosomes during avian
evoluAon. As such, MicroFinder enables rapid curaAon of dot chromosomes using gene-rich
scaffolds as anchors to build up dot chromosomes, removing the need for curators to trawl
through repeAAve shrapnel conAgs and reducing the risk of small gene-rich dot chromosome
conAgs being missed from dot chromosome models during the curaAon process.
Figure 4: MicroFinder-enabled manual cura:on of bird dot chromosomes. Main panel shows Hi-C contact map
of the MicroFinder-ordered dra/ (pre cura:on) genome assembly of Anas acuta (O’Brien & Lopez Colom, 2024).
Central panel shows a zoomed in view of the puta:ve dot chromosome content that has been moved to the start
of the of the assembly by MicroFinder for cura:on. Right hand panel shows zoomed in view of the curated dot
chromosomes.
Reassembly of DToL bird genomes using MicroFinder-aided cura6on
Next, we invesAgated whether MicroFinder could be used to impro ve previously released
chromosome-scale bird genome assemblies. We ran MicroFinder on 12 DToL bird genome
assemblies that had been assembled using PacBio HiFi and Hi-C and subjected to manual
curaAon by the ToL curaAon team (Supplementary Table 4). For each assembly, we ran
MicroFinder with a 5 Mb maximum scaffold length cutoff and generated a new Hi-C contact
map for curaAon in PretextView using the original sequence data. MicroFinder idenAfied
between 22 and 74 (mean = 49) putaAve unplaced dot chromosome scaffolds per assembly
(Figure 5a ). We were able to unambiguously place MicroFinder scaffolds onto dot
.CC-BY-NC 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
The copyright holder for this preprintthis version posted May 14, 2025. ; https://doi.org/10.1101/2025.05.09.653066doi: bioRxiv preprint
chromosome models in 11 out of 12 of the assemblies, placing between 2 and 16 scaffolds
and increasing the total length of assembled chromosomes in 9 out of 12 assemblies, adding
between 216 Kb and 4.3 MB of addiAonal content per assembly (average = 1.4 Mb) (Figure
5b). Two assemblies (bNetRuf1.1 and bAccGen1.1), had a decrease in assembled chromosome
length due to idenAficaAon of errors in the original assembly. In total, MicroFinder enabled
the placement of an addiAonal 12.5 MB of dot chromosome content across 9 DToL genomes.
Furthermore, in the case of bAnaAcu1.1, were able to idenAfy an addiAonal dot chromosome
model that had been missed in the original curaAon (Figure 5c). Unplaceable scaffolds either
had ambiguous Hi-C signal or were too small to place, reflecAng the fragmented nature of dot
chromosome assemblies (Figure 5c).
.CC-BY-NC 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
The copyright holder for this preprintthis version posted May 14, 2025. ; https://doi.org/10.1101/2025.05.09.653066doi: bioRxiv preprint
Figure 5: MicroFinder-enabled re-cura:on of 12 previously released DToL bird genome assemblies. (A) Bar chart
showing counts of shrapnel scaffolds (previously unplaced content) iden:fied by MicroFinder for 12 genome
assemblies. Bars are coloured by whether the scaffolds were placed onto chromosome models during manual
cura:on. (B) As for ( A) but showing total sequence content added to chromosome models during manual
cura:on of the MicroFinder sorted genome assemblies. (C) Hi-C contact map of the Anas acuta genome assembly
(bAnaAcu1.1). The figure shows a zoomed in view of the smallest seven chromosomes. Scaffolds in the original
assembly are separated by grey lines. Coloured squares indicate “painted” chromosomes and are assigned super
scaffold IDs (Scaffold_(n)) by PretextView (shown above each square). Red ver:cal arrows indicate scaffolds that
have been incorporated into chromosome models following MicroFinder -enabled manual re -cura:on.
Scaffold_35 is a chromosome model that was uniden:fiable in the original cura:on. Full stats for all 12 re-curated
genome assemblies are provided in Supplementary Table 4.
.CC-BY-NC 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
The copyright holder for this preprintthis version posted May 14, 2025. ; https://doi.org/10.1101/2025.05.09.653066doi: bioRxiv preprint
Conclusion
Here, we have idenAfied a set of broadly conserved genes located on the smallest bird
microchromosomes, known as dot chromosomes, and developed a pipeline (MicroFinder) to
idenAfy and order putaAve dot chromosome scaffolds in draV genome assemblies. By using
“fuzzy” orthogroup selecAon, our gene set includes a large number of broadly conserved
single-copy (or low copy number) genes and provides good coverage across all avian dot
chromosomes (Figure 3 ). Using this strategy, MicroFinder can detect putaAve dot
chromosome scaffolds in fragmented draV genome assemblies and is an effecAve curaAon aid
for bird genome assembly , even enabling improvement to genome assemblies that have
already undergone expert curaAon (Figure 5). Previously, an integraAve method that uses a
BAC panel to idenAfy chromosome-specific regions was developed to resolve fragmented
assemblies, including idenAficaAon of microchromosomes (Damas et al., 2017) , however it
requires experAse in molecular cytogeneAcs and is Ame -consuming and impracAcal for
current large-scale sequencing projects. Instead, MicroFinder provides a quick and easy
pipeline to effecAvely pull-out putaAve dot chromosome fragments in silico. Furthermore, the
MicroFinder approach may be applicable to other systems which have conserved but hard to
assemble chromosomes, such as the dot chromosome (Muller element F) in Diptera.
Recently, near-T2T assemblies have been released for chicken, bustard and mallard (Hu et al.,
2024; Huang et al., 2023; Luo et al., 2023) . These assemblies achieved higher
microchromosome conAguity through the inclusion of Oxford Nanopore ultra long reads. This
approach represents a promising avenue to further improve bird genome assembly quality.
However, due to scale and inerAa, many projects sAll rely primarily on PacBio HiFi de novo
assembly and will greatly benefit from our approach. We recommend MicroFinder is
incorporated into bird genome assembly pipelines prior to manual curaAon to maximise the
completeness of microchromosome assemblies.
Methods
Meta-analysis of bird karyotype and genome assembly chromosome counts
Genomes on a Tree (GoaT) (Challis et al., 2023) was used to retrieve bird chromosome counts
based on cytology and from chromosome -level assemblies hosted INDC databases
(Supplementary Table 3). Our query was made on the “taxon” index of the database, and we
excluded tax a with missing data, retaining 105 species for downstream analysis. For
chromosome counts based on genome assemblies, a single summary value was used as the
representaAve chromosome count per species. For each assembly, the chromosome count
corresponds to the number of chromosomes idenAfied in the primary assembly (as opposed
to the alternate assembly for a taxon). When mulAple assemblies were available per taxon,
the summary corresponds to the primary haplotype of NCBI RefSeq assembly. Haploid
cytology-based chromosome numbers were extracted by halving the diploid number from the
.CC-BY-NC 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
The copyright holder for this preprintthis version posted May 14, 2025. ; https://doi.org/10.1101/2025.05.09.653066doi: bioRxiv preprint
Bird Chromosome Database (Degrandi et al., 2020) and Animal Chromosome Counts Database
(Release 1.0.1) (Román-Palacios et al., 2021) during GoaT import. A single summary value per
species was calculated as the mode across all reported values per species. The ranges of values
within each dataset were manually checked to ensure the summary values for chromosome
number and haploid numbers from cytology were biologically consistent. We found that most
of the variaAon detected within cytological observaAons corresponded to +-1 chromosome
from the summary mode, consistent with reporAng of different total number of chromosomes
in different sexes and/or small miscounAng from older manuscripts (e.g. Makino, 1951). The
outliers were also manually checked on the original source, and all 7 detected cases
corresponded to problemaAc values in their respecAve databases; because these values were
not used as summaries, they were not included in our meta-analysis, and did not create bias
in the data on Figure 1a. An interacAve version of the scaperplot is available on the GoaT
website for raw data exploraAon and download (hpps://Anyurl.com/4jnc3pbb).
Dot chromosome homology assignment between GGswu, bTaeGut1 and bCucCan1
Pairwise whole genome alignments were carried out between chicken (GGswu), zebra finch
(bTaeGut1) and cuckoo (bCucCan1) (Supplementary Table 2) using nucmer v4.0.0rc1 (Marçais
et al., 2018) and visualised with Dot (hpps://dot.sandbox.bio/). Using these alignments, we
idenAfied homologs to GGswu dot chromosomes previously classified by Huang et al. (2023).
Orthogroup clustering and iden6fica6on of the MicroFinder protein set
To idenAfy a set of conserved protein coding genes to use as dot chromosome markers we
built orthogroups across representaAve bird genome assemblies. We selected 11 published
chromosome-scale bird genome assemblies that had NCBI RefSeq or Ensembl rapid release
gene-sets and combined them with a recent, near -T2T assembly of chicken (Huang et al.,
2023) (Supplementary Table 2). For each species, we selected the longest transcript per gene
to be the representaAve transcript and clustered protein sequences with OrthoFinder v2.5.4
(Emms & Kelly, 2015, 2019) in mulAple sequence alignment mode (“-M msa”). The resulAng
orthogroups were filtered with KinFin v1.1.1 (Laetsch & Blaxter, 2017) with the parameters
“--max 3 -x 0.5” to idenAfy orthogroups present in at least 50 percent of species with a
maximum of three gene copies per species. To create the MicroFinder protein set, the KinFin
orthogroups were filtered to retain only those with a gene copy on chicken (GGswu), zebra
finch (bTaeGut1) or cuckoo (bCucCan1) dot chromosomes . Pr oteins from the filtered
orthogroups were then clustered with CD-HIT v4.8.1 (Fu et al., 2012) using default sehngs to
reduce redundancy.
Phylogene6c analysis
To place the 12 bird reference genomes used to generate the MicroFinder protein set in
evoluAonary context we carried out phylogeneAc analysis using protein sequence alignments
generated by OrthoFinder for 9,400 strictly conserved single-copy orthogroups. IQTree v2.3.4
.CC-BY-NC 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
The copyright holder for this preprintthis version posted May 14, 2025. ; https://doi.org/10.1101/2025.05.09.653066doi: bioRxiv preprint
was used to idenAfy the opAmal parAAoning scheme, carry out model selecAon, esAmate the
maximum likelihood phylogeny and carry out 1,000 ultrafast bootstrap replicates to assess
tree support (Chernomor et al., 2016; Kalyaanamoorthy et al., 2017; Minh et al., 2020a,
2020b, 2021) . The IQTree phylogeny was rooted on the branch leading to Galloanserae
(Galliformes plus Anseriformes) following Prum et al. (2015).
The MicroFinder pipeline
All steps of the MicroFinder pipeline are implemented in a bash script and the whole pipeline
is available as a docker or singularity container ( hpps://github.com/sanger-tol/MicroFinder).
First, the MicroFinder protein set is aligned to the draV genome assembly with miniprot v0.14
(H. Li, 2023) with default sehngs. From the resulAng alignments, we retain the top hit and
discard alignments with less than 70% idenAty. MicroFinder protein hits are counted for each
scaffold and the input assembly fasta file is sorted by the alignment count. OpAonally, a
maximum scaffold length cutoff can be applied to the assembly sorAng step. MicroFinder
outputs a fasta file of the draV assembly sorted by MicroFinder protein alignment counts, a
table of alignment counts per input scaffold and a GFF file of the miniprot alignments. It should
be noted that MicroFinder counts reflect the number of protein hits from the MicroFinder
protein set rather than counts of individual loci. We opted to map all proteins to maximise
sensiAvity to detect candidate dot chromosome scaffolds across a wide range of bird species.
The MicroFinder-sorted assembly file should be prepared for manual curaAon in PretextView
(hpps://github.com/sanger-tol/PretextView) with the CuraAonPretext pipeline (Pointon,
2025) with the “--no-sort” parameter used to retain the order of the MicroFinder assembly
file in the Hi-C contact map.
MicroFinder protein distribu6on in chicken (GGswu) and associated features
We invesAgated the distribuAon of MicroFinder proteins across the near-T2T GGswu chicken
assembly (Huang et al., 2023) . Mic roFinder protein coordinates were extracted from the
GGswu annotaAon GFF file. To place MicroFinder proteins in context we also esAmated
genome-wide repeat content and gene expression levels. RNA-seq from a from female chicken
liver (SRR18788805) was aligned to the GGswu assembly with HISAT2 v2.2.1 (Kim et al., 2015)
and we calculated read depth in 10 Kb fixed windows using Sambamba v0.8.2 (Tarasov et al.,
2015). To esAmate repeat density across the GGswu dot chromosomes, we ran RepeatMasker
v4.1.8 (Smit et al., 2005; Tarailo-Graovac & Chen, 2009) using a manually curated avian
repeat library (Peona et al., 2023; Peona, Palacios -Gimenez, et al., 2021; Pointon, 2025) and
calculated repeat density in 10 Kb fixed windows with bedtools coverage v2.31.1 (Quinlan &
Hall, 2010) using the RepeatMasker GFF file as input. To compare the distribuAon of
MIcroFinder proteins to BUSCO genes we ran BUSCO v5.8.2 (Simão et al., 2015; Waterhouse
et al., 2018) with the Aves OrthoDB gene set (n = 8338) on the GGswu assembly and extracted
the coordinates of BUSCOs located on the dot chromosomes.
.CC-BY-NC 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
The copyright holder for this preprintthis version posted May 14, 2025. ; https://doi.org/10.1101/2025.05.09.653066doi: bioRxiv preprint
Reassembly of DToL bird genomes with MicroFinder-enabled cura6on
We selected 12 previously published DToL bird genome assemblies for re -curaAon with
MicroFinder (Supplementary Table 4). For each assembly, we ran MicroFinder with a 5 Mb
maximum scaffold length cutoff and generated a new Hi-C contact map for curaAon in
PretextView using the CuraAonPretext pipeline v1.0.1 (Pointon, 2025) with the “--no-sort”
parameter. CuraAonPretext was provided with the original Hi-C and PacBio long reads for each
assembly to create a Hi-C contact map with read coverage, gap, telomere and simple repeat
density tracks. Manual curaAon was carried out using PretextView v1.0.0
(hpps://github.com/sanger-tol/PretextView). Following manual curaAon of each assembly, an
AGP file was exported from PretextView and an updated assembly generated using pretext-
to-asm (hpps://github.com/sanger-tol/agp-tpf-uAls).
Data and Code availability
Supplementary data containing OrthoFinder results, the MicroFinder gene set and the 12 re-
curated bird genome assemblies is available from Zenodo
(hpps://doi.org/10.5281/zenodo.15364993). For each of the re-curated genome assemblies,
we have provided a MicroFinder -ordered Hi-C contact map of the original assembly,
PretextView savestate and agp files to show changes made to the original assembly and an
updated FASTA file of the assembly. The MicroFinder source code and containerised versions
of the pipeline are available on GitHub (hpps://github.com/sanger-tol/MicroFinder). Mathers
et. al. (2024) provides a pracAcal guide for using MicroFinder-ordered assemblies for curaAon
with example datasets (https://doi.org/10.5281/zenodo.13913870).
Acknowledgments
We thank Prof. Alex Suh and Dr ValenAna Peona for providing access to their curated avian
repeat library. We thank Dr KersAn Howe and Kr Kamil Joran for comments on an earlier
version of the manuscript. This work was supported by Wellcome through core funding to the
Wellcome Sanger InsAtute (220540) and the Darwin Tree of Life DiscreAonary Award
(218328).
References
Barros, C. P ., Derks, M. F . L., Mohr, J., Wood, B. J., Crooijmans, R. P . M. A., Megens, H. J.,
Bink, M. C. A. M., & Groenen, M. A. M. (2023). A new haplotype-resolved turkey
genome to enable turkey genetics and genomics research. GigaScience, 12.
https://doi.org/10.1093/gigascience/giad051
Challis, R., Kumar, S., Sotero-Caio, C., Brown, M., & Blaxter, M. (2023). Genomes on a
Tree (GoaT): A versatile, scalable search engine for genomic and sequencing
project metadata across the eukaryotic tree of life. Wellcome Open Research, 8.
https://doi.org/10.12688/wellcomeopenres.18658.1
.CC-BY-NC 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
The copyright holder for this preprintthis version posted May 14, 2025. ; https://doi.org/10.1101/2025.05.09.653066doi: bioRxiv preprint
Chernomor, O., Von Haeseler, A., & Minh, B. Q. (2016). Terrace Aware Data Structure for
Phylogenomic Inference from Supermatrices. Systematic Biology, 65(6), 997–1008.
https://doi.org/10.1093/sysbio/syw037
Damas, J., O’Connor, R., Farré, M., Lenis, V . P . E., Martell, H. J., Mandawala, A., Fowler,
K., Joseph, S., Swain, M. T., Grilin, D. K., & Larkin, D. M. (2017). Upgrading short-
read animal genome assemblies to chromosome level using comparative
genomics and a universal probe set. Genome Research, 27(5), 875–884.
https://doi.org/10.1101/gr.213660.116
Degrandi, T. M., Barcellos, S. A., Costa, A. L., Garnero, A. D. V ., Hass, I., & Gunski, R. J.
(2020). Introducing the Bird Chromosome Database: An Overview of Cytogenetic
Studies in Birds. Cytogenetic and Genome Research, 160(4), 199–205.
https://doi.org/10.1159/000507768
Emms, D. M., & Kelly, S. (2015). OrthoFinder: solving fundamental biases in whole
genome comparisons dramatically improves orthogroup inference accuracy.
Genome Biology, 16(1), 157. https://doi.org/10.1186/s13059-015-0721-2
Emms, D. M., & Kelly, S. (2019). OrthoFinder: Phylogenetic orthology inference for
comparative genomics. Genome Biology, 20(1), 1–14.
https://doi.org/10.1186/s13059-019-1832-y
Feron, R., & Waterhouse, R. M. (2022). Assessing species coverage and assembly
quality of rapidly accumulating sequenced genomes. GigaScience, 11.
https://doi.org/10.1093/gigascience/giac006
Fu, L., Niu, B., Zhu, Z., Wu, S., & Li, W. (2012). CD-HIT: Accelerated for clustering the
next-generation sequencing data. Bioinformatics, 28(23), 3150–3152.
https://doi.org/10.1093/bioinformatics/bts565
Habermann, F . A., Cremer, M., Walter, J., Kreth, G., Von Hase, J., Bauer, K., Wienberg, J.,
Cremer, C., Cremer, T., & Solovei, I. (2001). Arrangements of macro-and
microchromosomes in chicken cells. Chromosome Research, 9, 569–584.
Howe, K., Chow, W., Collins, J., Pelan, S., Pointon, D. L., Sims, Y ., Torrance, J., Tracey, A.,
& Wood, J. (2021). Significantly improving the quality of genome assemblies
through curation. GigaScience, 10(1), 1–9.
https://doi.org/10.1093/gigascience/giaa153
Hu, J., Song, L., Ning, M., Niu, X., Han, M., Gao, C., Feng, X., Cai, H., Li, T., Li, F ., Li, H.,
Gong, D., Song, W., Liu, L., Pu, J., Liu, J., Smith, J., Sun, H., & Huang, Y . (2024). A
new chromosome-scale duck genome shows a major histocompatibility complex
with several expanded multigene families. BMC Biology, 22(1).
https://doi.org/10.1186/s12915-024-01817-0
Huang, Z., Xu, Z., Bai, H., Huang, Y ., Kang, N., Ding, X., Liu, J., Luo, H., Yang, C., Chen,
W., Guo, Q., Xue, L., Zhang, X., Xu, L., Chen, M., Fu, H., Chen, Y ., Yue, Z., Liu, T. F . S.,
… Xu, L. (2023). Evolutionary analysis of a complete chicken genome. Proceedings
of the National Academy of Sciences of the United States of America, 120(8).
https://doi.org/10.1073/pnas.2216641120
Kalyaanamoorthy, S., Minh, B. Q., Wong, T. K. F ., Von Haeseler, A., & Jermiin, L. S. (2017).
ModelFinder: Fast model selection for accurate phylogenetic estimates. Nature
Methods, 14(6), 587–589. https://doi.org/10.1038/nmeth.4285
Kim, D., Langmead, B., & Salzberg, S. L. (2015). HISAT: A fast spliced aligner with low
memory requirements. Nature Methods, 12(4), 357–360.
https://doi.org/10.1038/nmeth.3317
.CC-BY-NC 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
The copyright holder for this preprintthis version posted May 14, 2025. ; https://doi.org/10.1101/2025.05.09.653066doi: bioRxiv preprint
Laetsch, D. R., & Blaxter, M. L. (2017). KinFin: Software for taxon-aware analysis of
clustered protein sequences. G3: Genes, Genomes, Genetics, 7(10), 3349–3357.
https://doi.org/10.1534/g3.117.300233
Larivière, D., Abueg, L., Brajuka, N., Gallardo-Alba, C., Grüning, B., Ko, B. J., Ostrovsky,
A., Palmada-Flores, M., Pickett, B. D., Rabbani, K., Antunes, A., Balacco, J. R.,
Chaisson, M. J. P ., Cheng, H., Collins, J., Couture, M., Denisova, A., Fedrigo, O.,
Gallo, G. R., … Formenti, G. (2024). Scalable, accessible and reproducible
Reference
genome assembly and evaluation in Galaxy. In Nature Biotechnology
(Vol. 42, Issue 3, pp. 367–370). Nature Research. https://doi.org/10.1038/s41587-
023-02100-3
Lawniczak, M. K. N., Durbin, R., Flicek, P ., Lindblad-Toh, K., Wei, X., Archibald, J. M.,
Baker, W. J., Belov, K., Blaxter, M. L., Marques Bonet, T., Childers, A. K., Coddington,
J. A., Crandall, K. A., Crawford, A. J., Davey, R. P ., Palma, F . Di, Fang, Q., Haerty, W.,
Hall, N., … Richards, S. (2022). Standards recommendations for the Earth
BioGenome Project. PNAS, 119(4), e2115639118.
https://doi.org/https://doi.org/10.1073/pnas.2115639118
Lewin, H. A., Robinson, G. E., Kress, W. J., Baker, W. J., Coddington, J., Crandall, K. A.,
Durbin, R., Edwards, S. V , Forest, E., Thomas, M., Gilbert, P ., Goldstein, M. M.,
Grigoriev, I. V , Hackett, K. J., Haussler, D., Jarvis, E. D., Johnson, W. E., Patrinos, A.,
Richards, S., … Zhang, G. (2001). Earth BioGenome Project: Sequencing life for the
future of life. Royal Botanic Gardens, 115(17), 4325–4333.
https://doi.org/10.1073/pnas.1720115115/-/DCSupplemental
Li, H. (2023). Protein-to-genome alignment with miniprot. Bioinformatics, 39(1).
https://doi.org/10.1093/bioinformatics/btad014
Li, H., & Durbin, R. (2024). Genome assembly in the telomere-to-telomere era. In Nature
Reviews Genetics (Vol. 25, Issue 9, pp. 658–670). Nature Research.
https://doi.org/10.1038/s41576-024-00718-w
Li, M., Sun, C., Xu, N., Bian, P ., Tian, X., Wang, X., Wang, Y ., Jia, X., Heller, R., Wang, M.,
Wang, F ., Dai, X., Luo, R., Guo, Y ., Wang, X., Yang, P ., Hu, D., Liu, Z., Fu, W., … Yang,
N. (2022). De Novo Assembly of 20 Chicken Genomes Reveals the Undetectable
Phenomenon for Thousands of Core Genes on Microchromosomes and
Subtelomeric Regions. Molecular Biology and Evolution, 39(4).
https://doi.org/10.1093/molbev/msac066
Liu, J., Wang, Z., Li, J., Xu, L., Liu, J., Feng, S., Guo, C., Chen, S., Ren, Z., Rao, J., Wei, K.,
Chen, Y ., Jarvis, E. D., Zhang, G., & Zhou, Q. (2021). A new emu genome illuminates
the evolution of genome configuration and nuclear architecture of avian
chromosomes. Genome Research, 31(3), 497–511.
https://doi.org/10.1101/GR.271569.120
Lopez Colom, R., & O’Brien, M. (2024). The genome sequence of the pink-footed goose,
Anser brachyrhynchus Baillon, 1834. Wellcome Open Research, 9, 613.
https://doi.org/10.12688/wellcomeopenres.23194.1
Luo, H., Jiang, X., Li, B., Wu, J., Shen, J., Xu, Z., Zhou, X., Hou, M., Huang, Z., Ou, X., & Xu,
L. (2023). A high-quality genome assembly highlights the evolutionary history of the
great bustard (Otis tarda, Otidiformes). Communications Biology, 6(1).
https://doi.org/10.1038/s42003-023-05137-x
Makino, S. (1951). An atlas of the chromosome numbers in animals. (Issue 2nd ed.(1st
American ed.)). Ames : The Iowa State College Press.
.CC-BY-NC 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
The copyright holder for this preprintthis version posted May 14, 2025. ; https://doi.org/10.1101/2025.05.09.653066doi: bioRxiv preprint
Marçais, G., Delcher, A. L., Phillippy, A. M., Coston, R., Salzberg, S. L., & Zimin, A.
(2018). MUMmer4: A fast and versatile genome alignment system. PLoS
Computational Biology, 14(1), 1–14. https://doi.org/10.1371/journal.pcbi.1005944
Mathers, T. C., Paulini, M., Collins, J., Absolon, D., Pelan, S., & Wood, J. (2024). Manual
curation of bird microchromosomes with HiC and gene mapping. Zenodo.
https://doi.org/10.5281/zenodo.13913870
McQueen, H. A., Siriaco, G., & Bird, A. P . (1998). Chicken Microchromosomes Are
Hyperacetylated, Early Replicating, and Gene Rich. Genome Research, 8(6), 621–
630. https://doi.org/doi:10.1101/gr.8.6.621
Minh, B. Q., Dang, C. C., Vinh, L. S., & Lanfear, R. (2021). QMaker: Fast and Accurate
Method
to Estimate Empirical Models of Protein Evolution. Systematic Biology,
70(5), 1046–1060. https://doi.org/10.1093/sysbio/syab010
Minh, B. Q., Schmidt, H. A., Chernomor, O., Schrempf, D., Woodhams, M. D., Von
Haeseler, A., Lanfear, R., & Teeling, E. (2020a). IQ-TREE 2: New models and elicient
Methods
for phylogenetic inference in the genomic era. Molecular Biology and
Evolution, 37(5), 1530–1534. https://doi.org/10.1093/molbev/msaa015
Minh, B. Q., Schmidt, H. A., Chernomor, O., Schrempf, D., Woodhams, M. D., Von
Haeseler, A., Lanfear, R., & Teeling, E. (2020b). IQ-TREE 2: New Models and Elicient
Methods
for Phylogenetic Inference in the Genomic Era. Molecular Biology and
Evolution, 37(5), 1530–1534. https://doi.org/10.1093/molbev/msaa015
Newcomer, E. H. (1955). Accessory chromosomes in the domestic fowl. Genetics,
40(5).
Newcomer, E. H. (1957). The mitotic chromosomes of the domestic fowl. Journal of
Heredity, 48, 227–234.
O’Brien, M. F ., & Lopez Colom, R. (2024). The genome sequence of the northern pintail,
Anas acuta Linnaeus, 1758. Wellcome Open Research, 9, 446.
https://doi.org/10.12688/wellcomeopenres.22770.1
O’Connor, R. E., Kiazim, L., Skinner, B., Fonseka, G., Joseph, S., Jennings, R., Larkin, D.
M., & Grilin, D. K. (2019). Patterns of microchromosome organization remain highly
conserved throughout avian evolution. Chromosoma, 128(1), 21–29.
https://doi.org/10.1007/s00412-018-0685-6
Peona, V ., Blom, M. P . K., Xu, L., Burri, R., Sullivan, S., Bunikis, I., Liachko, I., Haryoko, T.,
Jønsson, K. A., Zhou, Q., Irestedt, M., & Suh, A. (2021). Identifying the causes and
consequences of assembly gaps using a multiplatform genome assembly of a bird-
of-paradise. Molecular Ecology Resources, 21(1), 263–286.
https://doi.org/10.1111/1755-0998.13252
Peona, V., Palacios-Gimenez, O. M., Blommaert, J., Liu, J., Haryoko, T., Jønsson, K. A.,
Irestedt, M., Zhou, Q., Jern, P ., & Suh, A. (2021). The avian W chromosome is a
refugium for endogenous retroviruses with likely elects on female-biased
mutational load and genetic incompatibilities. Philosophical Transactions of the
Royal Society B: Biological Sciences, 376(1833).
https://doi.org/10.1098/rstb.2020.0186
Peona, V., Palacios-Gimenez, O. M., Lutgen, D., Olsen, R. A., Kakhki, N. A.,
Andriopoulos, P ., Bontzorlos, V ., Schweizer, M., Suh, A., & Burri, R. (2023). An
annotated chromosome-scale reference genome for Eastern black-eared wheatear
(Oenanthe melanoleuca). G3: Genes, Genomes, Genetics, 13(6).
https://doi.org/10.1093/g3journal/jkad088
.CC-BY-NC 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
The copyright holder for this preprintthis version posted May 14, 2025. ; https://doi.org/10.1101/2025.05.09.653066doi: bioRxiv preprint
Perry, B. W., Schield, D. R., Adams, R. H., & Castoe, T. A. (2021). Microchromosomes
Exhibit Distinct Features of Vertebrate Chromosome Structure and Function with
Underappreciated Ramifications for Genome Evolution. Molecular Biology and
Evolution, 38(3), 904–910. https://doi.org/10.1093/molbev/msaa253
Pointon, D.-L. B. (2025). sanger-tol/curationpretext. Zenodo.
https://doi.org/10.5281/zenodo.14621949
Prum, R. O., Berv, J. S., Dornburg, A., Field, D. J., Townsend, J. P ., Lemmon, E. M., &
Lemmon, A. R. (2015). A comprehensive phylogeny of birds (Aves) using targeted
next-generation DNA sequencing. Nature, 526(7574), 569–573.
https://doi.org/10.1038/nature15697
Quinlan, A. R., & Hall, I. M. (2010). BEDTools: A flexible suite of utilities for comparing
genomic features. Bioinformatics, 26(6), 841–842.
https://doi.org/10.1093/bioinformatics/btq033
Rhie, A., McCarthy, S. A., Fedrigo, O., Damas, J., Formenti, G., Koren, S., Uliano-Silva,
M., Chow, W., Fungtammasan, A., Kim, J., Lee, C., Ko, B. J., Chaisson, M., Gedman,
G. L., Cantin, L. J., Thibaud-Nissen, F ., Haggerty, L., Bista, I., Smith, M., … Jarvis, E.
D. (2021). Towards complete and error-free genome assemblies of all vertebrate
species. Nature, 592(7856), 737–746. https://doi.org/10.1038/s41586-021-03451-0
Román-Palacios, C., Medina, C. A., Zhan, S. H., & Barker, M. S. (2021). Animal
chromosome counts reveal a similar range of chromosome numbers but with less
polyploidy in animals compared to flowering plants. Journal of Evolutionary
Biology, 34(8), 1333–1339. https://doi.org/10.1111/jeb.13884
Simão, F . A., Waterhouse, R. M., Ioannidis, P ., Kriventseva, E. V ., & Zdobnov, E. M. (2015).
BUSCO: Assessing genome assembly and annotation completeness with single-
copy orthologs. Bioinformatics, 31(19), 3210–3212.
https://doi.org/10.1093/bioinformatics/btv351
Smit, A. F . A., Hubley, R., & Green, P . (2005). RepeatMasker Open-4.0.
Smith, J., Bruley, C. K., Paton, I. R., Dunn, I., Jones, C. T., Windsor, D., Morrice, D. R.,
Law, A. S., Masabanda, J., Sazanov, A., Waddington, D., Fries, R., & Burt, D. W.
(2000). Dilerences in gene density on chicken macrochromosomes and
microchromosomes. Animal Genetics, 31(2), 96–103.
https://doi.org/10.1046/j.1365-2052.2000.00565.x
Tarailo-Graovac, M., & Chen, N. (2009). Using RepeatMasker to identify repetitive
elements in genomic sequences. Current Protocols in Bioinformatics, SUPPL. 25,
1–14. https://doi.org/10.1002/0471250953.bi0410s25
Tarasov, A., Vilella, A. J., Cuppen, E., Nijman, I. J., & Prins, P . (2015). Sambamba: Fast
processing of NGS alignment formats. Bioinformatics, 31(12), 2032–2034.
https://doi.org/10.1093/bioinformatics/btv098
Tegelström, H., & Ryttman, H. (1981). Chromosomes in birds (Aves): evolutionary
implications of macro-and microchromosome numbers and lengths. Hereditas,
94(2), 225–233. https://doi.org/10.1111/j.1601-5223.1981.tb01757.x
The Darwin Tree of Life Project Consortium. (2021). Sequence locally, think globally: The
Darwin Tree of Life Project. Proceedings of the National Academy of Sciences,
119(4), e2115642118. https://doi.org/10.1073/pnas.2115642118/-
/DCSupplemental
Uno, Y ., Nishida, C., Hata, A., Ishishita, S., & Matsuda, Y . (2019). Molecular cytogenetic
characterization of repetitive sequences comprising centromeric heterochromatin
.CC-BY-NC 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
The copyright holder for this preprintthis version posted May 14, 2025. ; https://doi.org/10.1101/2025.05.09.653066doi: bioRxiv preprint
in three Anseriformes species. PLoS ONE, 14(3).
https://doi.org/10.1371/journal.pone.0214028
van Brink, J. M. (1959). L’expression morphologique de la digamétie chez les
sauropsidés et les monotrèmes. Chromosoma, 10(1), 1–72.
https://doi.org/10.1007/BF00396564
Waterhouse, R. M., Seppey, M., Simao, F . A., Manni, M., Ioannidis, P ., Klioutchnikov, G.,
Kriventseva, E. V ., & Zdobnov, E. M. (2018). BUSCO applications from quality
assessments to gene prediction and phylogenomics. Molecular Biology and
Evolution, 35(3), 543–548. https://doi.org/10.1093/molbev/msx319
Waters, P . D., Patel, H. R., Ruiz-Herrera, A., ıa Alvarez-Gonz alez, L., Lister, N. C.,
Simakov, O., Ezaz, T., Kaur, P ., Frere, C., Gr, F ., Georges, A., & Marshall Graves, J. A.
(2021). Microchromosomes are building blocks of bird, reptile, and mammal
chromosomes. Proceedings of the National Academy of Sciences, 118(45),
e2112494118. https://doi.org/https://doi.org/10.1073/pnas.2112494118
Wójcik, E., & Smalec, E. (2007). Description of the Anser anser Goose Karyotype. Folia
Biologica (Kraków), 55, 1–2.
.CC-BY-NC 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
The copyright holder for this preprintthis version posted May 14, 2025. ; https://doi.org/10.1101/2025.05.09.653066doi: bioRxiv preprint
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.