Introduction
63
Accurately delineating species, particularly within groups undergoing rapid radiation, remains 64
challenging (De-Kayne et al., 2025). Recent and rapid divergence often leaves a complex 65
signature of shared ancestral polymorphism and morphological convergence, which can 66
confound taxonomic assessments, especially when based on limited data (Seehausen 2004; 67
Nachman & Payseur 2012; Combrink et al., 2025). Distinguishing nascent species from deeply 68
structured populations under such circumstances requires new analytical approaches beyond 69
traditional methods. The application of genome-wide markers now provides unprecedented 70
resolution enabling robust tests of lineage independence, and offering powerful ways of 71
unearthing cryptic biodiversity (Edwards & Knowles 2014; Singhal et al. 2018; Streicher et al., 72
2024). 73
Cave-dwelling organisms, shaped by isolation and intense selection, provide valuable models 74
for studying evolutionary divergence in extreme environments (Niemiller & Soares, 2014). The 75
cyprinid genus Sinocyclocheilus, confined to the Karst systems of Southwest China, represents 76
the most species-rich cavefish radiation known. With over 82 described species so far, it shows 77
remarkable morphological disparity, including varying degrees of eye degeneration, 78
pigmentation loss, horns on their forehead and humps on their back, and other stygomorphic 79
features (Zhao & Zhang, 2009; Mao et al., 2021; Xiao et al., 2025; Chen et al., 2022). However, 80
this phenotypic variation, long used for species delineation, is poorly reflected in traditional 81
genetic markers. Many putative Sinocyclocheilus species exhibit low interspecific divergence in 82
mitochondrial DNA, a discordance suggesting that past studies have likely underestimated the 83
true species richness in this group (Mao et al. 2021, 2022). 84
The diversification of this genus is closely tied to the complex geological and climatic history of 85
Southwest China (Mao et al 2025). The uplift of the Qinghai-Xizang (Tibetan) and Yunnan-86
Guizhou Plateau, together with cycles of aridification since the late Miocene, is thought to have 87
fragmented subterranean habitats, promoting speciation and resulting in point endemism (Xiao 88
et al. 2005; Mao et al., 2022). River systems have acted as both barriers and conduits in this 89
vast karstic landscape: deep valleys often isolate basins, while underground waterways can 90
facilitate dispersal among them, generating complex biogeographic boundaries (Wen et al., 91
2022). Within this geographically intricate and hydrologically complicated background, the 92
radiation of Sinocyclocheilus has taken shape. 93
Previous taxonomic assessments on Sinocyclocheilus have relied primarily on mitochondrial 94
markers and morphology (Xiao et al., 2005; Zhao & Zhang, 2009; Wen et al., 2022). These 95
.CC-BY-ND 4.0 International licenseavailable under a
was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted July 15, 2025. ; https://doi.org/10.1101/2025.07.09.664015doi: bioRxiv preprint
approaches have recovered relatively few genetically distinct lineages, contrasting with the 96
striking morphological variation within the genus. Mitochondrial DNA, limited by its maternal and 97
single-locus inheritance, often fails to capture the full complexity of divergence, particularly in 98
systems shaped by recent radiation and incomplete lineage sorting (Hurst & Jiggins, 2005). In 99
contrast, genome-wide approaches such as RAD-seq provide far greater resolution. When 100
analysed within a multispecies coalescent framework, these data allow robust tests of species 101
boundaries that incorporate gene tree discordance and ancestral polymorphism (Yang & 102
Rannala, 2010). Yet, despite the putative high species diversity in Sinocyclocheilus, no 103
comprehensive genome-wide species delimitation of the genus has yet been undertaken. As a 104
result, its true genetically-backed species richness, including potentially extensive cryptic 105
diversity, remains poorly characterised. 106
Here, we generated an extensive dataset, including mitochondrial and nuclear Sanger 107
sequences and genome-wide RAD-seq data, building upon previous work. We aim to: (1) 108
establish a phylogenomic framework to clarify evolutionary relationships among major clades; 109
(2) assess species boundaries applying multiple molecular species delimitation approaches 110
across datasets, identifying cryptic lineages and evaluating method consistency; (3) investigate 111
population structure and gene flow to understand processes operating at the population-species 112
interface; and (4) reconstruct the historical biogeography of the genus to elucidate the drivers of 113
its remarkable radiation. 114
115
2. MATERIALS AND METHODS 116
We generated Sanger sequences and genome-wide RAD-seq data to investigate 117
species boundaries and evolutionary relationships. These datasets were used to reconstruct 118
phylogenies through multiple inference methods, followed by species delimitation analyses 119
applied separately to each data type. Using the SNPs from the RAD-seq dataset to explore 120
population-level processes, assess genetic clustering, and infer historical gene flow. In addition, 121
we mapped effective migration surfaces to evaluate connectivity across the landscape. Finally, 122
we estimated divergence times and inferred the biogeographic history using a time-calibrated 123
phylogeny to establish the temporal and geographic context for the radiation. 124
125
2.1. Taxon Sampling and Molecular Data Generation 126
Sample Collection and Preservation 127
.CC-BY-ND 4.0 International licenseavailable under a
was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted July 15, 2025. ; https://doi.org/10.1101/2025.07.09.664015doi: bioRxiv preprint
All experimental procedures involving live animals complied with the Chinese Animal 128
Welfare Law (GB/T 35892-2018). Sampling of Sinocyclocheilus species was conducted 129
between 2017 and 2021 across their known distribution in the karstic regions of Guangxi, 130
Guizhou, and Yunnan provinces, China (Supplementary Fig. S1). Live fish were transported to 131
the laboratory in oxygenated, temperature-controlled containers. Tissue samples (fin clips) were 132
primarily collected non-lethally after anaesthetising individuals with MS-222; the specimens 133
were allowed to recover before being returned to holding tanks. Tissue samples were 134
immediately preserved and stored at -80°C. Our sampling aimed to include representatives from 135
a broad range of known species and distinct morphological forms within the genus. 136
137
Sanger Sequencing Data 138
Genomic DNA was extracted from tissue samples using the DNeasy Blood and Tissue Kit 139
(Qiagen Inc.) following the manufacturer’s protocols. Fragments of three genes, mitochondrial 140
cytochrome b (cytb) and NADH dehydrogenase subunit 4 (ND4), and the nuclear recombination 141
activating gene 1 (Rag-1), were amplified via PCR. Primers used were: DonThr R / DonGlu F for 142
cytb (Sudasinghe et al., 2018), ND4F / ND4R for ND4 (Li et al., 2008), and RAG1F1 / RAG1R1 143
for Rag-1 (Sudasinghe et al., 2020). PCR reactions (25 µL) contained 3 mM MgCl2, 0.4 mM 144
dNTPs, 1X buffer, 0.06 U Taq DNA Polymerase, and 2 µM of each primer. Thermal cycling 145
conditions included an initial denaturation at 94°C (3 min), 35 cycles of 94°C (45s), 46–50°C (1 146
min for cytb/ND4) or 48–56°C (1 min for Rag-1), and 72°C (45s), followed by a final extension at 147
72°C (5 min). PCR products were visualized on 1.5% agarose gels, purified, and sequenced in 148
both directions using BigDye™ chemistry (Applied Biosystems Inc.) on an ABI automated 149
sequencer. Newly generated sequences were deposited in GenBank (accession numbers 150
provided in Supplementary Table S1). To broaden taxonomic coverage, we also incorporated 151
149 cytb, 126 ND4, and 15 Rag-1 sequences from 70 Sinocyclocheilus species and two 152
outgroup taxa (Puntius semifasciolatus and Cyprinus carpio) downloaded from GenBank 153
(Supplementary Table S1). 154
155
RAD-seq Data 156
A total of 324 individuals were sequenced using Restriction site-Associated DNA 157
sequencing (RAD-seq), including 204 newly collected samples (representing most individuals 158
also used for Sanger sequencing) and 120 samples previously processed by Mao et al. (2022), 159
.CC-BY-ND 4.0 International licenseavailable under a
was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted July 15, 2025. ; https://doi.org/10.1101/2025.07.09.664015doi: bioRxiv preprint
and 143 samples from Mao et al. (2025) (Supplementary Table S2). Two outgroup species 160
(Puntius semifasciolatus and Cyprinus carpio) were also included. High-quality genomic DNA 161
was extracted as described above. RAD-seq libraries were prepared following Baird et al. 162
(2008) and related protocols. Genomic DNA was digested with the EcoRI restriction enzyme 163
(Takara, China), adapters were ligated, and samples were pooled and sheared. Fragments 164
ranging from 200–600 bp were size-selected from agarose gels and PCR amplified. Paired-end 165
sequencing (100 bp) was performed on an Illumina HiSeq 2500 platform by Novogene 166
Bioinformatics Institute (Tianjin, China). 167
168
2.2. Sequence Processing, Alignment, and Dataset Assembly 169
Sanger Data Processing 170
Forward and reverse Sanger sequences were reconciled using MAFFT v7 171
(https://mafft.cbrc.jp/alignment/software/mailform.html). Sequences for each gene were aligned 172
using MAFFT v7.3.13 (Katoh & Standley, 2013) and MACSE v2.03 (Ranwez et al., 2018) as 173
implemented in PhyloSuite v1.2.2 (Zhang et al., 2020). Alignments were manually inspected, 174
and ambiguously aligned regions or poorly sequenced ends were trimmed using Gblocks 0.91b 175
(Talavera & Castresana, 2007) and trimAl v1.2 (Capella-Gutierrez et al., 2009). The final 176
concatenated dataset for phylogenetic analyses comprised 377 haplotypes from 70 177
Sinocyclocheilus species, totaling 3423 bp, partitioned by gene as follows: cytb: 1–1077; ND4: 178
1078–2109; Rag-1: 2110–3423. 179
180
RAD-seq Data Processing and SNP Calling: 181
Raw RAD-seq reads were processed using ipyrad v0.9.56 (Eaton & Overcast, 2020). 182
Reads were demultiplexed by individual, and low-quality reads (phred score 5 bases), 183
those containing adapter contamination, or excessive poly-N tracts were discarded. A minimum 184
sequencing depth of six reads was required for base calling. Within-individual loci were 185
clustered using a 90% similarity threshold. To minimize paralogous loci, highly repetitive RAD 186
tags (with stack depth exceeding a dynamically determined threshold) and their derivatives were 187
excluded. Additionally, loci with low coverage (≤5 reads) were also removed. Several filtered 188
SNP datasets were generated from the assembled RAD loci for different downstream analyses 189
(Supplementary Table S3): i) For concatenated phylogenomic analyses (IQ-TREE, MrBayes), 190
we used loci present in at least 90% of all 324 individuals (including outgroups); ii) For 191
.CC-BY-ND 4.0 International licenseavailable under a
was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted July 15, 2025. ; https://doi.org/10.1101/2025.07.09.664015doi: bioRxiv preprint
SVDquartets (tetrad), one random SNP per locus present in all individuals was selected to 192
minimize linkage; iii) For BPP species delimitation analyses, outgroups were excluded. Because 193
of high computational demands, two datasets were assembled: initial six clade-specific datasets 194
(Clades A–F, see results), each retaining loci shared across all individuals within each clade, 195
and a combined dataset of all 318 ingroup individuals, including loci present in ≥90% of all 196
individuals. This two-step approach streamlined the final analysis by constraining parameter 197
space within clades based on initial results. iv) For population structure (STRUCTURE, DAPC), 198
admixture (TreeMix) and Flexible Estimation of Effective Migration Surfaces (FEEMS) analyses, 199
outgroups were excluded, and one SNP per locus present in at least 90% of the 318 ingroup 200
individuals were used; v) For SNAPP divergence time estimation, loci shared among all 56 201
species (represented by selected individuals) were used, from which unlinked SNPs were 202
selected. 203
204
2.3. Phylogenetic Inference 205
Sanger Data Phylogenetic Analyses 206
Optimal substitution models and partitioning schemes for the concatenated Sanger 207
datasets (cytb+ND4+Rag-1; and cytb+ND4) were determined using ModelFinder 208
(Kalyaanamoorthy et al., 2017) in PhyloSuite, based on the Bayesian Information Criterion 209
(BIC), applying a codon-partitioned model. Maximum Likelihood (ML) phylogenies were inferred 210
using IQ-TREE v1.6.8 (Nguyen et al., 2015) with 1000 ultrafast bootstrap replicates. Bayesian 211
Inference (BI) was performed using MrBayes v3.1.2 (Ronquist et al., 2012), running two 212
independent analyses for 200 million generations, sampling every 20,000 generations, 213
discarding the first 10% as burn-in. Convergence was assessed using Tracer v1.7.1 (Rambaut 214
et al., 2018). Individual gene trees were also estimated. 215
216
RAD-seq Phylogenomic Analyses 217
Several phylogenomic approaches, including concatenated and coalescent, were 218
applied to the RAD-seq SNP datasets: i) A ML tree was inferred from the concatenated RAD-219
seq loci (Dataset i) using IQ-TREE v2.2.6 (Minh et al., 2020). Support was assessed with 1000 220
ultrafast bootstrap replicates. ii) Bayesian inference was conducted on the same concatenated 221
RAD-seq loci (Dataset i, excluding invariant sites) using MrBayes v3.1.2. Two independent runs 222
of 48 MCMC chains were performed for 20 million generations, sampling every 1000 223
.CC-BY-ND 4.0 International licenseavailable under a
was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted July 15, 2025. ; https://doi.org/10.1101/2025.07.09.664015doi: bioRxiv preprint
generations, with convergence diagnostics assessed as above. ModelFinder identified best-fit 224
nucleotide substitution models for both analyses (ML and BI). iii) To account for incomplete 225
lineage sorting (ILS) using a summary method, a species tree was inferred with SVDquartets 226
(Chifman & Kubatko, 2015) implemented in tetrad v.0.7.19 within ipyrad API, based on the 227
unlinked SNP dataset (Dataset ii). All possible quartets were evaluated, with support assessed 228
from 1000 non-parametric bootstrap replicates. iv) ASTRAL-III v5.7.3 (Zhang et al., 2018) was 229
used to infer a species tree from individual gene trees generated for each RAD locus (treated as 230
separate genes and inferred using IQ-TREE). ASTRAL accounts for ILS and provides local 231
posterior probabilities for branch support. 232
233
2.4. Species Delimitation Analyses 234
To confidently assess species boundaries and identify potential cryptic lineages, multiple 235
species delimitation methods were applied to the Sanger and RAD-seq datasets to assess 236
species boundaries and identify potential cryptic lineages. 237
238
Sanger Data-Based Species Delimitation 239
A suite of methods was employed to evaluate lineage boundaries based on traditional 240
mitochondrial and nuclear markers: i) Automatic Barcode Gap Discovery (ABGD) was applied to 241
cytb, ND4, and concatenated datasets via the ABGD web server (Puillandre et al., 2012), with 242
prior intraspecific divergence (P) from 0.001 to 0.1, 10 steps, minimum relative gap width (X) of 243
1, and the Kimura (K80) model. ii) Bayesian Poisson Tree Processes (bPTP) was performed on 244
the bPTP web server (Zhang et al., 2013) using ML trees (cytb, ND4, concatenated) as input. 245
Analyses ran for 5 million MCMC generations with thinning of 100, 0.1 burn-in, and outgroups 246
removed.iii) Multi-rate Poisson Tree Processes (mPTP) was applied to the same ML input trees 247
on the mPTP web server (Kapli et al., 2017), with 50 million MCMC generations, sampling at 248
every 1 million, and a burn-in of 1 million.iv) Bayesian General Mixed Yule-Coalescent (bGMYC) 249
was implemented in R (Reid & Carstens, 2012) using 1000 post-burn-in ultrametric trees from 250
the BEAST analysis of the Sanger data (section 2.6). Delimitation was assessed across several 251
conspecificity probability thresholds (0.95–0.99). 252
253
RAD-seq Data-Based Species Delimitation (BPP) 254
.CC-BY-ND 4.0 International licenseavailable under a
was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted July 15, 2025. ; https://doi.org/10.1101/2025.07.09.664015doi: bioRxiv preprint
Species delimitation using SNP data was performed with Bayesian Phylogenetics and 255
Phylogeography (BPP) v4.3 (Yang & Rannala, 2010, 2014), which implements the multispecies 256
coalescent model to assess species boundaries considering ILS.Analyses were conducted on 257
Dataset iii. Due to computational constraints, initial runs focused on delimiting species within the 258
six major clades identified through phylogenomic analyses, followed by a comprehensive 259
analysis of all 318 Sinocyclocheilus individuals based on those initial results. A guide tree 260
derived from the SVDquartets analysis was used. Priors for population size (θ) and root 261
divergence time (τ0) were set based on Minimalist BPP (https://brannala.github.io/bpps/#/), 262
using IG (3, 0.0025) for θ and IG (3, 0.20) for τ0, following Yang (2015). Both rjMCMC algorithm 263
0 (ϵ=2) and algorithm 1 (α=2, m=1) were tested across four independent runs of 20,000 MCMC 264
steps each (sampling every 2 steps, with 2000 steps burn-in). Populations were designated as 265
cryptic species if delimited through the RAD-seq-based BPP analysis and showed subtle 266
morphological differences from their confirmed sister species (to be further examined in formal 267
species descriptions). 268
269
2.5. Population Genetic Structure and Historical Gene Flow Analyses (using RAD-seq 270
data) 271
To investigate intraspecific genetic structure, interspecific admixture, and historical gene 272
flow, we used RAD-seq SNP data (Dataset iv) in three complementary approaches. First, we 273
used STRUCTURE v2.3.4 (Pritchard et al., 2000) with an admixture model with correlated allele 274
frequencies, testing K values from 2 to 57 (10 replicates per K), with 20,000 generations 275
following a 10,000 generation burn-in. Results were summarised with STRUCTURE 276
HARVESTER (Earl, 2012), CLUMPP (Jakobsson & Rosenberg, 2007) and distruct v1.1 277
(Rosenberg, 2004). Second, a Discriminant Analysis of Principal Components (DAPC) was 278
performed using the R package adegenet v2.1.5 (Jombart, 2008), with major clades identified 279
from phylogenomic analyses used as priori groupings. Third, we used TreeMix v1.1.3 (Pickrell & 280
Pritchard, 2012) to infer population splits and patterns of historical gene flow based on allele 281
frequency data, as implemented in ipyrad API. We tested the number of migration edges (m) 282
from 1 to 25, and selected the best-fitting model based on likelihood scores. 283
284
2.6. Divergence Time Estimation 285
.CC-BY-ND 4.0 International licenseavailable under a
was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted July 15, 2025. ; https://doi.org/10.1101/2025.07.09.664015doi: bioRxiv preprint
Divergence times based on Sanger data (cytb+ND4+Rag-1; and cytb+ND4) were estimated 286
using BEAST v2.7.7 (Suchard et al., 2018). A partitioned codon model was used with 287
substitution models selected by ModelFinder. Secondary calibrations from Li et al. (2021) 288
included divergence between: S. maitianheensis and S. anophthalmus (mean = 2.7 Mya, SD = 289
1.05, 95% HPD = 1.4 to 4.8 Mya), between S. maitianheensis + S. anophthalmus and S. 290
grahami (mean = 4.3 Mya, SD = 1.30, 95% HPD = 2.5 to 6.9 Mya), among S. maitianheensis + 291
S. anophthalmus + S. grahami and S. anshuiensis (mean = 7.8 Mya, SD = 1.95, 95% HPD = 5.0 292
to 11.7 Mya), and among S. maitianheensis + S. anophthalmus + S. grahami + S. anshuiensis 293
and S. rhinocerous (mean = 8.9 Mya, SD = 2.1, 95% HPD = 5.8 to 13.1 Mya). A Yule speciation 294
process with constant population size and an uncorrelated lognormal molecular clock was 295
assumed. Analyses were run for 200 million generations (sampling every 20,000), with 296
convergence checked in Tracer v1.7.1. 297
To estimate divergence times from genomic data while accounting for ILS, SNAPP 298
v1.4.1 (Bryant et al., 2012) implemented in BEAST v2.6.6 was used. Analyses were conducted 299
on Dataset v (774 unlinked SNPs across 56 individuals representing major lineages). The input 300
XML file was prepared using snapp_prep.rb (Stange et al., 2018). The calibration points 301
described above were applied as priors on the corresponding node. SNAPP analyses ran for 3 302
million MCMC generations (sampling every 300 steps), with the first 10% discarded as burn-in. 303
Convergence was assessed in Tracer v1.7.1, and the MCC tree was generated using 304
TreeAnnotator v2.7.3. 305
306
2.7. Biogeography 307
Biogeographic Reconstructions 308
To investigate the geographic patterns of speciation and dispersal among 309
Sinocyclocheilus species, we conducted biogeographic reconstructions in BioGeoBEARS 310
(Matzke, 2014) implemented within the RASP (Reconstruct Ancestral State in Phylogenies) 311
framework (Yu et al., 2020). The Karst habitats occupied by Sinocyclocheilus cavefish are 312
characterised by significant isolation, which inherently limits dispersal between major 313
hydrological basins (Ma et al., 2023). Additional factors such as the size of core distribution 314
areas, regional geological activity, and the paleo-drainage history of river networks, also known 315
to influence species distributions (Lei, 2023), guided the definition of nine discrete operational 316
biogeographic areas, corresponding to major river systems currently occupied by 317
.CC-BY-ND 4.0 International licenseavailable under a
was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted July 15, 2025. ; https://doi.org/10.1101/2025.07.09.664015doi: bioRxiv preprint
Sinocyclocheilus: Guijiang/Hejiang (GH), Liujiang (LJ), Hongshuihe (HS), Nanpanjiang (NP), 318
Beipanjiang (BP), Red River (RR), Jinsha River (JS), Wujiang (WJ), and Zuojiang (ZJ). Given 319
the narrow distribution ranges and high endemism of most species, suggesting limited dispersal 320
across major basins, we constrained the ancestral and extant range sizes to a maximum of two 321
areas. 322
Three statistical models were explored: Dispersal-Extinction-Cladogenesis (DEC) (Ree 323
& Smith, 2008), Dispersal-Vicariance Analysis-like (DIVALIKE) (Ronquist, 1997), and BayArea-324
like (BAYAREALIKE) (Landis et al., 2013). Each model was tested with and without the founder-325
event speciation parameter (+J) (Matzke, 2013, 2014). The "+J" parameter explicitly accounts 326
for founder-event or "jump dispersal" speciation, a mechanism plausible in fragmented 327
environments like Karst systems, where small, isolated populations may rapidly diverge 328
following colonization of new cave habitats. Incorporating this parameter is therefore critical for 329
accurately reconstructing the biogeographic history of Sinocyclocheilus. The time-calibrated 330
phylogenetic tree produced in SNAPP was used as the input topology for these analyses in 331
RASP. All the six models were compared to identify the most suitable model for 332
Sinocyclocheilus and to evaluate whether incorporating the +J parameter significantly improved 333
model fit relative to the corresponding null models (DEC, DIVALIKE, BAYAREALIKE). Model 334
selection was based on the corrected Akaike Information Criterion (AICc) (Anderson & 335
Burnham, 2004). 336
337
Estimation of Genetic Connectivity and Barriers to Gene Flow 338
To identify connectivity corridors and potential barriers to gene flow at a fine spatial 339
scale, we applied the Fast and Flexible Estimation of Effective Migration Surfaces (FEEMS) 340
References
682
Anderson, D., & Burnham, K. (2004). Model selection and multi -model inference. Second. NY: Springer-683
Verlag, 63(2020), 10. 684
Baird, N. A., Etter, P. D., Atwood, T. S., Currey, M. C., Shiver, A. L., Lewis, Z. A., ... & Johnson, E. A. (2008). 685
Rapid SNP discovery and genetic mapping using sequenced RAD markers. PloS one, 3(10), e3376. 686
Barnes, R., Sahr, K., Evenden, G., Johnson, A., & Warmerdam, F. (2017). dggridR: Discrete Global Grids 687
for R. R package version 0.1. 12. 688
Bryant, D., Bouckaert, R., Felsenstein, J., Rosenberg, N. A., & RoyChoudhury, A. (2012). Inferring species 689
trees directly from biallelic genetic markers: bypassing gene trees in a full coalescent analysis. Molecular 690
biology and evolution, 29(8), 1917-1932. 691
Capella-Gutiérrez, S., Silla-Martínez, J. M., & Gabaldón, T. (2009). trimAl: a tool for automated alignment 692
trimming in large-scale phylogenetic analyses. Bioinformatics, 25(15), 1972-1973. 693
Carstens, B. C., Pelletier, T. A., Reid, N. M., & Satler, J. D. (2013). How to fail at species 694
delimitation. Molecular ecology, 22(17), 4369-4383. 695
Chambers, E. A., Lara-Tufiño, J. D., Campillo -García, G., Cisneros-Bernal, A. Y., Dudek Jr, D. J., León -696
Règagnon, V., ... & Hillis, D. M. (2025). Distinguishing species boundaries from geographic 697
variation. Proceedings of the National Academy of Sciences, 122(19), e2423688122. 698
Chang, C. C., Chow, C. C., Tellier, L. C., Vattikuti, S., Purcell, S. M., & Lee, J. J. (2015). Second-generation 699
PLINK: Rising to the challenge of larger and richer datasets. Gigascience, 4(1), s13742-015-0047–0048. 700
Chen, B., Mao, T. R., Liu, Y. W., Dai, W. Z., Li, X. L., Rajput, A. P., ... & Meegaskumbura, M. (2022). Sensory 701
evolution in a cavefish radiation: patterns of neuromast distribution and associated behaviour in 702
Sinocyclocheilus (Cypriniformes: Cyprinidae). Proceedings of the Royal Society B, 289(1984), 20221641. 703
Chifman, J., & Kubatko, L. (2015). Identifiability of the unrooted species tree topology under the coalescent 704
model with time-reversible substitution processes, site-specific rate variation, and invariable sites. Journal 705
of theoretical biology, 374, 35-47. 706
Combrink, L. L., Golcher -Benavides, J., Lewanski, A. L., Rick, J. A., Rosenthal, W. C., & Wagner, C. E. 707
(2025). Population Genomics of Adaptive Radiation. Molecular Ecology, 34(2), e17574. 708
De-Kayne, R., Schley, R., Barth, J. M., Campillo, L. C., Chaparro -Pedraza, C., Joshi, J., ... & Cerca, J. 709
(2025). Why do some lineages radiate while others do not? Perspectives for future research on adaptive 710
radiations. Cold Spring Harbor perspectives in biology, 17(2), a041448. 711
Earl, D. A., & VonHoldt, B. M. (2012). STRUCTURE HARVESTER: a website and program for visualizing 712
STRUCTURE output and implementing the Evanno method. Conservation genetics resources, 4, 359-361. 713
.CC-BY-ND 4.0 International licenseavailable under a
was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted July 15, 2025. ; https://doi.org/10.1101/2025.07.09.664015doi: bioRxiv preprint
Eaton, D. A., & Overcast, I. (2020). ipyrad: Interactive assembly and analysis of RADseq 714
datasets. Bioinformatics, 36(8), 2592-2594. 715
Edwards, D. L., & Knowles, L. L. (2014). Species detection and individual assignment in species 716
delimitation: can integrative data increase efficacy?. Proceedings of the Royal Society B: Biological 717
Sciences, 281(1777), 20132765. 718
Edwards, S. V., Potter, S., Schmitt, C. J., Bragg, J. G., & Moritz, C. (2016). Reticulation, divergence, and 719
the phylogeography–phylogenetics continuum. Proceedings of the National Academy of Sciences, 113(29), 720
8025-8032. 721
Emerson, B. C. (2025). Delimiting Species —Prospects and Challenges for DNA Barcoding. Molecular 722
Ecology, e17677. 723
Evanno, G., Regnaut, S., & Goudet, J. (2005). Detecting the number of clusters of individuals using the 724
software STRUCTURE: a simulation study. Molecular ecology, 14(8), 2611-2620. 725
Hurst, G. D., & Jiggins, F. M. (2005). Problems with mitochondrial DNA as a marker in population, 726
phylogeographic and phylogenetic studies: the effects of inherited symbionts. Proceedings of the Royal 727
Society B: Biological Sciences, 272(1572), 1525-1534. 728
Jakobsson, M., & Rosenberg, N. A. (2007). CLUMPP: a cluster matching and permutation program for 729
dealing with label switching and multimodality in analysis of population structure. Bioinformatics, 23(14), 730
1801-1806. 731
Jombart, T., & Collins, C. (2017). A tutorial for Discriminant Analysis of Principal Components (DAPC) using 732
adegenet 2.1. 0. Imperial College, London, United Kingdom. 733
Kalyaanamoorthy, S., Minh, B. Q., Wong, T. K., Von Haeseler, A., & Jermiin, L. S. (2017). ModelFinder: fast 734
model selection for accurate phylogenetic estimates. Nature methods, 14(6), 587-589. 735
Kapli, P., Lutteropp, S., Zhang, J., Kobert, K., Pavlidis, P., Stamatakis, A., & Flouri, T. (2017). Multi -rate 736
Poisson tree processes for single-locus species delimitation under maximum likelihood and Markov chain 737
Monte Carlo. Bioinformatics, 33(11), 1630-1638. 738
Katoh, K., & Standley, D. M. (2013). MAFFT multiple sequence alignment software version 7: improvements 739
in performance and usability. Molecular biology and evolution, 30(4), 772-780. 740
Landis, M. J., Matzke, N. J., Moore, B. R., & Huelsenbeck, J. P. (2013). Bayesian analysis of biogeography 741
when the number of areas is large. Systematic Biology, 62(6), 789–804. 742
Leaché, A. D., & Fujita, M. K. (2010). Bayesian species delimitation in West African forest geckos 743
(Hemidactylus fasciatus). Proceedings of the Royal Society B: Biological Sciences, 277(1697), 3071-3077. 744
Lei, F. K. (2023). Characteristics and Conservation of the Chinese Sinocyclocheilus cavefish’s aquatic 745
habitat (Doctoral dissertation, Tsinghua University). 746
.CC-BY-ND 4.0 International licenseavailable under a
was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted July 15, 2025. ; https://doi.org/10.1101/2025.07.09.664015doi: bioRxiv preprint
Li, R. H., Wang, X. A, Bian, C., Gao, Z. J., Zhang, Y. W., Jiang, W. S., Wang, M., You, X. X., Cheng, L., Pan, 747
X. F., Yang, J. X., Shi, Q. (2021). Whole-Genome Sequencing of Sinocyclocheilus maitianheensis reveals 748
phylogenetic evolution and immunological variances in various Sinocyclocheilus fishes. Frontiers in 749
Genetics, 12, 736500. 750
Li, Z. Q., Guo, B. C., Li, J. B., He, S. P., & Chen, Y. Y. (2008). Bayesian mixed models and divergence time 751
estimation of Chinese cavefishes (Cyprinidae: Sinocyclocheilus). Chinese Science Bulletin, 53(15), 2342-752
2352. 753
Luo, X. Y., Chen, B., Mao, T. R., Liu, Y. W., Yang, J., and Meegaskumbura, M. (2023). Skin transcriptomic 754
correlates of cave-dwelling Sinocyclocheilus cavefish. Frontiers in Ecology and Evolution, 11, 1264214. 755
Ma, L., Yang, J. X., Lei, F. K., Xu, M. Z., Zhao, Y. H., & Jeffery, W. R. (2023). Protection and exploration of 756
the scientific potential of Chinese cavefish. Zoological Research, 44(4), 675–677. 757
Ma, L., Zhao, Y. H., & Yang, J. X. (2019). Chapter 28—Cavefish of China. In W. B. White, D. C. Culver, & 758
T. Pipan (Eds.), Encyclopedia of Caves (Third Edition) (pp. 237–254). Academic Press. 759
Mao, T. R., Liu, Y. W., Meegaskumbura, M., Yang, J., Ellepola, G., Senevirathne, G., ... & Pie, M. R. (2021). 760
Evolution in Sinocyclocheilus cavefish is marked by rate shifts, reversals, and origin of novel traits. BMC 761
Ecology and Evolution, 21, 1-14. 762
Mao, T. R., Liu, Y. W., Vasconcellos, M. M., Pie, M. R., Ellepola, G., Fu, C. H., ... & Meegaskumbura, M. 763
(2022). Evolving in the darkness: Phylogenomics of Sinocyclocheilus cavefishes highlights recent 764
diversification and cryptic diversity. Molecular Phylogenetics and Evolution, 168, 107400. 765
Mao, T. R, Liu, Y. W., Vasconcellos, M. M., Zhou, S. P., Ellepola, G., Yang, J., Pie. M. R and 766
Meegaskumbura, M. (2024). Caves as species pumps: key innovations, isolation, and periodic 767
introgression drive the world’s largest cavefish radiation in a dynamic karstic landscape. bioRxiv, 2024-09. 768
Marcus, J., Ha, W., Barber, R. F., & Novembre, J. (2021). Fast and flexible estimation of effective migration 769
surfaces. eLife, 10, e61927. 770
Matzke, N. J. (2013). Probabilistic historical biogeography: New models for founder -event speciation, 771
imperfect detection, and fossils allow improved accuracy and model -testing. University of California, 772
Berkeley. 773
Matzke, N. J. (2014). Model Selection in Historical Biogeography Reveals that Founder -Event Speciation 774
Is a Crucial Process in Island Clades. Systematic Biology, 63(6), 951–970. 775
Minh, B. Q., Schmidt, H. A., Chernomor, O., Schrempf, D., Woodhams, M. D., Von Haeseler, A., & Lanfear, 776
R. (2020). IQ -TREE 2: new models and efficient methods for phylogenetic inference in the genomic 777
era. Molecular biology and evolution, 37(5), 1530-1534. 778
.CC-BY-ND 4.0 International licenseavailable under a
was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted July 15, 2025. ; https://doi.org/10.1101/2025.07.09.664015doi: bioRxiv preprint
Nachman, M. W., & Payseur, B. A. (2012). Recombination rate variation and speciation: theoretical 779
predictions and empirical results from rabbits and mice. Philosophical Transactions of the Royal Society B: 780
Biological Sciences, 367(1587), 409-421. 781
Nguyen, L. T., Schmidt, H. A., Von Haeseler, A., & Minh, B. Q. (2015). IQ -TREE: a fast and effective 782
stochastic algorithm for estimating maximum -likelihood phylogenies. Molecular biology and 783
evolution, 32(1), 268-274. 784
Niemiller, M. L., & Soares, D. (2014). Cave environments. In Extremophile fishes: ecology, evolution, and 785
physiology of teleosts in extreme environments (pp. 161-191). Cham: Springer International Publishing. 786
Niemiller, M. L., & Zigler, K. S. (2013). Patterns of cave biodiversity and endemism in the Appalachians and 787
Interior Plateau of Tennessee, USA. PLoS One, 8(5), e64177. 788
Papadopoulou, A., Bergsten, J., Fujisawa, T., Monaghan, M. T., Barraclough, T. G., & Vogler, A. P. (2008). 789
Speciation and DNA barcodes: testing the effects of dispersal on the formation of discrete sequence 790
clusters. Philosophical Transactions of the Royal Society B: Biological Sciences, 363(1506), 2987-2996. 791
Pickrell, J., & Pritchard, J. (2012). Inference of population splits and mixtures from genome -wide allele 792
frequency data. Nature Precedings, 1-1. 793
Porter, M. L., & Crandall, K. A. (2003). Lost along the way: the significance of evolution in reverse. Trends 794
in Ecology & Evolution, 18(10), 541-547. 795
Pritchard, J. K., Stephens, M., & Donnelly, P. (2000). Inference of population structure using multilocus 796
genotype data. Genetics, 155(2), 945-959. 797
Puillandre, N., Lambert, A., Brouillet, S., & Achaz, G. J. M. E. (2012). ABGD, Automatic Barcode Gap 798
Discovery for primary species delimitation. Molecular ecology, 21(8), 1864-1877. 799
Rambaut, A., Drummond, A. J., Xie, D., Baele, G., & Suchard, M. A. (2018). Posterior summarization in 800
Bayesian phylogenetics using Tracer 1.7. Systematic biology, 67(5), 901-904. 801
Ranwez, V., Douzery, E. J., Cambon, C., Chantret, N., & Delsuc, F. (2018). MACSE v2: toolkit for the 802
alignment of coding sequences accounting for frameshifts and stop codons. Molecular biology and 803
evolution, 35(10), 2582-2584. 804
Ree, R. H., & Smith, S. A. (2008). Maximum likelihood inference of geographic range evolution by dispersal, 805
local extinction, and cladogenesis. Systematic Biology, 57(1), 4–14. 806
Reid, N. M., & Carstens, B. C. (2012). Phylogenetic estimation error can decrease the accuracy of species 807
delimitation: a Bayesian implementation of the general mixed Yule -coalescent model. BMC evolutionary 808
biology, 12, 1-11. 809
Ronquist, F. (1997). Dispersal -vicariance analysis: A new approach to the quantification of historical 810
biogeography. Systematic Biology, 46(1), 195–203. 811
.CC-BY-ND 4.0 International licenseavailable under a
was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted July 15, 2025. ; https://doi.org/10.1101/2025.07.09.664015doi: bioRxiv preprint
Ronquist, F., Teslenko, M., Van Der Mark, P., Ayres, D. L., Darling, A., Höhna, S., Larget, B., Liu L., Suchard 812
M. A., Huelsenbeck, J. P. (2012). MrBayes 3.2: efficient Bayesian phylogenetic inference and model choice 813
across a large model space. Systematic biology, 61(3), 539-542. 814
Rosenberg, N. A. (2004). DISTRUCT: a program for the graphical display of population structure. Molecular 815
ecology notes, 4(1), 137-138. 816
Seehausen, O. (2004). Hybridization and adaptive radiation. Trends in ecology & evolution, 19(4), 198-207. 817
Singhal, S., Hoskin, C. J., Couper, P., Potter, S., & Moritz, C. (2018). A framework for resolving cryptic 818
species: a case study from the lizards of the Australian wet tropics. Systematic Biology, 67(6), 1061-1075. 819
Sites, J. W., & Marshall, J. C. (2003). Delimiting species: a Renaissance issue in systematic biology. Trends 820
in Ecology & Evolution, 18(9), 462-470. 821
Sobel, J. M., Chen, G. F., Watt, L. R., & Schemske, D. W. (2010). The biology of speciation. Evolution, 64(2), 822
295-315. 823
Stange, M., Sánchez-Villagra, M. R., Salzburger, W., & Matschiner, M. (2018). Bayesian divergence-time 824
estimation with genome -wide single -nucleotide polymorphism data of sea catfishes (Ariidae) supports 825
Miocene closure of the Panamanian Isthmus. Systematic Biology, 67(4), 681-699. 826
Streicher, J. W., Lambert, S. M., Méndez de la Cruz, F. R., Martínez-Méndez, N., García-Vázquez, U. O., 827
Nieto Montes de Oca, A., & Wiens, J. J. (2024). What Predicts Gene Flow During Speciation? The Relative 828
Roles of Time, Space, Morphology and Climate. Molecular Ecology, 33(23), e17580. 829
Suchard, M. A., Lemey, P., Baele, G., Ayres, D. L., Drummond, A. J., & Rambaut, A. (2018). Bayesian 830
phylogenetic and phylodynamic data integration using BEAST 1.10. Virus evolution, 4(1), vey016. 831
Sudasinghe, H., Pethiyagoda, R., & Meegaskumbura, M. (2020). A molecular phylogeny of the genus 832
Laubuka (Teleostei: Cyprinidae) in Sri Lanka reveals multiple origins and a cryptic species. Systematics 833
and Biodiversity, 18(6), 592-613. 834
Sudasinghe, H., Ranasinghe, R. T., Goonatilake, S. D. A., & Meegaskumbura, M. (2018). A review of the 835
genus Labeo (Teleostei: Cyprinidae) in Sri Lanka. Zootaxa, 4486(3), 201-235. 836
Sukumaran, J., & Knowles, L. L. (2017). Multispecies coalescent delimits structure, not 837
species. Proceedings of the National Academy of Sciences, 114(7), 1607-1612. 838
Sun, H., Li, Z., Landis, J. B., Qian, L., Zhang, T., & Deng, T. (2022). Effects of drainage reorganization on 839
phytogeographic pattern in Sino-Himalaya. Alpine Botany, 1–11. 840
Talavera, G., & Castresana, J. (2007). Improvement of phylogenies after removing divergent and 841
ambiguously aligned blocks from protein sequence alignments. Systematic biology, 56(4), 564-577. 842
.CC-BY-ND 4.0 International licenseavailable under a
was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted July 15, 2025. ; https://doi.org/10.1101/2025.07.09.664015doi: bioRxiv preprint
Wan, Y. G., Shen, Z. K., Zeng, Y. H., & Sheng, S. Z. (2007). Evolution of cumulative Coulomb failure stress 843
in northeastern Qinghai -Xizang (Tibetan) Plateau and its effect on large earthquake occurrence. Acta 844
Seismologica Sinica, 20(2), 117–132. 845
Wen, H. M., Luo, T., Wang, Y. L., Wang, S. W., Liu, T., Xiao, N., & Zhou, J. (2022). Molecular phylogeny and 846
historical biogeography of the cave fish genus Sinocyclocheilus (Cypriniformes: Cyprinidae) in southwest 847
China. Integrative Zoology, 17(2), 311-325. 848
Xiao, H., Chen, S. Y., Liu, Z. M., Zhang, R. D., Li, W. X., Zan, R. G., & Zhang, Y. P. (2005). Molecular 849
phylogeny of Sinocyclocheilus (Cypriniformes: Cyprinidae) inferred from mitochondrial DNA 850
sequences. Molecular phylogenetics and evolution, 36(1), 67-77. 851
Xiao, M. Y., Wang, J. J., Luo, T., Zhou, J. J., Xiao, N., & Zhou, J. (2025). Sinocyclocheilus xingrenensis 852
(Cypriniformes, Cyprinidae), a new underground fish from Guizhou Province, Southeastern 853
China. Zoosystematics and Evolution, 101(2), 419-436. 854
Yang, Z. H. (2015). The BPP program for species tree estimation and species delimitation. Current 855
Zoology, 61(5), 854-865. 856
Yang, Z. H., & Rannala, B. (2010). Bayesian species delimitation using multilocus sequence 857
data. Proceedings of the National Academy of Sciences, 107(20), 9264-9269. 858
Yang, Z. H., & Rannala, B. (2014). Unguided species delimitation using DNA sequence data from multiple 859
loci. Molecular Biology and Evolution, 31(12), 3125-3135. 860
Yu, Y., Blair, C., & He, X. J. (2020). RASP 4: Ancestral State Reconstruction Tool for Multiple Genes and 861
Characters. Molecular Biology and Evolution, 37(2), 604–606. 862
Zhang, C., Rabiee, M., Sayyari, E., & Mirarab, S. (2018). ASTRAL -III: polynomial time species tree 863
reconstruction from partially resolved gene trees. BMC bioinformatics, 19, 15-30. 864
Zhang, C. , & Zhao, Y. . (2016). Species Diversity and Distribution of Inland Fishes in China. Science Press, 865
Beijing.Zhang, D., Gao, F., Jakovlić, I., Zou, H., Zhang, J., Li, W. X., & Wang, G. T. (2020). PhyloSuite: An 866
integrated and scalable desktop platform for streamlined molecular sequence data management and 867
evolutionary phylogenetics studies. Molecular ecology resources, 20(1), 348-355. 868
Zhang, J. J., Kapli, P., Pavlidis, P., & Stamatakis, A. (2013). A general species delimitation method with 869
applications to phylogenetic placements. Bioinformatics, 29(22), 2869-2876. 870
Zhang, L. S. (2012). Ancient Geography of China–the Formation of China’s Natural Environment. Beijing: 871
Science Press. 872
Zhang, Z., Daly, J. S., Tian, Y., Lei, C., Sun, X., Badenszki, E., ... & Hu, J. (2023). Late Oligocene formation 873
of the Pearl River triggered by the opening of the South China Sea. Geophysical Research Letters, 50(8), 874
e2023GL103049. 875
.CC-BY-ND 4.0 International licenseavailable under a
was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted July 15, 2025. ; https://doi.org/10.1101/2025.07.09.664015doi: bioRxiv preprint
Zhao, Y. H., & Zhang, C. G. (2009). Endemic Fishes of Sinocyclocheilus (Cypriniformes, Cyprinidae) in 876
China—Species Diversity, Cave Adaptation, Systematics and Zoogeography. Science Press, Beijing. 877
Zhou, Q. Y., & Chen, P. Y. (1993). Lithofacies change and palaeogeographical evolution during Late 878
Cenozoic in Guizhou and its vicinity. Geology of Guizhou, 10(3), 201-207. 879
880
.CC-BY-ND 4.0 International licenseavailable under a
was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted July 15, 2025. ; https://doi.org/10.1101/2025.07.09.664015doi: bioRxiv preprint
881
MAIN FIGURES 882
883
.CC-BY-ND 4.0 International licenseavailable under a
was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted July 15, 2025. ; https://doi.org/10.1101/2025.07.09.664015doi: bioRxiv preprint
884
885
Figure 1. A-D) Molecular phylogenetic relationships of Sinocyclocheilus based on Bayesian 886
inference of the concatenated cytb +nd4+ rag1 dataset. Numbers at the nodes indicate 887
Bayesian posterior probabilities. Scale bar represents number of substitutions per site. Species 888
delimitation results from molecular methods (bPTP, mPTP, and bGMYC) applied to cytb, ND4, 889
Rag-1, and the concatenated dataset are shown as black rectangles to the right of the trees. 890
.CC-BY-ND 4.0 International licenseavailable under a
was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted July 15, 2025. ; https://doi.org/10.1101/2025.07.09.664015doi: bioRxiv preprint
891
Figure 2. Maximum-likelihood (IQ-TREE) phylogeny of Sinocyclocheilus based on 87,141 RAD 892
loci. Bootstrap support values are shown at each node. Numbers in parentheses after species 893
names denote individuals from distinct caves (i.e. separate populations). Species delimitation 894