Integrative Species Delimitation in a Speciation Continuum: Phylogenomics, Cryptic Diversity, Diversification and Historical Biogeography of Sinocyclocheilus Cavefish

preprint OA: closed CC-BY-ND-4.0
📄 Open PDF Full text JSON View at publisher
Full text 88,437 characters · extracted from oa-pdf · 11 sections · click to expand

Abstract

37 The transition from structured populations to distinct species often unfolds along a speciation 38 continuum. However, empirically dissecting this continuum poses challenges for species 39 delimitation, particularly in rapid radiations marked by recent divergence, incomplete lineage 40 sorting, and gene flow. The species-rich Sinocyclocheilus cavefish radiation of Southwest 41 China, an emerging evolutionary multi-species model system, shows a notable mismatch 42 between its high morphological diversity and the limited divergence observed in commonly used 43 mitochondrial DNA markers. This pattern suggests that true species richness may be 44 underestimated. Yet, comprehensive genome-wide approaches to resolve species boundaries in 45 this group are still lacking. To address this, we combine phylogenomics, coalescent-based 46 species delimitation, population genetics, and historical biogeography, using both genome-wide 47 RAD-seq and multi-locus Sanger data (including nuclear and mitochondrial DNA). Our 48 phylogenomic analyses resolve major clades and reveal substantial cryptic diversity, well 49 beyond what was detected by earlier markers. Species delimitation based on RAD-seq genomic 50 data identifies several cryptic evolutionary lineages. We recover a dynamic divergence history 51 shaped by isolation and episodic connectivity. Biogeographic reconstruction supports a mid-52 Miocene origin following a major vicariance event, with subsequent founder-event dispersals 53 into subterranean habitats. This long history of fragmentation is further complicated by 54 reticulation and ancestral polymorphism, and is reflected in present-day patterns of restricted 55 gene flow across river valleys. These results highlight the utility of integrative genomic 56 approaches in resolving species boundaries and uncovering the evolutionary processes that 57 underlie high diversity in large and complex radiations. 58 59

Keywords

Species Delimitation, Rapid Radiation, Incomplete Lineage Sorting, Biogeography, 60 Founder-Event Speciation, Gene Flow, Speciation Continuum, Sinocyclocheilus 61 62 .CC-BY-ND 4.0 International licenseavailable under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (whichthis version posted July 15, 2025. ; https://doi.org/10.1101/2025.07.09.664015doi: bioRxiv preprint

Introduction

63 Accurately delineating species, particularly within groups undergoing rapid radiation, remains 64 challenging (De-Kayne et al., 2025). Recent and rapid divergence often leaves a complex 65 signature of shared ancestral polymorphism and morphological convergence, which can 66 confound taxonomic assessments, especially when based on limited data (Seehausen 2004; 67 Nachman & Payseur 2012; Combrink et al., 2025). Distinguishing nascent species from deeply 68 structured populations under such circumstances requires new analytical approaches beyond 69 traditional methods. The application of genome-wide markers now provides unprecedented 70 resolution enabling robust tests of lineage independence, and offering powerful ways of 71 unearthing cryptic biodiversity (Edwards & Knowles 2014; Singhal et al. 2018; Streicher et al., 72 2024). 73 Cave-dwelling organisms, shaped by isolation and intense selection, provide valuable models 74 for studying evolutionary divergence in extreme environments (Niemiller & Soares, 2014). The 75 cyprinid genus Sinocyclocheilus, confined to the Karst systems of Southwest China, represents 76 the most species-rich cavefish radiation known. With over 82 described species so far, it shows 77 remarkable morphological disparity, including varying degrees of eye degeneration, 78 pigmentation loss, horns on their forehead and humps on their back, and other stygomorphic 79 features (Zhao & Zhang, 2009; Mao et al., 2021; Xiao et al., 2025; Chen et al., 2022). However, 80 this phenotypic variation, long used for species delineation, is poorly reflected in traditional 81 genetic markers. Many putative Sinocyclocheilus species exhibit low interspecific divergence in 82 mitochondrial DNA, a discordance suggesting that past studies have likely underestimated the 83 true species richness in this group (Mao et al. 2021, 2022). 84 The diversification of this genus is closely tied to the complex geological and climatic history of 85 Southwest China (Mao et al 2025). The uplift of the Qinghai-Xizang (Tibetan) and Yunnan-86 Guizhou Plateau, together with cycles of aridification since the late Miocene, is thought to have 87 fragmented subterranean habitats, promoting speciation and resulting in point endemism (Xiao 88 et al. 2005; Mao et al., 2022). River systems have acted as both barriers and conduits in this 89 vast karstic landscape: deep valleys often isolate basins, while underground waterways can 90 facilitate dispersal among them, generating complex biogeographic boundaries (Wen et al., 91 2022). Within this geographically intricate and hydrologically complicated background, the 92 radiation of Sinocyclocheilus has taken shape. 93 Previous taxonomic assessments on Sinocyclocheilus have relied primarily on mitochondrial 94 markers and morphology (Xiao et al., 2005; Zhao & Zhang, 2009; Wen et al., 2022). These 95 .CC-BY-ND 4.0 International licenseavailable under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (whichthis version posted July 15, 2025. ; https://doi.org/10.1101/2025.07.09.664015doi: bioRxiv preprint approaches have recovered relatively few genetically distinct lineages, contrasting with the 96 striking morphological variation within the genus. Mitochondrial DNA, limited by its maternal and 97 single-locus inheritance, often fails to capture the full complexity of divergence, particularly in 98 systems shaped by recent radiation and incomplete lineage sorting (Hurst & Jiggins, 2005). In 99 contrast, genome-wide approaches such as RAD-seq provide far greater resolution. When 100 analysed within a multispecies coalescent framework, these data allow robust tests of species 101 boundaries that incorporate gene tree discordance and ancestral polymorphism (Yang & 102 Rannala, 2010). Yet, despite the putative high species diversity in Sinocyclocheilus, no 103 comprehensive genome-wide species delimitation of the genus has yet been undertaken. As a 104 result, its true genetically-backed species richness, including potentially extensive cryptic 105 diversity, remains poorly characterised. 106 Here, we generated an extensive dataset, including mitochondrial and nuclear Sanger 107 sequences and genome-wide RAD-seq data, building upon previous work. We aim to: (1) 108 establish a phylogenomic framework to clarify evolutionary relationships among major clades; 109 (2) assess species boundaries applying multiple molecular species delimitation approaches 110 across datasets, identifying cryptic lineages and evaluating method consistency; (3) investigate 111 population structure and gene flow to understand processes operating at the population-species 112 interface; and (4) reconstruct the historical biogeography of the genus to elucidate the drivers of 113 its remarkable radiation. 114 115 2. MATERIALS AND METHODS 116 We generated Sanger sequences and genome-wide RAD-seq data to investigate 117 species boundaries and evolutionary relationships. These datasets were used to reconstruct 118 phylogenies through multiple inference methods, followed by species delimitation analyses 119 applied separately to each data type. Using the SNPs from the RAD-seq dataset to explore 120 population-level processes, assess genetic clustering, and infer historical gene flow. In addition, 121 we mapped effective migration surfaces to evaluate connectivity across the landscape. Finally, 122 we estimated divergence times and inferred the biogeographic history using a time-calibrated 123 phylogeny to establish the temporal and geographic context for the radiation. 124 125 2.1. Taxon Sampling and Molecular Data Generation 126 Sample Collection and Preservation 127 .CC-BY-ND 4.0 International licenseavailable under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (whichthis version posted July 15, 2025. ; https://doi.org/10.1101/2025.07.09.664015doi: bioRxiv preprint All experimental procedures involving live animals complied with the Chinese Animal 128 Welfare Law (GB/T 35892-2018). Sampling of Sinocyclocheilus species was conducted 129 between 2017 and 2021 across their known distribution in the karstic regions of Guangxi, 130 Guizhou, and Yunnan provinces, China (Supplementary Fig. S1). Live fish were transported to 131 the laboratory in oxygenated, temperature-controlled containers. Tissue samples (fin clips) were 132 primarily collected non-lethally after anaesthetising individuals with MS-222; the specimens 133 were allowed to recover before being returned to holding tanks. Tissue samples were 134 immediately preserved and stored at -80°C. Our sampling aimed to include representatives from 135 a broad range of known species and distinct morphological forms within the genus. 136 137 Sanger Sequencing Data 138 Genomic DNA was extracted from tissue samples using the DNeasy Blood and Tissue Kit 139 (Qiagen Inc.) following the manufacturer’s protocols. Fragments of three genes, mitochondrial 140 cytochrome b (cytb) and NADH dehydrogenase subunit 4 (ND4), and the nuclear recombination 141 activating gene 1 (Rag-1), were amplified via PCR. Primers used were: DonThr R / DonGlu F for 142 cytb (Sudasinghe et al., 2018), ND4F / ND4R for ND4 (Li et al., 2008), and RAG1F1 / RAG1R1 143 for Rag-1 (Sudasinghe et al., 2020). PCR reactions (25 µL) contained 3 mM MgCl2, 0.4 mM 144 dNTPs, 1X buffer, 0.06 U Taq DNA Polymerase, and 2 µM of each primer. Thermal cycling 145 conditions included an initial denaturation at 94°C (3 min), 35 cycles of 94°C (45s), 46–50°C (1 146 min for cytb/ND4) or 48–56°C (1 min for Rag-1), and 72°C (45s), followed by a final extension at 147 72°C (5 min). PCR products were visualized on 1.5% agarose gels, purified, and sequenced in 148 both directions using BigDye™ chemistry (Applied Biosystems Inc.) on an ABI automated 149 sequencer. Newly generated sequences were deposited in GenBank (accession numbers 150 provided in Supplementary Table S1). To broaden taxonomic coverage, we also incorporated 151 149 cytb, 126 ND4, and 15 Rag-1 sequences from 70 Sinocyclocheilus species and two 152 outgroup taxa (Puntius semifasciolatus and Cyprinus carpio) downloaded from GenBank 153 (Supplementary Table S1). 154 155 RAD-seq Data 156 A total of 324 individuals were sequenced using Restriction site-Associated DNA 157 sequencing (RAD-seq), including 204 newly collected samples (representing most individuals 158 also used for Sanger sequencing) and 120 samples previously processed by Mao et al. (2022), 159 .CC-BY-ND 4.0 International licenseavailable under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (whichthis version posted July 15, 2025. ; https://doi.org/10.1101/2025.07.09.664015doi: bioRxiv preprint and 143 samples from Mao et al. (2025) (Supplementary Table S2). Two outgroup species 160 (Puntius semifasciolatus and Cyprinus carpio) were also included. High-quality genomic DNA 161 was extracted as described above. RAD-seq libraries were prepared following Baird et al. 162 (2008) and related protocols. Genomic DNA was digested with the EcoRI restriction enzyme 163 (Takara, China), adapters were ligated, and samples were pooled and sheared. Fragments 164 ranging from 200–600 bp were size-selected from agarose gels and PCR amplified. Paired-end 165 sequencing (100 bp) was performed on an Illumina HiSeq 2500 platform by Novogene 166 Bioinformatics Institute (Tianjin, China). 167 168 2.2. Sequence Processing, Alignment, and Dataset Assembly 169 Sanger Data Processing 170 Forward and reverse Sanger sequences were reconciled using MAFFT v7 171 (https://mafft.cbrc.jp/alignment/software/mailform.html). Sequences for each gene were aligned 172 using MAFFT v7.3.13 (Katoh & Standley, 2013) and MACSE v2.03 (Ranwez et al., 2018) as 173 implemented in PhyloSuite v1.2.2 (Zhang et al., 2020). Alignments were manually inspected, 174 and ambiguously aligned regions or poorly sequenced ends were trimmed using Gblocks 0.91b 175 (Talavera & Castresana, 2007) and trimAl v1.2 (Capella-Gutierrez et al., 2009). The final 176 concatenated dataset for phylogenetic analyses comprised 377 haplotypes from 70 177 Sinocyclocheilus species, totaling 3423 bp, partitioned by gene as follows: cytb: 1–1077; ND4: 178 1078–2109; Rag-1: 2110–3423. 179 180 RAD-seq Data Processing and SNP Calling: 181 Raw RAD-seq reads were processed using ipyrad v0.9.56 (Eaton & Overcast, 2020). 182 Reads were demultiplexed by individual, and low-quality reads (phred score 5 bases), 183 those containing adapter contamination, or excessive poly-N tracts were discarded. A minimum 184 sequencing depth of six reads was required for base calling. Within-individual loci were 185 clustered using a 90% similarity threshold. To minimize paralogous loci, highly repetitive RAD 186 tags (with stack depth exceeding a dynamically determined threshold) and their derivatives were 187 excluded. Additionally, loci with low coverage (≤5 reads) were also removed. Several filtered 188 SNP datasets were generated from the assembled RAD loci for different downstream analyses 189 (Supplementary Table S3): i) For concatenated phylogenomic analyses (IQ-TREE, MrBayes), 190 we used loci present in at least 90% of all 324 individuals (including outgroups); ii) For 191 .CC-BY-ND 4.0 International licenseavailable under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (whichthis version posted July 15, 2025. ; https://doi.org/10.1101/2025.07.09.664015doi: bioRxiv preprint SVDquartets (tetrad), one random SNP per locus present in all individuals was selected to 192 minimize linkage; iii) For BPP species delimitation analyses, outgroups were excluded. Because 193 of high computational demands, two datasets were assembled: initial six clade-specific datasets 194 (Clades A–F, see results), each retaining loci shared across all individuals within each clade, 195 and a combined dataset of all 318 ingroup individuals, including loci present in ≥90% of all 196 individuals. This two-step approach streamlined the final analysis by constraining parameter 197 space within clades based on initial results. iv) For population structure (STRUCTURE, DAPC), 198 admixture (TreeMix) and Flexible Estimation of Effective Migration Surfaces (FEEMS) analyses, 199 outgroups were excluded, and one SNP per locus present in at least 90% of the 318 ingroup 200 individuals were used; v) For SNAPP divergence time estimation, loci shared among all 56 201 species (represented by selected individuals) were used, from which unlinked SNPs were 202 selected. 203 204 2.3. Phylogenetic Inference 205 Sanger Data Phylogenetic Analyses 206 Optimal substitution models and partitioning schemes for the concatenated Sanger 207 datasets (cytb+ND4+Rag-1; and cytb+ND4) were determined using ModelFinder 208 (Kalyaanamoorthy et al., 2017) in PhyloSuite, based on the Bayesian Information Criterion 209 (BIC), applying a codon-partitioned model. Maximum Likelihood (ML) phylogenies were inferred 210 using IQ-TREE v1.6.8 (Nguyen et al., 2015) with 1000 ultrafast bootstrap replicates. Bayesian 211 Inference (BI) was performed using MrBayes v3.1.2 (Ronquist et al., 2012), running two 212 independent analyses for 200 million generations, sampling every 20,000 generations, 213 discarding the first 10% as burn-in. Convergence was assessed using Tracer v1.7.1 (Rambaut 214 et al., 2018). Individual gene trees were also estimated. 215 216 RAD-seq Phylogenomic Analyses 217 Several phylogenomic approaches, including concatenated and coalescent, were 218 applied to the RAD-seq SNP datasets: i) A ML tree was inferred from the concatenated RAD-219 seq loci (Dataset i) using IQ-TREE v2.2.6 (Minh et al., 2020). Support was assessed with 1000 220 ultrafast bootstrap replicates. ii) Bayesian inference was conducted on the same concatenated 221 RAD-seq loci (Dataset i, excluding invariant sites) using MrBayes v3.1.2. Two independent runs 222 of 48 MCMC chains were performed for 20 million generations, sampling every 1000 223 .CC-BY-ND 4.0 International licenseavailable under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (whichthis version posted July 15, 2025. ; https://doi.org/10.1101/2025.07.09.664015doi: bioRxiv preprint generations, with convergence diagnostics assessed as above. ModelFinder identified best-fit 224 nucleotide substitution models for both analyses (ML and BI). iii) To account for incomplete 225 lineage sorting (ILS) using a summary method, a species tree was inferred with SVDquartets 226 (Chifman & Kubatko, 2015) implemented in tetrad v.0.7.19 within ipyrad API, based on the 227 unlinked SNP dataset (Dataset ii). All possible quartets were evaluated, with support assessed 228 from 1000 non-parametric bootstrap replicates. iv) ASTRAL-III v5.7.3 (Zhang et al., 2018) was 229 used to infer a species tree from individual gene trees generated for each RAD locus (treated as 230 separate genes and inferred using IQ-TREE). ASTRAL accounts for ILS and provides local 231 posterior probabilities for branch support. 232 233 2.4. Species Delimitation Analyses 234 To confidently assess species boundaries and identify potential cryptic lineages, multiple 235 species delimitation methods were applied to the Sanger and RAD-seq datasets to assess 236 species boundaries and identify potential cryptic lineages. 237 238 Sanger Data-Based Species Delimitation 239 A suite of methods was employed to evaluate lineage boundaries based on traditional 240 mitochondrial and nuclear markers: i) Automatic Barcode Gap Discovery (ABGD) was applied to 241 cytb, ND4, and concatenated datasets via the ABGD web server (Puillandre et al., 2012), with 242 prior intraspecific divergence (P) from 0.001 to 0.1, 10 steps, minimum relative gap width (X) of 243 1, and the Kimura (K80) model. ii) Bayesian Poisson Tree Processes (bPTP) was performed on 244 the bPTP web server (Zhang et al., 2013) using ML trees (cytb, ND4, concatenated) as input. 245 Analyses ran for 5 million MCMC generations with thinning of 100, 0.1 burn-in, and outgroups 246 removed.iii) Multi-rate Poisson Tree Processes (mPTP) was applied to the same ML input trees 247 on the mPTP web server (Kapli et al., 2017), with 50 million MCMC generations, sampling at 248 every 1 million, and a burn-in of 1 million.iv) Bayesian General Mixed Yule-Coalescent (bGMYC) 249 was implemented in R (Reid & Carstens, 2012) using 1000 post-burn-in ultrametric trees from 250 the BEAST analysis of the Sanger data (section 2.6). Delimitation was assessed across several 251 conspecificity probability thresholds (0.95–0.99). 252 253 RAD-seq Data-Based Species Delimitation (BPP) 254 .CC-BY-ND 4.0 International licenseavailable under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (whichthis version posted July 15, 2025. ; https://doi.org/10.1101/2025.07.09.664015doi: bioRxiv preprint Species delimitation using SNP data was performed with Bayesian Phylogenetics and 255 Phylogeography (BPP) v4.3 (Yang & Rannala, 2010, 2014), which implements the multispecies 256 coalescent model to assess species boundaries considering ILS.Analyses were conducted on 257 Dataset iii. Due to computational constraints, initial runs focused on delimiting species within the 258 six major clades identified through phylogenomic analyses, followed by a comprehensive 259 analysis of all 318 Sinocyclocheilus individuals based on those initial results. A guide tree 260 derived from the SVDquartets analysis was used. Priors for population size (θ) and root 261 divergence time (τ0) were set based on Minimalist BPP (https://brannala.github.io/bpps/#/), 262 using IG (3, 0.0025) for θ and IG (3, 0.20) for τ0, following Yang (2015). Both rjMCMC algorithm 263 0 (ϵ=2) and algorithm 1 (α=2, m=1) were tested across four independent runs of 20,000 MCMC 264 steps each (sampling every 2 steps, with 2000 steps burn-in). Populations were designated as 265 cryptic species if delimited through the RAD-seq-based BPP analysis and showed subtle 266 morphological differences from their confirmed sister species (to be further examined in formal 267 species descriptions). 268 269 2.5. Population Genetic Structure and Historical Gene Flow Analyses (using RAD-seq 270 data) 271 To investigate intraspecific genetic structure, interspecific admixture, and historical gene 272 flow, we used RAD-seq SNP data (Dataset iv) in three complementary approaches. First, we 273 used STRUCTURE v2.3.4 (Pritchard et al., 2000) with an admixture model with correlated allele 274 frequencies, testing K values from 2 to 57 (10 replicates per K), with 20,000 generations 275 following a 10,000 generation burn-in. Results were summarised with STRUCTURE 276 HARVESTER (Earl, 2012), CLUMPP (Jakobsson & Rosenberg, 2007) and distruct v1.1 277 (Rosenberg, 2004). Second, a Discriminant Analysis of Principal Components (DAPC) was 278 performed using the R package adegenet v2.1.5 (Jombart, 2008), with major clades identified 279 from phylogenomic analyses used as priori groupings. Third, we used TreeMix v1.1.3 (Pickrell & 280 Pritchard, 2012) to infer population splits and patterns of historical gene flow based on allele 281 frequency data, as implemented in ipyrad API. We tested the number of migration edges (m) 282 from 1 to 25, and selected the best-fitting model based on likelihood scores. 283 284 2.6. Divergence Time Estimation 285 .CC-BY-ND 4.0 International licenseavailable under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (whichthis version posted July 15, 2025. ; https://doi.org/10.1101/2025.07.09.664015doi: bioRxiv preprint Divergence times based on Sanger data (cytb+ND4+Rag-1; and cytb+ND4) were estimated 286 using BEAST v2.7.7 (Suchard et al., 2018). A partitioned codon model was used with 287 substitution models selected by ModelFinder. Secondary calibrations from Li et al. (2021) 288 included divergence between: S. maitianheensis and S. anophthalmus (mean = 2.7 Mya, SD = 289 1.05, 95% HPD = 1.4 to 4.8 Mya), between S. maitianheensis + S. anophthalmus and S. 290 grahami (mean = 4.3 Mya, SD = 1.30, 95% HPD = 2.5 to 6.9 Mya), among S. maitianheensis + 291 S. anophthalmus + S. grahami and S. anshuiensis (mean = 7.8 Mya, SD = 1.95, 95% HPD = 5.0 292 to 11.7 Mya), and among S. maitianheensis + S. anophthalmus + S. grahami + S. anshuiensis 293 and S. rhinocerous (mean = 8.9 Mya, SD = 2.1, 95% HPD = 5.8 to 13.1 Mya). A Yule speciation 294 process with constant population size and an uncorrelated lognormal molecular clock was 295 assumed. Analyses were run for 200 million generations (sampling every 20,000), with 296 convergence checked in Tracer v1.7.1. 297 To estimate divergence times from genomic data while accounting for ILS, SNAPP 298 v1.4.1 (Bryant et al., 2012) implemented in BEAST v2.6.6 was used. Analyses were conducted 299 on Dataset v (774 unlinked SNPs across 56 individuals representing major lineages). The input 300 XML file was prepared using snapp_prep.rb (Stange et al., 2018). The calibration points 301 described above were applied as priors on the corresponding node. SNAPP analyses ran for 3 302 million MCMC generations (sampling every 300 steps), with the first 10% discarded as burn-in. 303 Convergence was assessed in Tracer v1.7.1, and the MCC tree was generated using 304 TreeAnnotator v2.7.3. 305 306 2.7. Biogeography 307 Biogeographic Reconstructions 308 To investigate the geographic patterns of speciation and dispersal among 309 Sinocyclocheilus species, we conducted biogeographic reconstructions in BioGeoBEARS 310 (Matzke, 2014) implemented within the RASP (Reconstruct Ancestral State in Phylogenies) 311 framework (Yu et al., 2020). The Karst habitats occupied by Sinocyclocheilus cavefish are 312 characterised by significant isolation, which inherently limits dispersal between major 313 hydrological basins (Ma et al., 2023). Additional factors such as the size of core distribution 314 areas, regional geological activity, and the paleo-drainage history of river networks, also known 315 to influence species distributions (Lei, 2023), guided the definition of nine discrete operational 316 biogeographic areas, corresponding to major river systems currently occupied by 317 .CC-BY-ND 4.0 International licenseavailable under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (whichthis version posted July 15, 2025. ; https://doi.org/10.1101/2025.07.09.664015doi: bioRxiv preprint Sinocyclocheilus: Guijiang/Hejiang (GH), Liujiang (LJ), Hongshuihe (HS), Nanpanjiang (NP), 318 Beipanjiang (BP), Red River (RR), Jinsha River (JS), Wujiang (WJ), and Zuojiang (ZJ). Given 319 the narrow distribution ranges and high endemism of most species, suggesting limited dispersal 320 across major basins, we constrained the ancestral and extant range sizes to a maximum of two 321 areas. 322 Three statistical models were explored: Dispersal-Extinction-Cladogenesis (DEC) (Ree 323 & Smith, 2008), Dispersal-Vicariance Analysis-like (DIVALIKE) (Ronquist, 1997), and BayArea-324 like (BAYAREALIKE) (Landis et al., 2013). Each model was tested with and without the founder-325 event speciation parameter (+J) (Matzke, 2013, 2014). The "+J" parameter explicitly accounts 326 for founder-event or "jump dispersal" speciation, a mechanism plausible in fragmented 327 environments like Karst systems, where small, isolated populations may rapidly diverge 328 following colonization of new cave habitats. Incorporating this parameter is therefore critical for 329 accurately reconstructing the biogeographic history of Sinocyclocheilus. The time-calibrated 330 phylogenetic tree produced in SNAPP was used as the input topology for these analyses in 331 RASP. All the six models were compared to identify the most suitable model for 332 Sinocyclocheilus and to evaluate whether incorporating the +J parameter significantly improved 333 model fit relative to the corresponding null models (DEC, DIVALIKE, BAYAREALIKE). Model 334 selection was based on the corrected Akaike Information Criterion (AICc) (Anderson & 335 Burnham, 2004). 336 337 Estimation of Genetic Connectivity and Barriers to Gene Flow 338 To identify connectivity corridors and potential barriers to gene flow at a fine spatial 339 scale, we applied the Fast and Flexible Estimation of Effective Migration Surfaces (FEEMS) 340

Method

(Marcus et al., 2021). The input dataset consisted of SNPs, filtered to exclude variants 341 with a minor allele frequency (MAF) below 5% or missing data exceeding 10%. Filtering was 342 performed using PLINK v.1.9.0 (Chang et al., 2015), resulting in a final dataset of 23,184 SNPs 343 across 318 individuals. Sampled populations were assigned to vertices of a customized 5 km 344 triangular discrete global grid (DGG), generated with the R package ‘dggridR’ (Barnes et al., 345 2017). Effective migration rates across this grid were estimated using a penalized likelihood 346 framework that incorporates a smoothing parameter (λ) to control model complexity. Because 347 the estimated migration surface is sensitive to the choice of λ, where low λ values may cause 348 overfitting and spurious inferences (Marcus et al., 2021), we employed the ‘leave-one-out’ 349 cross-validation procedure implemented in FEEMS to identify an optimal value. This procedure 350 .CC-BY-ND 4.0 International licenseavailable under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (whichthis version posted July 15, 2025. ; https://doi.org/10.1101/2025.07.09.664015doi: bioRxiv preprint tested 20 values evenly spaced across a range from 10−6 to 102. The model corresponding to 351 the λ value that minimized the L2 error was selected for visualization and interpretation of 352 effective migration surfaces. 353 354 3. RESULTS 355 Our integrative analyses, combining Sanger sequencing and RAD-seq phylogenomics, 356 for a comprehensive inference of species delimitation, population genetics and biogeography, 357 revealed a complex evolutionary history for Sinocyclocheilus, characterized by substantial 358 cryptic diversity and varying degrees of congruence among molecular datasets and analyses. 359 360 3.1. Sanger Data Analyses 361 Phylogenetic Relationships from Sanger Data 362 Phylogenetic analyses (ML and BI) of the concatenated three-gene Sanger dataset 363 (cytb+ND4+Rag-1; 377 haplotypes, 3423 bp) and a two-gene mitochondrial dataset (cytb+ND4) 364 generally provided strong support for the major clades within Sinocyclocheilus (Fig. 1, 365 Supplementary Figs. S2-S5). Across both datasets, five major clades (Clades I-V) were 366 recovered. The S. jinxiensis was a sister species of S. tianlinensis and S. anatirostris in ML and 367 MrBayes trees. While the topologies of the ML and Bayesian trees were largely congruent in 368 recovering these major clades, some topological inconsistencies were observed. For instance, 369 the placement of species such as S. ronganensis, S. macrolepis, S. angularis, S. 370 zhenfengensis, and S. cf. furcodorsalis differed between ML and Bayesian trees, and between 371 the mitochondrial and the ‘mitochondrial + nuclear’ datasets (Fig. 1, Supplementary Figs. S2-372 S5). Relationships among certain recognised species, such as populations attributed to S. 373 cyphotergous and S. multipunctatus, were unstable or non-monophyletic. Notably, the Rag-1 374 nuclear gene tree exhibited marked topological differences compared to the mitochondrial gene 375 trees (Supplementary Figs. S6-S7), suggesting potential discordance due to processes like 376 incomplete lineage sorting or mitochondrial introgression. 377 378 Species Delimitation with Sanger Data 379 Species delimitation methods applied to the Sanger datasets (ABGD, bPTP, mPTP, bGMYC) 380 consistently recovered fewer putative species than the number of morphologically recognized 381 .CC-BY-ND 4.0 International licenseavailable under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (whichthis version posted July 15, 2025. ; https://doi.org/10.1101/2025.07.09.664015doi: bioRxiv preprint species included in the analyses (Fig. 1, Supplementary Table S4). For the concatenated 382 cytb+ND4+Rag-1 dataset, these methods identified between 29 (ABGD) and 63 (bPTP) putative 383 species. Similar ranges were observed for the cytb+ND4 dataset (26-61 species) and for the 384 individual mitochondrial markers (cytb: 29-55; ND4: 27-37 species). Most importantly, these 385 Sanger-based delimitation approaches generally failed to consistently resolve closely related, 386 morphologically distinct species. In several cases, they also failed to detect distinct 387 morphological lineages that were later recovered using RAD-seq data. 388 389 3.2. RAD-seq Data Analyses 390 Phylogenomic Reconstruction from RAD-seq Data 391 RAD sequencing yielded an average of ~3.6 billion raw reads per sample (error rate ~0.03%; 392 Supplementary Table S5). Phylogenomic analyses of the filtered RAD-seq SNP dataset (87,141 393 SNPs from loci present in ≥90% of 324 individuals) using both concatenation (IQ-TREE, 394 MrBayes) and coalescent-based methods (SVDquartets/tetrad, ASTRAL) produced largely 395 congruent and well-resolved topologies. These analyses consistently identified six well-396 supported major clades (Clades A-F; Fig. 2, Supplementary Figs. S8-S10), with most nodes 397 receiving strong support (bootstrap values for IQ-TREE >95 and posterior probabilities >0.9 for 398 MrBayes, ASTRAL). The ASTRAL species tree, which accounts for incomplete lineage sorting, 399 showed maximal support (posterior probability of 1.0) for nearly all nodes (Supplementary Fig. 400 S9). We recognized four cryptic species (S. cf. bicornutus, S. cf. furcodorsalis, S. cf. 401 qiubeiensis, and S. cf. yishanensis), which along with their respective sister species, were 402 strongly supported across all four phylogenetic analyses (IQ-TREE, MrBayes, ASTRAL, and 403 tetrad). Two additional cryptic species, S. cf. guanyangensis and S. cf. multipunctatus, were 404 strongly supported as distinct from their respective sister taxa by IQ-TREE and MrBayes 405 analyses, however, reciprocal monophyly was not achieved in the MrBayes and Tetrad trees. 406 Minor topological incongruences were observed between the concatenation-based (IQ-TREE; 407 Fig. 2) and coalescent-based (ASTRAL; Supplementary Fig. S9) approaches, for example, in 408 the placement of S. jii, S. huangtianensis, and S. brevifinus. The SVDquartets analysis also 409 placed some individuals outside of their conspecific groupings, including S. lingyunensis 410 (Liu172) and certain S. cf. multipunctatus and S. guanyangensis individuals (Supplementary 411 Fig. S10), suggesting complex population histories or possible misidentifications. The S. 412 jinxiensis is classified into Clade B with S. aquihornes, S. tianlinensis and S. anatirostris (IQ-413 TREE, MrBayes, ASTRAL, and tetrad). 414 .CC-BY-ND 4.0 International licenseavailable under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (whichthis version posted July 15, 2025. ; https://doi.org/10.1101/2025.07.09.664015doi: bioRxiv preprint 415 Species Delimitation with RAD-seq Data (BPP) 416 Bayesian species delimitation using BPP on the RAD-seq SNP dataset provided strong support 417 for a greater number of distinct evolutionary lineages compared to the Sanger dataset. The 418

Results

largely agreed with the major clades identified in the phylogenomic analyses, while also 419 further subdividing them (Supplementary Table S6, Supplementary Fig. 2). Overall, BPP 420 analyses on the 318 ingroup Sinocyclocheilus individuals supported the existence of at least 56 421 distinct evolutionary lineages, many of which correspond to already recognized species, but 422 several of which represent strong candidates for undescribed taxa. Within Clade A, six species 423 were confirmed (PP = 1.0), and a population previously identified as S. guanyangensis were 424 consistently delimited as a distinct potential cryptic species (S. cf. guanyangensis). Clade B 425 comprised four species (PP=1.0). Clade C resolved into thirteen species (PP=0.8), with samples 426 of S. jiuxuensis and S. altishoulderus forming distinct population-level clusters within a broader 427 S. altishoulderus lineage. This clade also included two strongly supported candidate new 428 species: S. cf. bicornutus and S. cf. furcodorsalis. In Clade D, seven species were delimited 429 with strong support (PP=1.0), including two populations previously identified as S. 430 multipunctatus, which were consistently recognized as a distinct putative species (S. cf. 431 multipunctatus). Clade E resolved into fourteen species (PP=1.0), including a distinct population 432 within S. qiubeiensis (S. cf. “qiubeiensis”). Within Clade F, twelve species were delimited 433 (PP=1.0), with structured populations of S. lingyunensis, and one population of S. yishanensis 434 forming another candidate species (S. cf. yishanensis. 435 436 3.3. Population Genetics and Evolutionary Dynamics 437 Divergence Time Estimates 438 Time-calibrated phylogenetic analysis using SNAPP on the RAD-seq SNP dataset 439 estimated the crown age of the sampled Sinocyclocheilus to be approximately 17.96 Ma (95% 440 HPD: 11.87-24.8 Ma) (Supplementary Fig. S11). Other recent divergence events included the 441 subclade containing S. brevibarbatus, S. mashanensis, and S. cf. furcodorsalis, with a tMRCA of 442 0.8 Ma (95% HPD: 0.41-1.34 Ma). The overall topology of the SNAPP tree was congruent with 443 the six major clades recovered in the other RAD-seq analyses. 444 445 .CC-BY-ND 4.0 International licenseavailable under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (whichthis version posted July 15, 2025. ; https://doi.org/10.1101/2025.07.09.664015doi: bioRxiv preprint Genetic Structure 446 STRUCTURE analysis of the RAD-seq SNP dataset for all ingroup individuals identified 447 an optimal K=6, based on the Evanno method (Evanno et al., 2005) (Supplementary Table S7). 448 These six genetic clusters corresponded closely to the six major phylogenomic clades (Clades 449 A-F) recovered in previous analyses (Fig. 3B). DAPC analysis, using these six clades as a priori 450 groups, also revealed clear differentiation among them (Fig. 3C). Clades B and D emerged as 451 the most genetically distinct, while Clades A, C, E, and F showed comparatively lower levels of 452 genetic differentiation from each other in the DAPC space, though they were still 453 distinguishable. 454 455 Gene Flow and Admixture 456 TreeMix analyses, allowing for up to 20 migration events (m=20, where likelihood scores began 457 to plateau; Fig. 3D), suggested several episodes of ancestral gene flow. Notable inferred events 458 included the admixture between the ancestral lineages of (S. anophthalmus + S. 459 maitianheensis) and S. oxycephalus; between S. rhinocerous and S. broadihornes; between the 460 ancestor of (S. anophthalmus, S. maitianheensis, S. tingi, S. huizeensis) and S. qujingensis; 461 and between ancestral S. tianlinensis + S. anatirostris and S. broadihornes (Fig. 3A). These 462 complex admixture events may contribute to some of the topological uncertainties observed in 463 phylogenetic reconstructions and underscore a reticulate evolutionary history in the 464 diversification of Sinocyclocheilus. 465 466 3.4 Biogeographic Reconstructions 467 Among the biogeographic models evaluated, the DIVALIKE+J was the best-fitting model 468 with the highest likelihood and the lowest Akaike Information Criterion (AICc) (Table 1). This 469

Result

underscores the prominent role of founder-event speciation (i.e. jump dispersal) in 470 shaping the current biogeographic pattern observed across the extant Sinocyclocheilus species. 471 Under the DIVALIKE+J model, the genus’ range evolution involved approximately 21 dispersal 472 events, 21 vicariance events, and five extinction events. Most of these events, particularly those 473 associated with speciation and dispersal, occurred during the Pleistocene (Fig. 4). The ancestral 474 range reconstruction suggests that Sinocyclocheilus originated in the combined 475 Guijiang/Hejiang and Nanpanjiang River regions, likely representing the genus’ centre of origin, 476 during the middle Miocene, around 17.96 million years ago (Ma) (Fig. 4). 477 .CC-BY-ND 4.0 International licenseavailable under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (whichthis version posted July 15, 2025. ; https://doi.org/10.1101/2025.07.09.664015doi: bioRxiv preprint Following the initial diversification of Sinocyclocheilus, several key colonisation events 478 originated from ancestral lineages in the Nanpanjiang River region. The Hongshuihe River was 479 colonised twice, at approximately 7.53 Ma and 5.63 Ma. Other early dispersal events during the 480 late Miocene included colonization of the Zuojiang River around 8.59 Ma and the Liujiang River 481 around 7.19 Ma. Subsequent expansions from the Nanpanjiang River region extended into the 482 Beipanjiang River system and into the Wujiang River system during two distinct periods in the 483 Pliocene, around 3.73 Ma and 3.31 Ma (Fig. 4). 484 Subsequent dispersal events during the Pleistocene accounted for most range evolution 485 events, likely contributing to the genus diversification (Fig. 4). For instance, in Clade C, 486 dispersal was inferred from the Beipanjiang River to the Hongshuihe River at ~1.42 Ma, and 487 from the Nanpanjiang River to the Beipanjiang River at ~1.19 Ma. In Clade D, lineages 488 originating in the Wujiang River are inferred to have dispersed into the Hongshuihe River ~0.43 489 Ma. Clade E showed three independent colonisation events from the Liujiang River to the 490 Hongshuihe River, dated to ~1.17 Ma, ~0.98 Ma, and ~0.25 Ma. In Clade F, lineages from the 491 Nanpanjiang River expanded into the Jinsha River system around 1.33 Ma, and later colonised 492 the Red River system during two occasions, at ~0.42 Ma and ~0.31 Ma. Finally, for Clade A, the 493 model inferred a very dynamic biogeographic history shaped by five dispersal events, five 494 vicariance events, and five extinction events. 495 Table 1. Biogeographical model comparison with and without jump dispersal events (+J) for 496 Sinocyclocheilus species. Abbreviations as follows: LnL = log-likelihood; numparams = number of 497 parameters in each model; d = dispersal rate; e = extinction rate; j = founder -event speciation 498 rate; LRT = likelihood -ratio test; AICc = Akaike Information Criterion corrected; AICc_wt = AICc 499 weight. 500 501 Model LnL numparams d e j AICc AICc_wt DEC - 79.76 2 0.035 1.57 0 163.8 0.0023 DEC+J -73.7 3 0.0027 1.69 0.023 153.9 0.32 DIVALIKE - 80.33 2 0.025 0.48 0 164.9 0.0013 DIVALIKE+J - 73.47 3 1.00E- 12 0.62 0.025 153.4 0.41 BAYAREALIKE - 86.63 2 0.024 0.56 0 177.5 2.40E-06 .CC-BY-ND 4.0 International licenseavailable under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (whichthis version posted July 15, 2025. ; https://doi.org/10.1101/2025.07.09.664015doi: bioRxiv preprint BAYAREALIKE+J - 73.89 3 1.00E- 07 0.27 0.025 154.2 0.27 502 3.5 Estimation of Genetic Connectivity and Barriers to Gene Flow 503 The FEEMS cross-validation procedure identified an optimal smoothing parameter (λ = 504 2.069) that minimized the L2 prediction error (Fig. 5A), and this value was used for subsequent 505 analyses and visualization of migration surfaces. The resulting FEEMS map revealed several 506 pronounced genetic barriers across the distribution range of Sinocyclocheilus, indicating regions 507 of restricted gene flow. Strong barriers were detected in the southwestern part of Guizhou 508 Province and the northeastern part of Yunnan Province. These regions correspond 509 geographically to the Beipanjiang River valley and the Wumeng Mountain Range (Fig. 5B). 510 Furthermore, low effective migration rates were also inferred in eastern Yunnan, likely reflecting 511 the Nanpanjiang River valley acting as another geographical barrier impeding gene flow (Fig. 512 5B). Within the Guangxi Zhuang Autonomous Region, a distinct genetic barrier was detected in 513 the northwestern and central areas, likely associated with limited gene flow across the 514 Hongshuihe River valley (Fig. 5B). Migration rates were also reduced in northwestern Guangxi, 515 potentially due to the isolating effect of the Zuojiang River valley (Fig. 5B). Similarly, the northern 516 and northeastern regions of Guangxi exhibited very low effective migration rates, a pattern that 517 may reflect the Guijiang and the Liujiang River valleys acting as barriers to genetic exchange 518 (Fig. 5B). 519 520 4. DISCUSSION 521 By employing a genomic approach to examine the evolutionary history of the 522 Sinocyclocheilus cavefish radiation, our study clarified its species boundaries and overcame the 523

Limitations

of mitochondrial markers that previously obscured the group's true diversity. We 524 uncovered cryptic lineages using several complementary methods and refined the genus’s 525 evolutionary relationships. Building on this updated species framework, we investigated both 526 contemporary and historical processes shaping diversification. Population genomic analyses 527 revealed patterns of genetic structure and gene flow. Divergence estimates and biogeographic 528 reconstruction traced the origin of this radiation to the mid-Miocene, shaped by vicariance, 529 dispersal, and environmental change. The outcome is a comprehensive evolutionary narrative 530 of a radiation characterised by cryptic speciation, subterranean habitat specialisation, and high 531 .CC-BY-ND 4.0 International licenseavailable under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (whichthis version posted July 15, 2025. ; https://doi.org/10.1101/2025.07.09.664015doi: bioRxiv preprint levels of endemism, demonstrating that Sinocyclocheilus diversity arose through repeated 532 episodes of isolation and secondary contact across a dynamic Karst landscape. 533 534 4.1 Integrative Genomics in Resolving Complex Radiations 535 Adaptive radiations such as Sinocyclocheilus often exhibit a mismatch between 536 morphological evolution and molecular divergence, complicated further by incomplete lineage 537 sorting (ILS) and possible reticulate evolution, all of which hinder systematic resolution 538 (Seehausen, 2004). Our findings conform to this pattern. While earlier studies based on 539 morphology and limited mitochondrial markers have laid the foundation for Sinocyclocheilus 540 systematics (e.g., Zhao & Zhang, 2009; Mao et al., 2021), our genome-wide dataset provides 541 considerably improved resolution. The phylogenomic framework derived from RAD-seq data 542 (Fig. 2) strongly supports clades that were previously weakly resolved using Sanger sequencing 543 (Fig. 1), consistent with the view that phylogenomics can overcome the confounding effects of 544 ILS in recent radiations (Edwards et al., 2016). 545 Combining genomic data with species delimitation and population genetic analyses 546 proved highly informative. Taxa previously treated as widespread species or species complexes 547 often resolved into multiple, genetically distinct, though sometimes morphologically cryptic, 548 lineages under the RAD-seq + BPP framework (especially within Clades C and D). Six putative 549 new species (S. cf. guanyangensis, S. cf. multipunctatus, S. cf. bicornutus, S. cf. furcodorsalis, 550 S. cf. qiubeiensis, and S. cf. yishanensis) underscore this phylogenetic structure. Morphological 551 differences are subtle and only became evident after genomic distinctiveness had been 552 established. In the absence of genome-scale resolution, such lineages might have been 553 dismissed as intraspecific variation. Although Sinocyclocheilus jinxiensis was previously 554 assigned to Pseudosinocyclocheilus based on phenetic characteristics (Zhang & Zhao, 2016), 555 our phylogeny supports its placement in Sinocyclocheilus(Figs. 1-2, Supplementary Figs. S2-S5 556 and S8-S11). 557 558 4.2 Methodological Congruence and Conflict 559 Species delimitation still remains contentious in systematics (Sites & Marshall, 2003; 560 Carstens et al., 2013; Chambers et al., 2025; Emerson, 2025). We directly compared multiple 561

Methods

applied to both Sanger and RAD-seq datasets. Sanger-based approaches (ABGD, 562 bPTP, mPTP) consistently delimited fewer species than RAD-seq based BPP analyses, which 563 .CC-BY-ND 4.0 International licenseavailable under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (whichthis version posted July 15, 2025. ; https://doi.org/10.1101/2025.07.09.664015doi: bioRxiv preprint supported the recognition of six cryptic species (S. cf. guanyangensis, S. cf. multipunctatus, S. 564 cf. bicornutus, S. cf. furcodorsalis, S. cf. qiubeiensis, and S. cf. yishanensis). This partly reflects 565 differences in data resolution and the limitations of single-locus and distance-based methods in 566 resolving shallow divergences or accommodating high intraspecific variation (Papadopoulou et 567 al., 2008; Sukumaran & Knowles, 2017). 568 BPP’s strength lies in its probabilistic treatment of gene tree–species tree discordance 569 while accounting for ancestral polymorphism (Yang & Rannala, 2010). However, its results can 570 be sensitive to priors and guide tree topology (Leaché & Fujita, 2010), and its assumption of 571 post-divergence isolation is frequently violated in nature. Moreover, our TreeMix analysis (Fig. 572 3A) revealed extensive gene flow among lineages, further complicating species boundaries. We 573 argue that concordance across MSC models, genetic structure, and phenotypic evidence 574 provides the strongest support for species delimitation, while discrepancies should be viewed as 575 a point of further inquiry into aspects such as the natural history of the system. 576 577 4.3 Geography of Speciation: Vicariance, Dispersal, and Climate 578 Sinocyclocheilus diversification reflects a complex interplay of geological isolation, 579 episodic connectivity, and climatic shifts. Our population structure analyses (Fig. 3B, C) reveal 580 strong genetic clustering congruent with the geographically structured phylogenomic clades, 581 indicating that most species are genetically isolated by the region’s fragmented Karst landscape 582 (Niemiller et al., 2013). 583 Biogeographic reconstruction places the origin of the radiation in the Nanpanjiang and 584 Guijiang/Hejiang river systems during the mid-Miocene (~18 Ma; Fig. 4), coinciding with major 585 drainage reorganisation along the southeastern Tibetan Plateau (Ma et al., 2019; Sun et al., 586 2022). This initial vicariance appears to have triggered initial diversification. However, vicariance 587 alone cannot explain the full pattern. The superior fit of the DIVALIKE+J model indicates that 588 founder-event speciation has been recurrent. After initial separation, repeated dispersal into new 589 river basins, facilitated by tectonic activity and climate-driven hydrological shifts, has shaped the 590 distribution and evolution of lineages (Wan et al., 2007; Zhang, 2012). Major expansions into the 591 Liujiang, Hongshuihe, and Wujiang basins appear linked to the late Miocene and Pliocene 592 tectonic events (Yuan & Chen, 1993; Zhang et al., 2023). 593 Gene flow adds further complexity to this history. TreeMix analyses (Fig. 3A) reveal 594 multiple admixture events, suggesting that intermittent hydrological connections enabled contact 595 .CC-BY-ND 4.0 International licenseavailable under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (whichthis version posted July 15, 2025. ; https://doi.org/10.1101/2025.07.09.664015doi: bioRxiv preprint between diverging lineages. Such reticulation likely contributes to the inconsistent mapping of 596 traits like eye reduction across the species tree. Contemporary gene flow remains constrained 597 by geography: Our FEEMS analysis (Fig. 5B) highlights current barriers such as the 598 Nanpanjiang and Hongshuihe valleys, reinforcing their long-term role in structuring genetic 599 diversity. 600 601 4.4 Dynamics of Rapid Subterranean Diversification 602 The Sinocyclocheilus radiation, restricted to a topographically and ecologically complex 603 region, offers a valuable model for studying rapid diversification. Cave systems promote 604 isolation and strong divergent selection, both conducive to speciation (Porter & Crandall, 2003). 605 The genus spans ecological extremes, from surface-dwellers with full vision to obligate cave 606 forms lacking eyes and relying on enhanced mechanosensory traits (Zhao & Zhang, 2009). This 607 combination of ecological specialisation and fragmented terrain likely creates a shifting mosaic 608 of selection pressures and dispersal routes, promoting rapid diversification. 609 Our findings also provide a foundation for exploring the evolution and genetic basis of 610 stygomorphic traits. The gene flow detected among cave lineages implies that speciation has 611 not always occurred in complete isolation. Instead, parapatric or allo-parapatric modes, entailing 612 divergence with gene flow, likely characterise much of this radiation (Sobel et al., 2010). 613 614 4.5 Conservation Implications in a Cryptic Radiation 615 Our findings have clear conservation relevance. The presence of numerous cryptic and 616 locally endemic lineages suggests that Sinocyclocheilus diversity has been underestimated. 617 Many lineages are restricted to single caves or micro-drainages, making them highly vulnerable 618 to habitat degradation, groundwater extraction, and pollution (Luo et al., 2023). 619 Accurate taxonomy is important for conservation. By resolving taxonomic uncertainties 620 and identifying new candidate species, our study provides essential data for defining 621 Evolutionarily Significant Units (ESUs). Regions identified as centres of origin and 622 diversification, particularly the Nanpanjiang basin, should be prioritised for conservation. And 623 present-day genetic barriers revealed by our analyses point to isolated populations that may 624 require separate management. Our integrated framework offers a strong basis for conservation 625 .CC-BY-ND 4.0 International licenseavailable under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (whichthis version posted July 15, 2025. ; https://doi.org/10.1101/2025.07.09.664015doi: bioRxiv preprint planning, accounting for the improved taxonomic diversity of this threatened and unique cave 626 fauna. 627 628 5. CONCLUSION 629 This study advances our understanding of the complex evolutionary history of the 630 Sinocyclocheilus radiation by integrating phylogenomics, coalescent-based species delimitation, 631 and biogeographic analysis. Our genome-wide dataset yielded a robust and highly resolved 632 phylogeny, revealing far greater species-level diversity than previously recognised. Several 633 cryptic lineages were identified, demonstrating that substantial genetic divergence often exists 634 beneath subtle or convergent morphology. These findings call for a taxonomic revision and 635 highlight the limitations of traditional markers in rapidly diversifying clades. The diversification of 636 Sinocyclocheilus reflects a dynamic interplay of historical and contemporary processes. Our 637 analyses suggest a mid-Miocene origin in Yunnan–Guizhou River systems, with a key 638 vicariance event, likely linked to drainage reorganisation, acting as the initial trigger. Founder-639 event dispersal into emergent Karst habitats subsequently became a dominant mode of 640 speciation. Yet, isolation was not complete; clear signals of gene flow between lineages suggest 641 that secondary contact and introgression have also shaped the genomic landscape of the 642 radiation. 643 644 .CC-BY-ND 4.0 International licenseavailable under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (whichthis version posted July 15, 2025. ; https://doi.org/10.1101/2025.07.09.664015doi: bioRxiv preprint AUTHOR CONTRIBUTIONS 645 YWL, MM, TRM, and JY conceptualised the research and designed the methodology; YWL, 646 JJZ, SPZ, TRM, JY, and MM conducted fieldwork and curated the data; YWL and TRM carried 647 out formal analysis; YWL, TRM, and MM wrote the original draft; YWL made figures and tables; 648 MM, MMV, MRP, and JY supervised; All authors verified. All authors reviewed and edited the 649 draft. All authors read and approved the final manuscript. 650 651

Acknowledgements

652 We thank the EED lab members for fieldwork; Cheng-Hai Fu for field work; MMV was supported 653 as a postdoctoral fellow by Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP 654 #2019/08308-0 and #2023/16814-9) while working on this manuscript. 655 656 FUNDING 657 Funding for this study is provided by (1) National Natural Science Foundation of China 658 (#32260333) to MM; (2) Guangxi University Higher-Talents Funding to MM for fieldwork, lab 659 work, analyses and supporting YWL, TRM; (3) National Natural Science Foundation of China 660 (#31860600) to JY for fieldwork (4) Guangxi Natural Science Foundation 661 (#2017GXNSFFA198010) to JY for research work (5) Innovation Project of Guangxi Graduate 662 Education (#YCBZ2021008) to TRM and YWL for research work. 663 664 These funding bodies had no role in the study's design, data collection, analysis, and 665 interpretation, or manuscript writing. 666 667 CONFLICT OF INTEREST STATEMENT 668 The authors declare no conflicts of interest. 669 670 DATA AVAILABILITY STATEMENT 671 Dara will be provided at a later stage after the paper is accepted for publication. 672 673 .CC-BY-ND 4.0 International licenseavailable under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (whichthis version posted July 15, 2025. ; https://doi.org/10.1101/2025.07.09.664015doi: bioRxiv preprint ETHICS STATEMENT 674 The animal study was reviewed and approved by the Institutional Animal Care and Use 675 Committee of Guangxi University (GXU), Nanning-China (#GXU-2024-282). The Guangxi 676 Province Government approved field sampling; fish were sampled using trap nets, given the 677 rarity of species, tissues from the fish were obtained mainly using fin clips without destructive 678 sampling except for when whole specimens were required for further taxonomy work. 679 680 681 .CC-BY-ND 4.0 International licenseavailable under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (whichthis version posted July 15, 2025. ; https://doi.org/10.1101/2025.07.09.664015doi: bioRxiv preprint

References

682 Anderson, D., & Burnham, K. (2004). Model selection and multi -model inference. Second. NY: Springer-683 Verlag, 63(2020), 10. 684 Baird, N. A., Etter, P. D., Atwood, T. S., Currey, M. C., Shiver, A. L., Lewis, Z. A., ... & Johnson, E. A. (2008). 685 Rapid SNP discovery and genetic mapping using sequenced RAD markers. PloS one, 3(10), e3376. 686 Barnes, R., Sahr, K., Evenden, G., Johnson, A., & Warmerdam, F. (2017). dggridR: Discrete Global Grids 687 for R. R package version 0.1. 12. 688 Bryant, D., Bouckaert, R., Felsenstein, J., Rosenberg, N. A., & RoyChoudhury, A. (2012). Inferring species 689 trees directly from biallelic genetic markers: bypassing gene trees in a full coalescent analysis. Molecular 690 biology and evolution, 29(8), 1917-1932. 691 Capella-Gutiérrez, S., Silla-Martínez, J. M., & Gabaldón, T. (2009). trimAl: a tool for automated alignment 692 trimming in large-scale phylogenetic analyses. Bioinformatics, 25(15), 1972-1973. 693 Carstens, B. C., Pelletier, T. A., Reid, N. M., & Satler, J. D. (2013). How to fail at species 694 delimitation. Molecular ecology, 22(17), 4369-4383. 695 Chambers, E. A., Lara-Tufiño, J. D., Campillo -García, G., Cisneros-Bernal, A. Y., Dudek Jr, D. J., León -696 Règagnon, V., ... & Hillis, D. M. (2025). Distinguishing species boundaries from geographic 697 variation. Proceedings of the National Academy of Sciences, 122(19), e2423688122. 698 Chang, C. C., Chow, C. C., Tellier, L. C., Vattikuti, S., Purcell, S. M., & Lee, J. J. (2015). Second-generation 699 PLINK: Rising to the challenge of larger and richer datasets. Gigascience, 4(1), s13742-015-0047–0048. 700 Chen, B., Mao, T. R., Liu, Y. W., Dai, W. Z., Li, X. L., Rajput, A. P., ... & Meegaskumbura, M. (2022). Sensory 701 evolution in a cavefish radiation: patterns of neuromast distribution and associated behaviour in 702 Sinocyclocheilus (Cypriniformes: Cyprinidae). Proceedings of the Royal Society B, 289(1984), 20221641. 703 Chifman, J., & Kubatko, L. (2015). Identifiability of the unrooted species tree topology under the coalescent 704 model with time-reversible substitution processes, site-specific rate variation, and invariable sites. Journal 705 of theoretical biology, 374, 35-47. 706 Combrink, L. L., Golcher -Benavides, J., Lewanski, A. L., Rick, J. A., Rosenthal, W. C., & Wagner, C. E. 707 (2025). Population Genomics of Adaptive Radiation. Molecular Ecology, 34(2), e17574. 708 De-Kayne, R., Schley, R., Barth, J. M., Campillo, L. C., Chaparro -Pedraza, C., Joshi, J., ... & Cerca, J. 709 (2025). Why do some lineages radiate while others do not? Perspectives for future research on adaptive 710 radiations. Cold Spring Harbor perspectives in biology, 17(2), a041448. 711 Earl, D. A., & VonHoldt, B. M. (2012). STRUCTURE HARVESTER: a website and program for visualizing 712 STRUCTURE output and implementing the Evanno method. Conservation genetics resources, 4, 359-361. 713 .CC-BY-ND 4.0 International licenseavailable under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (whichthis version posted July 15, 2025. ; https://doi.org/10.1101/2025.07.09.664015doi: bioRxiv preprint Eaton, D. A., & Overcast, I. (2020). ipyrad: Interactive assembly and analysis of RADseq 714 datasets. Bioinformatics, 36(8), 2592-2594. 715 Edwards, D. L., & Knowles, L. L. (2014). Species detection and individual assignment in species 716 delimitation: can integrative data increase efficacy?. Proceedings of the Royal Society B: Biological 717 Sciences, 281(1777), 20132765. 718 Edwards, S. V., Potter, S., Schmitt, C. J., Bragg, J. G., & Moritz, C. (2016). Reticulation, divergence, and 719 the phylogeography–phylogenetics continuum. Proceedings of the National Academy of Sciences, 113(29), 720 8025-8032. 721 Emerson, B. C. (2025). Delimiting Species —Prospects and Challenges for DNA Barcoding. Molecular 722 Ecology, e17677. 723 Evanno, G., Regnaut, S., & Goudet, J. (2005). Detecting the number of clusters of individuals using the 724 software STRUCTURE: a simulation study. Molecular ecology, 14(8), 2611-2620. 725 Hurst, G. D., & Jiggins, F. M. (2005). Problems with mitochondrial DNA as a marker in population, 726 phylogeographic and phylogenetic studies: the effects of inherited symbionts. Proceedings of the Royal 727 Society B: Biological Sciences, 272(1572), 1525-1534. 728 Jakobsson, M., & Rosenberg, N. A. (2007). CLUMPP: a cluster matching and permutation program for 729 dealing with label switching and multimodality in analysis of population structure. Bioinformatics, 23(14), 730 1801-1806. 731 Jombart, T., & Collins, C. (2017). A tutorial for Discriminant Analysis of Principal Components (DAPC) using 732 adegenet 2.1. 0. Imperial College, London, United Kingdom. 733 Kalyaanamoorthy, S., Minh, B. Q., Wong, T. K., Von Haeseler, A., & Jermiin, L. S. (2017). ModelFinder: fast 734 model selection for accurate phylogenetic estimates. Nature methods, 14(6), 587-589. 735 Kapli, P., Lutteropp, S., Zhang, J., Kobert, K., Pavlidis, P., Stamatakis, A., & Flouri, T. (2017). Multi -rate 736 Poisson tree processes for single-locus species delimitation under maximum likelihood and Markov chain 737 Monte Carlo. Bioinformatics, 33(11), 1630-1638. 738 Katoh, K., & Standley, D. M. (2013). MAFFT multiple sequence alignment software version 7: improvements 739 in performance and usability. Molecular biology and evolution, 30(4), 772-780. 740 Landis, M. J., Matzke, N. J., Moore, B. R., & Huelsenbeck, J. P. (2013). Bayesian analysis of biogeography 741 when the number of areas is large. Systematic Biology, 62(6), 789–804. 742 Leaché, A. D., & Fujita, M. K. (2010). Bayesian species delimitation in West African forest geckos 743 (Hemidactylus fasciatus). Proceedings of the Royal Society B: Biological Sciences, 277(1697), 3071-3077. 744 Lei, F. K. (2023). Characteristics and Conservation of the Chinese Sinocyclocheilus cavefish’s aquatic 745 habitat (Doctoral dissertation, Tsinghua University). 746 .CC-BY-ND 4.0 International licenseavailable under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (whichthis version posted July 15, 2025. ; https://doi.org/10.1101/2025.07.09.664015doi: bioRxiv preprint Li, R. H., Wang, X. A, Bian, C., Gao, Z. J., Zhang, Y. W., Jiang, W. S., Wang, M., You, X. X., Cheng, L., Pan, 747 X. F., Yang, J. X., Shi, Q. (2021). Whole-Genome Sequencing of Sinocyclocheilus maitianheensis reveals 748 phylogenetic evolution and immunological variances in various Sinocyclocheilus fishes. Frontiers in 749 Genetics, 12, 736500. 750 Li, Z. Q., Guo, B. C., Li, J. B., He, S. P., & Chen, Y. Y. (2008). Bayesian mixed models and divergence time 751 estimation of Chinese cavefishes (Cyprinidae: Sinocyclocheilus). Chinese Science Bulletin, 53(15), 2342-752 2352. 753 Luo, X. Y., Chen, B., Mao, T. R., Liu, Y. W., Yang, J., and Meegaskumbura, M. (2023). Skin transcriptomic 754 correlates of cave-dwelling Sinocyclocheilus cavefish. Frontiers in Ecology and Evolution, 11, 1264214. 755 Ma, L., Yang, J. X., Lei, F. K., Xu, M. Z., Zhao, Y. H., & Jeffery, W. R. (2023). Protection and exploration of 756 the scientific potential of Chinese cavefish. Zoological Research, 44(4), 675–677. 757 Ma, L., Zhao, Y. H., & Yang, J. X. (2019). Chapter 28—Cavefish of China. In W. B. White, D. C. Culver, & 758 T. Pipan (Eds.), Encyclopedia of Caves (Third Edition) (pp. 237–254). Academic Press. 759 Mao, T. R., Liu, Y. W., Meegaskumbura, M., Yang, J., Ellepola, G., Senevirathne, G., ... & Pie, M. R. (2021). 760 Evolution in Sinocyclocheilus cavefish is marked by rate shifts, reversals, and origin of novel traits. BMC 761 Ecology and Evolution, 21, 1-14. 762 Mao, T. R., Liu, Y. W., Vasconcellos, M. M., Pie, M. R., Ellepola, G., Fu, C. H., ... & Meegaskumbura, M. 763 (2022). Evolving in the darkness: Phylogenomics of Sinocyclocheilus cavefishes highlights recent 764 diversification and cryptic diversity. Molecular Phylogenetics and Evolution, 168, 107400. 765 Mao, T. R, Liu, Y. W., Vasconcellos, M. M., Zhou, S. P., Ellepola, G., Yang, J., Pie. M. R and 766 Meegaskumbura, M. (2024). Caves as species pumps: key innovations, isolation, and periodic 767 introgression drive the world’s largest cavefish radiation in a dynamic karstic landscape. bioRxiv, 2024-09. 768 Marcus, J., Ha, W., Barber, R. F., & Novembre, J. (2021). Fast and flexible estimation of effective migration 769 surfaces. eLife, 10, e61927. 770 Matzke, N. J. (2013). Probabilistic historical biogeography: New models for founder -event speciation, 771 imperfect detection, and fossils allow improved accuracy and model -testing. University of California, 772 Berkeley. 773 Matzke, N. J. (2014). Model Selection in Historical Biogeography Reveals that Founder -Event Speciation 774 Is a Crucial Process in Island Clades. Systematic Biology, 63(6), 951–970. 775 Minh, B. Q., Schmidt, H. A., Chernomor, O., Schrempf, D., Woodhams, M. D., Von Haeseler, A., & Lanfear, 776 R. (2020). IQ -TREE 2: new models and efficient methods for phylogenetic inference in the genomic 777 era. Molecular biology and evolution, 37(5), 1530-1534. 778 .CC-BY-ND 4.0 International licenseavailable under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (whichthis version posted July 15, 2025. ; https://doi.org/10.1101/2025.07.09.664015doi: bioRxiv preprint Nachman, M. W., & Payseur, B. A. (2012). Recombination rate variation and speciation: theoretical 779 predictions and empirical results from rabbits and mice. Philosophical Transactions of the Royal Society B: 780 Biological Sciences, 367(1587), 409-421. 781 Nguyen, L. T., Schmidt, H. A., Von Haeseler, A., & Minh, B. Q. (2015). IQ -TREE: a fast and effective 782 stochastic algorithm for estimating maximum -likelihood phylogenies. Molecular biology and 783 evolution, 32(1), 268-274. 784 Niemiller, M. L., & Soares, D. (2014). Cave environments. In Extremophile fishes: ecology, evolution, and 785 physiology of teleosts in extreme environments (pp. 161-191). Cham: Springer International Publishing. 786 Niemiller, M. L., & Zigler, K. S. (2013). Patterns of cave biodiversity and endemism in the Appalachians and 787 Interior Plateau of Tennessee, USA. PLoS One, 8(5), e64177. 788 Papadopoulou, A., Bergsten, J., Fujisawa, T., Monaghan, M. T., Barraclough, T. G., & Vogler, A. P. (2008). 789 Speciation and DNA barcodes: testing the effects of dispersal on the formation of discrete sequence 790 clusters. Philosophical Transactions of the Royal Society B: Biological Sciences, 363(1506), 2987-2996. 791 Pickrell, J., & Pritchard, J. (2012). Inference of population splits and mixtures from genome -wide allele 792 frequency data. Nature Precedings, 1-1. 793 Porter, M. L., & Crandall, K. A. (2003). Lost along the way: the significance of evolution in reverse. Trends 794 in Ecology & Evolution, 18(10), 541-547. 795 Pritchard, J. K., Stephens, M., & Donnelly, P. (2000). Inference of population structure using multilocus 796 genotype data. Genetics, 155(2), 945-959. 797 Puillandre, N., Lambert, A., Brouillet, S., & Achaz, G. J. M. E. (2012). ABGD, Automatic Barcode Gap 798 Discovery for primary species delimitation. Molecular ecology, 21(8), 1864-1877. 799 Rambaut, A., Drummond, A. J., Xie, D., Baele, G., & Suchard, M. A. (2018). Posterior summarization in 800 Bayesian phylogenetics using Tracer 1.7. Systematic biology, 67(5), 901-904. 801 Ranwez, V., Douzery, E. J., Cambon, C., Chantret, N., & Delsuc, F. (2018). MACSE v2: toolkit for the 802 alignment of coding sequences accounting for frameshifts and stop codons. Molecular biology and 803 evolution, 35(10), 2582-2584. 804 Ree, R. H., & Smith, S. A. (2008). Maximum likelihood inference of geographic range evolution by dispersal, 805 local extinction, and cladogenesis. Systematic Biology, 57(1), 4–14. 806 Reid, N. M., & Carstens, B. C. (2012). Phylogenetic estimation error can decrease the accuracy of species 807 delimitation: a Bayesian implementation of the general mixed Yule -coalescent model. BMC evolutionary 808 biology, 12, 1-11. 809 Ronquist, F. (1997). Dispersal -vicariance analysis: A new approach to the quantification of historical 810 biogeography. Systematic Biology, 46(1), 195–203. 811 .CC-BY-ND 4.0 International licenseavailable under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (whichthis version posted July 15, 2025. ; https://doi.org/10.1101/2025.07.09.664015doi: bioRxiv preprint Ronquist, F., Teslenko, M., Van Der Mark, P., Ayres, D. L., Darling, A., Höhna, S., Larget, B., Liu L., Suchard 812 M. A., Huelsenbeck, J. P. (2012). MrBayes 3.2: efficient Bayesian phylogenetic inference and model choice 813 across a large model space. Systematic biology, 61(3), 539-542. 814 Rosenberg, N. A. (2004). DISTRUCT: a program for the graphical display of population structure. Molecular 815 ecology notes, 4(1), 137-138. 816 Seehausen, O. (2004). Hybridization and adaptive radiation. Trends in ecology & evolution, 19(4), 198-207. 817 Singhal, S., Hoskin, C. J., Couper, P., Potter, S., & Moritz, C. (2018). A framework for resolving cryptic 818 species: a case study from the lizards of the Australian wet tropics. Systematic Biology, 67(6), 1061-1075. 819 Sites, J. W., & Marshall, J. C. (2003). Delimiting species: a Renaissance issue in systematic biology. Trends 820 in Ecology & Evolution, 18(9), 462-470. 821 Sobel, J. M., Chen, G. F., Watt, L. R., & Schemske, D. W. (2010). The biology of speciation. Evolution, 64(2), 822 295-315. 823 Stange, M., Sánchez-Villagra, M. R., Salzburger, W., & Matschiner, M. (2018). Bayesian divergence-time 824 estimation with genome -wide single -nucleotide polymorphism data of sea catfishes (Ariidae) supports 825 Miocene closure of the Panamanian Isthmus. Systematic Biology, 67(4), 681-699. 826 Streicher, J. W., Lambert, S. M., Méndez de la Cruz, F. R., Martínez-Méndez, N., García-Vázquez, U. O., 827 Nieto Montes de Oca, A., & Wiens, J. J. (2024). What Predicts Gene Flow During Speciation? The Relative 828 Roles of Time, Space, Morphology and Climate. Molecular Ecology, 33(23), e17580. 829 Suchard, M. A., Lemey, P., Baele, G., Ayres, D. L., Drummond, A. J., & Rambaut, A. (2018). Bayesian 830 phylogenetic and phylodynamic data integration using BEAST 1.10. Virus evolution, 4(1), vey016. 831 Sudasinghe, H., Pethiyagoda, R., & Meegaskumbura, M. (2020). A molecular phylogeny of the genus 832 Laubuka (Teleostei: Cyprinidae) in Sri Lanka reveals multiple origins and a cryptic species. Systematics 833 and Biodiversity, 18(6), 592-613. 834 Sudasinghe, H., Ranasinghe, R. T., Goonatilake, S. D. A., & Meegaskumbura, M. (2018). A review of the 835 genus Labeo (Teleostei: Cyprinidae) in Sri Lanka. Zootaxa, 4486(3), 201-235. 836 Sukumaran, J., & Knowles, L. L. (2017). Multispecies coalescent delimits structure, not 837 species. Proceedings of the National Academy of Sciences, 114(7), 1607-1612. 838 Sun, H., Li, Z., Landis, J. B., Qian, L., Zhang, T., & Deng, T. (2022). Effects of drainage reorganization on 839 phytogeographic pattern in Sino-Himalaya. Alpine Botany, 1–11. 840 Talavera, G., & Castresana, J. (2007). Improvement of phylogenies after removing divergent and 841 ambiguously aligned blocks from protein sequence alignments. Systematic biology, 56(4), 564-577. 842 .CC-BY-ND 4.0 International licenseavailable under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (whichthis version posted July 15, 2025. ; https://doi.org/10.1101/2025.07.09.664015doi: bioRxiv preprint Wan, Y. G., Shen, Z. K., Zeng, Y. H., & Sheng, S. Z. (2007). Evolution of cumulative Coulomb failure stress 843 in northeastern Qinghai -Xizang (Tibetan) Plateau and its effect on large earthquake occurrence. Acta 844 Seismologica Sinica, 20(2), 117–132. 845 Wen, H. M., Luo, T., Wang, Y. L., Wang, S. W., Liu, T., Xiao, N., & Zhou, J. (2022). Molecular phylogeny and 846 historical biogeography of the cave fish genus Sinocyclocheilus (Cypriniformes: Cyprinidae) in southwest 847 China. Integrative Zoology, 17(2), 311-325. 848 Xiao, H., Chen, S. Y., Liu, Z. M., Zhang, R. D., Li, W. X., Zan, R. G., & Zhang, Y. P. (2005). Molecular 849 phylogeny of Sinocyclocheilus (Cypriniformes: Cyprinidae) inferred from mitochondrial DNA 850 sequences. Molecular phylogenetics and evolution, 36(1), 67-77. 851 Xiao, M. Y., Wang, J. J., Luo, T., Zhou, J. J., Xiao, N., & Zhou, J. (2025). Sinocyclocheilus xingrenensis 852 (Cypriniformes, Cyprinidae), a new underground fish from Guizhou Province, Southeastern 853 China. Zoosystematics and Evolution, 101(2), 419-436. 854 Yang, Z. H. (2015). The BPP program for species tree estimation and species delimitation. Current 855 Zoology, 61(5), 854-865. 856 Yang, Z. H., & Rannala, B. (2010). Bayesian species delimitation using multilocus sequence 857 data. Proceedings of the National Academy of Sciences, 107(20), 9264-9269. 858 Yang, Z. H., & Rannala, B. (2014). Unguided species delimitation using DNA sequence data from multiple 859 loci. Molecular Biology and Evolution, 31(12), 3125-3135. 860 Yu, Y., Blair, C., & He, X. J. (2020). RASP 4: Ancestral State Reconstruction Tool for Multiple Genes and 861 Characters. Molecular Biology and Evolution, 37(2), 604–606. 862 Zhang, C., Rabiee, M., Sayyari, E., & Mirarab, S. (2018). ASTRAL -III: polynomial time species tree 863 reconstruction from partially resolved gene trees. BMC bioinformatics, 19, 15-30. 864 Zhang, C. , & Zhao, Y. . (2016). Species Diversity and Distribution of Inland Fishes in China. Science Press, 865 Beijing.Zhang, D., Gao, F., Jakovlić, I., Zou, H., Zhang, J., Li, W. X., & Wang, G. T. (2020). PhyloSuite: An 866 integrated and scalable desktop platform for streamlined molecular sequence data management and 867 evolutionary phylogenetics studies. Molecular ecology resources, 20(1), 348-355. 868 Zhang, J. J., Kapli, P., Pavlidis, P., & Stamatakis, A. (2013). A general species delimitation method with 869 applications to phylogenetic placements. Bioinformatics, 29(22), 2869-2876. 870 Zhang, L. S. (2012). Ancient Geography of China–the Formation of China’s Natural Environment. Beijing: 871 Science Press. 872 Zhang, Z., Daly, J. S., Tian, Y., Lei, C., Sun, X., Badenszki, E., ... & Hu, J. (2023). Late Oligocene formation 873 of the Pearl River triggered by the opening of the South China Sea. Geophysical Research Letters, 50(8), 874 e2023GL103049. 875 .CC-BY-ND 4.0 International licenseavailable under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (whichthis version posted July 15, 2025. ; https://doi.org/10.1101/2025.07.09.664015doi: bioRxiv preprint Zhao, Y. H., & Zhang, C. G. (2009). Endemic Fishes of Sinocyclocheilus (Cypriniformes, Cyprinidae) in 876 China—Species Diversity, Cave Adaptation, Systematics and Zoogeography. Science Press, Beijing. 877 Zhou, Q. Y., & Chen, P. Y. (1993). Lithofacies change and palaeogeographical evolution during Late 878 Cenozoic in Guizhou and its vicinity. Geology of Guizhou, 10(3), 201-207. 879 880 .CC-BY-ND 4.0 International licenseavailable under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (whichthis version posted July 15, 2025. ; https://doi.org/10.1101/2025.07.09.664015doi: bioRxiv preprint 881 MAIN FIGURES 882 883 .CC-BY-ND 4.0 International licenseavailable under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (whichthis version posted July 15, 2025. ; https://doi.org/10.1101/2025.07.09.664015doi: bioRxiv preprint 884 885 Figure 1. A-D) Molecular phylogenetic relationships of Sinocyclocheilus based on Bayesian 886 inference of the concatenated cytb +nd4+ rag1 dataset. Numbers at the nodes indicate 887 Bayesian posterior probabilities. Scale bar represents number of substitutions per site. Species 888 delimitation results from molecular methods (bPTP, mPTP, and bGMYC) applied to cytb, ND4, 889 Rag-1, and the concatenated dataset are shown as black rectangles to the right of the trees. 890 .CC-BY-ND 4.0 International licenseavailable under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (whichthis version posted July 15, 2025. ; https://doi.org/10.1101/2025.07.09.664015doi: bioRxiv preprint 891 Figure 2. Maximum-likelihood (IQ-TREE) phylogeny of Sinocyclocheilus based on 87,141 RAD 892 loci. Bootstrap support values are shown at each node. Numbers in parentheses after species 893 names denote individuals from distinct caves (i.e. separate populations). Species delimitation 894

Results

from BPP using RAD loci are depicted as white rectangles adjacent to the phylogeny. 895 .CC-BY-ND 4.0 International licenseavailable under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (whichthis version posted July 15, 2025. ; https://doi.org/10.1101/2025.07.09.664015doi: bioRxiv preprint 896 Figure 3. Population genetics and gene flow in Sinocyclocheilus. A) Population history of 56 897 Sinocyclocheilus species inferred with TREEMIX, including 20 migration edges. Arrows indicate 898 the direction of gene flow, with numbers representing the intensity of migration (ranked from 1 to 899 20). B) Structure bar plot illustrating individual assignment to genetic clusters (K = 6), 900 corresponding to the six major clades. Each bar represents one individual, and colors indicate 901 inferred ancestry. C) Discriminant analysis of principal components (DAPC) plot based on SNP 902 data showing differentiation among the six clades. D) TreeMix model-fit plot indicating the most 903 likely number of admixture events based on likelihood relative increase. 904 .CC-BY-ND 4.0 International licenseavailable under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (whichthis version posted July 15, 2025. ; https://doi.org/10.1101/2025.07.09.664015doi: bioRxiv preprint 905 Figure 4. Ancestral area reconstruction of Sinocyclocheilus. The graphical summary (top left) 906 illustrates changes in the genus’s distribution across the Miocene, Pliocene, and Pleistocene. 907 Red dots mark species collection sites. Ancestral areas were inferred using the best-fit 908 biogeographic model (DIVALIKE+J). Nine biogeographic regions were defined based on major 909 river basins: GH: Guijiang and Hejiang; LJ: Liujiang; HS: Hongshuihe; NP: Nanpanjiang; BP: 910 Beipanjiang; RR: Red River; JS: Jinsha River; WJ: Wujiang; ZJ: Zuojiang. Arrows indicate 911 inferred dispersal events, with adjacent numbers denoting their estimated timing (in millions of 912 years, Ma). Current species ranges are denoted as coloured circles at each tip of the tree, 913 corresponding to their respective region. Pie charts at each node represent the relative 914 probability of ancestral area assignments, with the median node ages labeled. 915 .CC-BY-ND 4.0 International licenseavailable under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (whichthis version posted July 15, 2025. ; https://doi.org/10.1101/2025.07.09.664015doi: bioRxiv preprint 916 Figure 5. Effective migration surfaces in Sinocyclocheilus inferred by FEEMS. A) Cross-917 validation results used to select the optimal smoothing parameter (λ = 2.069) for modelling. B) 918 Effective migration surfaces. Areas of low effective migration (genetic barriers) are depicted in 919 brown, while regions of higher effective migration (gene flow corridors) are shown in blue. 920 Significant genetic barriers correspond to the Beipanjiang River and Wumeng Mountains, 921 separating populations in southwestern Guizhou and northeastern Yunnan. Additional areas of 922 low migration rates characterizing barriers to gene flow are also evident across the 923 Nanpanjiang, Hongshuihe, Youjiang, Guijiang, and Liujiang River valleys, highlighting their roles 924 as impediments to contemporary gene flow. 925 926 .CC-BY-ND 4.0 International licenseavailable under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (whichthis version posted July 15, 2025. ; https://doi.org/10.1101/2025.07.09.664015doi: bioRxiv preprint

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

Ask this paper AI returns verbatim quotes from the full text · source: oa-pdf

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2025) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc
last seen: 2026-05-20T01:45:00.602351+00:00
unpaywall
last seen: 2026-05-23T02:00:01.238055+00:00
License: CC-BY-ND-4.0