Full text
73,328 characters
· extracted from
preprint-html
· click to expand
Demographic history of the Malayan tapirs (Tapirus indicus) in Southeast Asia | Authorea try { document.documentElement.classList.add('js'); } catch (e) { } var _gaq = _gaq || []; _gaq.push(['_setAccount', 'G-8VDV14Y67G']); _gaq.push(['_trackPageview']); (function() { var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true; ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js'; var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s); })(); Skip to main content Preprints Collections Wiley Open Research IET Open Research Ecological Society of Japan All Collections About About Authorea FAQs Contact Us Quick Search anywhere Search for preprint articles, keywords, etc. Search Search ADVANCED SEARCH SCROLL Ecological Research This is a preprint and has not been peer reviewed. Data may be preliminary. 14 November 2024 V1 Latest version Share on Demographic history of the Malayan tapirs (Tapirus indicus) in Southeast Asia Authors : Qi Luan Lim 0000-0002-1499-3627 , Yu Sato , Norsyamimi Rosli 0000-0002-0232-3060 , and Miho Inoue-Murayama [email protected] Authors Info & Affiliations https://doi.org/10.22541/au.173154377.78525662/v1 Published Ecological Research Version of record Peer review timeline 1023 views 320 downloads Contents Abstract Abstract Introduction Methods Conflict of Interest References Information & Authors Metrics & Citations View Options References Figures Tables Media Share Abstract Malayan tapir is the only extant Asiatic species in the family Tapiridae, is endangered and threatened by risk of inbreeding from population structure. To elucidate the demographic and evolutionary history of the tapirs in Southeast Asia (SEA), this study analysed whole genome data from 10 individuals for historical effective population size (Ne) inference using sequentially Markovian coalescence (i.e., PSMC, MSMC, MSMC2), folded site frequency spectrum, (i.e., Stairway Plot 2), and their hybrid SMC++. The results revealed that tapir Ne ranged 6,000–12,000 in the last glacial period but went down to 0.25) between the Sumatran and mainland SEA tapirs since at least 6 kya. Subsidence of Sundaland and rainforest reduction were the major drivers for Ne decline. The timing of population split corresponded well with the inundation of Straits of Malacca to present-day levels by rapidly raising sea-levels during 10–6 kya. Results of this study, as well as contemporary geographical isolation, supports the subspecies status of the Sumatran population. This will have implication to the ex-situ conservation practices that may have produced hybrids of the isolated populations. “‘latex Original articles Demographic history of the Malayan tapirs ( Tapirus indicus ) in Southeast Asia Qi Luan Lim, Yu Sato, Norsyamimi Rosli, Miho Inoue-Murayama* Q. L. Lim, Y. Sato, M. Inoue-Murayama Wildlife Research Center of Kyoto University, Kyoto 606-8203, Japan. Norsyamimi Rosli “‘latex National Wildlife Forensic Laboratory, Ex-Situ Conservation Division, Department of Wildlife and National Parks, 56100 Cheras, Kuala Lumpur, Malaysia. Current address: Q. L. Lim Faculty of Environmental Earth Science, Hokkaido University, Sapporo 606-0810, Japan. *Corresponding author. Tel.: +81-75-771-4375; Fax: +81-75-771-4394 Email: [email protected] Abstract Malayan tapir is the only extant Asiatic species in the family Tapiridae, is endangered and threatened by risk of inbreeding from population structure. To elucidate the demographic and evolutionary history of the tapirs in Southeast Asia (SEA), this study analysed whole genome data from 10 individuals for historical effective population size ( Ne ) inference using sequentially Markovian coalescence (i.e., PSMC, MSMC, MSMC2), folded site frequency spectrum, (i.e., Stairway Plot 2), and their hybrid SMC ++ . The results revealed that tapir Ne ranged 6,000–12,000 in the last glacial period but went down to 0.25) between the Sumatran and mainland SEA tapirs since at least 6 kya. Subsidence of Sundaland and rainforest reduction were the major drivers for Ne decline. The timing of population split corresponded well with the inundation of Straits of Malacca to present-day levels by rapidly raising sea-levels during 10–6 kya. Results of this study, as well as contemporary geographical isolation, supports the subspecies status of the Sumatran population. This will have implication to the ex-situ conservation practices that may have produced hybrids of the isolated populations. Keywords: Asian tapir; whole genome analysis; population divergence; conservation genetics; population dynamics Introduction The Malayan tapir ( Tapirus indicus Desmarest, 1819; alternatively, Acrocordia indica [Groves & Grubb, 2011]) is the only Asiatic tapir species (Tapiridae, order Perissodactyla). Geographically, three distinct populations are recognized in Southeast Asia (SEA)—in the western border of Thailand-Myanmar, along the Malay Peninsula, and on the Sumatran Island of Indonesia (Traeholt et al. 2016). According to the International Union for the Conservation of Nature (IUCN) assessment report, the species is “endangered (EN)” and is estimated to have not more than 2,500 mature individuals globally. Conservation genetic or genomic resources are still underutilized or unevenly applied to different populations of the Malayan tapir. Previous studies that used nuclear markers (i.e., microsatellites, major histocompatibility complex [MHC] genes) had inferred low genetic diversity in the wild and captive tapirs in Peninsular Malaysia (Lim et al. 2022; Ismail et al. 2023), in addition to detection of population genetic structure with substantial genetic differentiation (Lim et al. 2022). Preliminary assessment of the captive populations in Japan suggested a moderately high genetic diversity using a novel microsatellite panel (Chantra et al. 2021) that have not been systematically applied to other populations. Maternal lineages revealed using mitochondrial DNA (mtDNA) cytochrome B gene sequences suggested correspondence to three recognized populations in SEA (Pitra and Veits 2000; Rovie-Ryan et al. 2008; Ogata et al. 2009; Steiner and Ryder 2011; Muangkram et al. 2013). D-loop sequences revealed two maternal lineages in mainland SEA that have diverged 1.46 Mya and eventually become admixed in Peninsular Malaysia, as well as a significant genetic differentiation between the Thailand and Malaysian samples (Muangkram et al. 2017; Lim et al. 2021). Genomic sequences can be used to estimate historical effective population size ( Ne ) for both recent and ancient time to modelling species demographic histories and inferring adaptations related to paleoenvironment change to provide insights for species conservation. Demographic histories can be estimated using tools such as the hidden Markov model (HMM)-based pairwise sequentially Markovian coalescent (PSMC) (Li and Durbin 2011), which have been used in some iconic, threatened mammalian species in SEA such as the Sumatran rhinoceros (Mays et al. 2018; von Seth et al. 2021), Asian elephants (Palkopoulou et al. 2018), orangutans (Mattle-Greminger et al. 2018), and tigers (Armstrong et al. 2021). Successors such as multiple sequentially Markovian coalescent (MSMC and MSMC2) and SMC ++ each has its strengths and weaknesses in Ne estimation (Terhorst et al. 2017; Patton et al. 2019; Schiffels and Wang 2020; Sellinger et al. 2021). Both SMC ++ and site-frequency spectrum (SFS)-based Stairway Plot 2 can analyze larger sample sizes without genome phasing while being less computational exhaustive (Terhorst et al. 2017; Liu and Fu 2020); the former assumes reference alleles as the ancestral alleles, the latter do not require such information (for folded SFS) and is therefore suitable for non-model organisms (Liu and Fu 2020). Paleontological records suggested that Malayan tapirs exist in SEA at least 1 Mya and has survived through the last interglacial and glacial periods into modern days (Cranbrook and Piper 2013). These glacial cycles and the associating climatic and sea-level changes had shaped the complex biogeographical patterns in SEA through habitat connections and fragmentation. Provisionally, a subspecies T. indicus brevatianus was described based on two all-black specimens from the Sumatran Island (Kuiper 1926); yet the discovery of rare melanistic tapirs in the Malaysian forests on mainland suggested the form to be a condition of genetic abnormality (Asrulsani et al. 2017). The Sumatran and the Malay Peninsula populations of iconic large mammal species are either taxonomically classified as subspecies (e.g., tigers, Asian elephants) or not (e.g., Sumatran rhinoceros, which however has become extinct in Malay Peninsula). Nonetheless, the divergence time estimated by haploid PSMC approach between the Sumatran and Malayan tigers (10–7 kya), and between the Sumatran and Malay Peninsula populations of Sumatran rhinoceros (13–9 kya), corresponded to the separation of Sumatra from the mainland SEA (Armstrong et al. 2021; von Seth et al. 2021). For tapirs, the nuclear genome of the Sumatran population remained understudied to provide any genetic evidence for a subspecies status or its extinction risks. Therefore, to elucidate the population structure and demographic histories of tapirs in SEA, we used 11 genomes from both public database (unknown lineages) and selected captive tapirs in Japan whose lineages included Sumatran origin. Our results have provided new population insights for the paleoecology of the Malayan tapir as well as genomic evidence of a potential Sumatran subspecies, which will have conservation implications to the species captive breeding program worldwide. “‘latex Methods This section briefly described bioinformatic methods used in this study. See Supplementary Information for a detailed descriptions of methods and workflow. All custom scripts used in this study are available in the supplementary documents. Sample collection and DNA extraction A total of 11 Malayan tapir genomes were used in this study (Supplementary Table S1). Six of the genomes were newly sequenced derived from captive individuals in Japan, and five were retrieved from the sequence read archive (SRA). Whole blood samples of six of the Malayan tapirs in Japan were provided by Preservation and Research Center, City of Yokohama (n = 2), Tama Zoological Park (n = 2), Nagasaki Bio Park (n = 1), and Fukuoka Zoo (n = 1). These were selected based on the studbook information and the genetic profiles obtained with microsatellite and mitochondrial DNA (mtDNA) control region markers (unpublished data), Genomic DNA (gDNA) was extracted using the DNeasy® Blood and Tissue Kit (Qiagen, Germany) following the manufacturer’s spin protocol. Whole-genome re-sequencing (WGS) and alignment Entire steps of WGS were conducted by Macrogen Japan Corp. (Japan). Genomic libraries were prepared using the TruSeq DNA PCR-Free Library Prep Kits (Illumina). The libraries were sequenced on an Illumina Novaseq 6000 platform at 15‒20× sequence depth in the paired-end mode (2 ×150 bp). Quality of the raw reads, including the raw reads from SRA, was checked using FastQC v0.11.9 (Andrews 2010). Quality filtering and trimming was performed using fastp v0.20.1 (Chen et al. 2018). Using bwa-mem2 v2.2.1 (Md et al. 2019) at default setting, the filtered paired reads were mapped to the chromosome-length Hi-C genome assembly (simple repeat elements unmasked) of T. indicus released by the DNAZoo Consortium (https://www.dnazoo.org/assemblies/Tapirus_indicus) (Clavijo et al. 2017; Dudchenko et al. 2018). The alignments were processed using samtools v1.13 (Li et al. 2009; Bonfield et al. 2021), including the removal of duplicate and unmapped reads. “‘latex Detection of sex-linked scaffolds and sex-identification To include or exclude sex-linked scaffolds depending on analysis, and to use them to ascertain the sexes of the samples, potential sex-linked scaffolds in the reference assembly were identified through a combination of three approaches: 1) nucleotide blast search using blastn v2.13.0 (Camacho et al. 2009) with the X- and Y-chromosomes of horse and human, 2) nucleotide blast search with partial sequences of sex-linked genes in T. indicus , including the sex-determining region Y (SRY) and X- or Y-linked zinc finger (ZFX/Y) genes, and 3) Gaussian mixture model (GMM) of homogametic and heterogametic differences in depth of coverage (DP) (Nursyifa et al. 2022), conducted using the R package ClusterR (Mouselimis 2023) in RStudio 2023.6.1.524 with R version 4.3.1 (Posit team 2023; R Core Team 2023). Both per-scaffold and genome average DP was estimated using samtools depth . Sex of samples was identified or ascertained via unsupervised classification with principal component analysis (PCA) on the scaled DP of sex-linked scaffolds. See Supplementary Information for detailed procedures. Finally, only autosomal scaffolds ≥ 1 Mb were kept for downstream analyses, and aligned reads mapped to the autosomal scaffolds were extracted using samtools. “‘latex Mask preparation To mask genomic regions for different downstream analyses post-mapping, three different DNA regions in the reference genome were identified: 1) repetitive regions using RepeatModeler v2.0.2a (Flynn et al. 2020) and RepeatMasker v4.1.2.p1 (rmblastn v2.10.0+; (Smit et al.)), then converted to BED format using rmsk2bed (Neph et al. 2012), 2) unique regions (mappability value = 1) estimated using genmap v1.3.0 (Pockrandt et al. 2020), and 3) uncalled (N) region using seqtk cutN (2018). These masks were processed with bedtools v2.30.0 (Quinlan and Hall 2010), to create ‘Mask A’ for called, unique and non-repetitive regions and ‘Mask B’ that complement it, for the following analyses. Variant calling and filtering A genomic variant call format (gvcf) file including invariant sites were produced by joint-calling variants from the alignments using bcftools. Sites that are invariant sites, homozygotes-only, or carrying less than six called alleles were excluded. The variant sites of each sample were separately filtered to remove sites with DP 2-fold of the average per-scaffold DP, before merging back into a single vcf file, in which missing sites in any sample were recorded as missing genotypes. The vcf file was further filtered for biallelic single nucleotide polymorphism (SNP) sites not within 10 bp of indels. Finally, sites with missing genotypes, a minor allele frequency (MAF) < 0.09 (i.e., less than two alleles recoded in a sample size of 11), and no homozygous state genotype, were removed. “‘latex Genetic structure To examine the genetic structure, the vcf was filtered with ‘Mask A’, LD pruned using bcftools +prune to discard SNPs with r 2 bigger than 0.8 in a 50 kb window, then converted to Beagle format using the perl script vcf2Beagle.pl from GotCloud v1.17.5 (Jun et al. 2015). Individual admixture proportions were computed using NGSadmix (Skotte et al. 2013) from ANGSD v0.941-6-g67b6b3b build (Korneliussen et al. 2014), assuming number of ancestral populations, K = 2–4. PCA was computed using PCAngsd v1.2 (Meisner and Albrechtsen 2018). To compute pairwise genetic distances, the LD-pruned vcf file was first converted using PLINK v1.90b6.21 (Purcell et al. 2007) to the required input format of ngsDist v1.0.9 (Vieira et al. 2016) from ngsTools (Fumagalli et al. 2014), run with 100 bootstrap replicates and a block size of 5000. Trees for each distance matrix was inferred using FastMe v2.1.6.1 (Lefort et al. 2015) with the default BioNJ algorithm (Gascuel 1997), and branch support values were placed on the final tree using RAxML v8.2.12 (Stamatakis 2014). Inter-population genetic differentiation index ( Fst ) was estimated using realSFS (from ANGSD), using the site allele frequency likelihood (.saf) files generated by ANGSD (see below). Maternal lineages inferred from mtDNA haplotypes Additionally, to infer maternal lineages using mtDNA genome sequences of the 11 tapirs, the filtered raw reads were aligned to a Malayan tapir reference mtDNA genome (accession no. NC_023838.1; (Muangkram et al. 2016)) using the same mapping procedures above to reconstruct the mtDNA genomes. The mtDNA genome alignments were imported to Geneious Prime v2023.2.1 to extract the control region consensus sequences based on Highest Quality 50% threshold. The sequences were aligned (with slight manual editing) with 23 previously identified haplotypes in Thailand (captive) and Malaysian (wild/captive) tapir populations (Lim et al. 2021), using ClusterW algorithm (Higgins et al. 1994) in MEGA v10.1.8 (Kumar et al. 2018). Excluding missing or ambiguous data and gaps, a median-joining network (Bandelt et al. 1999) was created using PopART (http://popart.otago.ac.nz). Historical effective population size PSMC v0.6.5 (r67) was performed on selected autosomal scaffolds > 1 Mb. The procedure consisted variant calling using the samtools-bcftools pipeline and processing with vcfutils.py script (2015) to retain only sites with a DP at least one-third to two-fold of the average genomic DP. PSMC was run with 64 free atomic time intervals (4+25*2+4+6). To perform bootstrap, the long scaffolds was split to shorter segments using splitfa (chunk size = 0.5 Mb) and randomly sampled by replacement to generate 100 resampled datasets for PSMC. For Ne estimation using MSMC, its successor MSMC2, SMC ++ , and Stairway Plot 2, samples were assigned to putative geographical groups (Supplementary Table S2) based on the clusters in phylogenetic tree (see Results). Due to high computational demands, MSMC and MSMC2 was limited to groups with four or less genomes (Schiffels and Wang 2020). Input files for MSMC (v1.1.0) and MSMC2 (v2.1.1) were prepared from the variants called using bcftools and filtered using a python script bamcaller.py from msmc-tools (https://github.com/stschiff/msmc-tools), adjusting DP criteria to the same one in PSMC. Then, generate_multihetsep.py (from msmc-tools) was used to generate the “multihetsep” files of all 11 samples and the selected autosomal scaffolds. MSMC and MSMC2 were run using default patterns of fixed time segments (10*1+15*2 for MSMC and 1*2+25*1+1*2+1*3 for MSMC2). For bootstrapping, the python script multihetsep_bootstrap.py (from msmc-tools) was used to break the genomes into 5 Mb-long chunks to be resampled into a resampled genome (20 chunks per chromosome), repeated to create 20 replicates of bootstrap datasets for MSMC/MSMC2. Prior to Ne estimation with SMC ++ v1.15.4, filtered vcf file (before LD pruning; see above) was converted to compatible datasets using vcf2smc , iterating through all individuals in the group. At each iteration, one individual was used as the “distinguished” lineage and the others as the “undistinguished” lineages. This was repeated using a vcf file without MAF filtering (MAF > 0) to investigate the effect of MAF filtering on Ne estimation (Sellinger et al. 2021). Since the positions not in vcf are regarded as invariant sites, ‘Mask B’ was applied to mark the Ns regions as missing data instead of invariant sites, in addition to masking the repetitive or non-mappable regions (mappability < 1). Ne was estimated with model fitting using either piecewise or cubic functions (for cubic, include regularization-penalty = 5), and timepoints from 1 to 100,000. Bootstrap SMC ++ was run by resampling into 20 full genome-sized datasets (27 “chromosomes” each contained 20 chunks sized 5 Mb) using an online script bootstrap_smcpp.py (https://github.com/popgenmethods/smcpp/files/1182555/bootstrap_smcpp.zip). Folded-SFS estimated from minor allele count was used for the Ne estimation with Stairway Plot 2 v2.1.1. For each selected analysis group, ANGSD was used to generate the .saf files from the alignment bam files using SAMtools caller, applying ‘Mask A’ on selected autosomal scaffolds. Folded-SFS was estimated with realSFS for computing Ne with Stairway Plot 2 with 200 subsample estimations. In addition, Stairway Plot 2 was also run using SFS estimated from the two hard-filtered vcf files for SMC ++ (MAF ≥ 0.09 and MAF ≥ 0.00) with model_creator.py and vcf_to_sfs.py from Pop-Gen Pipeline Platform v0.1.13 (Webb et al. 2021), assuming single population model for all analysis groups. To investigate the effect of masking repetitive and non-unique regions on the estimations, PSMC, MSMC, MSMC2 procedures for estimating Ne , including bootstrapping, were repeated by applying ‘Mask A’. Time and Ne in all plots was scaled by mutation rate, µ = 2.34 × 10 -8 substitutions per generation, taken from the rate used in the Sumatran rhinoceros (Mays et al. 2018) and a generation time of 12 years referring IUCN assessment report (Traeholt et al. 2016). Other candidate µ were 2.64 × 10 -8 (Kumar and Subramanian 2002) and 0.466 × 10 -8 (Bergeron et al. 2023). However, µ = 2.64 × 10 -8 produced Ne trajectories only slightly time-offset to µ = 2.34 × 10 -8 , but largely atypical patterns that are difficult to interpret when µ = 0.466 × 10 -8 (using PSMC as example in Supplementary Figure S1) . “‘latex Split time between populations To estimate split time between populations, pseudo-diploid X chromosomes were constructed from the X-linked scaffolds of selected samples in a pairwise manner to represent the X chromosomes of hypothetical ancestral populations before splitting. Consensus sequences were first generated using the bcftools and vcfutils.py, applying ‘Mask A’ and removing sites with a DP two-fold of the average; ploidy number was set to 1 for male and 2 for females. The haploid sequences were merged by randomly sampling one of the alleles at the polymorphic sites (to create haploid state) using seqtk mergefa . PSMC was carried out using the same parameters described in previous section but with 60 intervals (1*6+2*4+1*3+13*2+1*3+2*4+1*6) used for X-chromosome (Li and Durbin 2011). The procedures were repeated for 100 bootstrap datasets generated from segments of scaffold split to 0.5 Mb trunk size and resampled with replacement. Ne was scaled by 0.75 to roughly adjust to the expected effective population size of a chromosome X. Inference of population split time was repeated under the SMC ++ framework for selected population pairs, for which datasets containing joint frequency spectrum was created using vcf2smc for selected population pairs. Bootstrapping was conducted on 100 resampled datasets to estimate the margin of error of the split time. “‘latex Data visualization Several R packages were used for intermediate data processing and plotting: ggplot2 (Wickham 2016), ggrepel (Slowikowski 2024), ggtree (Xu et al. 2022), ggpubr (Kassambara 2023), ggformula (Kaplan and Pruim 2023), ggridges (Wilke 2022), ggstance (Henry et al. 2022), cowplot (Wilke 2024), sp (Bivand et al. 2013), gadmSEA (Choisy 2023a), mcutils (Choisy 2023b), phytools (Revell 2024), maps (Becker et al. 2023), ape (Paradis and Schliep 2019), gridExtra (Auguie 2017), tidyr (Wickham et al. 2023b), dplyr (Wickham et al. 2023a), tibble (Müller and Wickham 2023), and scales (Wickham and Seidel 2022). Adobe Illustrator CC 2019 (Adobe Inc.) was used for graphic editing and arrangement for representation. “‘latex Reads pre-processing and variant filters Statistics of the raw and filtered reads, alignment to reference genome (mapping rate at 92.87–99.77%), RepeatMasker, and per-scaffold and per-genome DP (ranged 18.90–81.97) are summarised in Supplementary Table S3–S7. Results of sex-linked scaffolds identification are summarised in the text of Supplementary Information and Supplementary Figures S2–S3. Twenty-seven autosomal scaffolds ≥1 Mb (i.e., HiC_scaffold_1–25, HiC_scaffold_105, and HiC_scaffold_177) were identified and retained for downstream analyses, keeping 2,097,051,005 bp which covered approximately 80.43% of the reference assembly (2,607,247,779 bp). ‘Mask A’ (intersect of non-repetitive and unique regions) included 68.11% sites in the 27 autosomal scaffolds, complementing ‘Mask B’ (repetitive, uncalled, or non-unique). Initial variant calling identified nearly 2.5 M sites, including around 8 M polymorphic sites. After a series of filters but before LD pruning, 1,633,081 biallelic SNPs in the selected autosomal scaffolds remained. Genetic structure and mtDNA lineages Genetic structure analyses using LD-pruned SNPs (213,832 sites) identified three origin groups—two from mainland SEA (i.e., Malaysia and Thailand) and one from Sumatra (Figure 1); the results were used to assign samples to four main analysis groups, i.e., Thailand (TH), Malaysia (MY), Sumatra (SM), and mainland SEA (MP), including subgroups (Supplementary Table S2 and S8). However, our genomic analysis did not distinguish wild and captive tapirs. The grouping including the major subgroups within the mainland SEA cluster was supported by unrooted and midpoint-rooted BioNJ dendrograms (Figure 1b,c) at 100% bootstraps. However, SRR17072331 had an odd placement further from the Sumatran lineage while not clustered with the mainland SEA group; and support for the branch clustering SRR17072817 and the Thailand samples was also low at 49% bootstraps (Figure 1b,c). This genomic population structure with three original groups was also supported by PCA and NGSadmix analysis (Fig 1d and e). PCA manifested a greater genetic variability between the mainland SEA and Sumatran groups on PC1 than between the Malaysian and Thailand samples on PC2 (Figure 1d). Weighted Fst were 0.283 (unweighted Fst = 0.248) between mainland SEA (MP2) and Sumatran (SM1) populations, and 0.14 (unweighted Fst = 0.052) between the mainland SEA subpopulations (MY2 and TH4). Individual admixture proportions at K = 2 was the best model, followed by K = 3, based on the value of likelihood; this output revealed admixed individuals among the samples (Figure 1e). At K = 3, some of them associated with a mainland SEA lineage but mixed with 3-34% Sumatran lineage (the largest being SRR17072331 at 34%), while others showing a major Thailand lineage and 40% from both Thailand and Malaysia. On the other hand, additional genetic cluster emerged within the Thailand lineage at K = 4, but the overall pattern remained similar. Five new mtDNA control region haplotypes from both Clade I and II were suggested based on a median-joining network (Supplementary Figure S4) constructed with 34 mtDNA control region sequences (1224 bp long, with 39 mutation sites and 14 singletons). This revealed a sample ratio of Clade I (Thailand/Malaysia) to Clade II (Malaysia/Sumatra) at 8:3, but 5:3 in terms of unique haplotypes. Overall, the representativeness in term of maternal genetic diversity was fair. Ne trajectories inferred by different methods To sum up, the Malayan tapir shows an overall Ne decline throughout Pleistocene and Holocene (11.7 kya~); Ne (scaled by µ = 2.34 × 10 -8 ) was highest at 26,000–34,000 during Early Pleistocene (2.58–0.781 Mya), went down to Late Pleistocene (0.126 Mya–11.7 kya), and only around 2,000–6,000 by 2 kya (Figure 2–3). Models of Ne inference by different methods were largely similar with some discrepancies at different periods of time. Notably, PSMC and MSMC2 (Figure 2) both inferred an increase of Ne throughout late Pliocene-Early Pleistocene, eventually decreased sharply while entering Middle Pleistocene and stabilized by ~500–400 kya, but again declined at an accelerated rate by 300–200 kya; Ne stabilization was shown around the Last Interglacial Period (LIP; 126–115 kya) or upon entering Last Glacial Period (LGP; 115–11.7 kya). During LGP, a small Ne peak was observed ~50 kya (in PSMC and some outputs of MSMC2, MSMC, and SMC ++ ) but were otherwise stabilizing before decreasing towards Last Glacial Maximum (LGM; ~22 kya) (Supplementary Figure S7-S9). Ne inferred using Stairway Plot 2 from MAF-filtered datasets seem more interpretable and of higher resolution; overall, the pattern was a generally decreasing trend since mid-LGP, with an Ne plummet either before or after LGM (Supplementary Figure S10). A bottleneck was observed during middle Late Pleistocene (~40–60 kya) and only manifested in TH3 (n = 4), TH4 (n = 5), and MP2 (n = 8); the event was followed by a recovery pre- or post-LGM and a gradual decrease towards recent time (Supplementary Figure S10 and Figure 3). MSMC captured the gradual decrease in Ne starting by LGM; however, TH groups leant to decline later than MY and SM groups (Supplementary Figure S8 and Figure 2). PSMC/SMC ++ seems to suggest a larger Ne for SM group in the beginning of Holocene, distinctive when compared to MP/MY/TH groups (Figure 2) and was consistent even in bootstraps (Supplementary Figure S6 and S9). Population split time inferred using PSMC and SMC ++ Population split (i.e., Ne towards infinity) was observed between SM-TH, MY-SM, and SM-SM sample pairs at a timing around 7-9 kya using PSMC; on the other hand, splitting was not observed between TH-TH and MY-MY pairs (Figure 4 and Supplementary Table S9). Although discrepancies were found among the population pairs that show population split, (Supplementary Figures S11–S13), any population split was inferred only from sample pairs with ≥ 80% bootstrap showing maximum Ne > 5 × 10 4 , the threshold at which zero coalescent rate is assumed. With SMC ++ and using only groups with maximum sample sizes (except SM1; see Discussion), datasets filtered by minimum MAF ≥ 0.09 (that is, a parsimony approach with all singletons removed), and no regularization of Ne , split time inferred (with 95% confidence intervals [CI] in 100 bootstraps) was 8.56 (8.55–8.88) kya for MP-SM, 6.11 (5.88–6.23) kya for MY-SM, 590 (448–604) ya for MY-TH, and 9.97 (10.04–10.35) kya for SM-TH population pairs; whereas with regularization penalty and cubic spline function, split was more recent by 7.4–40.3% for MAF ≥ 0.09 datasets, and more distant by 5.0–41.3% for MAF ≥ 0.00 datasets (Supplementary Table S9 and Figure S14). Effect of variant filters on Ne inferences Applying a filtration metric of repetitive and non-unique regions (‘Mask A’) did not affect the general pattern of Ne trajectories but may reduce the overall Ne or slightly shift the historical time in PSMC (Supplementary Figure S5), very slightly reduced the inflation near recent time in MSMC2 (Supplementary Figure S7), or more distant time in MSMC (Supplementary Figure S8). For SMC ++ , including all singletons resulted in Ne inflation in more recent time compared to removing them i.e., MAF ≥ 0.09 (Supplementary Figure S9). For pseudo-diploid PSMC, ‘Mask A’ may alter the conclusion for some population pairs, i.e., MY-SM, SM-TH (Supplementary Figure S11). For SMC ++ , removing singletons reduce the split times inferred (Supplementary Figure S14) . “‘latex Discussion Population assignment based on genetic structure The captive tapir population in Japanese zoos presented a fair opportunity to conduct investigation on demographic history and genetic structure, for tapirs with ancestry to three geographical origins, i.e., Peninsular Malaysia, Thailand, and Sumatran Island of Indonesia according to studbook were present (Supplementary Table S1). The putative population origins of these coincided well with the genetic structure inferred from the 11 samples using PCA, BioNJ tree and admixture analysis (Figure 1), Therefore, it is possible to say that captive breeding program of zoos in Japan has been well managed. On the other hand, extremely admixed ancestry between mainland SEA and Sumatra through ex-situ breeding may have produced the hybrid SRR17072331 from Copenhagen Zoo (Bergeron et al. 2023), a rare if not impossible event in the light of contemporary geographical separation of the two populations. The sample (and SM2) was therefore dropped from the main discussion but is reported in the supplementary figures. The selection of tapirs, including sampling from two major maternal lineages (Supplementary Figure S4), is believed to be representative for the purpose of this study—but limited to our current knowledge of tapir genetic diversity because a comprehensive metapopulation genetic comparison is still lacking. “‘latex Demographic history and paleoenvironmental changes Demographic Ne inference suggests that ancestral tapir populations expanded in Asian continent during the Plio-Pleistocene transition phase (2–3 Mya as shown by PSMC and MSMC2; Figure 2), at which time glacio-eustatic sea-level lowerings are thought to have occurred (van den Bergh et al. 2001); by ~1.8 Mya, Sundaland was already permanently continental that even hominins had arrived at Java (Fu et al. 2013). The peaks of the population expansion (Figure 2) coincide with the Early-Middle Pleistocene transition (~1.2–0.8 Mya) of glacial-interglacial cycles into higher-amplitude cycles, in which sea level lowered to as low as 170 m below the present day, may suggest the Malayan tapir’s inclusion in the Early Pleistocene immigration into Sundaic subregion together with the Siwalik faunas, but through the “Sino-Malayan” route (Tougard 2001a; van den Bergh et al. 2001; Cranbrook and Piper 2013; Legrain et al. 2023). Early Pleistocene Sundaland was dominated by mixed to closed-canopy forests in the Indochinese subregion, and open savannah grasslands in the Sundaic subregion (Louys and Roberts 2020), therefore, tapirs would need to cross the savannah even if marginally. The earliest record (from a Javan site) for the said crossing was 800-700 kya (Cranbrook and Piper 2013), but the timing could be earlier, e.g., 1.46 Mya (95% CI: 1.15–1.78 Mya) estimated using mtDNA control region, which may be used as an upper bound timing of genetic divergence, i.e., promoted by migration into new habitats (Fu et al. 2013; Lim et al. 2021). This Early-early Middle Pleistocene population expansion and eventual fall-off may signal events of tapir range expansion (possibly also confounded by population structure (Li and Durbin 2011)), saturation of suitable Sundaland habitats, and population contraction due to paleohabitat changes or competition for resources. During Middle Pleistocene, forests of Early Pleistocene had been overtaken by savannahs, leading to the spread of grazers and the extinction of browsers (Louys and Roberts 2020). Malayan tapir, described as a rare, browser species but ecologically versatile, might have only marginally occupy and adapt to the habitats in the prevailing environment, and survived (Cranbrook and Piper 2013). Sundaland was largely continental up until 400 kya, before transgressions started to prevail; the land subsidence insurmountably reduced total inhabitable area for terrestrial fauna, leading to accelerated tapir Ne decline from 300-200 kya until LIP (126–115 kya) (Sarr et al. 2019; Louys and Roberts 2020). The suggested fauna exchange during the late Middle Pleistocene before LIP (Tougard 2001b) did not seem to have an impact on Ne , if any, could be contributing to secondary contact of isolated lineages (Lim et al. 2021) and a halt of continuous decline and even restoration. During the Late Pleistocene, the Indochinese subregion was largely dominated by closed-canopy forests, while the Sundaic subregion largely by open-canopy forest (Louys and Roberts 2020). The tapir populations saw a temporary increase or stabilized Ne up to inference of bottleneck with Stairway Plot 2. Considering the caveats of SMC- and SFS-based methods (see below), an Ne bottleneck (SFS) and peak (SMC) could both be explained by a mimetic effect of population structure. Therefore, a more probable scenario may be a stable ancestral population since entering LGP, attributable to migration and range expansion conferred by re-emerging land bridges formed between all major land masses by ~74 kya (Wurster and Bird 2016)—that is, until an unfavorable paleoenvironment limited inter-population gene flow as LGM set in as early as 30 kya (Hamilton et al. 2024), or when early humans arrived at the Malay Peninsula ~60–45 kya (Bird et al. 2005). The Toba eruption ~75 kya might have also contributed to the Ne bottleneck (Mattle-Greminger et al. 2018), however, timing of the bottleneck in TH groups did not quite align (Supplementary Figure S10). Insignificant impact of the eruption compared to the prevailing climatic changes was also suggested (Louys 2007; Singh and Srivastava 2022). Approaching LGM, tapir population declined while climate became cooler and drier and lowland rainforests transitioning into seasonally dry tropical forests (Louys and Roberts 2020; Hamilton et al. 2024). Warming climate and reforestation (Hamilton et al. 2024) then allowed some population recovery post-LGM. On Sumatra, a larger rainforest refugia and less impact of transgressions compared to mainland SEA (Visser et al. 2004; Bird et al. 2005) might have allowed a slightly persisted Ne post-LGM. While potentially with low accuracies, Ne in the most recent time (≤ 1.2 kya, 100 generations) aligned with the estimate (< 1000) using microsatellite markers in Malaysian population (Lim et al. 2022). Paleoenvironments of Sundaland changes with glacio-eustatic cycles and was largely characterized by a mosaic of open vegetation and significantly contracted rainforest (Cannon et al. 2009; Raes et al. 2014; Hamilton et al. 2024). Species with different ecological needs responded differently. For example, towards LIP when dense rainforests extended from refugia and available landmasses reduced; Malayan tapir (Figure 2) population declined, while Sumatran/Malayan tigers, Asian elephants, Sumatran orangutan and Sumatran rhinoceros thrived (Palkopoulou et al. 2018; Mattle-Greminger et al. 2018; Armstrong et al. 2021; von Seth et al. 2021). Extended tropical forests in LIP like present day favored forest species such as tigers and orangutans, but browsers such as elephants, rhinoceroses and tapirs may have been competing for resources (i.e., food and space) over overlapping ecological niches, which elephants and rhinoceros dominated. Conversely, tapirs may thrive better during glacial periods characterized by expanded and connected lands, open habitat types with a mosaic of persisted closed-canopy forests (Hamilton et al. 2024). Even in modern days, tapirs are found in a diverse habitat type from peat swamps up to lower montane forest (PERHILITAN 2012). Population divergence between tapir subpopulations Given that only 2% of the last 2 million years during the Pleistocene did the sea-level rise to the present-day level to restrict gene flows between tapir populations (Sathiamurthy and Voris 2006), the ancestral populations on mainland SEA and Sumatra might have remained mostly panmictic throughout the Pleistocene, as split was only inferred around or after LGM (Supplementary Table S9). Signal of a probable, earlier separation between the Sumatran and mainland SEA populations during LIP was seen only picked up by SMC ++ (Figure 2) ; however, the genetic differentiation could have been neutralized by post-LIP migrations and admixture events (e.g., during Middle-Late Pleistocene and Late Pleistocene-Holocene transitions), for which, southern Malay Peninsula has been speculated as an admixture ground (Lim et al. 2021). Stepwise divergence inferred from some pseudo-diploids (Figure 4) suggested that for some populations gene flow restriction began as early as pre-LGM (~50 kya) but complete split was achieved by 10–6 kya. This timing of split between the mainland SEA and Sumatran populations corresponds well with the geographical characteristics in SEA—the Straits of Malacca physically separated the Malay Peninsula and Sumatran Island. By 15‒10 kya, the sea level in the Strait of Malacca would be high enough (31 to 40 m below present level) to submerge most of the LGM-exposed landmasses, and by 10–5 kya, swiftly rose to present-day level and above (Tjia 2013). Conversely, the half-millennium divergence time inferred between mainland SEA subpopulations by SMC ++ may be due to increasingly occupied habitats by humans, e.g., at the Isthmus of Kra (Bellina et al. 2019). Inference by multiple approaches An elevated Ne in a PSMC analysis (~1 Mya and 50 kya) may be interpreted as an actual increase in population size or an indication of population structure involving separation and admixture; the signals are not easily differentiable (Li and Durbin 2011). For that matter, coalescence-based (SMC) methods such as MSMC, MSMC2, and SMC ++ (partially) may suffer the same caveat as PSMC (Sellinger et al. 2021). For example, inflated Ne during Early-Middle and Late Pleistocene can be arguably caused by population sub-structuring (Figure 2 and 3). On the other hand, SFS-based approach (i.e., Stairway Plot 2 and partially in SMC ++ ) may be outperformed by the SMC methods when inferring Ne at more ancient times, or when only a few samples were used (Li and Durbin 2011; Patton et al. 2019) . Wholly SFS-based Stairway Plot tools may be able to recover recent Ne decline and swift population recovery following a bottleneck; however, incongruency of the trajectories with SMC methods and timing offset with actual population events may hinder its usefulness (Patton et al. 2019). Similar observation was found in this study, but timings of Ne changes seemed in-sync with the paleoenvironmental changes; at larger sample sizes (typically > 4) with singletons included, a bottleneck instead of an Ne peak (as in SMC methods) was detected (Supplementary Figure S10). Population structure may mimic SFS patterns generated by bottlenecks (Gattepaille et al. 2013), which may be potentially used to distinguish the nature of Ne peak inferred by SMC methods. Lastly, Beichman et al. (Beichman et al. 2018) recommended caution against Ne inference within the last hundred generations (~1,200 years for tapirs), when < 10 genomes were used. Masking of repetitive or non-unique sites generally had no impact on the outcome but might improve population split inference with PSMC. However, removing singletons may influence Ne inference in recent time or deflate population split time, particularly for SMC ++ (Supplementary Figure S12-S13) ; we keep the latter as a conservative estimation of split time. For Stairway Plot 2, we based our interpretation on hard-filtered (quality-wise) vcf file keeping the singletons to extract more information for demographic inference. Limitations and recommendations The small number of new genomes sequenced were compensated as much by using genomic resources available in the public databases and by integrating multiple methods in the analyses. Some degree of uncertainties in the origins of the samples warned that any conclusion drawn from this study especially the genetic structure should be considered with cautions, as intermediate populations between populations and marginal habitats, may have been absent from the captive populations. On the other hand, small sample sizes (n < 10) and a lack of genome phasing information have hampered the full potential of using SFS methods for demographic inference (Liu and Fu 2020). In addition, this study did not consider the effect of selection on demographic inference (Beichman et al. 2018), although SNP in the coding region usually consists of only a small portion of SNPs detected (e.g., Lastly, only contemporary extant tapir populations were included, but extinct tapirs on Java and Borneo would probably share similar fate as the Sumatran rhinoceros (Mays et al. 2018; von Seth et al. 2021). Future studies are recommended to increase sample size (≥ 10 for each population) (Beichman et al. 2018) and apply museomics for investigating demographic history and extinction drivers of extant and extinct populations to inform any feasible reintroduction program (Piper and Cranbrook 2007). “‘latex Conservation implications This study added new Malayan tapir genomic resources especially the two genomes of Sumatran-origin tapirs for genomic analysis to infer demographic and evolutionary history of the tapirs in Southeast Asia, achieved by taking advantage of a multilineage captive population in Japan and public data. Nevertheless, the genetic status of the wild Thailand and Sumatran populations are still unknown and should be the next priority for investigation to safeguard the last populations of tapirs in Southeast Asia. Our results supported a proposal of a subspecies status for the Sumatran population, based on monophyly in the midpoint rooted BioNJ tree and high genetic differentiation ( Fst > 0.25) against the mainland SEA samples. Hybrid tapirs of Sumatran and Thailand origins have been detected in ex - situ conditions (e.g., SRR17072331) including in the Japanese zoos, but potential adverse consequences from the hybridisation have not been reported. However, higher mutation load in either Sumatran or mainland populations, if any, may further reduce the Ne of the recipient population. Demographic history inference suggested tapirs’ marginal dependency on lowland rainforests, and although somewhat survived and can thrive even in today’s habitat conditions (e.g., selectively logged forests) (Rayan et al. 2012), the species face risks of contemporary population structure with isolated small populations that may further aggravate inbreeding and local extinction (Lim et al. 2022). Habitat protection and connectivity is therefore important for maintaining genetic flow and effective population sizes. Based on the species-specific demographic history, we suggest a realistic Ne baseline of 2,000 (census size 4,347–5,128 (Lim et al. 2022)) as a conservation goal for the tapir populations in Southeast Asia. For captive breeding programs, we recommend avoiding hybrids between Sumatran and mainland SEA lineage until clarification of its genomic consequences. Acknowledgements This study was financially supported by KAKENHI No. 20H00420 for M.I.-M., the Leading Graduate Program in Primatology and Wildlife Science, Kyoto University and Japan Society for the Promotion of Science (JSPS) grant-in-aid to Q.L.L (project number: 21J15064). We are grateful to Preservation and Research Center, City of Yokohama, Tama Zoological Park, Nagasaki Bio Park, and Fukuoka Zoo for providing the Malayan tapir DNA samples. Samples obtained from Nagasaki Bio Park were given under a permit from the Department of Wildlife and National Parks (PERHILITAN), Malaysia (KeTSA600-2/2/21 JLD. 12(20); license no. A-00362-16-21). All protocols were approved by the ethical committee of Wildlife Research Center of Kyoto University (WRC-2022-010A). Computations were partially performed on the NIG supercomputer at ROIS National Institute of Genetics. Q.L.L. is grateful to the Japanese Government (Monbukagakusho: MEXT) Scholarship (2019-2020) and Japanese Society for the Promotion of Science (2021-2023) for the scholarships received. Conflict of Interest The authors declare that there is no conflict of interest. “‘latex Data Availability Statement Raw reads of all six newly generated genomes passed the FastQC assessments with no ‘fail’ flag and were deposited in the BioProject (accession no. PRJNA882187) in the SRA database of the National Center for Biotechnology Information (NCBI). The other five genomes were retrieved from the sequence read archive (SRA): SRR11097180, SRR14226351, SRR14572919, SRR17072331, and SRR17072817. All metadata and data generated in this study that are not included in this manuscript are available from the corresponding author on reasonable request. “‘latex Figure Legends Figure 1. Genetic structure of the 11 Malayan tapir samples used in this study based on 213,832 filtered biallelic single nucleotide polymorphism (SNP) sites. Parts of Southeast Asia (SEA) highlighted for west Malaysia, Thailand, Myanmar, and the Sumatran Island of Indonesia (A), countries with remaining extant Malayan tapir populations. Unrooted BioNJ dendrogram (B) and Midpoint-rooted BioNJ dendrogram (C) with branch support computed based on 100 bootstraps. D) Principal component analysis (PCA) showing the relationships of the unknown samples in reference to samples originated from Thailand, Malaysia, and Sumatra (D). Individual admixture proportions (E) to two to four ancestral populations (K = 2–4). Note that individual SRR17072331 has a dubious placement in Sumatran branch (B, C) that contradicts with the individual admixture proportion (see E). Figure 2. Effective population size ( Ne ) inferred using Pairwise sequential Markovian coalescent (PSMC), multiple sequential Markovian coalescent (MSMC/MSMC2), and SMC ++ (singletons excluded) for selected analysis groups (which are, representative groups from each geographical region). Time (primary axis) is scaled by generation time (g = 12 years) and mutation rate ( µ = 2.34 × 10 -8 substitutions per generation). Secondary x-axis shows the time in coalescent unit. The five grey bars on the top are time periods (from left to right) for Holocene, Late Pleistocene, Middle Pleistocene, Early Pleistocene, and Pliocene, while the two grey bars below are Last Glacial Period (LGP) and Last Interglacial Period (LIP). LGM – Last Glacial Maximum. Figure 3. Effective population size ( Ne ) inferred using SMC++ (singletons excluded) and Stairway Plot 2 (singletons included) for selected analysis groups (which are, representative groups from each geographical region). Time (primary axis) is scaled by generation time (g = 12 years) and mutation rate ( µ = 2.34 × 10 -8 substitutions per generation). Secondary x-axis shows the time in coalescent unit. The five grey bars on the top are time periods (from left to right) for Holocene, Late Pleistocene, Middle Pleistocene, Early Pleistocene, and Pliocene, while the two grey bars below are Last Glacial Period (LGP) and Last Interglacial Period (LIP). LGM – Last Glacial Maximum. Figure 4. Estimation of population split time between putative geographical groups of the Malayan tapir. Time (primary axis) is scaled by generation time (g = 12 years) and mutation rate ( µ = 2.34 × 10 -8 substitutions per generation). Secondary x-axis shows the time in coalescent unit. LGM – Last Glacial Maximum. LGP – Last Glacial Period. LIP – Last Interglacial Period. MP – mainland SEA. SM – Sumatra. MY – Malaysia. TH – Thailand. References 1. Andrews S (2010) A quality control tool for high throughput sequence data Armstrong EE, Khan A, Taylor RW, et al (2021) Recent Evolutionary History of Tigers Highlights Contrasting Roles of Genetic Drift and Selection. Mol Biol Evol 38:2366–2379. https://doi.org/10.1093/molbev/msab032 Asrulsani J, Amri I, Rafhan H, et al (2017) Discovery of melanistic Malayan tapir ( Tapirus indicus var. brevetianus ) in Tekai Tembeling Forest Reserve. Journal of Wildlife and Parks 32:79–83 Auguie B (2017) gridExtra: Miscellaneous functions for “Grid” graphics Bandelt HJ, Forster P, Röhl A (1999) Median-joining networks for inferring intraspecific phylogenies. Mol Biol Evol 16:37–48 Becker RA, Wilks AR, Brownrigg R, et al (2023) maps: Draw geographical maps Beichman AC, Huerta-Sanchez E, Lohmueller KE (2018) Using Genomic Data to Infer Historic Population Dynamics of Nonmodel Organisms. Annu Rev Ecol Evol Syst 49:433–456. https://doi.org/10.1146/annurev-ecolsys-110617-062431 Bellina B, Favereau A, Dussubieux L (2019) Southeast Asian early Maritime Silk Road trading polities’ hinterland and the sea-nomads of the Isthmus of Kra. J Anthropol Archaeol 54:102–120. https://doi.org/10.1016/j.jaa.2019.02.005 Bergeron LA, Besenbacher S, Zheng J, et al (2023) Evolution of the germline mutation rate across vertebrates. Nature 615:285–291. https://doi.org/10.1038/s41586-023-05752-y Bird MI, Taylor D, Hunt C (2005) Palaeoenvironments of insular Southeast Asia during the Last Glacial Period: a savanna corridor in Sundaland? Quat Sci Rev 24:2228–2242. https://doi.org/10.1016/j.quascirev.2005.04.004 Bivand R, Pebesma E, Gomez-Rubio V (2013) Applied spatial data analysis with R, Second edition. Springer, New York Bonfield JK, Marshall J, Danecek P, et al (2021) HTSlib: C library for reading/writing high-throughput sequencing data. Gigascience 10:. https://doi.org/10.1093/gigascience/giab007 Camacho C, Coulouris G, Avagyan V, et al (2009) BLAST+: architecture and applications. BMC Bioinformatics 10:421. https://doi.org/10.1186/1471-2105-10-421 Cannon CH, Morley RJ, Bush ABG (2009) The current refugial rainforests of Sundaland are unrepresentative of their biogeographic past and highly vulnerable to disturbance. Proceedings of the National Academy of Sciences 106:11188–11193. https://doi.org/10.1073/pnas.0809865106 Chantra R, Dai Y, Inoue-Murayama M, et al (2021) Microsatellite records for volume 13, issue 4. Conserv Genet Resour. https://doi.org/10.1007/s12686-021-01243-2 Chen S, Zhou Y, Chen Y, Gu J (2018) fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics 34:i884–i890. https://doi.org/10.1093/bioinformatics/bty560 Choisy M (2023a) gadmSEA: Country maps of Southeast Asia Choisy M (2023b) mcutils: Complement the utils package Clavijo BJ, Garcia Accinelli G, Wright J, et al (2017) W2RAP: a pipeline for high quality, robust assemblies of large complex genomes from short read data. bioRxiv 110999. https://doi.org/10.1101/110999 Cranbrook E of, Piper PJ (2013) Paleontology to policy: the Quaternary history of Southeast Asian tapirs (Tapiridae) in relation to large mammal species turnover, with a proposal for conservation of Malayan tapir by reintroduction to Borneo. Integr Zool 8:95–120. https://doi.org/10.1111/j.1749-4877.2012.00319.x Dudchenko O, Shamim MS, Batra SS, et al (2018) The Juicebox Assembly Tools module facilitates de novo assembly of mammalian genomes with chromosome-length scaffolds for under $1000. bioRxiv 254797. https://doi.org/10.1101/254797 Flynn JM, Hubley R, Goubert C, et al (2020) RepeatModeler2 for automated genomic discovery of transposable element families. Proceedings of the National Academy of Sciences 117:9451–9457. https://doi.org/10.1073/pnas.1921046117 Fu Q, Mittnik A, Johnson PLF, et al (2013) A Revised Timescale for Human Evolution Based on Ancient Mitochondrial Genomes. Current Biology 23:553–559. https://doi.org/10.1016/j.cub.2013.02.044 Fumagalli M, Vieira FG, Linderoth T, Nielsen R (2014) ngsTools : methods for population genetics analyses from next-generation sequencing data. Bioinformatics 30:1486–1487. https://doi.org/10.1093/bioinformatics/btu041 Gascuel O (1997) BIONJ: an improved version of the NJ algorithm based on a simple model of sequence data. Mol Biol Evol 14:685–695. https://doi.org/10.1093/oxfordjournals.molbev.a025808 Gattepaille LM, Jakobsson M, Blum MG (2013) Inferring population size changes with sequence and SNP data: lessons from human bottlenecks. Heredity (Edinb) 110:409–419. https://doi.org/10.1038/hdy.2012.120 Groves C, Grubb P (2011) Ungulate taxonomy. The John Hopkins University Press, Baltimore Hamilton R, Amano N, Bradshaw CJA, et al (2024) Forest mosaics, not savanna corridors, dominated in Southeast Asia during the Last Glacial Maximum. Proceedings of the National Academy of Sciences 121:. https://doi.org/10.1073/pnas.2311280120 Henry L, Wickham H, Chang W (2022) ggstance: Horizontal “ggplot2” Components Higgins D, Thompson J, Gibson T, et al (1994) CLUSTAL W: Improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res 22:4673–4680 Ismail NA, Yong CSY, Sin SYW, Annavi G (2023) Low diversity of major histocompatibility complex (MHC) genes in endangered Malayan tapir ( Tapirus indicus ). Zool Stud 62:12 Jun G, Wing MK, Abecasis GR, Kang HM (2015) An efficient and scalable analysis framework for variant extraction and refinement from population-scale DNA sequence data. Genome Res 25:918–925. https://doi.org/10.1101/gr.176552.114 Kaplan D, Pruim R (2023) ggformula: Formula Interface to the Grammar of Graphics Kassambara A (2023) ggpubr: “ggplot2” Based Publication Ready Plots Koepfli K-P, Tamazian G, Wildt D, et al (2019) Whole genome sequencing and re-sequencing of the Sable Antelope ( Hippotragus niger ): A resource for monitoring diversity in ex situ and in situ populations. G3 Genes|Genomes|Genetics 9:1785–1793. https://doi.org/10.1534/g3.119.400084 Korneliussen TS, Albrechtsen A, Nielsen R (2014) ANGSD: Analysis of Next Generation Sequencing Data. BMC Bioinformatics 15:356. https://doi.org/10.1186/s12859-014-0356-4 Kuiper K (1926) On a black variety of the Malay Tapir ( Tapirus indicus ). Proceedings of the Zoological Society of London 96:425–426 Kumar S, Stecher G, Li M, et al (2018) MEGA X: Molecular Evolutionary Genetics Analysis across computing platforms. Mol Biol Evol 35:1547–1549. https://doi.org/10.1093/molbev/msy096 Kumar S, Subramanian S (2002) Mutation rates in mammalian genomes. Proceedings of the National Academy of Sciences 99:803–808. https://doi.org/10.1073/pnas.022629899 Lefort V, Desper R, Gascuel O (2015) FastME 2.0: A Comprehensive, Accurate, and Fast Distance-Based Phylogeny Inference Program: Table 1. Mol Biol Evol 32:2798–2800. https://doi.org/10.1093/molbev/msv150 Legrain E, Parrenin F, Capron E (2023) A gradual change is more likely to have caused the Mid-Pleistocene Transition than an abrupt event. Commun Earth Environ 4:90. https://doi.org/10.1038/s43247-023-00754-0 Li H, Durbin R (2011) Inference of human population history from individual whole-genome sequences. Nature 475:493–496. https://doi.org/10.1038/nature10231 Li H, Handsaker B, Wysoker A, et al (2009) The Sequence Alignment/Map format and SAMtools. Bioinformatics 25:2078–2079. https://doi.org/10.1093/bioinformatics/btp352 Lim QL, Yong CSY, Ng WL, et al (2022) Population genetic structure of wild Malayan tapirs ( Tapirus indicus ) in Peninsular Malaysia revealed by nine cross-species microsatellite markers. Glob Ecol Conserv 40:e02321 Lim QL, Yong CSY, Ng WL, et al (2021) Genetic diversity and phylogenetic relationships of Malayan tapir (Tapirus indicus) populations in the Malay Peninsula based on mitochondrial DNA control region. Biodivers Conserv 30:2433–2449. https://doi.org/10.1007/s10531-021-02202-x Liu X, Fu Y-X (2020) Stairway Plot 2: demographic history inference with folded SNP frequency spectra. Genome Biol 21:280. https://doi.org/10.1186/s13059-020-02196-9 Louys J (2007) Limited effect of the Quaternary’s largest super-eruption (Toba) on land mammals from Southeast Asia. Quat Sci Rev 26:3108–3117. https://doi.org/10.1016/j.quascirev.2007.09.008 Louys J, Roberts P (2020) Environmental drivers of megafauna and hominin extinction in Southeast Asia. Nature 586:402–406. https://doi.org/10.1038/s41586-020-2810-y Mattle-Greminger MP, Bilgin Sonay T, Nater A, et al (2018) Genomes reveal marked differences in the adaptive evolution between orangutan species. Genome Biol 19:193. https://doi.org/10.1186/s13059-018-1562-6 Mays HL, Hung C-M, Shaner P-J, et al (2018) Genomic Analysis of Demographic History and Ecological Niche Modeling in the Endangered Sumatran Rhinoceros Dicerorhinus sumatrensis. Current Biology 28:70-76.e4. https://doi.org/10.1016/j.cub.2017.11.021 Md V, Misra S, Li H, Aluru S (2019) Efficient Architecture-Aware Acceleration of BWA-MEM for Multicore Systems. In: IEEE Parallel and Distributed Processing Symposium (IPDPS) Meisner J, Albrechtsen A (2018) Inferring Population Structure and Admixture Proportions in Low-Depth NGS Data. Genetics 210:719–731. https://doi.org/10.1534/genetics.118.301336 Mouselimis L (2023) ClusterR: Gaussian Mixture Models, K-Means, Mini-Batch-Kmeans, K-Medoids and Affinity Propagation Clustering Muangkram Y, Amano A, Wajjwalku W, et al (2017) Genetic diversity of the captive Asian tapir population in Thailand, based on mitochondrial control region sequence data and the comparison of its nucleotide structure with Brazilian tapir. Mitochondrial DNA A DNA Mapp Seq Anal 28:597–601. https://doi.org/10.3109/24701394.2016.1149828 Muangkram Y, Wajjwalku W, Kaolim N, et al (2016) The complete mitochondrial genome of the Asian tapirs ( Tapirus indicus ): The only extant Tapiridae species in the old world. Mitochondrial DNA 27:413–415. https://doi.org/10.3109/19401736.2014.898283 Muangkram Y, Wajjwalku W, Salakij C, et al (2013) Use of mitochondrial cytochrome b sequences to determine the origin of captive Asian tapirs Tapirus indicus : implications for conservation. Endanger Species Res 21:97–103. https://doi.org/10.3354/esr00509 Müller K, Wickham H (2023) tibble: Simple Data Frames Neph S, Kuehn MS, Reynolds AP, et al (2012) BEDOPS: high-performance genomic feature operations. Bioinformatics 28:1919–1920. https://doi.org/10.1093/bioinformatics/bts277 Nursyifa C, Brüniche‐Olsen A, Garcia‐Erill G, et al (2022) Joint identification of sex and sex‐linked scaffolds in non‐model organisms using low depth sequencing data. Mol Ecol Resour 22:458–467. https://doi.org/10.1111/1755-0998.13491 Ogata M, Watanabe S, Ogawa H (2009) Genetic variation of Asian tapir. Japanese Society of Zoo and Wildlife Medicine 14:73–76 Palkopoulou E, Lipson M, Mallick S, et al (2018) A comprehensive genomic history of extinct and living elephants. Proceedings of the National Academy of Sciences 115:. https://doi.org/10.1073/pnas.1720554115 Paradis E, Schliep K (2019) ape 5.0: an environment for modern phylogenetics and evolutionary analyses in R. Bioinformatics 35:526–528. https://doi.org/10.1093/bioinformatics/bty633 Patton AH, Margres MJ, Stahlke AR, et al (2019) Contemporary Demographic Reconstruction Methods Are Robust to Genome Assembly Quality: A Case Study in Tasmanian Devils. Mol Biol Evol 36:2906–2921. https://doi.org/10.1093/molbev/msz191 PERHILITAN (2012) Tapir Information Sheet. Department of Wildlife and National Parks (PERHILITAN) Piper PJ, Cranbrook E of (2007) The potential of large protected plantation areas for the secure reintroduction of Borneo’s lost “megafauna”: A case for the Malay Tapir Tapirus indicus . In: Regional Conference of Biodiversity Conservation in Tropical Planted Forests in Southeast Asia. Natural History Publications (Borneo), Kota Kinabalu, pp 182–189 Pitra C, Veits J (2000) Use of mitochondrial DNA sequences to test the Ceratomorpha (Perissodactyla: Mammalia) hypothesis. Journal of Zoological Systematics and Evolutionary Research 38:65–72. https://doi.org/10.1046/j.1439-0469.2000.382136.x Pockrandt C, Alzamel M, Iliopoulos CS, Reinert K (2020) GenMap: ultra-fast computation of genome mappability. Bioinformatics 36:3687–3692. https://doi.org/10.1093/bioinformatics/btaa222 Posit team (2023) RStudio: Integrated Development for R Purcell S, Neale B, Todd-Brown K, et al (2007) PLINK: A Tool Set for Whole-Genome Association and Population-Based Linkage Analyses. The American Journal of Human Genetics 81:559–575. https://doi.org/10.1086/519795 Quinlan AR, Hall IM (2010) BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26:841–842. https://doi.org/10.1093/bioinformatics/btq033 R Core Team (2023) R: A Language and Environment for Statistical Computing Raes N, Cannon CH, Hijmans RJ, et al (2014) Historical distribution of Sundaland’s Dipterocarp rainforests at Quaternary glacial maxima. Proceedings of the National Academy of Sciences 111:16790–16795. https://doi.org/10.1073/pnas.1403053111 Rayan DM, Mohamad SW, Dorward L, et al (2012) Estimating the population density of the Asian tapir (Tapirus indicus) in a selectively logged forest in Peninsular Malaysia. Integr Zool 7:373–380. https://doi.org/10.1111/j.1749-4877.2012.00321.x Revell LJ (2024) phytools 2.0: an updated R ecosystem for phylogenetic comparative methods (and other things). PeerJ 12:e16505. https://doi.org/10.7717/peerj.16505 Rovie-Ryan JJ, Traeholt C, E M-J, et al (2008) Sequence variation in Malayan Tapir ( Tapirus indicus ) inferred using partial sequences of the cytochrome b segment of the mitochondrial DNA. Journal of Wildlife and Parks 25:16–18 Sarr A-C, Husson L, Sepulchre P, et al (2019) Subsiding Sundaland. Geology 47:119–122. https://doi.org/10.1130/G45629.1 Sathiamurthy E, Voris HK (2006) Maps of Holocene sea level transgression and submerged lakes on the Sunda shelf. The Natural History Journal of Chulalongkorn University, Supplement 2:1–43 Schiffels S, Wang K (2020) MSMC and MSMC2: The Multiple Sequentially Markovian Coalescent. pp 147–166 Sellinger TPP, Abu‐Awad D, Tellier A (2021) Limits and convergence properties of the sequentially Markovian coalescent. Mol Ecol Resour 21:2231–2248. https://doi.org/10.1111/1755-0998.13416 Singh A, Srivastava AK (2022) Had Youngest Toba Tuff (YTT, ca. 75 ka) eruption really destroyed living media explicitly in entire Southeast Asia or just a theoretical debate? An extensive review of its catastrophic event. Journal of Asian Earth Sciences: X 7:100083. https://doi.org/10.1016/j.jaesx.2022.100083 Skotte L, Korneliussen TS, Albrechtsen A (2013) Estimating Individual Admixture Proportions from Next Generation Sequencing Data. Genetics 195:693–702. https://doi.org/10.1534/genetics.113.154138 Slowikowski K (2024) ggrepel: Automatically position non-overlapping text labels with “ggplot2” Smit A, Hubley R, Green P RepeatMasker Open-4.0. http://www.repeatmasker.org. Accessed 30 Jan 2022 Stamatakis A (2014) RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30:1312–1313. https://doi.org/10.1093/bioinformatics/btu033 Steiner CC, Ryder OA (2011) Molecular phylogeny and evolution of the Perissodactyla. Zool J Linn Soc 163:1289–1303. https://doi.org/10.1111/j.1096-3642.2011.00752.x Terhorst J, Kamm JA, Song YS (2017) Robust and scalable inference of population history from hundreds of unphased whole genomes. Nat Genet 49:303–309. https://doi.org/10.1038/ng.3748 Tjia HD (2013) Evidence of Holocene and historical changes of sea level in the Langkawi islands. Bulletin of the Geological Society of Malaysia 59:67–72 Tougard C (2001a) Biogeography and migration routes of large mammal faunas in South–East Asia during the Late Middle Pleistocene: focus on the fossil and extant faunas from Thailand. Palaeogeogr Palaeoclimatol Palaeoecol 168:337–358. https://doi.org/10.1016/S0031-0182(00)00243-1 Tougard C (2001b) Biogeography and migration routes of large mammal faunas in South-East Asia during the Late Middle Pleistocene: focus on the fossil and extant faunas from Thailand. Palaeogeogr Palaeoclimatol Palaeoecol 168:337–358 Traeholt C, Novarino W, bin Saaban S, et al (2016) Tapirus indicus . The IUCN Red List of Threatened Species 2016: e.T21472A45173636 van den Bergh GD, de Vos J, Sondaar PY (2001) The Late Quaternary palaeogeography of mammal evolution in the Indonesian Archipelago. Palaeogeogr Palaeoclimatol Palaeoecol 171:385–408. https://doi.org/10.1016/S0031-0182(01)00255-3 Vieira FG, Lassalle F, Korneliussen TS, Fumagalli M (2016) Improving the estimation of genetic distances from Next-Generation Sequencing data. Biological Journal of the Linnean Society 117:139–149. https://doi.org/10.1111/bij.12511 Visser K, Thunell R, Goni M (2004) Glacial–interglacial organic carbon record from the Makassar Strait, Indonesia: implications for regional changes in continental vegetation. Quat Sci Rev 23:17–27. https://doi.org/10.1016/j.quascirev.2003.07.001 von Seth J, Dussex N, Díez-del-Molino D, et al (2021) Genomic insights into the conservation status of the world’s last remaining Sumatran rhinoceros populations. Nat Commun 12:2393. https://doi.org/10.1038/s41467-021-22386-8 Webb A, Knoblauch J, Sabankar N, et al (2021) The Pop-Gen Pipeline Platform: A Software Platform for Population Genomic Analyses. Mol Biol Evol 38:3478–3485. https://doi.org/10.1093/molbev/msab113 Wickham H (2016) ggplot2: elegant graphics for data analysis, Second edi. Springer, Cham Wickham H, François R, Henry L, et al (2023a) dplyr: A Grammar of Data Manipulation Wickham H, Seidel D (2022) scales: Scale Functions for Visualization Wickham H, Vaughan D, Girlich M (2023b) tidyr: Tidy Messy Data Wilke C (2024) cowplot: Streamlined plot theme and plot annotations for “ggplot2” Wilke CO (2022) ggridges: Ridgeline Plots in “ggplot2” Wurster CM, Bird MI (2016) Barriers and bridges: Early human dispersals in equatorial SE Asia. Geol Soc Spec Publ 411:235–250. https://doi.org/10.1144/SP411.2 Xu S, Li L, Luo X, et al (2022) Ggtree : A serialized data object for visualization of a phylogenetic tree and annotation data. iMeta 1:. https://doi.org/10.1002/imt2.56 (2018) seqtk, Toolkit for processing sequences in FASTA/Q formats. https://github.com/lh3/seqtk. Accessed 23 Sep 2022 (2015) vcfUtils, A small python package with some handly vcf format utilities. https://github.com/pushkardakle/vcfUtils. Accessed 24 Sep 2022 Crossref Google Scholar Information & Authors Information Version history V1 Version 1 14 November 2024 Peer review timeline Published Ecological Research Version of Record 27 Jan 2025 Published Copyright This work is licensed under a Non Exclusive No Reuse License. Collection Ecological Research Keywords 38: molecular ecology asian tapir conservation genetics ecological genetics population divergence population dynamics whole genome analysis Authors Affiliations Qi Luan Lim 0000-0002-1499-3627 Hokkaido University Faculty of Environmental Earth Science View all articles by this author Yu Sato Kyoto University Wildlife Research Center View all articles by this author Norsyamimi Rosli 0000-0002-0232-3060 Department of Wildlife and National Parks Peninsular Malaysia View all articles by this author Miho Inoue-Murayama [email protected] Kyoto University Wildlife Research Center View all articles by this author Metrics & Citations Metrics Article Usage 1023 views 320 downloads .FvxKWukQNSOunydq8rnd { width: 100px; } Citations Download citation Qi Luan Lim, Yu Sato, Norsyamimi Rosli, et al. Demographic history of the Malayan tapirs (Tapirus indicus) in Southeast Asia. Authorea . 14 November 2024. DOI: https://doi.org/10.22541/au.173154377.78525662/v1 If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download. For more information or tips please see 'Downloading to a citation manager' in the Help menu . Format Please select one from the list RIS (ProCite, Reference Manager) EndNote BibTex Medlars RefWorks Direct import Tips for downloading citations document.getElementById('citMgrHelpLink').addEventListener('click', function() { popupHelp(this.href); return false; }); $(".js__slcInclude").on("change", function(e){ if ($(this).val() == 'refworks') $('#direct').prop("checked", false); $('#direct').prop("disabled", ($(this).val() == 'refworks')); }); View Options View options PDF View PDF Figures Tables Media Share Share Share article link Copy Link Copied! Copying failed. Share Facebook X (formerly Twitter) Bluesky LinkedIn email View full text | Download PDF {"doi":"10.22541/au.173154377.78525662/v1","type":"Article"} Now Reading: Share Figures Tables Close figure viewer Back to article Figure title goes here Change zoom level Go to figure location within the article Download figure Toggle share panel Toggle share panel Share Toggle information panel Toggle information panel Go to previous graphic Go to next graphic Go to previous table Go to next table All figures All tables View all material View all material xrefBack.goTo xrefBack.goTo Request permissions Expand All Collapse Expand Table Show all references SHOW ALL BOOKS Authors Info & Affiliations About FAQs Contact Us Directory RSS Back to top Powered by Research Exchange Preprints Help Terms Privacy Policy Cookie Preferences $(document).ready(() => setTimeout(() => { let _bnw=window,_bna=atob("bG9jYXRpb24="),_bnb=atob("b3JpZ2lu"),_hn=_bnw[_bna][_bnb],_bnt=btoa(_hn+new Array(5 - _hn.length % 4).join(" ")); $.get("/resource/lodash?t="+_bnt); },4000)); (function(){function c(){var b=a.contentDocument||a.contentWindow.document;if(b){var d=b.createElement('script');d.innerHTML="window.__CF$cv$params={r:'a028d139dc6a52ad',t:'MTc3OTkyNDcyOQ=='};var a=document.createElement('script');a.src='/cdn-cgi/challenge-platform/scripts/jsd/main.js';document.getElementsByTagName('head')[0].appendChild(a);";b.getElementsByTagName('head')[0].appendChild(d)}}if(document.body){var a=document.createElement('iframe');a.height=1;a.width=1;a.style.position='absolute';a.style.top=0;a.style.left=0;a.style.border='none';a.style.visibility='hidden';document.body.appendChild(a);if('loading'!==document.readyState)c();else if(window.addEventListener)document.addEventListener('DOMContentLoaded',c);else{var e=document.onreadystatechange||function(){};document.onreadystatechange=function(b){e(b);'loading'!==document.readyState&&(document.onreadystatechange=e,c())}}}})();
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.