ERGA-BGE Reference Genome of Gluvia dorsalis: An Endemic Sun Spider from Iberian Arid Regions

preprint OA: closed CC-BY-4.0
📄 Open PDF Full text JSON View at publisher
Full text 26,069 characters · extracted from oa-pdf · 8 sections · click to expand

Abstract

The reference genome of Gluvia dorsalis is the first of its order Solifugae (sun spiders), offering insights into adaptations to arid environments and the evolutionary history of arachnids. The entirety of the genome sequence was assembled into 5 contiguous chromosomal pseudomolecules. This chromosome- level assembly encompasses 787 Mb, composed of 51 contigs and 10 scaffolds (including the mitogenome), with contig and scaffold N50 values of 38 Mb and 199 Mb, respectively.

Keywords

Gluvia dorsalis , genome assembly, European Reference Genome Atlas, Biodiversity Genomics Europe, Earth Biogenome Project, Arachnida, Solifugae, Daesiidae, Araña camello ibérica, Aranya camell ibèrica Author-formatted document posted on 14/05/2025. DOI:  https://doi.org/10.3897/arphapreprints.e158720 ERGA-BGE Genome Report - Gluvia dorsalis 2

Introduction

Gluvia dorsalis (Latreille, 1817) is a member of the Daesiidae family within the arachnid order Solifugae. Members of this group, commonly known as sun spiders or camel spiders, inhabit arid environments, particularly warm deserts with sparse vegetation, and are rarely found in Europe. Only two species of sun sp iders are known to be present in Western Europe: Gluvia dorsalis, endemic to the arid regions of Spain and Portugal, and G. brunnea Pertegal, Barranco, De Mas, and Moya -Laraño, 2024, recently described in a small region of southern Spain (Pertegal et al., 2024). Gluvia dorsalis is a ground -dwelling arachnid that can reach between 15 and 22 mm in length, with females being larger than males. Although not venomous, it is a fast -moving nocturnal predator that usually hides under stones during daytime. It has a yellow, orange, or reddish prosoma and legs, and a dark abdomen. The two pedipalps are highly developed, and they bear a membranous suctorial organ at the tips that allows the sun spider to capture prey and climb smooth surfaces. The diet of G. dorsalis includes mainly ants and spiders (Hrušková- Martišová et al., 2010) , although it can potentially consume a wider range of prey. Sun spiders possess powerful pincer-like chelicerae projected forward that allow them to capture and consume large prey. Gluvia dorsalis can be distinguished from its relative G. brunnea mainly by its coloration. While G. dorsalis has yellow areas in the palps and legs, G. brunnea is dorsally completely brown. In addition, mature individuals of G. brunnea bear a hypertrophied seta on the basal and internal part of coxa, which is absent in G. dorsalis. These recent findings indicate the potential for greater genetic diversity in sun spiders than previously assumed, though further investigation is needed to confirm this interpretation. Developing a high -quality reference genome for G. dorsalis is crucial for two reasons. Firstly, thi s information will help to improve our understanding of genomic adaptations to extreme environments, in particular to extremely hot and dry regions. Moreover, gaining a better knowledge of the genomics of this sun spider is relevant to understanding the distribution patterns of this species in particular, as well as the global distribution of sun spiders in general. Secondly, evolutionary relationships within arachnids are still among the most challenging phylogenetic relationships to resolve within animals (Lozano-Fernandez et al., 2019; Ballesteros et al., 2022) . This is du e to the old origin of the group, the rapid radiation of all their orders, and the multiple Whole Genome Duplication (WGD) events that some of these groups have undergone, some independent and some shared between orders (Leite et al., 2018) . In the present study, we present the first genome at the chromosome level for any species from the order Solifugae, which will allow us to test whether this group of arachnids has undergone WGD. Moreover, it will greatly help to locate the position of this group, still poorly represented in genetic databases, within the arachnid tree of life. The generation of this reference resource was coordinated by the European Reference Genome Atlas (ERGA) initiative’s Biodiversity Genomics Europe (BGE) project, supporting ERGA’s aims of promoting transnational cooperation to promote advances in the application of genomics technologies to protect and restore biodiversity (Mazzoni et al., 2023). This species falls within the regional reach of the Catalan Initiative for the Earth BioGenome Project (CBP), which is linked to ERGA (Corominas et al., 2024).

Materials

& Methods ERGA's sequencing strategy includes Oxford Nanopore Technology (ONT) and/or Pacific Author-formatted document posted on 14/05/2025. DOI:  https://doi.org/10.3897/arphapreprints.e158720 ERGA-BGE Genome Report - Gluvia dorsalis 3 Biosciences (PacBio) for long-read sequencing, along with Hi -C sequencing for chromosomal architecture, Illumina Paired -End (PE) for polishing (i.e. r ecommended for ONT -only assemblies), and RNA sequencing for transcriptomic profiling, to facilitate genome assembly and annotation. Sample and Sampling Information On August 1 st, 2023, an adult individual of G. dorsalis (sex undetermined, however, morphological appearance suggested a female specimen) was sampled by Attila Ibos. The species was first identified morphologically by Marc Domènech and confirmed through COI barcoding by Jesus Lozano -Fernandez from Universitat de Barcelona. The specimen was caught directly with a plastic tube from the ground in Prenyanosa, Lleida (Catalonia, Spain). Sampling was conducted under the permit SF/0117/23, issued by the Catalan Government. The specimen's tissues (e.g.: cephalothorax, abdomen, and legs) were snap - frozen immediately after harvesting and stored in liquid nitrogen until DNA extraction. Vouchering information Frozen reference tissue material from the sequenced individual (Figure 1) is available at the Biobank of the Museo Nacional de Ciencias Naturales in Madrid (Spain) under the voucher ID MNCN -ADN 151.722. Physical reference

Material

from another individual of the same population has been deposited in the same museum under the accession number MNCN20.02/22140. Data Availability Gluvia dorsalis and the related genomic study were assigned to Tree of Life identifier (ToLID) 'qqGluDors1' and all sample, sequence, and assembly information are available under the umbrella BioProject PRJEB76507. The sample information is available at the following BioSample accessions: SAMEA115728209, SAMEA114558560, and SAMEA114558561. The genome assembly is accessible from ENA under accession number GCA_964187665.1 and the annotated genome is available through the Ensembl Rapid Release page (projects.ensembl.org/erga-bge). Sequencing data produced as part of this project are available from ENA at the following accessions: ERX13168338, ERX12623869, ERX13168339, and ERX12623871. Documentation related to the genome assembly and curation can be found in the ERGA Assembly Report (EAR) document available at github.com/ERGA- consortium/EARs/tree/main/Assembly_Reports/Gl uvia_dorsalis/qqGluDors1. Further details and data about the project are hosted on the ERGA portal at portal.erga- biodiversity.eu/organism/SAMEA114558555. Genetic Information The estimated genome size, based on ancestral taxa is 1.1 Gb, while the estimation based on reads kmer profiling is 0.79 Gb. This is a diploid genome with a haploid number of 5 chromosomes (2n=10). Information for this species was retrieved from Genomes on a Tree (Challis et al., 2023). DNA/RNA processing DNA was extracted from the cephalothorax and abdomen using the Blood & Cell Culture DNA Mini Kit (Qiagen) following the manufacturer’s instructions. DNA quantification was performed using a Qubit dsDNA BR Assay Kit (Thermo Fisher Scientific), and DNA integrity was assessed using a F emtopulse system (Genomic DNA 165 Kb Kit, Agilent). DNA was stored at 4ºC until use. RNA was extracted using an RNeasy Mini Kit (Qiagen) according to the manufacturer’s instructions. RNA was extracted from two different specimen body parts: leg and cephalothorax. RNA quantification was performed using the Qubit RNA BR Kit and RNA integrity was assessed using a Author-formatted document posted on 14/05/2025. DOI:  https://doi.org/10.3897/arphapreprints.e158720 ERGA-BGE Genome Report - Gluvia dorsalis 4 Bioanalyzer 2100 system (Eukaryote Total RNA Pico Kit, Agilent). RNA was pooled in a 1:5 (leg:cephalothorax) ratio before library preparation and stored at -80ºC until use. Library Preparation and Sequencing A long -read whole genome library was prepared using the SQK -LSK114 kit and sequenced on a PromethION P24 A series instrument (Oxford Nanopore Technologies). For short -read whole genome sequencing (WGS), a library was constructed with the KAPA Hyper Prep Kit (Roche) for subsequent sequencing on an Illumina platform. Hi -C library preparation, using cephalothorax and leg tissue, was conducted with the ARIMA High Coverage Hi -C Kit (Arima) and further processed with the KAPA Hyper Prep Kit for Illumina sequencing (Roche). The RNA library, generated from the pooled sample, was prepared with the KAPA mRNA Hyper Prep Kit (Roche). All short -read libraries were sequenced on the Illumina NovaSeq 6000 instrument. In total, 116x Oxford Nanopore, 102x Illumina WGS shotgun, and 97x HiC data were sequenced to generate the assembly. Genome Assembly Methods The genome was assembled using the CNAG CLAWS pipeline (Gomez-Garrido, 2024) . Briefly, reads were preprocessed for quality and length using Trim Galore v0.6.7 and Filtlong v0.2.1, and initial contigs were assembled using NextDenovo v2.5.0, followed by polishing of the assembled contigs using HyPo v1.0.3, removal of retained haplotigs using purge-dups v1.2.6 and scaffolding with YaHS v1.2a. Finally, assembled scaffolds were curated via manual inspection using Pretext v0.2.5 with the Rapid Curation Toolkit ( gitlab.com/wtsi- grit/rapid-curation) to remove any false joins and incorporate any sequences not automatically scaffolded into their respective locations in the chromosomal pseudomolecules (or super - scaffolds). The mitochondrial genome was assembled as a single circular contig of 14,734 bp using the FOAM pipeline v0.5 (github.com/cnag-aat/FOAM) and included in the released assembly (GCA_964187665.1). Summary analysis of the released assembly was performed using the ERGA -BGE Genome Report ASM Galaxy workflow (De Panis, 2024), incorporating tools such as BUSCO v5.5, Merqury v1.3, and others (see reference for the full list of tools). Genome Annotation Methods A gene set was generated using the Ensembl Gene Annotation system (Aken et al., 2016) , primarily by aligning publicly available short - read RNA -seq data from BioSample: SAMEA115728209 to the genome. Gaps in the annotation were filled via protein -to-genome alignments of a select set of arthropod proteins from UniProt (The UniProt Consortium, 2019), which had experimental evidence at the protein or transcript level. At each locus, data were aggregated and consolidated, prioritising models derived from RNA-seq data, resulting in a final set of gene models and associated non- redundant transcript sets. To distinguish true isoforms from fragments, the likelihood of each open reading frame (ORF) was evaluated against known arthropod proteins. Low-quality transcript models, such as those showing evidence of fragmented ORFs, we re removed. In cases where RNA-seq data were fragmented or absent, homology data were prioritised, favouring longer transcripts with strong intron support from short -read data. The resulting gene models were classified into three categories: protein -coding, pseudogene, and long non -coding. Models with hits to known proteins and few structural abnormalities were classified as protein-coding. Models with hits to known proteins but displaying abnormalities, such as the absence of a start codon, non - canonical s plicing, unusually small intron structures (<75 bp), or excessive repeat coverage, were reclassified as pseudogenes. Single-exon models with a corresponding multi-exon copy elsewhere in the genome were Author-formatted document posted on 14/05/2025. DOI:  https://doi.org/10.3897/arphapreprints.e158720 ERGA-BGE Genome Report - Gluvia dorsalis 5 classified as processed (retrotransposed) pseudogenes. Models that did not fit any of the previously described categories did not overlap protein-coding genes, and were constructed from transcriptomic data were considered potential lncRNAs. Potential lncRNAs were further filtered to remove single -exon loci du e to their unreliability. Putative miRNAs were predicted by performing a BLAST search of miRBase (Kozomara et al., 2019) against the genome, followed by RNAfold analysis (Gruber et al., 2008) . Other small non -coding loci were identified by scanning the genome with Rfam (Kalvari et al., 2018) and passing the

Results

through I nfernal (Nawrocki & Eddy, 2013). Summary analysis of the released annotation was performed using the ERGA - BGE Genome Report ANNOT Galaxy workflow (De Panis, 2024a) , incorporating tools such as AGAT v1.2, OMArk v0.3, and others (see reference for the full list of tools).

Results

Genome Assembly The genome assembly had a total length of 787,034,199 bp in 10 scaffolds including the mitogenome (Figures 2 and 3), with a GC content of 39.73%. It featured a contig N50 of 37,604,012 bp (L50=8) and a scaffold N50 of 198,509,873 bp (L50=2). There were 41 gaps, totaling 8,200 kb in cumulative size. The single- copy gene content analysis using the Arachnida database with BUSCO resulted in 94.7% completeness (93.3% single and 1.4% duplicated). Additionally, 95.6% of reads k - mers were present in the assembly and the assembly has a base accuracy Quality Value (QV) of 48.05 as calculated by Merqury. Genome Annotation The genome annotation consists of 14,266 protein-coding genes with an associated 26,432 transcripts (Table 1). Using the longest isoform per transcript, the single -copy gene content analysis using the Arachnida database with BUSCO resulted in 94.8% completeness. Using the OMAmer Arachnida database for OMArk resulted in 94.09% completeness and 58.06% consistency (Table 2). Author-formatted document posted on 14/05/2025. DOI:  https://doi.org/10.3897/arphapreprints.e158720 ERGA-BGE Genome Report - Gluvia dorsalis 6 Figure 1. Electronic voucher image of the sequenced individual of Gluvia dorsalis. The image, along with two others, is available in ERGA's EBI BioImageArchive dataset (www.ebi.ac.uk/biostudies/bioimages/studies/S-BIAD1012?query=ERGA) under accession ID SAMEA114558555. Author-formatted document posted on 14/05/2025. DOI:  https://doi.org/10.3897/arphapreprints.e158720 ERGA-BGE Genome Report - Gluvia dorsalis 7 Table 1. Statistics from assembled gene models No. genes No. transcripts Mean gene length (bp) No. single-exon genes Mean exons per transcript Protein-coding 14,266 26,439 16,829 325 8.2 lncRNA 1,102 1,279 11,580 0 2.5 tRNA 817 817 75 817 1.0 Table 2. Annotation completeness and consistency scores calculated by BUSCO run in protein mode (Arachnida) and OMArk (Arachnida) Complete Singular Duplicated Fragmented Missing BUSCO 2,781 (94.8%) 2,726 (92.9%) 55 (1.9%) 49 (1.7%) 104 (3.5%) OMArk 2,726 (94.09%) 2,628 (90.71%) 98 (3.38%) - 171 (5.90%) Consistent Inconsistent Contaminants Unknown OMArk 8,283 (58.06%) 2,446 (17.15%) 0 (0.00%) 3,537 (24.79%) Author-formatted document posted on 14/05/2025. DOI:  https://doi.org/10.3897/arphapreprints.e158720 ERGA-BGE Genome Report - Gluvia dorsalis 8 Figure 2. Snail plot summary of assembly statistics. The main plot is divided into 1,000 size-ordered bins around the circumference, with each bin representing 0.1% of the 787,034,199 bp assembly including the mitochondrial genome. The distribution of sequence lengt hs is shown in dark grey, with the plot radius scaled to the longest sequence present in the assembly (201,641,468 bp, shown in red). Orange and pale -orange arcs show the scaffold N50 and N90 sequence lengths (198,509,873 and 122,092,752 bp), respectively. The pale grey spiral shows the cumulative sequence count on a log-scale, with white scale lines showing successive orders of magnitude. The blue and pale-blue area around the outside of the plot shows the distribution of GC, AT, and N percentages in the s ame bins as the inner plot. A summary of complete, fragmented, duplicated, and missing BUSCO genes found in the assembled genome from the Arachnida database (odb10) is shown on the top right. Author-formatted document posted on 14/05/2025. DOI:  https://doi.org/10.3897/arphapreprints.e158720 ERGA-BGE Genome Report - Gluvia dorsalis 9 Figure 3. Hi-C contact map showing spatial interactions between regions of the genome. The diagonal corresponds to intra-chromosomal contacts, depicting chromosome boundaries. The frequency of contacts is shown on a logarithmic heatmap scale. Hi -C matrix bins were merged into a 200 kb bin size for plotting. From the 10 Scaffolds including the mitogenome, only the GenBank names of the five chromosomes are shown. Author-formatted document posted on 14/05/2025. DOI:  https://doi.org/10.3897/arphapreprints.e158720 ERGA-BGE Genome Report - Gluvia dorsalis 10

Acknowledgements

We thank Alberto Narro for his willingness to help by providing samples from other popul ations. We acknowledge the support of the Freiburg Galaxy Team: Saim Momin and Björn Grüning, Bioinformatics, University of Freiburg (Germany), funded by the German Federal Ministry of Education and Research BMBF grant 031 A538A de.NBI-RBC and the Ministry of Science, Research and the Arts Baden-Württemberg (MWK) within the framework of LIBIS/de.NBI Freiburg. We would like to acknowledge the assembly reviewer, Tom Mathers, from the Wellcome Sanger Institute. Conflict of Interest The authors declare no conflict of interest related to this study. The funding sources had no involvement in the study design, collection, analysis, or interpretation of data; in the writing of the manuscript; or in the decision to submit the article for publication. All authors have participated sufficiently in the work to take public responsibility for the content and agree to the submission of this manuscript. Funder Information Biodiversity Genomics Europe (Grant no.101059492) is funded by Horizon Europe under the Biodiversity, Circular Economy and Environment call (REA.B.3); co -funded by the Swiss State Secretariat for Education, Research and Innovation (SERI) under contract numbers 22.00173 and 24.00054; and by the UK Research and Innovation (UKRI) under the Department for Business, Energy and Industrial Strategy’s Horizon Europe Guarantee Scheme. This study was partially funded by ‘Ayudas para Incentivar la Consolidación Investigadora’ (CNS2022 -135805) from the AEI with the budget from ‘Ministerio de Ciencia e Innovación’ and ‘Next Generation EU', as well as the project PID2022-137753NA-I00. The author MD was also supported by a Margarita Salas contract by the Spanish Government. Author Contributions JLF coordinated the project, AI and MD collected the species, MD and JLF identified the species, JLF and MD sampled and preserved biological material and provided metadata, RM, TM, RO, THS, and AsB provided sampling and metadata support and management, LA and MG extracted DNA, prepared libraries, and performed sequencing, FCF, JGG and FC performed genome assembly and curation under the supervision of TSA. LH, SS, and FM performed genome annotation. DDP generated the analysis and report. All authors contributed to the writing, review, and editing of this genome note and read and approved the final version. Literature Cited Ballesteros, J. A., Santibáñez-López, C. E., Baker, C. M., Benavides, L. R., Cunha, T. J., Gainett, G., Ontano, A. Z., Setton, E. V. W., Arango, C. P., Gavish-Regev, E., Harvey, M. S., Wheeler, W. C., Hormiga, G., Giribet, G., & Sharma, P. P. (2022). Comprehensive Species Sampling and Sophisticated Algorithmic Approaches Refute the Monophyly of Arachnida. Molecular Biology and Evolution, 39(2). https://doi.org/10.1093/molbev/msac021 Challis, R., Kumar, S., Sotero-Caio, C., Brown, M., & Blaxter, M. (2023). Genomes on a Tree (GoaT): Author-formatted document posted on 14/05/2025. DOI:  https://doi.org/10.3897/arphapreprints.e158720 ERGA-BGE Genome Report - Gluvia dorsalis 11 A versatile, scalable search engine for genomic and sequencing project metadata across the eukaryotic tree of life. Wellcome Open Research , 8, 24. https://doi.org/10.12688/wellcomeopenres.18658.1 Corominas, M., Marquès-Bonet, T., Arnedo, M. A., Bayés, M., Belmonte, J., Escrivà, H., Fernández, R., Gabaldón, T., Garnatje, T., Germain, J., Niell, M., Palero, F., Pons, J., Puigdomènech , P., Initiative For The Earth BioGenome Project, T. C., Catalan initiative for the Earth BioGenome Project, Arroyo, V., Cuevas-Caballé, C., Obiol, J. F., … Guigó, R. (2024). The Catalan initiative for the Earth BioGenome Project: Contributing local data to global biodiversity genomics. NAR Genomics and Bioinformatics, 6(3), lqae075. https://doi.org/10.1093/nargab/lqae075 De Panis, D. (2024). ERGA-BGE Genome Report ASM analyses (one -asm WGS Illumina PE + HiC) . WorkflowHub. https://doi.org/10.48546/WORKFLOWHUB.WORKFLOW.1103.2 Gomez-Garrido, J. (2024). CLAWS (CNAG’s long -read assembly workflow in Snakemake) . WorkflowHub. https://doi.org/10.48546/WORKFLOWHUB.WORKFLOW.567.2 Hrušková-Martišová, M., Pekár, S., & Cardoso, P. (2010). Natural history of the Iberian solifuge Gluvia dorsalis (Solifuges: Daesiidae). The Journal of Arachnology , 38(3), 466 –474. https://doi.org/10.1636/Hi09-104.1 Leite, D. J., Baudouin -Gonzalez, L., Iwasaki -Yokozawa, S., Lozano -Fernandez, J., Turetzek, N., Akiyama-Oda, Y., Prpic, N.-M., Pisani, D., Oda, H., Sharma, P. P., & McGregor, A. P. (2018). Homeobox Gene Duplication and Divergence in Arachnids. Molecular Biology and Evolution, 35(9), 2240–2253. https://doi.org/10.1093/molbev/msy125 Lozano-Fernandez, J., Tanner, A. R., Giacomelli, M., Ca rton, R., Vinther, J., Edgecombe, G. D., & Pisani, D. (2019). Author Correction: Increasing species sampling in chelicerate genomic-scale datasets provides support for monophyly of Acari and Arachnida. Nature Communications, 10, 4534. https://doi.org/10.1038/s41467-019-12259-6 Mazzoni, C., Ciofi, C., & Waterhouse, R. (2023). Biodiversity: An atlas of European reference genomes. Nature, 619, 252–252. https://doi.org/10.1038/d41586-023-02229-w Pertegal, C., Barranco, P., De Mas, E., & Moya-Laraño, J. (2024). More Than 200 Years Later: Gluvia brunnea sp. nov. (Solifugae, Daesiidae), a Second Species of Camel Spider from the Iberian Peninsula. Insects, 15(4), 284. https://doi.org/10.3390/insects15040284 Author-formatted document posted on 14/05/2025. DOI:  https://doi.org/10.3897/arphapreprints.e158720

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

Ask this paper AI returns verbatim quotes from the full text · source: oa-pdf

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2025) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc
last seen: 2026-05-20T01:45:00.602351+00:00
unpaywall
last seen: 2026-05-23T02:00:01.238055+00:00
License: CC-BY-4.0