Abstract
The reference genome of Gluvia dorsalis is the first of its order Solifugae (sun spiders), offering insights
into adaptations to arid environments and the evolutionary history of arachnids. The entirety of the
genome sequence was assembled into 5 contiguous chromosomal pseudomolecules. This chromosome-
level assembly encompasses 787 Mb, composed of 51 contigs and 10 scaffolds (including the
mitogenome), with contig and scaffold N50 values of 38 Mb and 199 Mb, respectively.
Keywords
Gluvia dorsalis , genome assembly, European Reference Genome Atlas, Biodiversity Genomics Europe, Earth
Biogenome Project, Arachnida, Solifugae, Daesiidae, Araña camello ibérica, Aranya camell ibèrica
Author-formatted document posted on 14/05/2025. DOI: https://doi.org/10.3897/arphapreprints.e158720
ERGA-BGE Genome Report - Gluvia dorsalis
2
Introduction
Gluvia dorsalis (Latreille, 1817) is a member of
the Daesiidae family within the arachnid order
Solifugae. Members of this group, commonly
known as sun spiders or camel spiders, inhabit
arid environments, particularly warm deserts
with sparse vegetation, and are rarely found in
Europe. Only two species of sun sp iders are
known to be present in Western Europe: Gluvia
dorsalis, endemic to the arid regions of Spain
and Portugal, and G. brunnea Pertegal,
Barranco, De Mas, and Moya -Laraño, 2024,
recently described in a small region of southern
Spain (Pertegal et al., 2024). Gluvia dorsalis is
a ground -dwelling arachnid that can reach
between 15 and 22 mm in length, with females
being larger than males. Although not
venomous, it is a fast -moving nocturnal
predator that usually hides under stones during
daytime. It has a yellow, orange, or reddish
prosoma and legs, and a dark abdomen. The two
pedipalps are highly developed, and they bear a
membranous suctorial organ at the tips that
allows the sun spider to capture prey and climb
smooth surfaces. The diet of G. dorsalis
includes mainly ants and spiders (Hrušková-
Martišová et al., 2010) , although it can
potentially consume a wider range of prey. Sun
spiders possess powerful pincer-like chelicerae
projected forward that allow them to capture
and consume large prey. Gluvia dorsalis can be
distinguished from its relative G. brunnea
mainly by its coloration. While G. dorsalis has
yellow areas in the palps and legs, G. brunnea
is dorsally completely brown. In addition,
mature individuals of G. brunnea bear a
hypertrophied seta on the basal and internal part
of coxa, which is absent in G. dorsalis. These
recent findings indicate the potential for greater
genetic diversity in sun spiders than previously
assumed, though further investigation is needed
to confirm this interpretation.
Developing a high -quality reference genome
for G. dorsalis is crucial for two reasons.
Firstly, thi s information will help to improve
our understanding of genomic adaptations to
extreme environments, in particular to
extremely hot and dry regions. Moreover,
gaining a better knowledge of the genomics of
this sun spider is relevant to understanding the
distribution patterns of this species in particular,
as well as the global distribution of sun spiders
in general. Secondly, evolutionary relationships
within arachnids are still among the most
challenging phylogenetic relationships to
resolve within animals (Lozano-Fernandez et
al., 2019; Ballesteros et al., 2022) . This is du e
to the old origin of the group, the rapid radiation
of all their orders, and the multiple Whole
Genome Duplication (WGD) events that some
of these groups have undergone, some
independent and some shared between orders
(Leite et al., 2018) . In the present study, we
present the first genome at the chromosome
level for any species from the order Solifugae,
which will allow us to test whether this group
of arachnids has undergone WGD. Moreover, it
will greatly help to locate the position of this
group, still poorly represented in genetic
databases, within the arachnid tree of life.
The generation of this reference resource was
coordinated by the European Reference
Genome Atlas (ERGA) initiative’s Biodiversity
Genomics Europe (BGE) project, supporting
ERGA’s aims of promoting transnational
cooperation to promote advances in the
application of genomics technologies to protect
and restore biodiversity (Mazzoni et al., 2023).
This species falls within the regional reach of
the Catalan Initiative for the Earth BioGenome
Project (CBP), which is linked to ERGA
(Corominas et al., 2024).
Materials
& Methods
ERGA's sequencing strategy includes Oxford
Nanopore Technology (ONT) and/or Pacific
Author-formatted document posted on 14/05/2025. DOI: https://doi.org/10.3897/arphapreprints.e158720
ERGA-BGE Genome Report - Gluvia dorsalis
3
Biosciences (PacBio) for long-read sequencing,
along with Hi -C sequencing for chromosomal
architecture, Illumina Paired -End (PE) for
polishing (i.e. r ecommended for ONT -only
assemblies), and RNA sequencing for
transcriptomic profiling, to facilitate genome
assembly and annotation.
Sample and Sampling Information
On August 1 st, 2023, an adult individual of G.
dorsalis (sex undetermined, however,
morphological appearance suggested a female
specimen) was sampled by Attila Ibos. The
species was first identified morphologically by
Marc Domènech and confirmed through COI
barcoding by Jesus Lozano -Fernandez from
Universitat de Barcelona. The specimen was
caught directly with a plastic tube from the
ground in Prenyanosa, Lleida (Catalonia,
Spain). Sampling was conducted under the
permit SF/0117/23, issued by the Catalan
Government. The specimen's tissues (e.g.:
cephalothorax, abdomen, and legs) were snap -
frozen immediately after harvesting and stored
in liquid nitrogen until DNA extraction.
Vouchering information
Frozen reference tissue material from the
sequenced individual (Figure 1) is available at
the Biobank of the Museo Nacional de Ciencias
Naturales in Madrid (Spain) under the voucher
ID MNCN -ADN 151.722. Physical reference
Material
from another individual of the same
population has been deposited in the same
museum under the accession number
MNCN20.02/22140.
Data Availability
Gluvia dorsalis and the related genomic study
were assigned to Tree of Life identifier (ToLID)
'qqGluDors1' and all sample, sequence, and
assembly information are available under the
umbrella BioProject PRJEB76507. The sample
information is available at the following
BioSample accessions: SAMEA115728209,
SAMEA114558560, and SAMEA114558561.
The genome assembly is accessible from ENA
under accession number GCA_964187665.1
and the annotated genome is available through
the Ensembl Rapid Release page
(projects.ensembl.org/erga-bge). Sequencing data
produced as part of this project are available
from ENA at the following accessions:
ERX13168338, ERX12623869,
ERX13168339, and ERX12623871.
Documentation related to the genome assembly
and curation can be found in the ERGA
Assembly Report (EAR) document available at
github.com/ERGA-
consortium/EARs/tree/main/Assembly_Reports/Gl
uvia_dorsalis/qqGluDors1. Further details and
data about the project are hosted on the ERGA
portal at portal.erga-
biodiversity.eu/organism/SAMEA114558555.
Genetic Information
The estimated genome size, based on ancestral
taxa is 1.1 Gb, while the estimation based on
reads kmer profiling is 0.79 Gb. This is a diploid
genome with a haploid number of 5
chromosomes (2n=10). Information for this
species was retrieved from Genomes on a Tree
(Challis et al., 2023).
DNA/RNA processing
DNA was extracted from the cephalothorax and
abdomen using the Blood & Cell Culture DNA
Mini Kit (Qiagen) following the manufacturer’s
instructions. DNA quantification was
performed using a Qubit dsDNA BR Assay Kit
(Thermo Fisher Scientific), and DNA integrity
was assessed using a F emtopulse system
(Genomic DNA 165 Kb Kit, Agilent). DNA
was stored at 4ºC until use.
RNA was extracted using an RNeasy Mini Kit
(Qiagen) according to the manufacturer’s
instructions. RNA was extracted from two
different specimen body parts: leg and
cephalothorax. RNA quantification was
performed using the Qubit RNA BR Kit and
RNA integrity was assessed using a
Author-formatted document posted on 14/05/2025. DOI: https://doi.org/10.3897/arphapreprints.e158720
ERGA-BGE Genome Report - Gluvia dorsalis
4
Bioanalyzer 2100 system (Eukaryote Total
RNA Pico Kit, Agilent). RNA was pooled in a
1:5 (leg:cephalothorax) ratio before library
preparation and stored at -80ºC until use.
Library Preparation and Sequencing
A long -read whole genome library was
prepared using the SQK -LSK114 kit and
sequenced on a PromethION P24 A series
instrument (Oxford Nanopore Technologies).
For short -read whole genome sequencing
(WGS), a library was constructed with the
KAPA Hyper Prep Kit (Roche) for subsequent
sequencing on an Illumina platform. Hi -C
library preparation, using cephalothorax and leg
tissue, was conducted with the ARIMA High
Coverage Hi -C Kit (Arima) and further
processed with the KAPA Hyper Prep Kit for
Illumina sequencing (Roche). The RNA library,
generated from the pooled sample, was
prepared with the KAPA mRNA Hyper Prep
Kit (Roche). All short -read libraries were
sequenced on the Illumina NovaSeq 6000
instrument. In total, 116x Oxford Nanopore,
102x Illumina WGS shotgun, and 97x HiC data
were sequenced to generate the assembly.
Genome Assembly Methods
The genome was assembled using the CNAG
CLAWS pipeline (Gomez-Garrido, 2024) .
Briefly, reads were preprocessed for quality and
length using Trim Galore v0.6.7 and Filtlong
v0.2.1, and initial contigs were assembled using
NextDenovo v2.5.0, followed by polishing of
the assembled contigs using HyPo v1.0.3,
removal of retained haplotigs using purge-dups
v1.2.6 and scaffolding with YaHS v1.2a.
Finally, assembled scaffolds were curated via
manual inspection using Pretext v0.2.5 with the
Rapid Curation Toolkit ( gitlab.com/wtsi-
grit/rapid-curation) to remove any false joins and
incorporate any sequences not automatically
scaffolded into their respective locations in the
chromosomal pseudomolecules (or super -
scaffolds). The mitochondrial genome was
assembled as a single circular contig of 14,734
bp using the FOAM pipeline v0.5
(github.com/cnag-aat/FOAM) and included in the
released assembly (GCA_964187665.1).
Summary analysis of the released assembly was
performed using the ERGA -BGE Genome
Report ASM Galaxy workflow (De Panis,
2024), incorporating tools such as BUSCO
v5.5, Merqury v1.3, and others (see reference
for the full list of tools).
Genome Annotation Methods
A gene set was generated using the Ensembl
Gene Annotation system (Aken et al., 2016) ,
primarily by aligning publicly available short -
read RNA -seq data from BioSample:
SAMEA115728209 to the genome. Gaps in the
annotation were filled via protein -to-genome
alignments of a select set of arthropod proteins
from UniProt (The UniProt Consortium, 2019),
which had experimental evidence at the protein
or transcript level. At each locus, data were
aggregated and consolidated, prioritising
models derived from RNA-seq data, resulting in
a final set of gene models and associated non-
redundant transcript sets. To distinguish true
isoforms from fragments, the likelihood of each
open reading frame (ORF) was evaluated
against known arthropod proteins. Low-quality
transcript models, such as those showing
evidence of fragmented ORFs, we re removed.
In cases where RNA-seq data were fragmented
or absent, homology data were prioritised,
favouring longer transcripts with strong intron
support from short -read data. The resulting
gene models were classified into three
categories: protein -coding, pseudogene, and
long non -coding. Models with hits to known
proteins and few structural abnormalities were
classified as protein-coding. Models with hits to
known proteins but displaying abnormalities,
such as the absence of a start codon, non -
canonical s plicing, unusually small intron
structures (<75 bp), or excessive repeat
coverage, were reclassified as pseudogenes.
Single-exon models with a corresponding
multi-exon copy elsewhere in the genome were
Author-formatted document posted on 14/05/2025. DOI: https://doi.org/10.3897/arphapreprints.e158720
ERGA-BGE Genome Report - Gluvia dorsalis
5
classified as processed (retrotransposed)
pseudogenes. Models that did not fit any of the
previously described categories did not overlap
protein-coding genes, and were constructed
from transcriptomic data were considered
potential lncRNAs. Potential lncRNAs were
further filtered to remove single -exon loci du e
to their unreliability. Putative miRNAs were
predicted by performing a BLAST search of
miRBase (Kozomara et al., 2019) against the
genome, followed by RNAfold analysis
(Gruber et al., 2008) . Other small non -coding
loci were identified by scanning the genome
with Rfam (Kalvari et al., 2018) and passing the
Results
through I nfernal (Nawrocki & Eddy,
2013). Summary analysis of the released
annotation was performed using the ERGA -
BGE Genome Report ANNOT Galaxy
workflow (De Panis, 2024a) , incorporating
tools such as AGAT v1.2, OMArk v0.3, and
others (see reference for the full list of tools).
Results
Genome Assembly
The genome assembly had a total length of
787,034,199 bp in 10 scaffolds including the
mitogenome (Figures 2 and 3), with a GC
content of 39.73%. It featured a contig N50 of
37,604,012 bp (L50=8) and a scaffold N50 of
198,509,873 bp (L50=2). There were 41 gaps,
totaling 8,200 kb in cumulative size. The single-
copy gene content analysis using the Arachnida
database with BUSCO resulted in 94.7%
completeness (93.3% single and 1.4%
duplicated). Additionally, 95.6% of reads k -
mers were present in the assembly and the
assembly has a base accuracy Quality Value
(QV) of 48.05 as calculated by Merqury.
Genome Annotation
The genome annotation consists of 14,266
protein-coding genes with an associated 26,432
transcripts (Table 1). Using the longest isoform
per transcript, the single -copy gene content
analysis using the Arachnida database with
BUSCO resulted in 94.8% completeness. Using
the OMAmer Arachnida database for OMArk
resulted in 94.09% completeness and 58.06%
consistency (Table 2).
Author-formatted document posted on 14/05/2025. DOI: https://doi.org/10.3897/arphapreprints.e158720
ERGA-BGE Genome Report - Gluvia dorsalis
6
Figure 1. Electronic voucher image of the sequenced individual of Gluvia dorsalis. The image, along
with two others, is available in ERGA's EBI BioImageArchive dataset
(www.ebi.ac.uk/biostudies/bioimages/studies/S-BIAD1012?query=ERGA) under accession ID
SAMEA114558555.
Author-formatted document posted on 14/05/2025. DOI: https://doi.org/10.3897/arphapreprints.e158720
ERGA-BGE Genome Report - Gluvia dorsalis
7
Table 1. Statistics from assembled gene models
No. genes No.
transcripts
Mean gene
length (bp)
No. single-exon
genes
Mean exons per
transcript
Protein-coding 14,266 26,439 16,829 325 8.2
lncRNA 1,102 1,279 11,580 0 2.5
tRNA 817 817 75 817 1.0
Table 2. Annotation completeness and consistency scores calculated by BUSCO run in protein
mode (Arachnida) and OMArk (Arachnida)
Complete Singular Duplicated Fragmented Missing
BUSCO 2,781 (94.8%) 2,726 (92.9%) 55 (1.9%) 49 (1.7%) 104 (3.5%)
OMArk 2,726 (94.09%) 2,628 (90.71%) 98 (3.38%) - 171 (5.90%)
Consistent Inconsistent Contaminants Unknown
OMArk 8,283 (58.06%) 2,446 (17.15%) 0 (0.00%) 3,537 (24.79%)
Author-formatted document posted on 14/05/2025. DOI: https://doi.org/10.3897/arphapreprints.e158720
ERGA-BGE Genome Report - Gluvia dorsalis
8
Figure 2. Snail plot summary of assembly statistics. The main plot is divided into 1,000 size-ordered
bins around the circumference, with each bin representing 0.1% of the 787,034,199 bp assembly
including the mitochondrial genome. The distribution of sequence lengt hs is shown in dark grey, with
the plot radius scaled to the longest sequence present in the assembly (201,641,468 bp, shown in red).
Orange and pale -orange arcs show the scaffold N50 and N90 sequence lengths (198,509,873 and
122,092,752 bp), respectively. The pale grey spiral shows the cumulative sequence count on a log-scale,
with white scale lines showing successive orders of magnitude. The blue and pale-blue area around the
outside of the plot shows the distribution of GC, AT, and N percentages in the s ame bins as the inner
plot. A summary of complete, fragmented, duplicated, and missing BUSCO genes found in the
assembled genome from the Arachnida database (odb10) is shown on the top right.
Author-formatted document posted on 14/05/2025. DOI: https://doi.org/10.3897/arphapreprints.e158720
ERGA-BGE Genome Report - Gluvia dorsalis
9
Figure 3. Hi-C contact map showing spatial interactions between regions of the genome. The
diagonal corresponds to intra-chromosomal contacts, depicting chromosome boundaries. The frequency
of contacts is shown on a logarithmic heatmap scale. Hi -C matrix bins were merged into a 200 kb bin
size for plotting. From the 10 Scaffolds including the mitogenome, only the GenBank names of the five
chromosomes are shown.
Author-formatted document posted on 14/05/2025. DOI: https://doi.org/10.3897/arphapreprints.e158720
ERGA-BGE Genome Report - Gluvia dorsalis
10
Acknowledgements
We thank Alberto Narro for his willingness to help by providing samples from other popul ations. We
acknowledge the support of the Freiburg Galaxy Team: Saim Momin and Björn Grüning,
Bioinformatics, University of Freiburg (Germany), funded by the German Federal Ministry of Education
and Research BMBF grant 031 A538A de.NBI-RBC and the Ministry of Science, Research and the Arts
Baden-Württemberg (MWK) within the framework of LIBIS/de.NBI Freiburg. We would like to
acknowledge the assembly reviewer, Tom Mathers, from the Wellcome Sanger Institute.
Conflict of Interest
The authors declare no conflict of interest related to this study. The funding sources had no involvement
in the study design, collection, analysis, or interpretation of data; in the writing of the manuscript; or in
the decision to submit the article for publication. All authors have participated sufficiently in the work
to take public responsibility for the content and agree to the submission of this manuscript.
Funder Information
Biodiversity Genomics Europe (Grant no.101059492) is funded by Horizon Europe under the
Biodiversity, Circular Economy and Environment call (REA.B.3); co -funded by the Swiss State
Secretariat for Education, Research and Innovation (SERI) under contract numbers 22.00173 and
24.00054; and by the UK Research and Innovation (UKRI) under the Department for Business, Energy
and Industrial Strategy’s Horizon Europe Guarantee Scheme. This study was partially funded by
‘Ayudas para Incentivar la Consolidación Investigadora’ (CNS2022 -135805) from the AEI with the
budget from ‘Ministerio de Ciencia e Innovación’ and ‘Next Generation EU', as well as the project
PID2022-137753NA-I00. The author MD was also supported by a Margarita Salas contract by the
Spanish Government.
Author Contributions
JLF coordinated the project, AI and MD collected the species, MD and JLF identified the species, JLF
and MD sampled and preserved biological material and provided metadata, RM, TM, RO, THS, and
AsB provided sampling and metadata support and management, LA and MG extracted DNA, prepared
libraries, and performed sequencing, FCF, JGG and FC performed genome assembly and curation under
the supervision of TSA. LH, SS, and FM performed genome annotation. DDP generated the analysis
and report. All authors contributed to the writing, review, and editing of this genome note and read and
approved the final version.
Literature Cited
Ballesteros, J. A., Santibáñez-López, C. E., Baker, C. M., Benavides, L. R., Cunha, T. J., Gainett, G.,
Ontano, A. Z., Setton, E. V. W., Arango, C. P., Gavish-Regev, E., Harvey, M. S., Wheeler, W.
C., Hormiga, G., Giribet, G., & Sharma, P. P. (2022). Comprehensive Species Sampling and
Sophisticated Algorithmic Approaches Refute the Monophyly of Arachnida. Molecular Biology
and Evolution, 39(2). https://doi.org/10.1093/molbev/msac021
Challis, R., Kumar, S., Sotero-Caio, C., Brown, M., & Blaxter, M. (2023). Genomes on a Tree (GoaT):
Author-formatted document posted on 14/05/2025. DOI: https://doi.org/10.3897/arphapreprints.e158720
ERGA-BGE Genome Report - Gluvia dorsalis
11
A versatile, scalable search engine for genomic and sequencing project metadata across the
eukaryotic tree of life. Wellcome Open Research , 8, 24.
https://doi.org/10.12688/wellcomeopenres.18658.1
Corominas, M., Marquès-Bonet, T., Arnedo, M. A., Bayés, M., Belmonte, J., Escrivà, H., Fernández,
R., Gabaldón, T., Garnatje, T., Germain, J., Niell, M., Palero, F., Pons, J., Puigdomènech , P.,
Initiative For The Earth BioGenome Project, T. C., Catalan initiative for the Earth BioGenome
Project, Arroyo, V., Cuevas-Caballé, C., Obiol, J. F., … Guigó, R. (2024). The Catalan initiative
for the Earth BioGenome Project: Contributing local data to global biodiversity genomics. NAR
Genomics and Bioinformatics, 6(3), lqae075. https://doi.org/10.1093/nargab/lqae075
De Panis, D. (2024). ERGA-BGE Genome Report ASM analyses (one -asm WGS Illumina PE + HiC) .
WorkflowHub. https://doi.org/10.48546/WORKFLOWHUB.WORKFLOW.1103.2
Gomez-Garrido, J. (2024). CLAWS (CNAG’s long -read assembly workflow in Snakemake) .
WorkflowHub. https://doi.org/10.48546/WORKFLOWHUB.WORKFLOW.567.2
Hrušková-Martišová, M., Pekár, S., & Cardoso, P. (2010). Natural history of the Iberian solifuge Gluvia
dorsalis (Solifuges: Daesiidae). The Journal of Arachnology , 38(3), 466 –474.
https://doi.org/10.1636/Hi09-104.1
Leite, D. J., Baudouin -Gonzalez, L., Iwasaki -Yokozawa, S., Lozano -Fernandez, J., Turetzek, N.,
Akiyama-Oda, Y., Prpic, N.-M., Pisani, D., Oda, H., Sharma, P. P., & McGregor, A. P. (2018).
Homeobox Gene Duplication and Divergence in Arachnids. Molecular Biology and Evolution,
35(9), 2240–2253. https://doi.org/10.1093/molbev/msy125
Lozano-Fernandez, J., Tanner, A. R., Giacomelli, M., Ca rton, R., Vinther, J., Edgecombe, G. D., &
Pisani, D. (2019). Author Correction: Increasing species sampling in chelicerate genomic-scale
datasets provides support for monophyly of Acari and Arachnida. Nature Communications, 10,
4534. https://doi.org/10.1038/s41467-019-12259-6
Mazzoni, C., Ciofi, C., & Waterhouse, R. (2023). Biodiversity: An atlas of European reference genomes.
Nature, 619, 252–252. https://doi.org/10.1038/d41586-023-02229-w
Pertegal, C., Barranco, P., De Mas, E., & Moya-Laraño, J. (2024). More Than 200 Years Later: Gluvia
brunnea sp. nov. (Solifugae, Daesiidae), a Second Species of Camel Spider from the Iberian
Peninsula. Insects, 15(4), 284. https://doi.org/10.3390/insects15040284
Author-formatted document posted on 14/05/2025. DOI: https://doi.org/10.3897/arphapreprints.e158720
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.