AQUARIUM_HB: a bioinformatics pipeline for human blood circular RNA analysis

doi:10.21203/rs.3.rs-5657706/v1

AQUARIUM_HB: a bioinformatics pipeline for human blood circular RNA analysis

2024 · doi:10.21203/rs.3.rs-5657706/v1

preprint OA: closed

Full text JSON View at publisher

Full text 95,287 characters · extracted from preprint-html · click to expand

AQUARIUM_HB: a bioinformatics pipeline for human blood circular RNA analysis | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article AQUARIUM_HB: a bioinformatics pipeline for human blood circular RNA analysis Shaoxun Yuan, Xue Bai, Linwei Li, Wanjun Gu This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-5657706/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract Accurately identifying and quantifying human blood circular RNAs (circRNAs) from RNA-seq data is a critical bioinformatics challenge in biomarker discovery for human diseases. In this study, we present AQUARIUM-HB , a comprehensive bioinformatics pipeline for identifying, quantifying, annotating, and analyzing circRNAs from human blood transcriptomes. AQUARIUM-HB includes three functional modules. First, it identifies and annotates circRNAs from rRNA-depleted RNA-seq datasets of human blood samples. Second, it performs an in-depth expression analysis of blood circRNAs. Third, it constructs a reference set of full-length blood circRNAs. We demonstrate the application of AQUARIUM-HB using a human blood RNA-seq dataset from COVID-19 patients, showcasing its potential for improving the accuracy and depth of circRNA biomarker discovery. Bioinformatics AQUARIUM-HB bioinformatics circRNA RNA-seq blood Figures Figure 1 Figure 2 Figure 3 Highlights AQUARIUM-HB identifies and quantifies circRNAs at the transcript level from high-throughput RNA sequencing (HT RNA-seq) data in human peripheral blood samples. AQUARIUM-HB dynamically constructs a reference set of full-length blood-derived circRNAs, thereby enhancing the accuracy of circRNA identification and quantification. AQUARIUM-HB can be applied to discover circRNA biomarkers from blood HT RNA-seq data for human diseases. Introduction Liquid biopsies, which use body fluids such as blood or urine, provide non-invasive, real-time approach for disease monitoring compared to traditional tissue biopsies 1 . Peripheral blood is particularly favored for its ease of collection, minimal invasiveness and comprehensive information content. Among the numerous biomarkers found in blood, including circulating tumor cells and extracellular vesicles, RNA-based molecular markers have gained significant attention due to their dynamic expression patterns and close association with disease states 2 , 3 . Circular RNAs (CircRNAs), in particular, stand out because of their higher stability 4 and specificity 5 compared to traditional linear RNAs. Recent studies have demonstrated crucial roles of blood circRNAs in intercellular communication and disease progression of severe diseases such as cancers 6 , suggesting their promising application in liquid biopsies 7 . The identification and quantification of circRNAs from high-throughput RNA sequencing (HT RNA-seq) of rRNA-depleted blood samples allow for a more profound understanding of circRNA expression dynamics. This understanding enhances their potential utility in diagnosing and prognosing complex diseases. Although various computational tools with differing performance levels have been developed for circRNA quantification from HT RNA-seq data 8 – 12 , several challenges remain 13 . For example, most existing tools quantify circRNA expression by normalizing read counts across back-splice junction (BSJ) sites 8 , 10 – 12 , 14 . However, this approach may introduce biased estimation of circRNA expression due to the uneven coverage of sequencing reads and the generally lower expression levels of circular transcripts. In our previous studies, we proposed a pseudo-linear transformation for circular transcripts 15 and implemented a model-based strategy to quantify circRNAs using the full-length RNA structure of these transcripts 16 . This computational framework has been shown to accurately and simultaneously quantify the expression levels of both circular and linear RNA transcripts from rRNA-depleted HT RNA-Seq data 4 , 6 . In particular, the full-length structure of circRNAs in peripheral blood samples can enhance the accuracy of circRNA quantification 16 . Therefore, establishing a reference set of human blood full-length circRNAs is crucial for accurate circRNA quantification from RNA-seq datasets. Full-length circRNAs can be obtained from two sources. First, Oxford Nanopore Technology (ONT) has been successfully used to identify full-length circRNAs from tissue samples and cell lines of several model organisms, including circFL-seq 17 , circNick-LRS 18 , IsoCirc 19 and CIRI-long 20 . For instance, Xin et al. identified 107,147 full-length circRNA isoforms across 12 human tissues and one human cell line, which are publicly accessible through the UCSC Genome Browser 19 . Additionally, two public circRNA databases, FLcircAS 21 and circAtlas 22 , 23 , have compiled over one million human full-length circRNA isoforms from ONT datasets. Second, several computational tools have been developed to reconstruct the internal structure of circular transcripts from HT RNA-seq data 14 . For example, CIRI-full utilizes BSJ sites and reverse overlap features of short sequencing reads to assemble full-length circRNA sequences 14 . Other tools, such as psirc 24 , c ircRNA-full 25 , CYCLER 26 and FEICP 27 , leverage chimeric alignment information from short sequencing reads to reconstruct full-length circular transcripts. Given the wealth of publicly available HT RNA-seq datasets, this second source remains the primary means of assessing genome-wide full-length circRNA profiles. To overcome the need for reliable full-length reconstruction of human blood circRNAs while simultaneously quantifying and analyzing both linear and circular transcripts from HT RNA-seq data, we developed AQUARIUM-HB . This pipeline identifies, annotates, quantifies, and performs expression analysis of human blood circRNAs from HT RNA-seq datasets (Fig. 1 ). Using this pipeline, the reference set of full-length blood circRNAs can be dynamically expanded by incorporating additional human blood HT RNA-seq datasets. By incorporating known full-length circRNAs from public databases and dynamically expanding its reference set with new RNA-seq datasets, AQUARIUM-HB aims to improve the identification and quantification of blood circRNAs, thus advancing their application in biomarker discovery and contributing to improved diagnostic and therapeutic strategies. Methods Identification of human blood circRNAs from HT RNA-seq data In AQUARIUM 16 , we incorporated the reconstructed circRNAs from CIRI-full 14 to improve the quantification accuracy of circRNA expressions. Due to the short length of sequencing reads or insufficient sequencing depth of HT RNA-seq, CIRI-full may not completely characterize the full-length sequence of some circRNAs. In this case, AQUARIUM utilizes the gene annotation of human genome to facilitate the internal structure reconstruction of circRNAs 16 . However, this strategy does not account the internal structures in large number of known full-length circRNAs derived from ONT or HT RNA-seq datasets. To increase the accuracy of circRNA quantification, we integrated the full-length circRNAs in FLcircAS 21 or IsoCirc 19 databases in the AQUARIUM-HB pipeline, together with the full-length circRNAs obtained from CIRI-full identification of HT RNA-seq datasets ( Figure 1A ). We reconstructed the full-length sequence of circRNAs identified by CIRI-full as follows. For circRNAs that are identified as the “ full ” transcripts by CIRI-full , the complete sequences from the CIRI-full output were used in subsequent analysis. For circRNAs that were identified as the “ break ” or “ BSJ only ” transcripts by CIRI-full , the internal structures of human full-length circRNAs from different sources were used to reconstruct the full-length sequence of these incomplete circRNAs. First, blood full-length circRNAs in the FLcircAS 21 or IsoCirc 19 databases, along with those full-length transcripts identified from blood HT RNA-seq datasets, were prioritized for internal structure reconstruction ( Priority-1 ). If this is insufficient, circRNAs from non-blood tissues in the FLcircAS 21 or IsoCirc 19 databases were utilized ( Priority-2 ). Lastly, if required, gene annotations are employed to complete circRNA structures, ensuring robust and comprehensive identification ( Priority-3 ). Annotation of human blood circRNAs The identified human blood circRNAs were annotated using the terms listed in Table 1 ( Figure 1B ). First, each circRNA is assigned a uniform ID with the terminology proposed by Chen et al 28 , ensuring its consistency across different databases. For example, circUBXN4(2,3,L4,5) , which 2,3,4,5 indicate exons 2, 3, 4 and 5, while L before exon 4 indicate 5' alternative splicing of exon 4. To connect with the knowledge deposited in other circRNA databases, the aliases ID of each circRNA in several existing circRNA databases, including FLcircAS 21 , TransCirc 29 , circAtlas 22,23 , circBase 30 , and PltDB 31 , were annotated as well. Next, the reconstruction source of each circRNA is documented, indicating whether it is reconstructed by CIRI-full from RNA-seq data, or it is complemented by full-length circRNAs in FLcircAS and/or IsoCirc databases. Finally, circRNAs are classified by confidence level based on their reconstruction method and detection frequency. The Level-1 circRNAs should meet two criteria. First, they should have their full-length sequences reconstructed by CIRI-full . Second, these circRNAs are detected in at least five samples by CIRI-full , or they have been deposited in the FLcircAS and/or IsoCirc databases. The Level-2 circRNAs should have their full-length sequences reconstructed by CIRI-full , be detected in less than five samples, and not be deposited in the FLcircAS and/or IsoCirc databases. The Level-3 circRNAs are those incomplete circRNAs that are reconstructed using full-length circRNAs from FLcircAS and/or IsoCirc and/or HT RNA-seq blood samples. The Level-4 circRNAs are those incomplete circRNAs that are reconstructed using gene annotation of human genome. Expression analysis of human blood circRNAs Following reconstruction, we quantified the expression of both linear and circular transcripts simultaneously at the transcript level using AQUARIUM 16 , a model-based framework that we have developed for circRNA quantification ( Figure 1C ).To minimize the interference from highly abundant transcripts in blood samples, we ignored the expression of both circular and linear RNA transcripts from 13 hemoglobin related gene and 171 ribosomal genes in HGNC database 32 . Next, we kept only the transcripts from protein-coding genes and recalculated the TPM (Transcripts per Million) expression values for all circular and linear transcripts. For each circRNA, TPM values at the isoform, BSJ, and gene levels were aggregated, providing a comprehensive profile of circRNA expressions. Differential expression analyses across groups were performed using DESeq2 33 , with subsequent gene set enrichment analyses in GO 34 , KEGG 35 , Reactome 36 , and GSEA 37 to reveal underlying biological functions and pathways ( Figure 1D ). Dynamic expansion of the reference set of human blood full-length circRNAs A high-quality reference set of human blood full-length circRNAs can improve the accuracy of circRNA identification and quantification in peripheral blood samples. AQUARIUM-HB is capable of constructing a reference set of human blood full-length circRNAs sourced from the ONT and HT RNA-seq datasets ( Figure 1E ). This reference set includes two components. The first is derived from full-length circRNAs in the FLcircAS 21 and IsoCirc 19 databases. The second component includes the full-length circRNAs identified from public HT RNA-seq datasets of human blood samples. Notably, the second component is dynamic, allowing continuous expansion and updates as new HT RNA-seq datasets become available. This iterative enhancement ensures both accuracy and comprehensiveness in circRNA identification and quantification from human blood samples, advancing our understanding of their roles in diseases. Data To exemplify our AQUARIUM-HB pipeline, we downloaded a HT RNA-seq dataset from the GEO database 38 (accession number: GSE172114). This dataset includes 69 whole blood RNA samples from COVID-19 patients, comprising 46 critical and 23 non-critical patients at the time of hospitalization 39 . RNA-seq libraries were prepared using the TruSeq Stranded Total RNA with Ribo-Zero Globin kit (Illumina) and sequenced on the Illumina NovaSeq 6000 platform with S4 flow cells, generating 151-base pair paired-end reads. Results Identification of human blood circRNAs We first applied the AQUARIUM-HB pipeline to analyze HT RNA-seq data from whole blood samples of 69 COVID-19 patients 39 . Using databases FLcircAS 21 and IsoCirc 19 , we retrieved 275,165 and 31,998 blood-derived circRNAs, respectively. Among these, FLcircAS contains 275,165 blood-derived circRNAs (14.8%), and IsoCirc contains 31,998 blood-derived circRNAs (29.9%) (Fig. 2 A). Next, the internal structures of circRNAs from the HT RNA-seq data were reconstructed by CIRI-full using the identified BSJ sites and overlapping sequences between paired-end reads. In this dataset, a total of 128,342 circRNAs were identified. Of these, the full-length sequences of 66,837 (52.1%) circRNAs were completely reconstructed, while the remaining circRNAs were partially reconstructed (47,102 circRNAs, 36.7%) or only had the BSJ site identified (14,403 circRNAs, 11.2%) (Fig. 2 B). Then, these partially reconstructed circRNAs or BSJ only circRNAs were supplementally extended using the pipeline's strategy (Fig. 1 A). Among them, 34,031 circRNAs (55.3%) were supplemented using existing blood-derived full-length circRNAs in databases or from blood samples ( Priority-1 , Fig. 2 C). 6,164 circRNAs (10%) were completed by full-length circRNAs from non-blood tissues in FLcircAS 21 or IsoCirc 19 databases ( Priority-2 , Fig. 2 C). The remaining 21,310 circRNAs (34.6%) were supplemented using gene annotation of human genome ( Priority-3 , Fig. 2 C). These results indicate that the majority of partially reconstructed circRNAs could be effectively supplemented using known full-length circRNAs in blood, underscoring the importance of using blood full-length circRNA databases in reconstruction of circRNA internal structure. Additionally, the use of known full-length circRNAs in non-blood tissues and the genomic annotation can facilitate the pipeline’s ability in addressing the gaps in circRNA internal structure. Annotation of human blood circRNAs Next, these 128,342 blood circRNAs were annotated by our AQUARIUM-HB pipeline (Fig. 1 B). Among them, a majority (117,489, 91.5%) are exonic circRNAs (Fig. 2 D) with five or five less exons (Fig. 2 E). In terms of transcript length, most circRNAs are shorter than 1,000 base pairs (Fig. 2 F). Regarding the confidence level, most circRNAs (76,571, 85.7%) were identified in less than five samples (Fig. 2 G), indicating potential sample specificity or low expression levels. For 52.1% circRNAs that are completely reconstructed by CIRI-full , almost half (30,900, 46.2%) were identified in at least five samples or already deposited in FLcircAS and/or IsoCirc databases ( Level-1 , Fig. 2 H). The remaining 35,937 CIRI-full completely reconstructed blood circRNAs were newly identified in human blood ( Level-2 , Fig. 2 H). For 47.9% circRNAs that are incompletely reconstructed by CIRI-full , 40,195 (65.4%) were complementally supplemented by full-length circRNAs in FLcircAS or IsoCirc databases or blood samples ( Level-3 , Fig. 2 H), while the remaining 21,310 (34.6%) were completed using gene annotation of human genome ( Level-4 , Fig. 2 H). These blood circRNAs were associated with 9,308 human genes, with most genes transcribed only a single circRNA (Fig. 2 I). However, 397 genes exhibited high levels of alternative splicing of circular transcripts, with each of these genes corresponding to more than 50 transcripts. Functional enrichment analysis showed these highly spliced genes are significantly involved in pathways like cell cycle regulation, ubiquitin-mediated proteolysis, and chemokine signaling (Fig. 2 J). Expression analysis of circRNA profiles CircRNAs generally exhibit lower expression levels than their linear counterparts. can exhibit varying expression levels from the same gene depending on the context 40 , 41 . Understanding the regulation of dynamic circRNA expression highlights the importance of simultaneously quantifying both circular and linear RNA types. Using the AQUARIUM-HB pipeline, we generated a density plot of RNA expression of both circular and linear transcripts to illustrate the expression distribution of circRNAs and linear mRNAs (Fig. 3 A). The overall expression of circRNAs and linear RNAs follows a normal distribution, indicating that their expression levels are well-regulated and may reflect typical biological variability across samples. The expression levels of circRNAs much smaller than linear mRNA transcripts in both non-critical and critical COVID-19 patients (Fig. 3 B). The circRNAs accounted for 3.3% of total RNA expression in non-critical COVID-19 patients, slightly higher than that in critical patients (3.1%). Furthermore, a significant positive correlation was observed between the expression changes of circRNAs and their corresponding linear RNAs at the gene level ( R = 0.26, P-value < 2.2*10 − 16 ) (Figs. 3 C). This suggests the expressional change of circRNAs are largely determined by the transcriptional regulation of its host gene. For example, expressions of some circRNAs are up-regulated (Figs. 3 C, blue dots) due to the up-regulation of their corresponding parent gene. However, some dysregulated circRNAs are splicing-derived circRNAs, with their expression levels up-regulated (Figs. 3 C, red dots) or down-regulated (Figs. 3 C, green dots) independently to the transcriptional regulation of their parent genes. The functional enrichment patterns of differentially expressed circRNAs also differ from those of differentially expressed linear RNAs, indicating distinct biological roles of circRNAs in disease severity of COVID-19 (Fig. 3 D). Construction of a reference set of human blood full-length circRNAs Although long-read sequencing technology can directly sequence the full-length circRNAs, it still has limitations in application, such as high costs. On the other hand, the whole genome characterization of circRNAs can be achieved through the HT RNA-seq technology with various RNA enrichment strategies. Given the importance of a reliable reference set, AQUARIUM-HB dynamically integrates circRNAs from both FLcircAS 21 and IsoCirc 19 databases, along with newly identified circRNAs from HT RNA-seq datasets. Initially, 275,165 blood full-length circRNAs from the FLcircAS database 21 and 31,998 blood full-length circRNAs from the IsoCirc database 19 were integrated as the reference set of human blood full-length circRNAs ( Supplementary Table S1 ). This initial reference set was composed of 276,179 blood-derived full-length circRNAs from existing circRNA databases in total. Our pipeline identified 66,837 full-length circRNAs from the HT RNA-seq data of 69 human blood samples of COVID-19 patients. This set of full-length circRNAs in human blood samples were then used to update the initial reference set (Fig. 1 E). Among them, 11,472 full-length circRNAs were already deposited in the FLcircAS or IsoCirc databases, while the remaining 55,365 circRNAs were newly identified in human blood samples (Fig. 2 K). Finally, we obtained an updated human full-length blood circRNA reference set consisting of 331,544 full-length circRNAs. This reference set provides a robust foundation for circRNA identification and quantification in blood, supporting advanced research in biomarker discovery and improving diagnostic accuracy. Conclusion This study introduces AQUARIUM-HB , a comprehensive pipeline capable of identifying and quantifying circRNAs at the transcript level from HT RNA-seq data of human peripheral blood samples. AQUARIUM-HB integrates a reference set of full-length blood-derived circRNAs, combining established circRNA databases with new findings from HT RNA-seq datasets to ensure precise circRNA identification and quantification. By applying AQUARIUM-HB to a dataset of COVID-19 patients, we demonstrated its potential in uncovering the unique expression dynamics of circRNAs in response to diseases. The pipeline’s ability to capture and quantify full-length circRNA structures not only enhances the accuracy of circRNA profiling in blood but also facilitates the exploration of circRNAs as biomarkers in liquid biopsies. Declarations Acknowledgements This work was supported by the National Key R&D Program of China (Nos. 2022YFC3500200, 2022YFC3500202), National Natural Science Foundation of China (Nos. 81930117, 82430122), and Jiangsu Provincial Social Development and Clinical Frontier Technology Project (BE2023790). Contributions W.G. conceived the research. S.Y., X.B. and L.L. contributed to data analysis. S.Y. and W.G. wrote the manuscript. All authors read, revised and approved the final version of the manuscript. Declaration of competing interest The authors declare no competing interests. Code availability AQUARIUM-HB is publicly available and can be accessed on GitHub: https://github.com/NJUCMbioinfo/AQUARIUM-HB. References De Rubis, G., Rajeev Krishnan, S. & Bebawy, M. Liquid biopsies in cancer diagnosis, monitoring, and prognosis. Trends Pharmacol. Sci. 40 , 172–186 (2019). Wang, Y., Liu, J., Ma, J., Sun, T. & Ming, L. Exosomal circRNAs: biogenesis, effect and application in human diseases. Mol. Cancer 18 , 1–10 (2019). Zaporozhchenko, I. A., Ponomaryova, A. A., Rykova, E. Y. & Laktionov, P. P. The potential of circulating cell-free RNA as a cancer biomarker: challenges and opportunities. Expert Rev. Mol. Diagn. 18 , 133–145 (2018). Wen, G. & Gu, W. Circular RNAs in peripheral blood mononuclear cells are more stable than linear RNAs upon sample processing delay. J. Cell. Mol. Med. 26 , 5021–5032 (2022). Li, X., Yang, L. & Chen, L. L. The biogenesis, functions, and challenges of circular RNAs. Mol. Cell 71 , 428–442 (2018). Cao, L., Huang, C., Zhou, D. C., Hu, Y. & Zhao, G. Proteogenomic characterization of pancreatic ductal adenocarcinoma. Cell 184 , 5031-5052.e26 (2021). Wen, G., Zhou, T. & Gu, W. The potential of using blood circular RNA as liquid biopsy biomarker for human diseases. Protein Cell 12 , 911–946 (2021). Szabo, L. et al. Statistically based splicing detection reveals neural enrichment and tissue-specific induction of circular RNA during human fetal development. Genome Biol. 17 , 263 (2016). Vromman, M. et al. Large-scale benchmarking of circRNA detection tools reveals large differences in sensitivity but not in precision. Nat. Methods 20 , 1159–1169 (2023). Gao, Y., Zhang, J. & Zhao, F. Circular RNA identification based on multiple seed matching. Brief. Bioinform. 19 , 803–810 (2018). Zhang, J., Chen, S., Yang, J. & Zhao, F. Accurate quantification of circular RNAs identifies extensive circular isoform switching events. Nat. Commun. 11 , 90 (2020). Ma, X. K. et al. CIRCexplorer3: a CLEAR pipeline for direct comparison of circular and linear RNA expression. Genomics Proteomics Bioinformatics 17 , 511–521 (2019). Ma, X. K., Zhai, S. N. & Yang, L. Approaches and challenges in genome-wide circular RNA identification and quantification. Trends Genet. 39 , 897–907 (2023). Zheng, Y., Ji, P., Chen, S., Hou, L. & Zhao, F. Reconstruction of full-length circular RNAs enables isoform-level quantification. Genome Med. 11 , 1–20 (2019). Li, M. et al. Quantifying circular RNA expression from RNA-seq data using model-based framework. Bioinformatics 33 , 2131–2139 (2017). Wen, G. et al. AQUARIUM: Accurate quantification of circular isoforms using model-based strategy. Bioinformatics 37 , 4879–4881 (2021). Liu, Z. et al. circFL-seq reveals full-length circular RNAs with rolling circular reverse transcription and nanopore sequencing. eLife 10 , e69457 (2021). Rahimi, K., Venø, M. T., Dupont, D. M. & Kjems, J. Nanopore sequencing of brain-derived full-length circRNAs reveals circRNA-specific exon usage, intron retention and microexons. Nat. Commun. 12 , 4825 (2021). Xin, R. et al. isoCirc catalogs full-length circular RNA isoforms in human transcriptomes. Nat. Commun. 12 , 266 (2021). Zhang, J. et al. Comprehensive profiling of circular RNAs with nanopore sequencing and CIRI-long. Nat. Biotechnol. 39 , 836–845 (2021). Chiang, T. W. et al. FL-circAS: an integrative resource and analysis for full-length sequences and alternative splicing of circular RNAs with nanopore sequencing. Nucleic Acids Res. 52 , D115–D123 (2024). Wu, W., Ji, P. & Zhao, F. CircAtlas: an integrated resource of one million highly accurate circular RNAs from 1070 vertebrate transcriptomes. Genome Biol. 21 , 101 (2020). Wu, W., Zhao, F. & Zhang, J. circAtlas 3.0: A gateway to 3 million curated vertebrate circular RNAs based on a standardized nomenclature scheme. Nucleic Acids Res. 52 , D52–D60 (2024). Yu, K. H. O. et al. Quantifying full-length circular RNAs in cancer. Genome Res. 31 , 2340–2353 (2021). Hossain, M. T. et al. Reconstruction of full-length circRNA sequences using chimeric alignment information. Int. J. Mol. Sci. 23 , 6776 (2022). Stefanov, S. R. & Meyer, I. M. CYCLeR—a novel tool for the full isoform assembly and quantification of circRNAs. Nucleic Acids Res. 51 , e10–e10 (2023). Zhong, Y. et al. Systematic identification and characterization of exon–intron circRNAs. Genome Res. 34 , 376–393 (2024). Chen, L. L. et al. A guide to naming eukaryotic circular RNAs. Nat. Cell Biol. 25 , 1–5 (2023). Huang, W. et al. TransCirc: An interactive database for translatable circular RNAs based on multi-omics evidence. Nucleic Acids Res. 49 , D236–D242 (2021). Glažar, P., Papavasileiou, P. & Rajewsky, N. circBase: A database for circular RNAs. RNA 20 , 1666–1670 (2014). Zou, D. et al. PltDB: A blood platelets-based gene expression database for disease investigation. Bioinformatics 38 , 3143–3145 (2022). Povey, S. et al. The HUGO Gene Nomenclature Committee (HGNC). Hum. Genet. 109 , 678–680 (2001). Love, M., Anders, S. & Huber, W. Differential analysis of count data–the DESeq2 package. Genome Biol 15 , 10–1186 (2014). Ashburner, M. et al. Gene ontology: Tool for the unification of biology. Nat. Genet. 25 , 25–29 (2000). Kanehisa, M. KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 28 , 27–30 (2000). Gillespie, M. et al. The reactome pathway knowledgebase 2022. Nucleic Acids Res. 50 , D687–D692 (2022). Subramanian, A. et al. Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl. Acad. Sci. 102 , 15545–15550 (2005). Barrett, T. et al. NCBI GEO: archive for functional genomics data sets—update. Nucleic Acids Res. 41 , D991–D995 (2012). Carapito, R. et al. Identification of driver genes for critical forms of COVID-19 in a deeply phenotyped young patient cohort. Sci. Transl. Med. 14 , eabj7521 (2022). Rybak-Wolf, A. et al. Circular RNAs in the mammalian brain are highly abundant, conserved, and dynamically expressed. Mol. Cell 58 , 870–885 (2015). Salzman, J., Chen, R. E., Olsen, M. N., Wang, P. L. & Brown, P. O. Cell-type specific features of circular RNA expression. PLoS Genet. 9 , e1003777 (2013). Table Table 1. The terms used in circRNA annotation module of AQUARIUM-HB pipeline. Annotation term Description BSJ ID BSJ position of a circRNA uniform ID standard nomenclature of circRNA (Chen et al 28 ) alias ID aliases ID in several circRNA databases 21–23,29–31 circRNA type type of a circRNA (exonic, intronic or intergenic) host gene host gene ID(s) of an exonic or intronic circRNA exon count the number of exons in a circRNA sequence length the splicing length of a circRNA reconstruction source reconstruction source of a circRNA confidence level the circRNA confidence level according to the strategies used in identification and reconstruction Additional Declarations The authors declare no competing interests. Supplementary Files SupplementaryTable.docx Supplementary Files Supplementary Table S1. The number of full-length circRNAs from different tissues in FLcircAS 1 and IsoCirc 2 databases. Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-5657706","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":391151203,"identity":"b99ad6ce-8a51-4d72-80b2-ccc8aa205a2e","order_by":0,"name":"Shaoxun Yuan","email":"","orcid":"","institution":"","correspondingAuthor":false,"prefix":"","firstName":"Shaoxun","middleName":"","lastName":"Yuan","suffix":""},{"id":391151505,"identity":"6fd0f00a-1c48-4706-ad02-5956a3417b09","order_by":1,"name":"Xue Bai","email":"","orcid":"","institution":"","correspondingAuthor":false,"prefix":"","firstName":"Xue","middleName":"","lastName":"Bai","suffix":""},{"id":391151506,"identity":"f8ec781e-94b3-403c-9c28-964dfb0934cd","order_by":2,"name":"Linwei Li","email":"","orcid":"","institution":"","correspondingAuthor":false,"prefix":"","firstName":"Linwei","middleName":"","lastName":"Li","suffix":""},{"id":391151507,"identity":"b5124e62-d7d0-4fb2-874f-b2688eee3480","order_by":3,"name":"Wanjun Gu","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAA7UlEQVRIiWNgGAWjYJACAxDBz8CQAOEeIFaLZAMpWiD64CoJaTE4fvZAMe+O2sTNtxueSfzMYZDju5HA+LkAn5YzeQnGvGeOJ267cyBNsncbg7HkjQRm6Rl4tJgdyDEw5m07lrjtRkKaNOM2hsQNNxLYmHnwaTn/BqJl8wyIlnrCWm6AbalJ3CAB0ZJgQEiL/Y03BoZz2w4Yz7iRkGzZu03CcOaZh83S+LRI9ueYGbxtq5Ptn5GTeOPnNht5vuPJBz/j0wIEbMCoPAykeRKAhAQQMzbg18DAwPyAgaEOSLMfIKRyFIyCUTAKRigAABN6T/BSk4mXAAAAAElFTkSuQmCC","orcid":"","institution":"","correspondingAuthor":true,"prefix":"","firstName":"Wanjun","middleName":"","lastName":"Gu","suffix":""}],"badges":[],"createdAt":"2024-12-17 03:03:36","currentVersionCode":1,"declarations":{"humanSubjects":false,"vertebrateSubjects":false,"conflictsOfInterestStatement":false,"humanSubjectEthicalGuidelines":false,"humanSubjectConsent":false,"humanSubjectClinicalTrial":false,"humanSubjectCaseReport":false,"vertebrateSubjectEthicalGuidelines":false},"doi":"10.21203/rs.3.rs-5657706/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-5657706/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":71757280,"identity":"8aa36b60-dcd9-4fc5-b5c0-9ec266bb3fe0","added_by":"auto","created_at":"2024-12-18 10:19:53","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":226318,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eThe pipeline of \u003c/strong\u003e\u003cem\u003e\u003cstrong\u003eAQUARIUM-HB\u003c/strong\u003e\u003c/em\u003e\u003cstrong\u003e. (A) Identification of blood circRNA. \u003c/strong\u003eFor circRNAs that are identified as the “\u003cem\u003efull\u003c/em\u003e” transcripts by \u003cem\u003eCIRI-full\u003c/em\u003e\u003csup\u003e14\u003c/sup\u003e, the complete sequences from the \u003cem\u003eCIRI-full\u003c/em\u003e\u003csup\u003e14\u003c/sup\u003e output were used in the subsequent analysis. For circRNAs that were identified as the “\u003cem\u003ebreak\u003c/em\u003e” or “\u003cem\u003eBSJ only\u003c/em\u003e” transcripts by \u003cem\u003eCIRI-full\u003c/em\u003e\u003csup\u003e14\u003c/sup\u003e, blood full-length circRNAs from the \u003cem\u003eFLcircAS\u003c/em\u003e\u003csup\u003e21\u003c/sup\u003e and \u003cem\u003eIsoCirc\u003c/em\u003e\u003csup\u003e19\u003c/sup\u003e databases, along with “\u003cem\u003efull\u003c/em\u003e” transcripts obtained from blood samples, were prioritized for reconstruction. Alternative, full-length circRNAs from non-blood in the \u003cem\u003eFLcircAS\u003c/em\u003e\u003csup\u003e21\u003c/sup\u003e and \u003cem\u003eIsoCirc\u003c/em\u003e\u003csup\u003e19\u003c/sup\u003e databases were utilized. Gene annotation of the human genome was least preferred for reconstruction. \u003cstrong\u003e(B) Annotation of blood circRNAs.\u003c/strong\u003e The annotation consists of two parts: \u003cem\u003eID\u003c/em\u003e and \u003cem\u003echaracter\u003c/em\u003e. The ID includes the BSJ position of a circRNA, the standardized circRNA nomenclature\u003csup\u003e28\u003c/sup\u003e, and the alias names corresponding to various circRNA databases\u003csup\u003e21,23,29,30\u003c/sup\u003e. The character includes various attributes of blood circRNAs, including the types of circRNAs (exon, intron, and intergenic), the corresponding host genes, exon details associated with the circRNAs, their presence in \u003cem\u003eFLcircAS\u003c/em\u003e\u003csup\u003e21\u003c/sup\u003e or \u003cem\u003eIsoCirc\u003c/em\u003e\u003csup\u003e19\u003c/sup\u003e databases, and the confidence levels regarding the identification of these circRNAs. (\u003cstrong\u003eC)\u003c/strong\u003e \u003cstrong\u003eQuantification of blood circRNAs. \u003c/strong\u003eThe expressions of both linear and circular transcripts were quantified simultaneously at the transcript per million (TPM) level using \u003cem\u003esailfish-cir\u003c/em\u003e\u003csup\u003e15\u003c/sup\u003e.\u003cstrong\u003e (D) Downstream analysis of blood circRNAs. \u003c/strong\u003eExpression analysis involves differential analysis of circRNAs across groups and comparison with the expression of linear RNAs. Enrichment analysis includes functional analysis of circRNAs using \u003cem\u003eGO\u003c/em\u003e\u003csup\u003e34\u003c/sup\u003e, \u003cem\u003eKEGG\u003c/em\u003e\u003csup\u003e35\u003c/sup\u003e, \u003cem\u003eReactome\u003c/em\u003e\u003csup\u003e36\u003c/sup\u003e, and \u003cem\u003eGSEA\u003c/em\u003e\u003csup\u003e37\u003c/sup\u003e. \u003cstrong\u003e(E) Reference set update of full-length blood circRNAs.\u003c/strong\u003e The reference set of full-length blood circRNAs includes two components. The first component integrates the full-length blood circRNAs deposited in the \u003cem\u003eFLcircAS\u003c/em\u003e\u003csup\u003e21\u003c/sup\u003e and \u003cem\u003eIsoCirc\u003c/em\u003e\u003csup\u003e19\u003c/sup\u003e databases. The second component includes the full-length circRNAs identified from public HT RNA-seq datasets of human blood samples. The second component is dynamic, allowing continuous expansion and updates as new HT RNA-seq datasets become available.\u003c/p\u003e","description":"","filename":"1.png","url":"https://assets-eu.researchsquare.com/files/rs-5657706/v1/e6eb36f378d760dd4b54d4e3.png"},{"id":71758446,"identity":"14e21de7-b8dc-46c4-a827-3b83f0c677b7","added_by":"auto","created_at":"2024-12-18 10:27:53","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":122296,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eCharacteristics of circRNAs in blood. (A) \u003c/strong\u003eProportional distribution of blood and non-blood circRNAs in the \u003cem\u003eFLcircAS\u003c/em\u003e\u003csup\u003e21\u003c/sup\u003e and \u003cem\u003eIsoCirc\u003c/em\u003e\u003csup\u003e19\u003c/sup\u003e databases. \u003cstrong\u003e(B) \u003c/strong\u003eDistribution of “\u003cem\u003efull\u003c/em\u003e”, “\u003cem\u003ebreak\u003c/em\u003e” and “\u003cem\u003eBSJ only\u003c/em\u003e” circRNAs in select dataset. \u003cstrong\u003e(C) \u003c/strong\u003ePriority distribution during the reconstruction pipeline of “\u003cem\u003ebreak\u003c/em\u003e” and “\u003cem\u003eBSJ only\u003c/em\u003e” circRNAs. \u003cem\u003ePriority-1\u003c/em\u003e indicates incomplete circRNAs are reconstructed using blood full-length circRNAs in the \u003cem\u003eFLcircAS\u003c/em\u003e\u003csup\u003e21\u003c/sup\u003e or \u003cem\u003eIsoCirc\u003c/em\u003e\u003csup\u003e19\u003c/sup\u003e databases, along with those full-length transcripts identified from blood RNA-seq datasets. \u003cem\u003ePriority-2\u003c/em\u003e indicates incomplete circRNAs are reconstructed using non-blood tissues in the \u003cem\u003eFLcircAS\u003c/em\u003e\u003csup\u003e21\u003c/sup\u003e or \u003cem\u003eIsoCirc\u003c/em\u003e\u003csup\u003e19\u003c/sup\u003e databases. \u003cem\u003ePriority-3\u003c/em\u003e indicates using gene annotation of human genome for reconstruction. \u003cstrong\u003e(D) \u003c/strong\u003eDistribution of identified circRNA types according to positions of its two ends on chromosome. \u003cstrong\u003e(E)\u003c/strong\u003e Histogram of the exon count distribution for identified circRNAs. \u003cstrong\u003e(F) \u003c/strong\u003eHistogram of the isoform length distribution of identified circRNAs. \u003cstrong\u003e(G)\u003c/strong\u003e Detected sample of identified circRNAs in the select dataset. \u003cstrong\u003e(H)\u003c/strong\u003e Proportional distribution of confidence levels for identified circRNAs. \u003cstrong\u003e(I) \u003c/strong\u003eHistogram of gene-circular isoform count distribution. \u003cstrong\u003e(J) \u003c/strong\u003eKEGG pathway enrichment results for genes with highly variable splicing levels (corresponding to more than 50 transcripts).\u003cstrong\u003e (K) \u003c/strong\u003eThe reference set of full-length blood circRNAs includes two parts: full-length blood circRNAs in the \u003cem\u003eFLcircAS\u003c/em\u003e\u003csup\u003e21\u003c/sup\u003e and \u003cem\u003eIsoCirc\u003c/em\u003e\u003csup\u003e19\u003c/sup\u003e databases, and full-length circRNAs identified from the selected HT RNA-seq dataset.\u003c/p\u003e","description":"","filename":"2.png","url":"https://assets-eu.researchsquare.com/files/rs-5657706/v1/3078f3c2b217218351436c84.png"},{"id":71758448,"identity":"1562bdd0-2bce-4b05-b694-9826889e4040","added_by":"auto","created_at":"2024-12-18 10:27:53","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":136442,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eExpression profiling analysis of circRNA\u003c/strong\u003e \u003cstrong\u003eand linear RNA. (A) \u003c/strong\u003eDensity\u0026nbsp;plot of the expression distributions of both circular and linear transcripts. \u003cstrong\u003e(B) \u003c/strong\u003eProportional expression of circRNAs and linear RNAs in the non-critical and critical COVID-19 patients. \u003cstrong\u003e(C) \u003c/strong\u003eThe correlation of log\u003csub\u003e2\u003c/sub\u003e(fold change) of circRNAs versus log\u003csub\u003e2\u003c/sub\u003e(fold change) of corresponding linear RNA expression. Blue dots represent transcription-derived circRNAs that were up-regulated because of consistent up-regulation of their parent genes. Red and green dots represent splice-derived circRNAs that were up-regulated or down-regulated whose parent genes showed no significant expression changes. Grey dots represent circRNAs that had no differential expression. \u003cstrong\u003e(D) \u003c/strong\u003eThe top 5 \u003cem\u003eKEGG\u003c/em\u003e functional enrichment plots of differentially expressed circRNAs and linear RNAs.\u003c/p\u003e","description":"","filename":"3.png","url":"https://assets-eu.researchsquare.com/files/rs-5657706/v1/71c0612507808bdb32b67a9f.png"},{"id":71758789,"identity":"6f9afd3e-6e15-4dfa-abb8-0d92ce26d7f2","added_by":"auto","created_at":"2024-12-18 10:35:57","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":1141439,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-5657706/v1/ea924c25-e1f1-4819-87cd-73ba221555e1.pdf"},{"id":71757277,"identity":"62326301-876c-46b7-a413-84c8d668777b","added_by":"auto","created_at":"2024-12-18 10:19:53","extension":"docx","order_by":1,"title":"","display":"","copyAsset":false,"role":"supplement","size":24028,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eSupplementary Files\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eSupplementary Table S1. \u003c/strong\u003eThe number of full-length circRNAs from different tissues in \u003cem\u003eFLcircAS\u003c/em\u003e\u003csup\u003e1\u003c/sup\u003e and \u003cem\u003eIsoCirc\u003c/em\u003e\u003csup\u003e2\u003c/sup\u003e databases.\u003c/p\u003e","description":"","filename":"SupplementaryTable.docx","url":"https://assets-eu.researchsquare.com/files/rs-5657706/v1/860fb3b419156ba581755295.docx"}],"financialInterests":"The authors declare no competing interests.","formattedTitle":"\u003cp\u003e\u003cstrong\u003eAQUARIUM_HB: a bioinformatics pipeline for human blood circular RNA analysis\u003c/strong\u003e\u003c/p\u003e","fulltext":[{"header":"Highlights","content":"\u003col\u003e\n \u003cli\u003e\u003cem\u003eAQUARIUM-HB\u003c/em\u003e identifies and quantifies circRNAs at the transcript level from high-throughput RNA sequencing (HT RNA-seq) data in human peripheral blood samples.\u003c/li\u003e\n \u003cli\u003e\u003cem\u003eAQUARIUM-HB\u003c/em\u003e dynamically constructs a reference set of full-length blood-derived circRNAs, thereby enhancing the accuracy of circRNA identification and quantification.\u003c/li\u003e\n \u003cli\u003e\u003cem\u003eAQUARIUM-HB\u0026nbsp;\u003c/em\u003ecan be applied to discover circRNA biomarkers from blood HT RNA-seq data for human diseases.\u003c/li\u003e\n\u003c/ol\u003e"},{"header":"Introduction","content":"\u003cp\u003eLiquid biopsies, which use body fluids such as blood or urine, provide non-invasive, real-time approach for disease monitoring compared to traditional tissue biopsies\u003csup\u003e\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e\u003c/sup\u003e. Peripheral blood is particularly favored for its ease of collection, minimal invasiveness and comprehensive information content. Among the numerous biomarkers found in blood, including circulating tumor cells and extracellular vesicles, RNA-based molecular markers have gained significant attention due to their dynamic expression patterns and close association with disease states\u003csup\u003e\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e,\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e\u003c/sup\u003e. Circular RNAs (CircRNAs), in particular, stand out because of their higher stability\u003csup\u003e\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e\u003c/sup\u003e and specificity\u003csup\u003e\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e\u003c/sup\u003e compared to traditional linear RNAs. Recent studies have demonstrated crucial roles of blood circRNAs in intercellular communication and disease progression of severe diseases such as cancers\u003csup\u003e\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e\u003c/sup\u003e, suggesting their promising application in liquid biopsies\u003csup\u003e\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e\u003c/sup\u003e.\u003c/p\u003e \u003cp\u003eThe identification and quantification of circRNAs from high-throughput RNA sequencing (HT RNA-seq) of rRNA-depleted blood samples allow for a more profound understanding of circRNA expression dynamics. This understanding enhances their potential utility in diagnosing and prognosing complex diseases. Although various computational tools with differing performance levels have been developed for circRNA quantification from HT RNA-seq data\u003csup\u003e\u003cspan additionalcitationids=\"CR9 CR10 CR11\" citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e\u003c/sup\u003e, several challenges remain\u003csup\u003e\u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e\u003c/sup\u003e. For example, most existing tools quantify circRNA expression by normalizing read counts across back-splice junction (BSJ) sites\u003csup\u003e\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e,\u003cspan additionalcitationids=\"CR11\" citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e,\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e\u003c/sup\u003e. However, this approach may introduce biased estimation of circRNA expression due to the uneven coverage of sequencing reads and the generally lower expression levels of circular transcripts. In our previous studies, we proposed a pseudo-linear transformation for circular transcripts\u003csup\u003e\u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e15\u003c/span\u003e\u003c/sup\u003e and implemented a model-based strategy to quantify circRNAs using the full-length RNA structure of these transcripts\u003csup\u003e\u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e\u003c/sup\u003e. This computational framework has been shown to accurately and simultaneously quantify the expression levels of both circular and linear RNA transcripts from rRNA-depleted HT RNA-Seq data\u003csup\u003e\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e,\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e\u003c/sup\u003e. In particular, the full-length structure of circRNAs in peripheral blood samples can enhance the accuracy of circRNA quantification\u003csup\u003e\u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e\u003c/sup\u003e. Therefore, establishing a reference set of human blood full-length circRNAs is crucial for accurate circRNA quantification from RNA-seq datasets.\u003c/p\u003e \u003cp\u003eFull-length circRNAs can be obtained from two sources. First, Oxford Nanopore Technology (ONT) has been successfully used to identify full-length circRNAs from tissue samples and cell lines of several model organisms, including \u003cem\u003ecircFL-seq\u003c/em\u003e\u003csup\u003e\u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e17\u003c/span\u003e\u003c/sup\u003e, \u003cem\u003ecircNick-LRS\u003c/em\u003e\u003csup\u003e\u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e18\u003c/span\u003e\u003c/sup\u003e, \u003cem\u003eIsoCirc\u003c/em\u003e\u003csup\u003e\u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e19\u003c/span\u003e\u003c/sup\u003e and \u003cem\u003eCIRI-long\u003c/em\u003e\u003csup\u003e\u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e20\u003c/span\u003e\u003c/sup\u003e. For instance, Xin \u003cem\u003eet al.\u003c/em\u003e identified 107,147 full-length circRNA isoforms across 12 human tissues and one human cell line, which are publicly accessible through the UCSC Genome Browser\u003csup\u003e\u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e19\u003c/span\u003e\u003c/sup\u003e. Additionally, two public circRNA databases, \u003cem\u003eFLcircAS\u003c/em\u003e\u003csup\u003e\u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e21\u003c/span\u003e\u003c/sup\u003e and \u003cem\u003ecircAtlas\u003c/em\u003e\u003csup\u003e\u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e22\u003c/span\u003e,\u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e23\u003c/span\u003e\u003c/sup\u003e, have compiled over one million human full-length circRNA isoforms from ONT datasets. Second, several computational tools have been developed to reconstruct the internal structure of circular transcripts from HT RNA-seq data\u003csup\u003e\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e\u003c/sup\u003e. For example, \u003cem\u003eCIRI-full\u003c/em\u003e utilizes BSJ sites and reverse overlap features of short sequencing reads to assemble full-length circRNA sequences\u003csup\u003e\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e\u003c/sup\u003e. Other tools, such as \u003cem\u003epsirc\u003c/em\u003e\u003csup\u003e\u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e24\u003c/span\u003e\u003c/sup\u003e, c\u003cem\u003eircRNA-full\u003c/em\u003e\u003csup\u003e\u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e25\u003c/span\u003e\u003c/sup\u003e, \u003cem\u003eCYCLER\u003c/em\u003e\u003csup\u003e\u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e26\u003c/span\u003e\u003c/sup\u003e and \u003cem\u003eFEICP\u003c/em\u003e\u003csup\u003e\u003cspan citationid=\"CR27\" class=\"CitationRef\"\u003e27\u003c/span\u003e\u003c/sup\u003e, leverage chimeric alignment information from short sequencing reads to reconstruct full-length circular transcripts. Given the wealth of publicly available HT RNA-seq datasets, this second source remains the primary means of assessing genome-wide full-length circRNA profiles.\u003c/p\u003e \u003cp\u003eTo overcome the need for reliable full-length reconstruction of human blood circRNAs while simultaneously quantifying and analyzing both linear and circular transcripts from HT RNA-seq data, we developed \u003cem\u003eAQUARIUM-HB\u003c/em\u003e. This pipeline identifies, annotates, quantifies, and performs expression analysis of human blood circRNAs from HT RNA-seq datasets (Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e). Using this pipeline, the reference set of full-length blood circRNAs can be dynamically expanded by incorporating additional human blood HT RNA-seq datasets. By incorporating known full-length circRNAs from public databases and dynamically expanding its reference set with new RNA-seq datasets, \u003cem\u003eAQUARIUM-HB\u003c/em\u003e aims to improve the identification and quantification of blood circRNAs, thus advancing their application in biomarker discovery and contributing to improved diagnostic and therapeutic strategies.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e"},{"header":"Methods","content":"\u003cp\u003e\u003cstrong\u003e\u003cem\u003eIdentification of human blood circRNAs from HT RNA-seq data\u003c/em\u003e\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eIn \u003cem\u003eAQUARIUM\u003c/em\u003e\u003csup\u003e16\u003c/sup\u003e, we incorporated the reconstructed circRNAs from \u003cem\u003eCIRI-full\u003c/em\u003e\u003csup\u003e14\u003c/sup\u003e to improve the quantification accuracy of circRNA expressions. Due to the short length of sequencing reads or insufficient sequencing depth of HT RNA-seq, \u003cem\u003eCIRI-full\u003c/em\u003e may not completely characterize the full-length sequence of some circRNAs. In this case, \u003cem\u003eAQUARIUM\u003c/em\u003e utilizes the gene annotation of human genome to facilitate the internal structure reconstruction of circRNAs\u003csup\u003e16\u003c/sup\u003e. However, this strategy does not account the internal structures in large number of known full-length circRNAs derived from ONT or HT RNA-seq datasets. To increase the accuracy of circRNA quantification, we integrated the full-length circRNAs in \u003cem\u003eFLcircAS\u003c/em\u003e\u003csup\u003e21\u003c/sup\u003e or \u003cem\u003eIsoCirc\u003c/em\u003e\u003csup\u003e19\u003c/sup\u003e databases in the \u003cem\u003eAQUARIUM-HB\u003c/em\u003e pipeline, together with the full-length circRNAs obtained from \u003cem\u003eCIRI-full\u0026nbsp;\u003c/em\u003eidentification of HT RNA-seq datasets (\u003cstrong\u003eFigure 1A\u003c/strong\u003e). We reconstructed the full-length sequence of circRNAs identified by\u003cem\u003e\u0026nbsp;CIRI-full\u003c/em\u003e as follows. For circRNAs that are identified as the “\u003cem\u003efull\u003c/em\u003e” transcripts by\u003cem\u003e\u0026nbsp;CIRI-full\u003c/em\u003e, the complete sequences from the \u003cem\u003eCIRI-full\u003c/em\u003e output were used in subsequent analysis. For circRNAs that were identified as the “\u003cem\u003ebreak\u003c/em\u003e” or “\u003cem\u003eBSJ\u003c/em\u003e \u003cem\u003eonly\u003c/em\u003e” transcripts by \u003cem\u003eCIRI-full\u003c/em\u003e, the internal structures of human full-length circRNAs from different sources were used to reconstruct the full-length sequence of these incomplete circRNAs. First, blood full-length circRNAs in the \u003cem\u003eFLcircAS\u003c/em\u003e\u003csup\u003e21\u003c/sup\u003e or \u003cem\u003eIsoCirc\u003c/em\u003e\u003csup\u003e19\u003c/sup\u003e databases, along with those full-length transcripts identified from blood HT RNA-seq datasets, were prioritized for internal structure reconstruction (\u003cem\u003ePriority-1\u003c/em\u003e). If this is insufficient, circRNAs from non-blood tissues in the \u003cem\u003eFLcircAS\u003c/em\u003e\u003csup\u003e21\u003c/sup\u003e or \u003cem\u003eIsoCirc\u003c/em\u003e\u003csup\u003e19\u003c/sup\u003e databases were utilized (\u003cem\u003ePriority-2\u003c/em\u003e). Lastly, if required, gene annotations are employed to complete circRNA structures, ensuring robust and comprehensive identification (\u003cem\u003ePriority-3\u003c/em\u003e).\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e\u003cem\u003eAnnotation of human blood circRNAs\u003c/em\u003e\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe identified human blood circRNAs were annotated using the terms listed in \u003cstrong\u003eTable 1\u0026nbsp;\u003c/strong\u003e(\u003cstrong\u003eFigure 1B\u003c/strong\u003e). First, each circRNA is assigned a \u003cstrong\u003e\u003cem\u003euniform ID\u003c/em\u003e\u003c/strong\u003e with the terminology proposed by Chen \u003cem\u003eet al\u003c/em\u003e\u003csup\u003e28\u003c/sup\u003e, ensuring its consistency across different databases. For example, \u003cem\u003ecircUBXN4(2,3,L4,5)\u003c/em\u003e, which \u003cem\u003e2,3,4,5\u003c/em\u003e indicate exons 2, 3, 4 and 5, while \u003cem\u003eL\u003c/em\u003e before exon 4 indicate 5' alternative splicing of exon 4. To connect with the knowledge deposited in other circRNA databases, the \u003cstrong\u003e\u003cem\u003ealiases ID\u003c/em\u003e\u003c/strong\u003e of each circRNA in several existing circRNA databases, including \u003cem\u003eFLcircAS\u003c/em\u003e\u003csup\u003e21\u003c/sup\u003e, \u003cem\u003eTransCirc\u003c/em\u003e\u003cem\u003e\u003csup\u003e29\u003c/sup\u003e\u003c/em\u003e, \u003cem\u003ecircAtlas\u003c/em\u003e\u003csup\u003e22,23\u003c/sup\u003e, \u003cem\u003ecircBase\u003c/em\u003e\u003csup\u003e30\u003c/sup\u003e, and \u003cem\u003ePltDB\u003c/em\u003e\u003csup\u003e31\u003c/sup\u003e, were annotated as well. Next, the \u003cstrong\u003e\u003cem\u003ereconstruction\u003c/em\u003e\u003c/strong\u003e \u003cstrong\u003e\u003cem\u003esource\u003c/em\u003e\u003c/strong\u003e of each circRNA is documented, indicating whether it is reconstructed by \u003cem\u003eCIRI-full\u003c/em\u003e from RNA-seq data, or it is complemented by full-length circRNAs in \u003cem\u003eFLcircAS\u0026nbsp;\u003c/em\u003eand/or \u003cem\u003eIsoCirc\u003c/em\u003e databases. Finally, circRNAs are classified by \u003cstrong\u003e\u003cem\u003econfidence level\u003c/em\u003e\u003c/strong\u003e based on their reconstruction method and detection frequency. The \u003cem\u003eLevel-1\u0026nbsp;\u003c/em\u003ecircRNAs should meet two criteria. First, they should have their full-length sequences reconstructed by \u003cem\u003eCIRI-full\u003c/em\u003e. Second, these circRNAs are detected in at least five samples by \u003cem\u003eCIRI-full\u003c/em\u003e, or they have been deposited in the \u003cem\u003eFLcircAS\u003c/em\u003e and/or \u003cem\u003eIsoCirc\u003c/em\u003e databases. The \u003cem\u003eLevel-2\u003c/em\u003e circRNAs should have their full-length sequences reconstructed by \u003cem\u003eCIRI-full\u003c/em\u003e, be detected in less than five samples, and not be deposited in the \u003cem\u003eFLcircAS\u003c/em\u003e and/or \u003cem\u003eIsoCirc\u003c/em\u003e databases. The \u003cem\u003eLevel-3\u003c/em\u003e circRNAs are those incomplete circRNAs that are reconstructed using full-length circRNAs from \u003cem\u003eFLcircAS\u003c/em\u003e and/or \u003cem\u003eIsoCirc\u003c/em\u003e and/or HT RNA-seq blood samples. The \u003cem\u003eLevel-4\u003c/em\u003e circRNAs are those incomplete circRNAs that are reconstructed using gene annotation of human genome.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e\u003cem\u003eExpression analysis of human blood circRNAs\u003c/em\u003e\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eFollowing reconstruction, we quantified the expression of both linear and circular transcripts simultaneously at the transcript level using \u003cem\u003eAQUARIUM\u003c/em\u003e\u003csup\u003e16\u003c/sup\u003e, a model-based framework that we have developed for circRNA quantification (\u003cstrong\u003eFigure 1C\u003c/strong\u003e).To minimize the interference from highly abundant transcripts in blood samples, we ignored the expression of both circular and linear RNA transcripts from 13 hemoglobin related gene and 171 ribosomal genes in \u003cem\u003eHGNC\u003c/em\u003e database\u003csup\u003e32\u003c/sup\u003e. Next, we kept only the transcripts from protein-coding genes and recalculated the TPM (Transcripts per Million) expression values for all circular and linear transcripts. For each circRNA, TPM values at the isoform, BSJ, and gene levels were aggregated, providing a comprehensive profile of circRNA expressions. Differential expression analyses across groups were performed using \u003cem\u003eDESeq2\u003c/em\u003e\u003csup\u003e33\u003c/sup\u003e, with subsequent gene set enrichment analyses in \u003cem\u003eGO\u003c/em\u003e\u003csup\u003e34\u003c/sup\u003e, \u003cem\u003eKEGG\u003c/em\u003e\u003csup\u003e35\u003c/sup\u003e, \u003cem\u003eReactome\u003c/em\u003e\u003csup\u003e36\u003c/sup\u003e, and \u003cem\u003eGSEA\u003c/em\u003e\u003csup\u003e37\u003c/sup\u003e to reveal underlying biological functions and pathways (\u003cstrong\u003eFigure 1D\u003c/strong\u003e).\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e\u003cem\u003eDynamic expansion of the reference set of human blood full-length circRNAs\u003c/em\u003e\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eA high-quality reference set of human blood full-length circRNAs can improve the accuracy of circRNA identification and quantification in peripheral blood samples. \u003cem\u003eAQUARIUM-HB\u003c/em\u003e is capable of constructing a reference set of human blood full-length circRNAs sourced from the ONT and HT RNA-seq datasets (\u003cstrong\u003eFigure 1E\u003c/strong\u003e). This reference set includes two components. The first is derived from full-length circRNAs in the \u003cem\u003eFLcircAS\u003c/em\u003e\u003csup\u003e21\u003c/sup\u003e and \u003cem\u003eIsoCirc\u003c/em\u003e\u003csup\u003e19\u003c/sup\u003e databases. The second component includes the full-length circRNAs identified from public HT RNA-seq datasets of human blood samples. Notably, the second component is dynamic, allowing continuous expansion and updates as new HT RNA-seq datasets become available. This iterative enhancement ensures both accuracy and comprehensiveness in circRNA identification and quantification from human blood samples, advancing our understanding of their roles in diseases.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e\u003cem\u003eData\u003c/em\u003e\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eTo exemplify our \u003cem\u003eAQUARIUM-HB\u003c/em\u003e pipeline, we downloaded a HT RNA-seq dataset from the GEO database\u003csup\u003e38\u003c/sup\u003e (accession number: GSE172114). This dataset includes 69 whole blood RNA samples from COVID-19 patients, comprising 46 critical and 23 non-critical patients at the time of hospitalization\u003csup\u003e39\u003c/sup\u003e. RNA-seq libraries were prepared using the \u003cem\u003eTruSeq\u003c/em\u003e Stranded Total RNA with \u003cem\u003eRibo-Zero\u003c/em\u003e Globin kit (Illumina) and sequenced on the \u003cem\u003eIllumina NovaSeq\u003c/em\u003e 6000 platform with S4 flow cells, generating 151-base pair paired-end reads.\u003c/p\u003e"},{"header":"Results","content":"\u003cdiv id=\"Sec9\" class=\"Section2\"\u003e \u003ch2\u003eIdentification of human blood circRNAs\u003c/h2\u003e \u003cp\u003eWe first applied the \u003cem\u003eAQUARIUM-HB\u003c/em\u003e pipeline to analyze HT RNA-seq data from whole blood samples of 69 COVID-19 patients\u003csup\u003e\u003cspan citationid=\"CR39\" class=\"CitationRef\"\u003e39\u003c/span\u003e\u003c/sup\u003e. Using databases \u003cem\u003eFLcircAS\u003c/em\u003e\u003csup\u003e\u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e21\u003c/span\u003e\u003c/sup\u003e and \u003cem\u003eIsoCirc\u003c/em\u003e\u003csup\u003e\u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e19\u003c/span\u003e\u003c/sup\u003e, we retrieved 275,165 and 31,998 blood-derived circRNAs, respectively. Among these, \u003cem\u003eFLcircAS\u003c/em\u003e contains 275,165 blood-derived circRNAs (14.8%), and \u003cem\u003eIsoCirc\u003c/em\u003e contains 31,998 blood-derived circRNAs (29.9%) (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003eA). Next, the internal structures of circRNAs from the HT RNA-seq data were reconstructed by \u003cem\u003eCIRI-full\u003c/em\u003e using the identified BSJ sites and overlapping sequences between paired-end reads. In this dataset, a total of 128,342 circRNAs were identified. Of these, the full-length sequences of 66,837 (52.1%) circRNAs were completely reconstructed, while the remaining circRNAs were partially reconstructed (47,102 circRNAs, 36.7%) or only had the BSJ site identified (14,403 circRNAs, 11.2%) (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003eB). Then, these partially reconstructed circRNAs or BSJ only circRNAs were supplementally extended using the pipeline's strategy (Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003eA). Among them, 34,031 circRNAs (55.3%) were supplemented using existing blood-derived full-length circRNAs in databases or from blood samples (\u003cem\u003ePriority-1\u003c/em\u003e, Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003eC). 6,164 circRNAs (10%) were completed by full-length circRNAs from non-blood tissues in \u003cem\u003eFLcircAS\u003c/em\u003e\u003csup\u003e\u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e21\u003c/span\u003e\u003c/sup\u003e or \u003cem\u003eIsoCirc\u003c/em\u003e\u003csup\u003e\u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e19\u003c/span\u003e\u003c/sup\u003e databases (\u003cem\u003ePriority-2\u003c/em\u003e, Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003eC). The remaining 21,310 circRNAs (34.6%) were supplemented using gene annotation of human genome (\u003cem\u003ePriority-3\u003c/em\u003e, Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003eC). These results indicate that the majority of partially reconstructed circRNAs could be effectively supplemented using known full-length circRNAs in blood, underscoring the importance of using blood full-length circRNA databases in reconstruction of circRNA internal structure. Additionally, the use of known full-length circRNAs in non-blood tissues and the genomic annotation can facilitate the pipeline\u0026rsquo;s ability in addressing the gaps in circRNA internal structure.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e\n\u003ch3\u003eAnnotation of human blood circRNAs\u003c/h3\u003e\n\u003cp\u003eNext, these 128,342 blood circRNAs were annotated by our \u003cem\u003eAQUARIUM-HB\u003c/em\u003e pipeline (Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003eB). Among them, a majority (117,489, 91.5%) are exonic circRNAs (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003eD) with five or five less exons (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003eE). In terms of transcript length, most circRNAs are shorter than 1,000 base pairs (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003eF). Regarding the confidence level, most circRNAs (76,571, 85.7%) were identified in less than five samples (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003eG), indicating potential sample specificity or low expression levels. For 52.1% circRNAs that are completely reconstructed by \u003cem\u003eCIRI-full\u003c/em\u003e, almost half (30,900, 46.2%) were identified in at least five samples or already deposited in \u003cem\u003eFLcircAS\u003c/em\u003e and/or \u003cem\u003eIsoCirc\u003c/em\u003e databases (\u003cem\u003eLevel-1\u003c/em\u003e, Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003eH). The remaining 35,937 \u003cem\u003eCIRI-full\u003c/em\u003e completely reconstructed blood circRNAs were newly identified in human blood (\u003cem\u003eLevel-2\u003c/em\u003e, Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003eH). For 47.9% circRNAs that are incompletely reconstructed by \u003cem\u003eCIRI-full\u003c/em\u003e, 40,195 (65.4%) were complementally supplemented by full-length circRNAs in \u003cem\u003eFLcircAS\u003c/em\u003e or \u003cem\u003eIsoCirc\u003c/em\u003e databases or blood samples (\u003cem\u003eLevel-3\u003c/em\u003e, Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003eH), while the remaining 21,310 (34.6%) were completed using gene annotation of human genome (\u003cem\u003eLevel-4\u003c/em\u003e, Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003eH). These blood circRNAs were associated with 9,308 human genes, with most genes transcribed only a single circRNA (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003eI). However, 397 genes exhibited high levels of alternative splicing of circular transcripts, with each of these genes corresponding to more than 50 transcripts. Functional enrichment analysis showed these highly spliced genes are significantly involved in pathways like cell cycle regulation, ubiquitin-mediated proteolysis, and chemokine signaling (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003eJ).\u003c/p\u003e \u003cdiv id=\"Sec11\" class=\"Section2\"\u003e \u003ch2\u003eExpression analysis of circRNA profiles\u003c/h2\u003e \u003cp\u003eCircRNAs generally exhibit lower expression levels than their linear counterparts. can exhibit varying expression levels from the same gene depending on the context\u003csup\u003e\u003cspan citationid=\"CR40\" class=\"CitationRef\"\u003e40\u003c/span\u003e,\u003cspan citationid=\"CR41\" class=\"CitationRef\"\u003e41\u003c/span\u003e\u003c/sup\u003e. Understanding the regulation of dynamic circRNA expression highlights the importance of simultaneously quantifying both circular and linear RNA types. Using the \u003cem\u003eAQUARIUM-HB\u003c/em\u003e pipeline, we generated a density plot of RNA expression of both circular and linear transcripts to illustrate the expression distribution of circRNAs and linear mRNAs (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003eA). The overall expression of circRNAs and linear RNAs follows a normal distribution, indicating that their expression levels are well-regulated and may reflect typical biological variability across samples. The expression levels of circRNAs much smaller than linear mRNA transcripts in both non-critical and critical COVID-19 patients (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003eB). The circRNAs accounted for 3.3% of total RNA expression in non-critical COVID-19 patients, slightly higher than that in critical patients (3.1%). Furthermore, a significant positive correlation was observed between the expression changes of circRNAs and their corresponding linear RNAs at the gene level (\u003cem\u003eR\u003c/em\u003e\u0026thinsp;=\u0026thinsp;0.26, \u003cem\u003eP-value\u003c/em\u003e\u0026thinsp;\u0026lt;\u0026thinsp;2.2*10\u003csup\u003e\u0026minus;\u0026thinsp;16\u003c/sup\u003e) (Figs.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003eC). This suggests the expressional change of circRNAs are largely determined by the transcriptional regulation of its host gene. For example, expressions of some circRNAs are up-regulated (Figs.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003eC, blue dots) due to the up-regulation of their corresponding parent gene. However, some dysregulated circRNAs are splicing-derived circRNAs, with their expression levels up-regulated (Figs.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003eC, red dots) or down-regulated (Figs.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003eC, green dots) independently to the transcriptional regulation of their parent genes. The functional enrichment patterns of differentially expressed circRNAs also differ from those of differentially expressed linear RNAs, indicating distinct biological roles of circRNAs in disease severity of COVID-19 (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003eD).\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec12\" class=\"Section2\"\u003e \u003ch2\u003eConstruction of a reference set of human blood full-length circRNAs\u003c/h2\u003e \u003cp\u003eAlthough long-read sequencing technology can directly sequence the full-length circRNAs, it still has limitations in application, such as high costs. On the other hand, the whole genome characterization of circRNAs can be achieved through the HT RNA-seq technology with various RNA enrichment strategies. Given the importance of a reliable reference set, \u003cem\u003eAQUARIUM-HB\u003c/em\u003e dynamically integrates circRNAs from both \u003cem\u003eFLcircAS\u003c/em\u003e\u003csup\u003e\u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e21\u003c/span\u003e\u003c/sup\u003e and \u003cem\u003eIsoCirc\u003c/em\u003e\u003csup\u003e\u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e19\u003c/span\u003e\u003c/sup\u003e databases, along with newly identified circRNAs from HT RNA-seq datasets. Initially, 275,165 blood full-length circRNAs from the \u003cem\u003eFLcircAS\u003c/em\u003e database\u003csup\u003e\u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e21\u003c/span\u003e\u003c/sup\u003e and 31,998 blood full-length circRNAs from the \u003cem\u003eIsoCirc\u003c/em\u003e database\u003csup\u003e\u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e19\u003c/span\u003e\u003c/sup\u003e were integrated as the reference set of human blood full-length circRNAs (\u003cb\u003eSupplementary Table S1\u003c/b\u003e). This initial reference set was composed of 276,179 blood-derived full-length circRNAs from existing circRNA databases in total. Our pipeline identified 66,837 full-length circRNAs from the HT RNA-seq data of 69 human blood samples of COVID-19 patients. This set of full-length circRNAs in human blood samples were then used to update the initial reference set (Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003eE). Among them, 11,472 full-length circRNAs were already deposited in the \u003cem\u003eFLcircAS\u003c/em\u003e or \u003cem\u003eIsoCirc\u003c/em\u003e databases, while the remaining 55,365 circRNAs were newly identified in human blood samples (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003eK). Finally, we obtained an updated human full-length blood circRNA reference set consisting of 331,544 full-length circRNAs. This reference set provides a robust foundation for circRNA identification and quantification in blood, supporting advanced research in biomarker discovery and improving diagnostic accuracy.\u003c/p\u003e \u003c/div\u003e"},{"header":"Conclusion","content":"\u003cp\u003eThis study introduces \u003cem\u003eAQUARIUM-HB\u003c/em\u003e, a comprehensive pipeline capable of identifying and quantifying circRNAs at the transcript level from HT RNA-seq data of human peripheral blood samples. \u003cem\u003eAQUARIUM-HB\u003c/em\u003e integrates a reference set of full-length blood-derived circRNAs, combining established circRNA databases with new findings from HT RNA-seq datasets to ensure precise circRNA identification and quantification. By applying \u003cem\u003eAQUARIUM-HB\u003c/em\u003e to a dataset of COVID-19 patients, we demonstrated its potential in uncovering the unique expression dynamics of circRNAs in response to diseases. The pipeline\u0026rsquo;s ability to capture and quantify full-length circRNA structures not only enhances the accuracy of circRNA profiling in blood but also facilitates the exploration of circRNAs as biomarkers in liquid biopsies.\u003c/p\u003e"},{"header":"Declarations","content":"\u003cp\u003e\u003cstrong\u003eAcknowledgements\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThis work was supported by the National Key R\u0026amp;D Program of China (Nos. 2022YFC3500200, 2022YFC3500202), National Natural Science Foundation of China (Nos. 81930117, 82430122), and Jiangsu Provincial Social Development and Clinical Frontier Technology Project (BE2023790).\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eContributions\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eW.G. conceived the research. S.Y., X.B. and L.L. contributed to data analysis. S.Y. and W.G. wrote the manuscript. All authors read, revised and approved the final version of the manuscript.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eDeclaration of competing interest\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe authors declare no competing interests.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eCode availability\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e\u003cem\u003eAQUARIUM-HB\u003c/em\u003e is publicly available and can be accessed on GitHub: https://github.com/NJUCMbioinfo/AQUARIUM-HB.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\n\u003cli\u003eDe Rubis, G., Rajeev Krishnan, S. \u0026amp; Bebawy, M. Liquid biopsies in cancer diagnosis, monitoring, and prognosis. \u003cem\u003eTrends Pharmacol. Sci.\u003c/em\u003e \u003cstrong\u003e40\u003c/strong\u003e, 172\u0026ndash;186 (2019).\u003c/li\u003e\n\u003cli\u003eWang, Y., Liu, J., Ma, J., Sun, T. \u0026amp; Ming, L. Exosomal circRNAs: biogenesis, effect and application in human diseases. \u003cem\u003eMol. Cancer\u003c/em\u003e \u003cstrong\u003e18\u003c/strong\u003e, 1\u0026ndash;10 (2019).\u003c/li\u003e\n\u003cli\u003eZaporozhchenko, I. A., Ponomaryova, A. A., Rykova, E. Y. \u0026amp; Laktionov, P. P. The potential of circulating cell-free RNA as a cancer biomarker: challenges and opportunities. \u003cem\u003eExpert Rev. Mol. Diagn.\u003c/em\u003e \u003cstrong\u003e18\u003c/strong\u003e, 133\u0026ndash;145 (2018).\u003c/li\u003e\n\u003cli\u003eWen, G. \u0026amp; Gu, W. Circular RNAs in peripheral blood mononuclear cells are more stable than linear RNAs upon sample processing delay. \u003cem\u003eJ. Cell. Mol. Med.\u003c/em\u003e \u003cstrong\u003e26\u003c/strong\u003e, 5021\u0026ndash;5032 (2022).\u003c/li\u003e\n\u003cli\u003eLi, X., Yang, L. \u0026amp; Chen, L. L. The biogenesis, functions, and challenges of circular RNAs. \u003cem\u003eMol. Cell\u003c/em\u003e \u003cstrong\u003e71\u003c/strong\u003e, 428\u0026ndash;442 (2018).\u003c/li\u003e\n\u003cli\u003eCao, L., Huang, C., Zhou, D. C., Hu, Y. \u0026amp; Zhao, G. Proteogenomic characterization of pancreatic ductal adenocarcinoma. \u003cem\u003eCell\u003c/em\u003e \u003cstrong\u003e184\u003c/strong\u003e, 5031-5052.e26 (2021).\u003c/li\u003e\n\u003cli\u003eWen, G., Zhou, T. \u0026amp; Gu, W. The potential of using blood circular RNA as liquid biopsy biomarker for human diseases. \u003cem\u003eProtein Cell\u003c/em\u003e \u003cstrong\u003e12\u003c/strong\u003e, 911\u0026ndash;946 (2021).\u003c/li\u003e\n\u003cli\u003eSzabo, L. \u003cem\u003eet al.\u003c/em\u003e Statistically based splicing detection reveals neural enrichment and tissue-specific induction of circular RNA during human fetal development. \u003cem\u003eGenome Biol.\u003c/em\u003e \u003cstrong\u003e17\u003c/strong\u003e, 263 (2016).\u003c/li\u003e\n\u003cli\u003eVromman, M. \u003cem\u003eet al.\u003c/em\u003e Large-scale benchmarking of circRNA detection tools reveals large differences in sensitivity but not in precision. \u003cem\u003eNat. Methods\u003c/em\u003e \u003cstrong\u003e20\u003c/strong\u003e, 1159\u0026ndash;1169 (2023).\u003c/li\u003e\n\u003cli\u003eGao, Y., Zhang, J. \u0026amp; Zhao, F. Circular RNA identification based on multiple seed matching. \u003cem\u003eBrief. Bioinform.\u003c/em\u003e \u003cstrong\u003e19\u003c/strong\u003e, 803\u0026ndash;810 (2018).\u003c/li\u003e\n\u003cli\u003eZhang, J., Chen, S., Yang, J. \u0026amp; Zhao, F. Accurate quantification of circular RNAs identifies extensive circular isoform switching events. \u003cem\u003eNat. Commun.\u003c/em\u003e \u003cstrong\u003e11\u003c/strong\u003e, 90 (2020).\u003c/li\u003e\n\u003cli\u003eMa, X. K. \u003cem\u003eet al.\u003c/em\u003e CIRCexplorer3: a CLEAR pipeline for direct comparison of circular and linear RNA expression. \u003cem\u003eGenomics Proteomics Bioinformatics\u003c/em\u003e \u003cstrong\u003e17\u003c/strong\u003e, 511\u0026ndash;521 (2019).\u003c/li\u003e\n\u003cli\u003eMa, X. K., Zhai, S. N. \u0026amp; Yang, L. Approaches and challenges in genome-wide circular RNA identification and quantification. \u003cem\u003eTrends Genet.\u003c/em\u003e \u003cstrong\u003e39\u003c/strong\u003e, 897\u0026ndash;907 (2023).\u003c/li\u003e\n\u003cli\u003eZheng, Y., Ji, P., Chen, S., Hou, L. \u0026amp; Zhao, F. Reconstruction of full-length circular RNAs enables isoform-level quantification. \u003cem\u003eGenome Med.\u003c/em\u003e \u003cstrong\u003e11\u003c/strong\u003e, 1\u0026ndash;20 (2019).\u003c/li\u003e\n\u003cli\u003eLi, M. \u003cem\u003eet al.\u003c/em\u003e Quantifying circular RNA expression from RNA-seq data using model-based framework. \u003cem\u003eBioinformatics\u003c/em\u003e \u003cstrong\u003e33\u003c/strong\u003e, 2131\u0026ndash;2139 (2017).\u003c/li\u003e\n\u003cli\u003eWen, G. \u003cem\u003eet al.\u003c/em\u003e AQUARIUM: Accurate quantification of circular isoforms using model-based strategy. \u003cem\u003eBioinformatics\u003c/em\u003e \u003cstrong\u003e37\u003c/strong\u003e, 4879\u0026ndash;4881 (2021).\u003c/li\u003e\n\u003cli\u003eLiu, Z. \u003cem\u003eet al.\u003c/em\u003e circFL-seq reveals full-length circular RNAs with rolling circular reverse transcription and nanopore sequencing. \u003cem\u003eeLife\u003c/em\u003e \u003cstrong\u003e10\u003c/strong\u003e, e69457 (2021).\u003c/li\u003e\n\u003cli\u003eRahimi, K., Ven\u0026oslash;, M. T., Dupont, D. M. \u0026amp; Kjems, J. Nanopore sequencing of brain-derived full-length circRNAs reveals circRNA-specific exon usage, intron retention and microexons. \u003cem\u003eNat. Commun.\u003c/em\u003e \u003cstrong\u003e12\u003c/strong\u003e, 4825 (2021).\u003c/li\u003e\n\u003cli\u003eXin, R. \u003cem\u003eet al.\u003c/em\u003e isoCirc catalogs full-length circular RNA isoforms in human transcriptomes. \u003cem\u003eNat. Commun.\u003c/em\u003e \u003cstrong\u003e12\u003c/strong\u003e, 266 (2021).\u003c/li\u003e\n\u003cli\u003eZhang, J. \u003cem\u003eet al.\u003c/em\u003e Comprehensive profiling of circular RNAs with nanopore sequencing and CIRI-long. \u003cem\u003eNat. Biotechnol.\u003c/em\u003e \u003cstrong\u003e39\u003c/strong\u003e, 836\u0026ndash;845 (2021).\u003c/li\u003e\n\u003cli\u003eChiang, T. W. \u003cem\u003eet al.\u003c/em\u003e FL-circAS: an integrative resource and analysis for full-length sequences and alternative splicing of circular RNAs with nanopore sequencing. \u003cem\u003eNucleic Acids Res.\u003c/em\u003e \u003cstrong\u003e52\u003c/strong\u003e, D115\u0026ndash;D123 (2024).\u003c/li\u003e\n\u003cli\u003eWu, W., Ji, P. \u0026amp; Zhao, F. CircAtlas: an integrated resource of one million highly accurate circular RNAs from 1070 vertebrate transcriptomes. \u003cem\u003eGenome Biol.\u003c/em\u003e \u003cstrong\u003e21\u003c/strong\u003e, 101 (2020).\u003c/li\u003e\n\u003cli\u003eWu, W., Zhao, F. \u0026amp; Zhang, J. circAtlas 3.0: A gateway to 3 million curated vertebrate circular RNAs based on a standardized nomenclature scheme. \u003cem\u003eNucleic Acids Res.\u003c/em\u003e \u003cstrong\u003e52\u003c/strong\u003e, D52\u0026ndash;D60 (2024).\u003c/li\u003e\n\u003cli\u003eYu, K. H. O. \u003cem\u003eet al.\u003c/em\u003e Quantifying full-length circular RNAs in cancer. \u003cem\u003eGenome Res.\u003c/em\u003e \u003cstrong\u003e31\u003c/strong\u003e, 2340\u0026ndash;2353 (2021).\u003c/li\u003e\n\u003cli\u003eHossain, M. T. \u003cem\u003eet al.\u003c/em\u003e Reconstruction of full-length circRNA sequences using chimeric alignment information. \u003cem\u003eInt. J. Mol. Sci.\u003c/em\u003e \u003cstrong\u003e23\u003c/strong\u003e, 6776 (2022).\u003c/li\u003e\n\u003cli\u003eStefanov, S. R. \u0026amp; Meyer, I. M. CYCLeR\u0026mdash;a novel tool for the full isoform assembly and quantification of circRNAs. \u003cem\u003eNucleic Acids Res.\u003c/em\u003e \u003cstrong\u003e51\u003c/strong\u003e, e10\u0026ndash;e10 (2023).\u003c/li\u003e\n\u003cli\u003eZhong, Y. \u003cem\u003eet al.\u003c/em\u003e Systematic identification and characterization of exon\u0026ndash;intron circRNAs. \u003cem\u003eGenome Res.\u003c/em\u003e \u003cstrong\u003e34\u003c/strong\u003e, 376\u0026ndash;393 (2024).\u003c/li\u003e\n\u003cli\u003eChen, L. L. \u003cem\u003eet al.\u003c/em\u003e A guide to naming eukaryotic circular RNAs. \u003cem\u003eNat. Cell Biol.\u003c/em\u003e \u003cstrong\u003e25\u003c/strong\u003e, 1\u0026ndash;5 (2023).\u003c/li\u003e\n\u003cli\u003eHuang, W. \u003cem\u003eet al.\u003c/em\u003e TransCirc: An interactive database for translatable circular RNAs based on multi-omics evidence. \u003cem\u003eNucleic Acids Res.\u003c/em\u003e \u003cstrong\u003e49\u003c/strong\u003e, D236\u0026ndash;D242 (2021).\u003c/li\u003e\n\u003cli\u003eGlažar, P., Papavasileiou, P. \u0026amp; Rajewsky, N. circBase: A database for circular RNAs. \u003cem\u003eRNA\u003c/em\u003e \u003cstrong\u003e20\u003c/strong\u003e, 1666\u0026ndash;1670 (2014).\u003c/li\u003e\n\u003cli\u003eZou, D. \u003cem\u003eet al.\u003c/em\u003e PltDB: A blood platelets-based gene expression database for disease investigation. \u003cem\u003eBioinformatics\u003c/em\u003e \u003cstrong\u003e38\u003c/strong\u003e, 3143\u0026ndash;3145 (2022).\u003c/li\u003e\n\u003cli\u003ePovey, S. \u003cem\u003eet al.\u003c/em\u003e The HUGO Gene Nomenclature Committee (HGNC). \u003cem\u003eHum. Genet.\u003c/em\u003e \u003cstrong\u003e109\u003c/strong\u003e, 678\u0026ndash;680 (2001).\u003c/li\u003e\n\u003cli\u003eLove, M., Anders, S. \u0026amp; Huber, W. Differential analysis of count data\u0026ndash;the DESeq2 package. \u003cem\u003eGenome Biol\u003c/em\u003e \u003cstrong\u003e15\u003c/strong\u003e, 10\u0026ndash;1186 (2014).\u003c/li\u003e\n\u003cli\u003eAshburner, M. \u003cem\u003eet al.\u003c/em\u003e Gene ontology: Tool for the unification of biology. \u003cem\u003eNat. Genet.\u003c/em\u003e \u003cstrong\u003e25\u003c/strong\u003e, 25\u0026ndash;29 (2000).\u003c/li\u003e\n\u003cli\u003eKanehisa, M. KEGG: Kyoto encyclopedia of genes and genomes. \u003cem\u003eNucleic Acids Res.\u003c/em\u003e \u003cstrong\u003e28\u003c/strong\u003e, 27\u0026ndash;30 (2000).\u003c/li\u003e\n\u003cli\u003eGillespie, M. \u003cem\u003eet al.\u003c/em\u003e The reactome pathway knowledgebase 2022. \u003cem\u003eNucleic Acids Res.\u003c/em\u003e \u003cstrong\u003e50\u003c/strong\u003e, D687\u0026ndash;D692 (2022).\u003c/li\u003e\n\u003cli\u003eSubramanian, A. \u003cem\u003eet al.\u003c/em\u003e Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. \u003cem\u003eProc. Natl. Acad. Sci.\u003c/em\u003e \u003cstrong\u003e102\u003c/strong\u003e, 15545\u0026ndash;15550 (2005).\u003c/li\u003e\n\u003cli\u003eBarrett, T. \u003cem\u003eet al.\u003c/em\u003e NCBI GEO: archive for functional genomics data sets\u0026mdash;update. \u003cem\u003eNucleic Acids Res.\u003c/em\u003e \u003cstrong\u003e41\u003c/strong\u003e, D991\u0026ndash;D995 (2012).\u003c/li\u003e\n\u003cli\u003eCarapito, R. \u003cem\u003eet al.\u003c/em\u003e Identification of driver genes for critical forms of COVID-19 in a deeply phenotyped young patient cohort. \u003cem\u003eSci. Transl. Med.\u003c/em\u003e \u003cstrong\u003e14\u003c/strong\u003e, eabj7521 (2022).\u003c/li\u003e\n\u003cli\u003eRybak-Wolf, A. \u003cem\u003eet al.\u003c/em\u003e Circular RNAs in the mammalian brain are highly abundant, conserved, and dynamically expressed. \u003cem\u003eMol. Cell\u003c/em\u003e \u003cstrong\u003e58\u003c/strong\u003e, 870\u0026ndash;885 (2015).\u003c/li\u003e\n\u003cli\u003eSalzman, J., Chen, R. E., Olsen, M. N., Wang, P. L. \u0026amp; Brown, P. O. Cell-type specific features of circular RNA expression. \u003cem\u003ePLoS Genet.\u003c/em\u003e\u003cstrong\u003e9\u003c/strong\u003e, e1003777 (2013). \u003c/li\u003e\n\u003c/ol\u003e"},{"header":"Table","content":"\u003cp\u003e\u003cstrong\u003eTable 1.\u003c/strong\u003e The terms used in circRNA annotation module of \u003cem\u003eAQUARIUM-HB\u003c/em\u003e pipeline.\u003c/p\u003e\n\u003cdiv align=\"Left\"\u003e\n \u003ctable border=\"1\" cellspacing=\"0\" cellpadding=\"0\" width=\"547\"\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\" style=\"width: 167px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eAnnotation\u003c/strong\u003e \u003cstrong\u003eterm\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 380px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eDescription\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 167px;\"\u003e\n \u003cp\u003e\u003cstrong\u003e\u003cem\u003eBSJ ID\u003c/em\u003e\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 380px;\"\u003e\n \u003cp\u003eBSJ position of a circRNA\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 167px;\"\u003e\n \u003cp\u003e\u003cstrong\u003e\u003cem\u003euniform ID\u003c/em\u003e\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 380px;\"\u003e\n \u003cp\u003estandard nomenclature of circRNA (Chen \u003cem\u003eet al\u003c/em\u003e\u003csup\u003e28\u003c/sup\u003e)\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 167px;\"\u003e\n \u003cp\u003e\u003cstrong\u003e\u003cem\u003ealias ID\u003c/em\u003e\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 380px;\"\u003e\n \u003cp\u003ealiases ID in several circRNA databases\u003csup\u003e21\u0026ndash;23,29\u0026ndash;31\u003c/sup\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 167px;\"\u003e\n \u003cp\u003e\u003cstrong\u003e\u003cem\u003ecircRNA type\u003c/em\u003e\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 380px;\"\u003e\n \u003cp\u003etype of a circRNA (exonic, intronic or intergenic)\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 167px;\"\u003e\n \u003cp\u003e\u003cstrong\u003e\u003cem\u003ehost gene\u003c/em\u003e\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 380px;\"\u003e\n \u003cp\u003ehost gene ID(s) of an exonic or intronic circRNA\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 167px;\"\u003e\n \u003cp\u003e\u003cstrong\u003e\u003cem\u003eexon count\u003c/em\u003e\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 380px;\"\u003e\n \u003cp\u003ethe number of exons in a circRNA\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 167px;\"\u003e\n \u003cp\u003e\u003cstrong\u003e\u003cem\u003esequence length\u003c/em\u003e\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 380px;\"\u003e\n \u003cp\u003ethe splicing length of a circRNA\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 167px;\"\u003e\n \u003cp\u003e\u003cstrong\u003e\u003cem\u003ereconstruction source\u003c/em\u003e\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 380px;\"\u003e\n \u003cp\u003ereconstruction source of a circRNA\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 167px;\"\u003e\n \u003cp\u003e\u003cstrong\u003e\u003cem\u003econfidence level\u003c/em\u003e\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\" style=\"width: 380px;\"\u003e\n \u003cp\u003ethe circRNA confidence level according to the strategies used in identification and reconstruction\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n \u003c/table\u003e\n\u003c/div\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":true,"hideJournal":true,"highlight":"","institution":"Nanjing University of Chinese Medicine","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"AQUARIUM-HB, bioinformatics, circRNA, RNA-seq, blood","lastPublishedDoi":"10.21203/rs.3.rs-5657706/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-5657706/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eAccurately identifying and quantifying human blood circular RNAs (circRNAs) from RNA-seq data is a critical bioinformatics challenge in biomarker discovery for human diseases. In this study, we present \u003cem\u003eAQUARIUM-HB\u003c/em\u003e, a comprehensive bioinformatics pipeline for identifying, quantifying, annotating, and analyzing circRNAs from human blood transcriptomes. \u003cem\u003eAQUARIUM-HB \u003c/em\u003eincludes three functional modules. First, it identifies and annotates circRNAs from rRNA-depleted RNA-seq datasets of human blood samples. Second, it performs an in-depth expression analysis of blood circRNAs. Third, it constructs a reference set of full-length blood circRNAs. We demonstrate the application of \u003cem\u003eAQUARIUM-HB\u003c/em\u003e using a human blood RNA-seq dataset from COVID-19 patients, showcasing its potential for improving the accuracy and depth of circRNA biomarker discovery.\u003c/p\u003e","manuscriptTitle":"AQUARIUM_HB: a bioinformatics pipeline for human blood circular RNA analysis","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2024-12-18 10:19:48","doi":"10.21203/rs.3.rs-5657706/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"0b1b8e2d-eed2-4977-a785-369ab2b7f93f","owner":[],"postedDate":"December 18th, 2024","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[{"id":41745080,"name":"Bioinformatics"}],"tags":[],"updatedAt":"2024-12-18T10:19:48+00:00","versionOfRecord":[],"versionCreatedAt":"2024-12-18 10:19:48","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-5657706","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-5657706","identity":"rs-5657706","version":["v1"]},"buildId":"qtupq5eGEP_6zYnWcrvyt","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

⚙ Ask this paper AI returns verbatim quotes from the full text · source: preprint-html ⓘ

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2024) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc: last seen: 2026-05-20T01:45:00.602351+00:00