A Bioinformatics Approach to MicroRNA-Sequencing Analysis Based on Human Saliva Samples of Patients with Endometriosis

article OA: gold CC0 ⤵ 11 in-corpus citations
AI-generated summary by claude@2026-06, 2026-06-08

This study developed and validated a bioinformatics approach for analyzing microRNA expression in saliva samples to identify potential diagnostic biomarkers for endometriosis.

One-sentence paraphrase of the abstract; not a substitute for reading it. No clinical advice. How this works

AI-generated deep summary by claude@2026-06, 2026-06-19 · read from full text

This prospective ENDOmiARN study analyzed miRNAome profiles from 200 saliva samples collected from women with chronic pelvic pain suggestive of endometriosis, comparing those with confirmed endometriosis (including rASRM stage I–IV stratification) versus “discordant/complex” controls without endometriosis lesions after surgical inspection. Small RNA sequencing with standardized saliva stabilization, followed by trimming/alignment and miRNA quantification using miRDeep2 and differential expression testing with DESeq2, identified 30 differentially expressed miRNAs, but only three had ROC AUC values >0.6; overall diagnostic performance across miRNAs showed wide ranges with AUC ~39.3–69.2% and sensitivity/specificity varying substantially. A key limitation noted by the authors is the challenge of variability and potential bias in miRNA sequencing from sample handling and methodological factors, which is why saliva stability and library/analysis choices were emphasized. Relevance to endometriosis: This paper is centrally about endometriosis — developing and testing a bioinformatics pipeline for saliva microRNA-sequencing as a diagnostic biomarker approach in patients with surgically confirmed endometriosis.

Read from the paper's body, not the abstract. Not a substitute for reading the paper. No clinical advice. How this works

Abstract

Endometriosis, defined by the presence of endometrium-like tissue outside the uterus, affects 2-10% of the female population, i.e., around 190 million women, worldwide. The aim of the prospective ENDO-miRNA study was to develop a bioinformatics approach for microRNA-sequencing analysis of 200 saliva samples for miRNAome expression and to test its diagnostic accuracy for endometriosis. Among the 200 patients, 76.5% (n = 153) had confirmed endometriosis and 23.5% (n = 47) had no endometriosis (controls). Small RNA-seq of 200 saliva samples yielded ~4642 M raw sequencing reads (from ~13.7 M to ~39.3 M reads/sample). The number of expressed miRNAs ranged from 1250 (outlier) to 2561 per sample. Some 2561 miRNAs were found to be differentially expressed in the saliva samples of patients with endometriosis compared with the control patients. Among these, 1.17% (n = 30) were up- or downregulated. Among these, the F1-score, sensitivity, specificity, and AUC ranged from 11-86.8%, 5.8-97.4%, 10.6-100%, and 39.3-69.2%, respectively. Here, we report a bioinformatic approach to saliva miRNA sequencing and analysis. We underline the advantages of using saliva over blood in terms of ease of collection, reproducibility, stability, safety, non-invasiveness. This report describes the whole saliva transcriptome to make miRNA quantification a validated, standardized, and reliable technique for routine use. The methodology could be applied to build a saliva signature of endometriosis.
Full text 18,571 characters · extracted from pmc-nxml · 6 sections · click to expand

Section 2

We used data from the prospective “ENDOmiARN” study (ClinicalTrials.gov Identifier: NCT04728152 ). Data collection and analysis were carried out under Research Protocol n° ID RCB: 2020-A03297-32. We obtained signed informed consent from all participants in the study. The experimental protocol was approved by Ethics committee le comité de protection des personnes (C.P.P.) Sud-Ouest et Outre-Mer 1 (CPP 1-20-095 ID 10476). The ENDOmiARN study included 200 saliva samples obtained from patients with chronic pelvic pain suggestive of endometriosis. All the samples were collected between January 2021 and June 2021. Analysis was performed blinded to the surgical and imaging findings. The patients with endometriosis were stratified according to the revised American Society of Reproductive Medicine (rASRM) classification [ 17 ]. The main characteristics of the patients included in the ENDOmiARN study are displayed in Table 1 . The saliva samples (2 mL) were collected in an all-in-one system including a nucleic acid stabilizing solution for collection, stabilization and transportation (OME 505, DNA Genotek Inc., 2 Beaverbrook Road Ottawa, ON, Canada K2K 1L1) using an at-home kit ( https://www.dnagenotek.com/row/products/collection-microbiome/omnigene-oral/OME-505.html , accessed on 1 January 2021). Subjects were asked to refrain from eating, drinking, smoking, or chewing gum for 30 min before the saliva sample was taken. All the samples were stored at room temperature prior to shipping. RNA was isolated from each saliva sample using the miRNeasy Kit (Qiagen, Inc., Germantown, MD, USA) according to the manufacturer’s instructions [ 6 , 8 , 9 ]. In accordance with DNA Genotek process of extraction, a systematic centrifugation was performed at 13,300× g for 3 min. RNA quality was assessed using the Agilent Technologies TapeStation 2200. RNA-sequencing libraries were prepared using the QIAseq miRNA Library Kit (Qiagen, Hilden, Germany) according to the manufacturer’s instructions. Samples were indexed in batches of 96, with a targeted sequencing depth of 17 million reads per sample. Sequencing was performed using 100 base single-end reads, using an Novaseq6000 sequencer (Illumina, San Diego, CA, USA) [ 14 , 18 ]. The process used is the one summarized in the previously published work by Potla et al. [ 13 ].

Section 3

Sequencing reads were processed using the data processing pipeline. FastQ files were trimmed to remove adapter sequences using Cutadapt version v.1.18 and were aligned using Bowtie version 1.1.1 to the following transcriptome databases: the human reference genome available from NCBI ( https://www.ncbi.nlm.nih.gov/genome/guide/human/ , accessed on 1 January 2021), and miRBase21 (miRNAs) using the MirDeep2 v0.1.0 package. The raw sequencing data quality was assessed using FastQC software v0.11.7 [ 14 , 15 , 19 , 20 , 21 ]. miRNA expression was quantified by miRDeep2 v0.1.0 [ 22 ]. Differential expression tests were then conducted in DESeq2 for miRNAs with read counts in ≥1 of the samples. DESeq2 V1.20 integrates methodological advances with several novel features to facilitate a more quantitative analysis of comparative RNA-seq data using shrinkage estimators for dispersion and fold change [ 23 ]. The resulting matrix was filtered for expressed miRNAs and normalized using Z-score normalization [ 24 ]. miRNAs were considered as differentially expressed if the absolute value of log2-fold change was >1.5 (upregulated) and <0.5 (downregulated). The p value adjusted for multiple testing was <0.05 [ 23 ]. To evaluate the diagnostic accuracy of each miRNA biomarker, sensitivity, specificity, an ROC analysis was performed, and the ROC AUC was calculated [ 25 , 26 ]. Additional statistical analysis was based on the Chi 2 test as appropriate for categorical variables. Values of p < 0.05 were considered to denote significant differences. Data were managed with an Excel database (Microsoft, Redmond, WA, USA) and analyzed using R 2.15 software, available online ( http://cran.r-project.org/ , accessed on 1 January 2021).

Intro

MicroRNAs (miRNAs) are small, highly conserved non-coding RNAs with a length of about 22 nucleotides which bind to the 3′-untranslated region (3′-UTR) of target messenger RNAs (mRNAs), thus regulating gene expression post-transcriptionally through RNA degradation and/or translational inhibition [ 1 , 2 ]. Schematically, miRNA biosynthesis involves several steps: (i) they are first transcribed from genes in intronic regions of coding or non-coding transcripts, or coded from exons under the action of the RNA polymerase II, generating hundreds of duplex nucleotide-long primary miRNAs (pri-miRNA); (ii) the pri-miRNA is subsequently cleaved by a complex formed by an RNase III enzyme, Drosha, RNA binding cofactor and Pasha to generate precursor miRNA (pre-miRNA); and (iii) then, the pre-miRNAs are transported from the nucleus to the cytoplasm using exportin 5 where the duplex is cleaved by Dicer and helicase to form mature miRNAs [ 2 , 3 ]. The miRNAs are subsequently incorporated into an RNA silencing complex (RISC) that regulates post-translational modifications through binding to the 3′ untranslated region (3′ UTR) of the target messenger-RNA (mRNA). Finally, the miRNAs are released from the cells into circulation using various carriers such as Argonaute, nucleophosmin 1, high-density lipoproteins or extracellular vesicles (exosomes) with a distribution in human fluids where they can be detected [ 4 , 5 ]. Numerous studies have demonstrated the relevance of evaluating miRNA expression in cancers and benign pathologies to provide insights into the molecular mechanisms of disease onset and progression [ 3 , 6 , 7 , 8 , 9 , 10 ]. For these reasons, sequencing has been applied to biomarker discovery for a variety of diseases, such as endometriosis, but with some limitations especially for saliva samples. Indeed, several sources of errors can be introduced during a sequencing study such as (i) an underpowered cohort [ 3 , 11 , 12 ], (ii) sample extraction, (iii) library preparation (12, 15–18) [ 7 , 13 , 14 , 15 ], and (iv) sequencing technique. Overall, these errors can lead to over- or underestimation of the molecule expression or a one-size-fits-all approach with inadequate analysis [ 8 , 9 , 12 , 14 , 15 , 16 ]. Therefore, the goal of the prospective ENDO-miRNA study was to develop a bioinformatics approach for microRNA-sequencing analysis of 200 saliva samples for miRNAome expression and to test its diagnostic accuracy for endometriosis.

Results

Among the 200 patients, 76.5% (n = 153) had confirmed endometriosis and 23.5% (n = 47) had no endometriosis (controls). In the endometriosis group, 52% (80) of the patients had rASRM stages I–II and 48% (73) had stage III–IV. The control group consisted of various benign pathologies with 51% (24) of the women having no abnormality. These were defined as “discordant” (or complex) patients corresponding to women with symptoms suggestive of endometriosis without clinical or MRI features of endometriosis and no endometriosis lesions discovered during laparoscopic inspection ( Table 1 ). Small RNA-seq of 200 saliva samples yielded ~4 642 M raw sequencing reads (from ~13.7 M to ~39.3 M reads/sample). Pre-filtering and filtering steps retained 70% (~3205 M) of initial raw reads. The majority of filtered reads were of short read length. Quantification of filtered reads and identification of known miRNAs yielded ~190 M sequences to be mapped to 2561 known miRNAs from miRBase v21. The number of expressed miRNAs ranged from 1250 (outlier) to 2561 per sample. The distribution of expressed miRNAs in the 200 saliva samples and the overall composition of processed reads are shown in Figure 1 A,B and Figure 2 . Of the miRNAs, 2561 were found to be differentially expressed in the saliva samples of patients with endometriosis, compared with the control patients. Among these, 1.17% (n = 30) were up- or downregulated. Figure 3 shows a volcano plot of the miRNAs expressed in endometriosis. Among the 30 regulated miRNAs, only three (hsa-miR-34c-5p, hsa-miR-4677-3p, hsa-miR-655-5p) had an AUC > 0.6. The top 10 differentially expressed miRNA patterns in the endometriosis and control are reported in Figure 4 . The diagnostic metrics for endometriosis in the differentially expressed miRNAs in the saliva samples (n = 30) are reported in Table 2 . Among these, the F1-score, sensitivity, specificity, and AUC ranged from 11–86.8%, 5.8–97.4%, 10.6–100%, and 39.3–69.2%, respectively. For AUC criteria, 90% (n = 27), and 10% (n = 3) had a value ranging between 36.3–59% and ≥60%, respectively. For the F1-score, 80% (n = 24) and 20% (n = 6) had a value ranging between 0–79%, and ≥80%, respectively For sensitivity, 80% (n = 24) and 20% (n = 6) had a value ranging between 0–79%, and ≥80%, respectively For specificity, 70% (n = 21) and 10% (n = 9) had a value ranging between 0–79%, and ≥80%, respectively. The clustering of the accuracy values is reported in Figure 5 .

Discussion

To the best of our knowledge, this is the first report detailing the miRNAome of 200 saliva samples from patients with and without endometriosis included in a prospective study: the ENDOmiARN study [ 3 , 11 , 16 , 27 ]. In addition, we report a bioinformatics approach to saliva miRNA sequencing and analysis and underline the advantages of using saliva over blood in terms of ease of collection, reproducibility, stability, safety, non-invasiveness, and cost-effectiveness [ 6 , 8 , 10 , 22 , 28 , 29 , 30 , 31 ]. Preliminary results about the use of saliva RNAs as diagnostic biomarkers have previously been reported, mainly for cancer [ 6 , 8 , 32 ], systemic disease, and forensic casework [ 6 , 9 , 28 , 33 ]. However, the quality of the methodology and yield issues of these studies overall are debatable [ 34 ]. In a recent literature review of miRNAs for the non-invasive diagnosis of endometriosis, Monnaka et al. underlined that none of the 449 reports investigated miRNAs in saliva [ 11 ]. Therefore, since (i) no scientifically proven salivary biomarkers for endometriosis have been reported, and (ii) the applicability of such biomarkers has been poorly explored, the concept of extracting and identifying miRNAs from saliva samples for the reliable identification of endometriosis is challenging [ 35 , 36 ]. The main obstacle to using miRNAs is their stability and susceptibility to degradation. This has always been an issue for mRNA-based gene expression analysis and a potential source of bias for reproducibility [ 28 , 34 , 37 ]. This point was highlighted, for example, for forensic routine applications using miRNA, because biological stains from forensic casework are often altered by ambient moisture and temperature, UV light, suboptimal environmental pH, which all have the potential to degrade the miRNA beyond usability [ 28 ]. In this setting, Patel et al. demonstrated that Oragene•RNA solution could preserve and stabilize RNA collected from saliva to produce high yields of good quality RNA for subsequent downstream applications and/or analyses [ 34 ]. The authors reported that the RNA yield remained fairly constant between matched samples from each donor when stored for 48 h at room temperature [ 34 ]. In addition, they explored the differences in the total RNA yield from donors over a 3-day period, but also sought to examine the potential differences in expression between commonly used mRNA and miRNA endogenous controls [ 34 ]. Although the total RNA from each donor varied over the days, probably due to bacterial RNA, the abundance of the mammalian RNA normalizers (snU6 small RNA, 18S rRNA, GAPDH mRNA and let-7b miRNA) remained stable [ 34 ]. In the current study, 200 saliva samples were collected according to the manufacturer’s guidelines (Oragene) and stored at room temperature prior to shipping and analysis. Interestingly, we found that the quantification of filtered reads and identification of miRNAs yielded ~190 M sequences to be mapped to 2561 known miRNAs. The total reads ranged from 13 to 39 million with a mean of 23 million. Among these, the miRNA reads ranged from 272,322 to 6 million with a mean of 949,893 ( Figure 2 A,B). These results are in concordance with previous reports demonstrating that the salivary transcriptome is abundant and stable, consisting of thousands of mRNAs and miRNAs [ 6 , 9 , 10 , 28 , 34 , 38 ]. In this setting, Courts et al. also confirmed that miRNAs are especially relevant because they are stable and easy to collect and analyze, and validated their use in standard forensic medicine [ 28 ]. Using the Oragene•RNA kit, we demonstrated (i) the stability and consistency of miRNA reads for the 200 samples whatever the conditions of sampling and transport, (ii) the reproducibility and efficiency of such techniques since all the 200 samples were usable for sequencing, and (iii) a routine bioinformatics approach. In the present study, diagnostic accuracies according to the F1-score, sensitivity, specificity and AUC ranged from 11–86.8%, 5.8–97.4%, 10.6–100%, and 39.3–69.2%, respectively. In addition, we identified 30 miRNAs up- and downregulated with a high heterogeneity in terms of accuracy. Although the use of saliva for miRNA identification could be a potential non-invasive solution to overcome current barriers to the diagnosis of endometriosis, the critical step is the transition from expression data to candidate selection, which is always somewhat arbitrary. In this setting, salivary miRNAs have been reported to be of great interest as diagnostic biomarkers especially in cancer [ 6 , 10 , 32 ]. However, as there are no fixed rules about which criteria to apply to select a miRNA candidate, we developed a bioinformatics approach for miRNA accuracy: among the 2561 miRNAs identified, 30 were up- or downregulated, underpinning the use of new mathematical methods and artificial intelligence to overcome the limits of classic logistic regression. Indeed, in agreement with Lopez-Rincon et al., it is illusory to imagine that a few mi-RNAs could reflect the heterogeneity of a multifactorial disorder such as endometriosis, characterized by various phenotypes and for which the various pathways implicated in its genesis are poorly understood [ 7 , 15 , 39 ]. We thus used a new statistical tool, machine learning, to overcome the accuracy limitations and design a potential diagnostic signature [ 7 , 9 , 10 , 15 , 30 ]. In the present study, we analyzed 200 plasma samples for miRNA expression and diagnostic accuracy. However, there are several unsolved issues that might hinder the broad acceptance of a miRNA-based signature. The miRNAome, perhaps even more than the transcriptome, is highly context dependent, and it is conceivable that certain non- physiologic or pathologic conditions might alter the expression levels of miRNAs for body-fluid identification. It will therefore be necessary to test whether the expression of candidate miRNAs for body-fluid identification are influenced by biologic processes or conditions such as the menstrual cycle phase or previous hormonal treatment [ 12 , 40 ]. In this setting, Vanhie et al. and Moustafa al. reported no impact on miRNA expression according either to hormonal treatment or menstrual cycle phases in contrast to data obtained from endometrial biopsies [ 12 , 40 ]. This apparent discrepancy could be linked to the modalities of miRNA release into bodily fluids that could vary depending on the organ and the tumor. In the ENDOmiARN study, two different body fluids were assessed: serum and saliva. This choice was mostly driven by the need for stability in the miRNAs detected to provide a reliable diagnostic tool. Indeed, while several studies have observed differences in miRNA expression in tissues according to the menstrual phase, mainly at endometrial level [ 41 , 42 ], no such cyclic differences were observed in the plasma of healthy women [ 43 ]. One hypothesis is that changes in miRNA expression at the endometrium level regulate gene expression locally but are insufficient to cause detectable systemic changes [ 3 ]. The other reason to opt for saliva was its easy availability, including in a home self-sampling setting and including virgin patients not examined during gynecological appointments. Another issue is the variations of miRNA expression analysis according to the next-generation sequencing (NGS) technique used. Indeed, several different methods and devices for miRNA extraction, reverse transcription and quantification from NGS to microarray analysis have been advocated leading to differences in results [ 13 , 31 , 34 ]. However, as underlined by Agrawal et al., we believe that the standardized NGS procedure we describe here is optimal for endometriosis since it is currently the gold standard approach for profiling nucleic acid, including miRNAs [ 3 ]. In addition, miRNAs are just one of several classes of small, ncRNAs with regulatory functions, and there is no reason to exclude these RNAs from endometriosis analyses. Therefore, miRNA analysis may represent an interim strategy until more is known about other small RNAs, and once a comprehensive small-RNA analysis is available, it is likely to replace miRNA only analysis [ 44 ]. Eventually, our results require external validation supporting temporal and geographic validation of for mi RNA quantification and sequencing reproducibility; that is the goal of an ongoing study [ 45 ].

Conclusions

Endometriosis affects about 190 million women worldwide, representing a healthcare burden equivalent to diabetes [ 46 ]. Endometriosis is a representative example of a multifactorial and not completely understood, chronic disease. To understand the various signaling pathways involved in this complex disease, analysis of the entire miRNome currently available is mandatory. This report describes the whole saliva transcriptome to make miRNA quantification a validated, standardized, and reliable technique for routine use. The methodology could be applied to build a saliva signature of endometriosis and to solve other issues of this debilitating disorder—various clinical phenotypes, infertility-associated endometriosis—as well as to evaluate the potential theragnostic value of miRNA expression. Finally, beyond endometriosis, our methodology can be applied to other chronic diseases with the goal of developing a noninvasive, quick and reliable tool to improve diagnosis, management and to select patients according to therapeutic medical and/or surgical response.

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

Ask this paper AI returns verbatim quotes from the full text · source: pmc-nxml

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Condition tags

endometriosis

MeSH descriptors

Endometriosis Endometriosis Endometriosis Endometriosis Endometriosis Endometriosis Endometriosis Endometriosis Endometriosis Endometriosis Endometriosis Endometriosis Endometriosis Endometriosis Endometriosis Endometriosis Endometriosis Endometriosis Endometriosis MicroRNAs

Citation neighborhood

Papers in the corpus that this work cites (lower rings, blue) and that cite this one (upper rings, green). Dot size scales with the paper's in-corpus citation count — bigger dot = more influential within the endo/adeno field. Click a dot to open that paper. [ expand to 2 hops ] — adds papers reached through this work's immediate citers/citees. Heavier; up to 60 extra dots.

References (47)

Cited by (11)

Source provenance

europepmc
last seen: 2026-06-22T06:15:23.361955+00:00
openalex
last seen: 2026-06-10T17:14:06.276822+00:00
pubmed
last seen: 2026-06-22T06:15:13.277551+00:00
License: CC0 · commercial use OK