Methods
This study was approved by the Central Health and Disability Ethics Committee (HDEC; 2022 EXP 12616), received endorsement from the Research Advisory Group—Māori (RAG‐M) at Te Whatu Ora—Capital and Coast (#937) and local approval from the Te Whatu Ora—Capital and Coast research and audit committee.
Participants were recruited between June 2022 and December 2024. Eligible participants were having surgery for suspected endometriosis at Te Whatu Ora—Capital and Coast (Wellington and Kenepuru Hospitals) and aged between 16 and 45 years. People who had current known cervical abnormalities or were menstruating were excluded.
Low vaginal swabs were taken by the operating surgeon after the administration of anaesthesia, prior to surgical preparation. Vaginal swabs were collected first by inserting a FLOQ swab (Copan Diagnostics, #5E089N) into the vagina and turning for approximately 30 s, then placed into a tube with 1 mL of phosphate buffered saline (PBS; Thermo Fisher,#70011044).
Samples were visually assessed, and samples with any blood contamination were excluded. This included any samples with blood visible on the collection device, or if the sample fluid was orange or red. The FLOQ swab was removed, and the sample was briefly vortexed and aliquoted in 200 µL volumes before storing at −80°C. Batched cervicovaginal fluid samples were transported on ice to the Centre for Protein Research at the University of Otago, Dunedin, New Zealand.
To represent the real‐world heterogeneity in which a diagnostic biomarker would be applied clinically, there were no exclusion criteria relating to menstrual phase. Menstrual cycle phase was estimated based on participant reported data collected via survey on the day of surgery. Participants were assigned to one of the following groups: combined oral contraceptives (COP), progestin medication, gonadotropin‐releasing hormone (GnRH) analogue, proliferative phase, secretory phase, irregular periods, or unknown. Irregular periods were defined as being ± 20 days from the expected date. Participants expecting their period in less than 14 days were grouped into the secretory phase. Participants expecting their period in 14 days or longer were grouped into the proliferative phase. Those who did not report the date of their last menstrual period, or reported they had no periods in the last three months and were not taking hormonal medications, were grouped into the unknown group. If endometrial curettings were taken during the surgery, the phase listed in the histological report was used instead of the estimated phase.
For participants with endometriosis at laparoscopy, disease extent was assessed by the revised American Society for Reproductive Medicine (rASRM) scoring system [ 17 ]. Two (2/20) endometriosis samples were not staged at surgery and were retrospectively staged by SS. Participants were classified as not having endometriosis and allocated to the control group when all biopsies were histologically negative for endometriosis, or when there was no visible evidence of endometriosis and no biopsy.
A total of 39 samples, 20 from people with endometriosis and 19 from people without endometriosis were prepared for SWATH‐MS analysis. To enable the assessment of the reproducibility of dysregulated protein signatures, sample preparation and SWATH‐MS was performed over two separate experimental runs (experiment one: endometriosis n = 11, no endometriosis n = 9, experiment two: endometriosis n = 9, no endometriosis n = 10).
Proteins were extracted and processed using the S‐Trap micro spin columns (Protifi, #C02‐micro) according to manufacturer's instructions. Each 200 µL aliquot of vaginal swab cervicovaginal fluid was lysed with 200 µL of lysis buffer (100 mM tri‐ethyl ammonium bicarbonate [TEAB] and 10% sodium dodecyl sulphate [SDS] in water) and sonication. Magnesium chloride (MgCl 2 ) was added to yield a final concentration of 2 mM, followed by the addition of 200 units of benzonase. Samples were then incubated at 37°C for 10 min, followed by centrifugation for 15 min at 30,000 x g . The supernatant was retained.
To estimate the total protein in each sample, a BCA assay was performed according to manufacturer's instructions (Thermo Fisher Scientific, #23225).
Samples were reduced and alkylated by adding tris(2‐carboxyethyl) phosphine (TCEP) and iodoacetamide to a final concentration of 5 and 10 mM, respectively and incubating for 10 min at room temperature in the dark. A sample volume containing 100 µg of protein was then processed using the S‐trap micro protocol for tryptic digestion. The digested protein samples were further purified by solid phase extraction on centrifugal peptide desalting columns (Thermo Scientific) following the manufacturer's instructions and dried in a centrifugal vacuum concentrator. For mass spectrometry analysis samples were reconstituted in 5% acetonitrile (ACN), 0.05% formic acid (FA) in water at a concentration corresponding to 1.5 µg of digested protein per µl sample. For each experiment an aliquot from each sample was pooled into a reference sample to (i) generate quality control (QC) samples to monitor instrument performance over the experiments and (ii) perform a peptide pre‐fractionation of a representative pooled sample to generate a high‐quality spectral library by data‐dependent acquisition (DDA) mass spectrometry. The pooled peptide samples from each experiment were pre‐fractionated into 11 fractions using the high pH reversed‐phase peptide fractionation kit (Thermo Scientific) according to the manufacturer's instructions.
Samples were analysed on a 5600 + triple time‐of‐flight (TOF) mass spectrometer coupled to an Ekspert NanoLC 415 system (Eksigent, AB Sciex). A volume of 5µl of peptide sample was separated on a 75 µm ID and 20 cm long silica emitter tip column that was in‐house packed with Luna (Phenomenex) C18 bead material (3.2 µm, 100Å) through an ACN gradient from 5% ACN, 0.05% FA in water to 90% ACN, 0.05% FA in water over a total gradient length of 115 min.
For protein identification and generating a spectral library, each of the high pH reversed phase fractions was analysed by data‐dependent acquisition with one MS survey scan in the mass range of 400–1300 m/z followed by 25 data‐dependent high energy collision‐induced MS/MS spectra per cycle. Each fraction was analysed in two technical replicates. For protein quantification individual samples were analysed in technical triplicates by SWATH‐MS with the mass spectrometer operated in data‐independent mode. Each DIA cycle consisted of one MS survey scan followed by HCD MS/MS scans of 34 mass windows over the mass range of 400–1250 m/z .
Protein identification was performed by searching the DDA raw data against the complete UniProt Human amino acid sequence database (204294 sequence entries; downloaded 20‐11‐2023) using ProteinPilot software version 4.5 (AB Sciex). The spectral library was built using the SWATH application (version 2.0) embedded into PeakView software (version 2.2, AB Sciex). The SWATH‐MS raw data of individual samples was aligned to the spectral library at a false discovery rate of 1% and confidence ≥99% for peak identification in at least one sample. Aligned peak areas of at least 6 fragment ions per peptide signal were exported to MarkerView software (version 1.2, AB Sciex) for further data processing and basic statistics such as unsupervised multivariate analysis using principal component analysis (PCA). For peak area normalisation, high abundance plasma and structural proteins (including keratins and haemoglobins) were removed to improve normalisation and quantification of lower‐abundance proteins of potential biological relevance. The proteins removed can be found in (Table S1 ). The resulting data was then scaled using the median peak intensity of all the samples in the experiment.
The median peak value of each sample triplicate was used for analysis. Intra‐assay coefficients of variance (CV%s) were calculated for QC samples for each protein across technical replicates. For each protein, the median values for each study group were averaged, followed by calculating the fold change between the endometriosis and no endometriosis groups. Two‐tailed unpaired t‐tests were performed on peak values. Fold change and p values were log transformed. As this was an exploratory study, a low‐stringency threshold was applied without corrections for multiple testing. Proteins were considered differentially regulated when the median peak values in the endometriosis group were 1.5‐fold different from the no endometriosis group (log2(fold change) ≥0.58 and ≤‐0.58), with p < 0.05 (‐log(10) p ≤1.3). The list of differentially regulated proteins from the two experiments were analysed separately for all subsequent analyses.
The STRING database tool (Version 12.0; [ 18 ]) was used to provide insight into the functional associations of the differentially regulated proteins. The minimum required interaction score was set at 0.4 (medium confidence). Markov clustering (MCL) with a 1.5 inflation parameter was performed to identify groups of interacting proteins. Functional enrichments were significant when the false discovery rate (FDR) 0.01 with a minimum of two proteins in the network.
Data preprocessing and visualisation of supervised multivariate analysis was performed in R (version 4.4.1) and R Studio (version 2024.04.2). Data was log‐transformed and scaled. Orthogonal‐partial least squares discriminant analysis (OPLS‐DA) was performed using the R package metabom8 (version 1.0.0, available from github.com/tkimhofer/metabom8, accessed July 11, 2024). The variables extracted and plotted were predicted weights (variables influence on projection [VIP]), predictive component (t_pred_cv) and orthogonal component (t_orth_cv). A VIP score of 1.0 was considered significant [ 19 ].
Ingenuity pathway analysis (IPA, version 111725566) was performed to identify upstream molecules that may be responsible for the observed differences in protein abundances in each dataset. For differentially regulated proteins, the log fold change (log2(fold change)) and log p ‐value (‐log(10)p) were input into the IPA core analysis tool. The core analysis from both datasets were input into the comparison analyses tool to identify results common to both datasets. A z ‐score of >1 or ←1 and p value of <0.05 were requirements for significance. Where applicable, p values and z ‐scores adjusted for multiple comparisons or bias were used.
Enzyme‐linked immunosorbent assays (ELISAs) were performed on cervicovaginal fluid samples to investigate the protein levels of human LGMN (Invitrogen, #EH299RB), NAMPT/Visfatin (Invitrogen, #EH482RB), OLFM4 (Abcam, #ab267805) and SPARCL1 (Abcam, #ab272478). LGMN, SPARCL1, NAMPT, and OLFM4 were chosen based on abundance, interaction with other proteins and predicted molecular functions, in addition to availability of commercial ELISA kits.
Five of the 28 samples included in the ELISA experiments had an ASRM stage reported by the operating surgeon with the remaining 23 samples retrospectively staged by SS. Cervicovaginal fluid sample aliquots were thawed on ice, pooled, then centrifuged at 2,000 x g for 10 min at 4°C. The supernatant was retained and the pellet discarded.
To estimate the total protein in each sample, a BCA assay was performed as previously described. ELISAs were then performed according to manufacturer's instructions. All incubations were performed at room temperature with shaking.
LGMN and NAMPT was detected through solid‐phase sandwich ELISA. LGMN standards ranged from 28.67 to 7,000 pg/mL and NAMPT standards ranged from 1.229 to 300 ng/mL. Cervicovaginal fluid swab supernatant were diluted 1:1 in assay diluent, plated in duplicate and incubated for 2.5 h prior to biotin conjugate, streptavidin‐HRP and TMB steps. OLFM4 and SPARCL1 were detected through quantitative sandwich ELISA. OLFM4 standards ranged from 75,000 to 1,172 pg/mL. SPARCL1 standards ranged from 39.06 to 2,500 pg/mL. To each well, 50 µL of cervicovaginal fluid supernatant, standard and an assay diluent blank were plated in duplicate, followed by the kit antibody cocktail and TMB solution.
All ELISA absorbances were read at 450 nm on a Multiskan microplate spectrophotometer (Thermo Scientific, #51119000). Four parameter logistic curves (4PL) were used to interpolate the unknown sample concentrations. When the absorbance exceeded the standard curve, the concentration of the first standard was assigned. When the absorbance was below the standard curve, the lower limit of detection of each assay was assigned. The interpolated concentrations for NAMPT and LGMN were then adjusted for dilution factor. The amount of LGMN, NAMPT, OLFM4, and SPARCL1 per µg of protein in the sample was calculated using the total protein obtained from the BCA assay.
The thresholds for a replacement test were set at 0.94 sensitivity and a 0.79 specificity [ 9 ]. A triage test ruling out endometriosis (SnOUT) required 0.95 sensitivity and 0.50 specificity, while a rule‐in test (SpIN) required the inverse thresholds.
First, univariate analysis was performed to evaluate the predictive potential of each individual protein. The pROC package in R was used to generate receiver operator characteristic (ROC) curves for the concentrations and normalised concentrations of LGMN, NAMPT, OLFM4 and SPARCL1. The optimal cut‐off threshold was identified using the Youden index, and sensitivity and specificity values were extracted from the ROC curve. Post‐hoc analysis examined normalised LGMN across the menstrual cycle using a Kruskal‐Wallis test.
To explore multivariate analysis with a simplified approach to the interpretation of biomarker patterns, a biomarker scoring system was developed [ 20 ]. Normalised LGMN concentrations, and the concentrations of NAMPT, OLFM4 and SPARCL1 were split into quartiles. For NAMPT and SPARCL1, scores were: 0 for the lower quartile, up to 3 for the upper quartile. As LGMN and OLFM4 showed opposite abundance directionality, scores were: 3 for the lower quartile, down to 0 for the upper quartile. Scores were summed to determine the biomarker score, with a theoretical minimum score of 0 and maximum score of 12. Biomarker scores between the endometriosis and no endometriosis group were compared using Fisher's exact test.
A multivariate logistic regression model was developed using a generalised linear model (GLM) in R studio to provide a more nuanced analysis of the predictive ability of the combined markers. Two models were produced, one using the concentrations of LGMN, NAMPT, OLFM4, and SPARCL1, and the other using the concentrations relative to total protein of the same four proteins. The AUC, sensitivity and specificity values were obtained as in the univariate analysis.
Results
SWATH‐MS was performed on 39 samples over two separate experiments, due to instrument availability. Experiment one comprised 11 endometriosis samples and 9 no endometriosis samples. Experiment two comprised 9 endometriosis and 10 no endometriosis samples. Of those who did not have endometriosis, almost half (9/19) had negative histological results, and the remaining ten participants had no areas suspicious of endometriosis (no histology).
No significant differences in clinical characteristics were found between the endometriosis and no endometriosis groups, whether analysed together as a whole (Table 1 ) or when the two experimental runs were compared separately (Table S2 ).
Characteristics of participant cohort in SWATH‐MS analysis.
Abberivation: GnRH—gonadotropin‐releasing hormone, MELAA—Middle Eastern, Latin American, African, rASRM—revised American Society for Reproductive Medicine.
* Unpaired t‐test.
† Fisher's exact test.
‡ Chi‐square.
§ Multiple endometriosis phenotypes per participant possible, sum does not equal n = 20.
There were 3,740 and 3,655 proteins identified of which 2,154 and 2,400 proteins were quantified in experiment one and two, respectively. A total of 1,741 proteins appeared in both lists of quantified proteins.
The intra‐assay CV% for the top 500 most abundant proteins was 21.9% for experiment one and 18.5% for experiment two, indicating acceptable technical reproducibility. The raw intensity data and CV% over the range of protein abundances are in Supplementary Figure 1 .
There were 29 proteins in experiment one and 47 proteins in experiment two that were identified as differentially regulated with a minimum fold change of 1.5 and p < 0.05. In experiment one, 12 proteins were upregulated and 17 were downregulated, while in experiment two 36 proteins had increased abundance and 11 had decreased abundance. There were no differentially regulated proteins that appeared in both protein lists. When the significance threshold was increased to p 1.5 with the same directionality. The top ten proteins according to p ‐value from each experiment are presented in Table 2 , with the full list of differentially regulated proteins in Table S3 .
Top ten differentially regulated vaginal swab proteins identified in experiment one and two, according to p ‐value.
PCA was peformed to evaluate the variance distribution and underlying structure of the datasets (Figure 1 ). In experiment one, PC1 accounted for 8.8% of the total variance and PC2 explained an additional 6.6%. Similarly, in experiment two, PC1 and PC2 explained 9.9% and 5.4% of the variance, respectively. The cumulative variance for the first two PCs was nearly identical across experiments, capturing 15.4% in experiment one and 15.3% in experiment two. All QC samples clustered in the cenre of all samples indicating consistent measurements throughout each experiment.
Principal component analysis of datasets from SWATH‐MS experiments one and two. The score plots show three technical replicates of each sample and the pooled quality control (QC) samples. In experiment one, principal component 1 (PC1) and PC2 explained 8.8% and 6.6% of the variance, respectively, with a cumulative variance of 15.4%. In experiment two, PC1 and PC2 explained 9.9% and 5.4%, respectively, with a cumulative variance of 15.3%. The plots show sample distribution based on these two principal components. QC—quality control, SWATH‐MS—sequential window acquisition of all theoretical mass spectra.
STRING functional protein association networks were created from the differentially regulated protein lists from each experiment. Neither network had significantly more interactions than expected in either experiment (experiment one: 7 observed, 9 expected, p = 0.77; experiment two: 18 observed, 12 expected, p = 0.055).
MCL analysis explored potential functional modules within the protein networks (Figure 2a,b ). In the network from experiment one, there were three clusters containing two or more proteins. These were related to the ribosome and ribonucleoprotein complex KEGG pathway and GO cellular components. There were six clusters identified in experiment two, with enriched GO and KEGG pathways for these clusters relating to the innate immune system, the complement cascade, and the Cul3‐RING ubiquitin ligase complex. The remaining nodes in experiment one and two were not enriched for any GO or KEGG pathways. Table 4 contains a tabulated summary of MCL analyses.
Bioinformatic analysis differentially regulated cervicovaginal fluid proteins identified in SWATH‐MS . ( a)–(b) STRING functional protein association network of differentially regulated proteins for experiment one (a) and experiment two (b). Proteins with a minimum of 1.5‐fold change and p < 0.05 were included, with the entire list of proteins identified in the respective experiment used as statistical background. Coloured nodes show groups of interacting proteins, identified through Markov clustering (MCL) with a 1.5 inflation parameter. Neither network contained significantly more interactions than expected ( p > 0.05). (c) Orthogonal‐partial least squares discriminant analysis (OPLS‐DA) modelling of experiment one, demonstrating the predictive and orthogonal components of the model. (d) The ten proteins with highest variable influence on projection (VIP) score, indicating the proteins with the most predictive weight in the model. (e)–(f) Predicted molecular and biological functions activated (e) and inhibited (f) by differentially regulated cervicovaginal fluid proteins. Pathways with a Benjamini–Hochberg corrected p value 1 (for activated pathways) or ←1 (for inhibited pathways) were considered significant.
A discriminant model was produced for experiment one, with a cross‐validated area under the receiver operator characteristic curve (CV_AUROC) of 0.79 suggesting that the model had a good ability to discriminate between the endometriosis and no endometriosis groups (Figure 2c ). The model had an R
2 of 0.05, indicating that 95% of the variance is explained by factors outside the model. The individual proteins that contributed the most to the separation of the groups are presented in Figure 2d . There were no significant components identified in the experiment two data (CV_AUROC = 0.59).
Canonical pathway analysis predicts pathways most likely to be affected by the change in protein expression. S100 signalling was inhibited in experiment one ( p = 0.013, Z‐score = ‐1.00), and ferroptosis signalling ( p = 1.2 × 10 −4 , Z ‐score = 2.00) and the complement cascade ( p = 1.4 × 10 −4 , Z ‐score = 2.00) were activated in experiment two. No significant canonical pathways were identified in both experiments.
Upstream regulator analysis identified molecules that may be responsible for the observed changes in protein abundance. Predicted activated upstream regulators were methylprednisolone in experiment one (adj p = 0.047, corrected Z ‐score = 1.22), and hepatocyte nuclear factor‐1 alpha (HNF1A) in experiment two (adj p = 0.021, corrected Z‐score = 1.07). Interleukin‐1 beta (IL1B) was predicted to be inhibited in experiment two (adj p = 0.021, corrected Z ‐score = ‐1.85).
Causal network analysis goes beyond identifying single upstream potential regulators, instead aiming to identify regulatory networks that may explain the observed changes in protein abundance. Master regulator analysis identifies key molecules that influence large sections of the regulatory network. CD55 was one of the top five master regulators in experiment one, and comparison analysis identified CD55 as a significant master regulator in experiment two also, although not in the top five (network bias‐corrected p ‐value = 0.043, z ‐score = 1.00). Comparison analysis also identified claudin 7 (CLDN7) and density regulated re‐initiation and release factor (DENR) as significant regulators with the same directionality in both experiments (CLDN7 experiment one: p = 0.039, z = ‐1.34, experiment two: p = 0.023, z = ‐1.13. DENR experiment one: p = 0.0001, z = ‐1.41, experiment two: p = 0.025, z = ‐1.00). The top five master regulators in each experiment by z‐score can be found in Table S5 presents the top five master regulators in each experiment by z‐score.
Biological and molecular function analysis identifies the processes that are likely to be impacted by the differential regulated proteins. There were no overlapping functions between experiments. Figure 2e,f presents the top five activated and inhibited functions identified in experiment one and two. In experiment two, there were only three activated biological functions meeting the Benjamini‐Hochberg corrected p value and bias‐corrected Z ‐score criteria. P values ranged from 0.020 to 0.047 for activated pathways and 0.012 to 0.029 for inhibited pathways.
Concentrations of LGMN, NAMPT, SPARCL1, and OLFM4 in cervicovaginal fluid were measured by ELISA. The two groups were unequal in size, comprising all remaining samples from individuals with no endometriosis ( n = 11), and a randomly selected subset of samples for the endometriosis cohort ( n = 28). The groups were similar for all reported parameters (Table S6 ). One participant in the no endometriosis cohort was included in the unknown menstrual phase group despite taking hormonal medications, as it was unrelated to endometriosis treatment (testosterone).
Three samples had insufficient volume remaining to perform the BCA assay to quantify total protein, so were excluded from the normalised protein analysis. LGMN relative to total protein was significantly elevated in people with endometriosis with a median concentration of 27.22 (15.93‐111.41) pg/µg protein compared to 6.61 (2.43‐19.18) pg/µg protein in the no endometriosis group (adjusted p = 0.012; Figure 3a ). The amount of the remaining proteins NAMPT, OLFM4 and SPARCL1 relative to total protein did not differ between groups (Figure 3a ). There were no significant differences in the concentrations of LGMN, SPARCL1, NAMPT, or OLFM4 between groups (Figure 2 ).
Cervicovaginal fluid proteins measured by ELISA. a) Cervicovaginal fluid LGMN, NAMPT, OLFM4, and SPARCL1 concentrations normalised to total protein (endometriosis n = 25, no endometriosis n = 11). (b) Receiver operator characteristic (ROC) curves of normalised cervicovaginal fluid protein concentrations. (c) A biomarker score system was developed by dividing the normalised concentration of LGMN and concentrations of NAMPT, OLFM4, and SPARCL1 into quartiles, then assigning a score between 0–3. Scores for individual proteins were summed with a theoretical maximum of 12. (d) ROC curves of multivariate logistic regression models of the concentrations and normalised concentrations of LGMN, NAMPT, OLFM4 and SPARCL1. Reference line (random classification) in gray. Data is presented as the median and interquartile range. * Adjusted p < 0.05, assessed by Mann–Whitney tests with Holm–Šídák correction for multiple comparisons (a) or Fisher's exact test (c). ELISA—enzyme‐linked immunosorbent assay, LGMN—legumain, NAMPT—nicotinamide phosphoribosyltransferase, OLFM4 – olfactomedin 4, SPARCL1 – SPARC‐like protein 1.
LGMN relative to total protein had the highest AUC of 0.826 (95% CI 0.640‐1.000, p = 0.0021; Figure 3b ). When applying a 11.47 pg/µg total protein cut‐off, the sensitivity was 0.960 and specificity was 0.727. Post‐hoc analysis determined that LGMN concentrations did not differ by approximate menstrual phase (Figure S3 ). The AUC and the sensitivity and specificity at cut‐offs determined by Youden's index for each protein are in Table S7 .
To assess whether combining markers could provide greater discriminatory power than individual proteins, multivariate analysis was performed. Firstly, a biomarker score was derived from the quartile concentrations of the four ELISA proteins. The biomarker scores were significantly lower in the endometriosis group compared to controls ( p = 0.048; Figure 3c ), however the AUC was 0.699 (Table S7 ). Next, logistic regression models were produced to determine whether increasing the complexity and nuance of the model improved discriminatory power. The models showed similar performances, with AUCs of 0.735 and 0.731 for concentration and concentration relative to total protein, respectively (Figure 3d ). Neither multivariate model met the pre‐defined sensitivity and specificity thresholds.
Section
Password: yznvPVAALnwF
Reviewer
Log in to the PRIDE website using the following details:
Conclusion
This study performed two proteomics experiments on cervicovaginal fluid samples, finding dysregulated cervicovaginal fluid proteins do not show a consistent signature in endometriosis. Bioinformatic analysis predicted the disruption of pathways previously linked to endometriosis, indicating systematic inflammation and altered immune function observed in endometriosis extends to the cervicovaginal environment that warrant further research. Cervicovaginal fluid normalised LGMN concentrations discriminated between endometriosis and controls with accuracy suitable for a clinical triage test, but a much larger cohort would be required to confirm the promise of cervicovaginal fluid LGMN as a candidate biomarker of endometriosis. Overall, these findings suggest that cervicovaginal fluid proteins sampled via vaginal swab may have limited biomarker potential for endometriosis, highlighting areas for additional investigation.
Discussion
In this study, data‐independent proteomic analysis across two experiments were performed on cervicovaginal fluid sampled with vaginal swabs. There was no common signature of protein dysregulation across the two experiments, and bioinformatic analysis showed equally discordant results. Cervicovaginal fluid proteins were measured using ELISA, identifying normalised LGMN concentration as significantly elevated in the endometriosis group.
The two separate proteomics runs aimed to provide a form of validation for each experiment's findings. However, each experiment identified distinct sets of differentially abundant proteins and at p<0.05 there were none common to both experiments. Unsupervised (PCA) and supervised (OPLS‐DA) analysis revealed the data had complex structure rather than one clear source of variation, with individual proteins contributing minimally to group separation. These findings align with the perspective that individual markers are unlikely to effectively discriminate between endometriosis and non‐endometriosis cases [ 21 , 22 ]. Furthermore, validating biomarkers in external cohorts remains elusive.
Predicted dysregulated pathways in endometriosis also showed little consistency between the two experiments. Importantly, lack of reproducibility for both individual proteins and pathways may suggest that cervicovaginal fluid does not exhibit consistent changes associated with endometriosis. Furthermore, none of the proteins identified in this study overlap with results from the one other study to perform proteomic analysis of cervicovaginal fluid for endometriosis biomarkers [ 15 ]. However, there were notable methodological differences compared to the current study, such as more inclusion criteria for the endometriosis group, use of an asymptomatic control group, different approaches to sampling cervicovaginal fluids and data‐dependent proteomic analysis. What remains unclear is whether this inconsistency stems from the heterogeneity of endometriosis coupled with small cohort sizes, dysregulation patterns being masked by menstrual cyclicity, or if there are genuinely limited changes associated with endometriosis detectable in cervicovaginal fluid.
As the data displayed a complex structure in the SWATH‐MS results, multivariate analysis was performed to assess whether coordinated changes together were more predictive. While the biomarker score was significantly different between the two groups, neither this score nor the multivariate logistic regression models outperformed LGMN as a single marker.
LGMN demonstrated strong individual biomarker performance in the ELISA validation. Identified in the OPLS‐DA model as one of the top proteins contributing to group separation, LGMN concentrations were significantly increased in individuals with endometriosis and sensitivity (0.960) and specificity (0.727) values meeting the thresholds for a clinical useful rule‐out test [ 9 ]. Improvement of diagnostic accuracy may be achieved through incorporating other cervicovaginal fluid markers outside those investigated in this study. Interestingly, elevated legumain pseudogene 1 (LGMNP1), which protects LGMN expression by reducing miR‐495‐3p‐mediated silencing [ 23 ], in serum extracellular vesicles has also been reported as a candidate marker of predicting endometriosis recurrence [ 24 ]. Given the lack of validation observed often in endometriosis biomarker studies and in the current study, a larger and more diverse cohort would be required to confirm the promise of cervicovaginal fluid LGMN as a candidate biomarker of endometriosis. Investigating cervicovaginal LGMN in the context of recurrence may also be important.
LGMN is an asparaginyl‐specific cysteine endopeptidase located within lysosomes and expressed primarily by macrophages, particularly M2‐like macrophages [ 25 , 26 ]. LGMN secreted by M2‐like macrophages contributes to extracellular matrix degradation through the activation of MMP2 and MMP9 [ 25 ] and in vitro studies suggest LGMN promotes angiogenesis [ 26 ]. Macrophages are the dominant innate immune cell in the lower genital tract [ 27 ] and are generally associated with an M2‐like phenotype in endometriosis [ 28 , 29 ]. This provides a plausible explanation for the observed increase in cervicovaginal fluid LGMN. However, these hypotheses remain speculative as limited research has explored LGMN in the context of endometriosis, and macrophage polarisation in the vagina and cervix is poorly characterised [ 30 ]. Beyond potential utility as a biomarker, increased LGMN levels in endometriosis may reflect altered macrophage phenotypes in cervicovaginal fluid, warranting further investigation.
Bioinformatic analysis predicted three pathways most likely to be affected by the observed differences in protein abundances that may warrant further investigation in the context of endometriosis pathophysiology. Ferroptosis, a regulated form of iron‐dependent cell death, was suggested to be activated based on protein abundance changes, contrasting previous reports of inhibition in endometriotic tissue [ 31 , 32 ]. Iron‐induced ferroptosis may contribute to fibrosis and inflammation via increasing proportions of myofibroblasts, [ 31 ], congruent with the high iron levels observed in the peritoneal cavity [ 32 ]. Endometriosis‐associated dysregulation of S100 signalling pathways and complement proteins has been previously reported in both menstrual blood [ 33 ] and cervical mucus [ 15 ]. Our study validates these findings, suggesting the systemic inflammation and immune dysregulation in endometriosis may extend to the cervicovaginal environment. Dysregulated complement activity has been linked with adverse reproductive outcomes [ 34 ], suggesting a potential mechanistic link to endometriosis‐associated subfertility. Therefore, this study has identified novel pathways of endometriosis pathophysiology for further research including investigation into targeted therapy.
This study had several limitations. P values were not adjusted for multiple comparisons in the SWATH‐MS analysis, increasing the likelihood of false positives. However, for the bioinformatic analysis, pathway enrichment requires sets of dysregulated proteins, making entire false‐positive pathways unlikely. The two‐step study design enabled the comparison of trends identified under these lower stringency thresholds across both experimental datasets, anticipating the emergence of modest yet pathophysiologically relevant changes. However, few conserved alterations were observed, suggesting limited consistent proteomic changes in cervicovaginal fluid in relation to endometriosis. High‐abundance keratin, haemoglobin and plasma proteins and keratins were excluded to enhance assessment of lower‐abundance proteins. We opted for removal during data analysis as the presence of these proteins were not consistent between samples, and depletion procedures during sample preparation targeting high abundance plasma proteins may have changed the protein complement of the original samples unpredictably. Subsequently, if these proteins were informative these insights will have been missed.
Additional limitations include the lack of screening for vaginal infections or recent sexual activity, and that menstrual cycle phase was not controlled for. While no significant differences in LGMN levels were observed across the cycle, confidence in this finding is constrained by small subgroup sizes and reliance on self‐reported information for menstrual phase estimation. Because this study aimed to evaluate biomarkers in conditions reflective of their intended clinical use, our menstrual cycle estimates were a pragmatic balance between accuracy and clinical feasibility. Finally, the ELISA kits used were not validated by the manufacturers for cervicovaginal fluid, which may have affected assay performance and result reproducibility.
To maintain the translatability of this pre‐clinical research, we prioritised a participant cohort representative of the populations the test is clinically intended for, by applying limited exclusion criteria and including control participants that were pre‐operatively suspected to have endometriosis. Consequently, the cohorts were highly heterogenous regarding menstrual cycle phase, hormonal medication use, concurrent pathologies and disease phenotype which may have affected our power to detect differences between groups. We believe this ultimately prevents wasting resources attempting to validate biomarkers that do not perform accurately outside highly selected participant cohorts.
The lack of concordance between the two proteomics runs was a notable result. These findings suggest that cervicovaginal fluid proteins sampled via vaginal swab may have limited biomarker potential for endometriosis. Given the lack of reproducible and validated markers despite thousands of reported candidates [ 10 ], this result may be unsurprising—reporting these negative findings is important to prevent redundant work in this space. Exploring subtype specific diagnostic markers or incorporating symptom‐based phenotyping in studies with larger cohorts presents an opportunity to reduce heterogeneity, with biomarkers panels showing improved performance at endometriosis disease extremes [ 35 ]. However, non‐invasive imaging techniques perform well for more severe endometriosis subtypes (endometrioma and DIE) [ 4 ]. Thus, diagnostic endometriosis biomarkers with poorer performance in diagnosing superficial endometriosis may have limited clinical utility. Future patient stratification enabled by deeper understanding of endometriosis molecular biology, may enhance biomarker discovery efforts.
Introduction
Endometriosis is an inflammatory condition defined by the presence of endometrial‐like tissue outside the uterus, commonly cited to affect one in ten people assigned female at birth [ 1 , 2 ]. Endometriosis is renowned for an extensive diagnostic delay ranging internationally from 3.3 years up to over 10 years [ 3 ]. Transvaginal ultrasound and magnetic resonance imaging (MRI) can detect endometriomas and deep infiltrating endometriosis (DIE) with high sensitivity and specificity when performed by experienced operators [ 4 ]. However, accuracy is highly operator dependent, creating inconsistencies in diagnostic reliability. Subsequently, despite laparoscopy no longer being required for an endometriosis diagnosis, surgery remains the most common method of diagnosis endometriosis globally [ 5 ]. Negative laparoscopy, where no evidence of endometriosis or alternative pelvic pathology is found, is reported in up to 42% of cases with a preoperative diagnosis of suspected endometriosis [ 6 ].
An effective biomarker could significantly transform the endometriosis diagnostic landscape, reducing delays in accessing specialist care and eliminating the reliance on expensive and invasive procedures to confirm endometriosis. Furthermore, a diagnostic biomarker may significantly reduce the number of unnecessary surgeries, minimising patient harm and optimising resource allocation. Extensive research has been conducted investigating endometriosis biomarkers [ 7 , 8 , 9 , 10 ] and remains a priority for both clinicians and patients [ 11 , 12 ], yet, no single or combination of biomarkers has been validated for clinical use; although several are commercially available [ 13 ].
Most biomarker research to date has investigated either accessible biofluids like peripheral blood, or more biologically relevant sources such as endometrial tissue or peritoneal fluid. A recent systematic review by Brulport et al. demonstrated inconsistent performance of candidate markers across studies [ 10 ]. Other accessible biofluids in proximity to the uterus such as cervical mucus and cervicovaginal fluid remain relatively understudied [ 10 ]. Cervicovaginal fluid, comprised of cervical, vaginal and uterine secretions [ 14 ], offers notable advantages as a non‐invasive and easily accessed biofluid for biomarker research. The proximity to the reproductive tract suggests cervicovaginal fluid may provide valuable insights into localised inflammatory and molecular processes associated with endometriosis. Previously, using cervical mucus aspirated via catheter, Grande et al. [ 15 ] identified 15 dysregulated proteins between people with endometriosis and fertile women, demonstrating the potential of cervicovaginal fluid biomarkers [ 15 ]. The catheter‐based approach may be more suitable for identifying proteins specific to cervical mucus; however, the sampling procedure can cause discomfort, reducing accessibility [ 16 ]. A clinically standard sampling procedure, such as vaginal swabs which can be self‐sampled, would be ideal to ensure the accessibility and clinical translation of candidate biomarkers.
In this study, we aimed to characterise protein differences in the cervicovaginal fluid sampled by vaginal swab between people with and without endometriosis in a pilot case‐control study.
Coi Statement
The authors declare no conflicts of interest.
Supplementary Material
Supporting File: prca70044‐sup‐0001‐SuppMat.docx.
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.