Results
To identify key genes influencing OCCC, we utilized summary statistics from OCCC-related samples, obtaining the outcome ID: ieu-a-1124. The extract_instruments and extract_outcome_data functions were sequentially employed to read the exposure factors and outcome data. Subsequently, we performed MR analysis to further screen the causal relationships of 295 pairs of eQTL-related genes (Fig. 2 , IVW p-value < 0.05). The analysis revealed 154 important genes, including F5, NUDT2, CRISPLD2, CCR3, DCAF4, MEG3, MPHOSPH6, TMED6, and ARAP3, which may be associated with a high risk of OCCC. Conversely, 141 genes including ASH2LP1, GNAL, CD2BP2, HEY2, PARP3, CD38, HCST, DNAH2, PPP1R14A, and PTGS2 were associated with a lower risk of OCCC. To further validate the robustness of the causal relationships identified for the 295 eQTL-related genes, sensitivity analysis was performed. The results demonstrated that the elimination of any single SNP had minimal impact on the overall error margins, indicating that the selected 295 causal relationships were robust and reliable.
Fig. 2 Volcano plot of genes
Volcano plot of genes
To investigate the causal relationships between pQTL and positive outcomes, we conducted MR analysis followed by co-localization analysis. The MR analysis utilized genetic variants as instrumental variables to assess the causal effects of pQTL on the outcomes of interest. Subsequently, co-localization analysis was performed to determine whether the pQTL and the outcomes shared the same causal genetic variant within the same genomic region. Further MR analysis was applied to screen the causal relationships of three genes corresponding to the pQTL-positive outcomes (Fig. 3 , IVW p-value < 0.05). Among these, the genes PPP1R14A (OR = 0.717; 95% CI = 0.518–0.991; P = 0.044) and PTGS2 (OR = 0.520; 95% CI = 0.304–0.890; P = 0.017) were associated with a reduced risk of OCCC, while the gene CBR3 (OR = 1.223; 95% CI = 1.005–1.489; P = 0.045) was associated with an increased risk of OCCC. To evaluate the robustness of these causal relationships, sensitivity analysis was performed. The results demonstrated that the exclusion of any single SNP had minimal impact on the overall error margins, indicating that the three causal relationships were robust and reliable (Fig. 4 ). Additionally, coloc analysis was conducted for these three genes at the eQTL-GWAS level. The genes PPP1R14A and PTGS2 exhibited posterior probabilities exceeding 0.95 (Fig. 5 ), suggesting strong evidence of shared causal variants. These two genes were identified as key candidates for subsequent functional and mechanistic analyses.
Fig. 3 PQTL Mendelian randomization
PQTL Mendelian randomization
Fig. 4 Sensitivity analyses of Mendelian randomization results for pQTL
Sensitivity analyses of Mendelian randomization results for pQTL
Fig. 5 Co-localization analysis of eQTL and GWAS signals for key genes
Co-localization analysis of eQTL and GWAS signals for key genes
The co-localization analysis was performed to assess whether the eQTL and GWAS signals for the genes PPP1R14A, PTGS2, and CBR3 shared the same causal genetic variant within the same genomic region. The posterior probabilities (PP) for co-localization are displayed, with PPP1R14A and PTGS2 showing PP > 0.95, indicating strong evidence of shared causal variants.
Subsequently, we conducted pathway analysis utilizing the identified key genes. GO enrichment analysis showed main enrichment in biological processes including positive regulation of synaptic plasticity, nuclear outer membrane, and protein phosphatase inhibitor activity (Fig. 6 A). Additionally, KEGG pathway analysis indicated prominent enrichment in pathways such as Ovarian Steroidogenesis, VEGF Signaling Pathway, and IL-17 Signaling Pathway (Fig. 6 B). These findings underscore the potential biological relevance of the key genes in these specific pathways and processes.
Fig. 6 Key gene function enrichment analysis by GO and KEGG
Key gene function enrichment analysis by GO and KEGG
The single-cell RNA sequencing dataset ( GSE224334 ) was retrieved from the NCBI GEO repository. Following rigorous quality control, cells expressing fewer than 200 genes were excluded, retaining 48,777 high-quality cells for subsequent analysis. Quality assessment was visualized through violin plots and two-dimensional embeddings (Supplementary Fig. 1AB). Data preprocessing involved normalization, variance stabilization, and principal component analysis (PCA). Batch effects were mitigated using Harmony integration, followed by non-linear dimensionality reduction via UMAP (Supplementary Fig. 1C-F, Fig. 7 A). Cellular clusters were systematically annotated into four major lineages: epithelial cancer cells, cancer-associated fibroblasts (CAFs), immune cells, and endothelial cells, as evidenced by typical marker expression patterns (Fig. 7 B–D).
Fig. 7 Single cell data annotation. A Cell clustering after UMAP. B Four cell annotations. C Bubble chart of 4 classic cell markers. D Cell ratio bar graph corresponding to the sample
Single cell data annotation. A Cell clustering after UMAP. B Four cell annotations. C Bubble chart of 4 classic cell markers. D Cell ratio bar graph corresponding to the sample
Differential expression analysis revealed cell-type-specific enrichment patterns: PPP1R14A showed predominant expression in epithelial cancer cells (adjusted p < 0.001), while PTGS2 demonstrated significant upregulation in immune cell populations (Fig. 8 A, B). Spatial co-expression mapping identified synergistic interactions between these key genes and tumor progression markers (ATM, BRCA1/2, MET, TP53) across cellular compartments (Supplementary Fig. 2–3).
Pathway enrichment analysis using AUCell scoring revealed PPP1R14A’s effective involvement in xenobiotic metabolism and estrogen response late pathways, with differential activity patterns visualized through comparative pathway bubble charts (Fig. 8 C).
Fig. 8 Expression of key genes in different cell types and enrichment of functional pathways
Expression of key genes in different cell types and enrichment of functional pathways
Integrating the two key oncogenic drivers into transcriptional network analysis revealed shared regulatory machinery involving nine evolutionarily conserved transcription factors (TFs). Enrichment of these transcriptional regulators was demonstrated through cumulative distribution function (CDF) analysis. Motif-TF systematic annotation identified cisbp__M5759 as the effective regulatory element with maximal normalized enrichment score (NES = 8.81). All enriched cis-regulatory motifs and their cognate transcription factors were mapped (Fig. 9 A, B).
This study utilized DGIdb to analyze potential drug interactions with two key genes. Through DGIdb, 110 drugs were identified to interact with PTGS2, which may facilitate the development of novel therapeutic targets. The interaction network was visualized using Cytoscape (Fig. 9 C). Furthermore, the three-dimensional protein structures corresponding to the key genes were retrieved from the RCSB PDB database ( https://www.rcsb.org/ ). The selected protein-ligand pair comprised PTGS2 (PDB ID:5F19) and Cimicoxib. Molecular docking analysis revealed a binding energy of– 4.15 kcal/mol for the PTGS2:5F19-Cimicoxib complex (Fig. 9 D).
Fig. 9 Key genes-related transcriptional regulatory networks, drug screening, and molecular docking of potential compounds
Key genes-related transcriptional regulatory networks, drug screening, and molecular docking of potential compounds
Materials
Expression quantitative trait loci (eQTL) data were acquired from Phase II of the eQTLGen Consortium ( https://www.eqtlgen.org ), the largest cis-eQTL meta-analysis resource for peripheral blood transcriptomes [ 28 ]. This ongoing initiative integrates genotype-expression profiles from 31,684 individuals across 37 cohort studies, employing standardized RNA sequencing protocols. The druggable gene screening framework was adapted from a prior study that connected complex disease- and biomarker-associated loci from genome-wide association studies to an updated set of genes encoding druggable human proteins, to agents with bioactivity against these targets, and, where there were licensed drugs, to clinical indications [ 29 ]. In this study, 4463 druggable genes underwent statistical validation. All contributing studies obtained ethics approval from respective institutional review boards.
Proteomic quantitative trait loci (pQTL) data were obtained from the deCODE ( https://www.decode.com/summarydata/ ) database. The data used this time came from the pQTL data of the deCODE database 2021 version [ 30 ], which describes a GWAS of 35,559 Europeans measuring plasma protein levels using 4907 aptamers.
The genetic outcome association data utilized in this study were derived from the latest release of the GWAS Meta-Analysis Database, which contained 1673 independent genome-wide association studies encompassing approximately 11,000,000,000 single nucleotide polymorphism (SNP)-trait associations. For OCCC analysis, we extracted summary statistics from the Ovarian Cancer Association Consortium (OCAC) dataset (accession ID: ieu-a-1124), comprising 1366 histologically confirmed OCCC cases and 40,941 population-matched controls of European ancestry [ 31 ]. This specific population stratification was maintained to minimize confounding from ancestral genetic heterogeneity. All participating studies in the consortium obtained appropriate institutional review board approvals and informed consent from participants.
Gene annotation data were systematically searched from the Gene Expression Omnibus database (GEO, https://www.ncbi.nlm.nih.gov/geo/info/datasets.html ) which was built by National Center for Biotechnology Information (NCBI). We downloaded the single-cell data file of GSE224334 from the GEO public database, and downloaded 10 sample data with complete single-cell expression profiles for single-cell analysis.
GWAS summary statistics for OCCC were systematically curated from the Integrative Epidemiology Unit (IEU) GWAS database ( https://gwas.mrcieu.ac.uk/ ) and aligned with eQTL data through the TwoSampleMR package (v0.5.7). The SNPs associated with each gene at the significance threshold of the whole locus ( P < 1e–5) were selected as potential Instrumental variables (IVs). Among the SNPs with R2 < 0.001 (clumping window size = 10,000 kb), only SNPs with p2 < 5e–5 were retained. The causal relationship was evaluated by four statistical methods (if there is only one SNP in the causal relationship, only Wald ratio is used) in turn, including Inverse variance weighted (IVW, using meta-analysis method to combine Wald estimates of each SNP), MR Egger (based on the assumption that the strength of the instrument is independent of the direct effect (InSIDE)), Weighted median (weighted median method allows correct estimation of causal relationship when up to 50% of IVs are invalid), and Weighted mode (weighted model estimation has greater ability to detect causal effects, smaller bias and lower type I error rate than MR-Egger regression) to obtain an overall estimate of the effect of all cis and some cross-region gene expression in whole blood on OCCC.
We used the leave-one-out sensitivity analysis of MR to evaluate the effect of specific genetic variants on the risk of OCCC. This approach systematically excludes each SNP individually and recalculates the combined effect size of the remaining SNPs. This process aims to identify and eliminate variants that exert a disproportionately large influence on the overall estimate. For each SNP removed, a new point estimate along with its corresponding 95% confidence interval (CI) is generated, enabling the evaluation of the SNP’s unique contribution and the robustness of the overall results. The results obtained after the exclusion of each SNP are summarized alongside the overall estimate derived from the full set of SNPs. By comparing these estimates, we were able to determine the impact of removing any individual SNP on the overall results, thereby assessing the robustness of our analysis.
We performed colocalization analysis using the Bayesian colocalization (coloc) framework, integrating eQTL summary statistics with GWAS data of OCCC. The 100-kilobase region around the index SNP is used to calculate the posterior probability. In the coloc results, H3 represents the posterior probability that two traits (gene expression and OCCC risk) are associated but have different causal variants while H4 represents the posterior probability that two traits are associated and share a single causal variant. A stringent threshold of SNP-specific posterior probability for hypothesis H4 (SNP.PP.H4) > 0.95 was applied to establish robust colocalization evidence.
To elucidate the biological functions and signaling pathways associated with key genes, we employed the R package “ClusterProfiler” to conduct functional annotation analysis. This approach enabled a comprehensive exploration of the functional relevance of these genes. Gene Ontology (GO) and the Kyoto Encyclopedia of Genes and Genomes (KEGG) were utilized to evaluate relevant functional categories. GO and KEGG enriched pathways with p-values and q-values less than 0.05 were considered statistically significant.
First, the expression profile was processed using the Seurat package. We applied stringent quality control measures by filtering cells based on the following criteria: the total number of UMIs (unique molecular identifiers) per cell, the number of genes expressed, and the mitochondrial and ribosomal gene expression ratios. The mitochondrial and ribosomal gene expression ratios were calculated as the percentage of total mitochondrial or ribosomal gene expression relative to the total expression of all genes, respectively. Cells exhibiting high mitochondrial or ribosomal gene expression ratios typically indicate low RNA content, suggesting that these cells may be undergoing programmed cell death. Quality control was further refined using the median absolute deviation (MAD), with data points exceeding 3 times the MAD classified as outliers. At this stage, the cell quality control process was successfully completed.
We used the LogNormalize method for global normalization, scaling the total expression of each cell to 10,000 by multiplying a coefficient S0, followed by logarithmic transformation for normalization. The CellCycleScoring function was utilized to calculate cell cycle scores, while FindVariableFeatures was applied to identify highly variable genes. To mitigate gene expression fluctuations arising from mitochondrial gene expression proportions, ribosomal gene expression proportions, and cell cycle variations, we used the ScaleData function. Linear dimensionality reduction was performed on the expression matrix using RunPCA, and principal components were selected for subsequent analysis. Batch effects were addressed using the Harmony algorithm, and nonlinear dimensionality reduction was achieved through RunUMAP (uniform manifold approximation and projection). For cell annotation, we queried the CellMarker and PanglaoDB databases, supplemented by literature review, and further validated using the SingleR software for automated annotation. This approach enabled the identification of cell types and their corresponding marker genes within the relevant tissues.
In this study, we utilized the R package “RcisTarget” to predict transcription factors. All computations performed by RcisTarget are motif-based. The normalized enrichment score (NES) of a motif is influenced by the total number of motifs in the database. In addition to the motifs annotated in the source data, we further inferred annotation files based on motif similarity and gene sequence. The initial step in assessing the overrepresentation of each motif within the gene set involved calculating the area under the curve (AUC) for each motif-motif set pair. This calculation was derived from the recovery curve of the gene set ranking for the motifs. Subsequently, the NES for each motif was determined based on the AUC distribution of all motifs within the gene set.
The DGIdb (Drug-Gene Interaction database) is a comprehensive resource that provides information on the associations between genes and their known or potential therapeutic drugs. The database encompasses over 14,000 drug-gene interactions, involving 2600 genes and 6300 drugs targeting these genes, as well as an additional 6,700 genes with potential interactions. In this study, we identified potential drugs associated with key genes using the DGIdb. The resulting drug-gene interaction network was visualized using Cytoscape software, enabling a comprehensive exploration of the relationships between genes and their corresponding therapeutic drugs.
Based on the identified key genes, the corresponding protein 3D structure was obtained in the RCSB Protein Data Bank (PDB) ( https://www.rcsb.org/ ), and the key gene drug prediction was performed in the DGIdb ( https://old.dgidb.org/ ), from which relevant key substances were identified. Then, the drug component structure was obtained through the PubChem database ( https://pubchem.ncbi.nlm.nih.gov/ ). Molecular docking was performed using AutoDock software, and the genetic algorithm was selected as the docking algorithm. A total of 50 docking runs were conducted, and the result with the lowest binding energy was selected for display. The docking results were imported into PyMOL for visualization, enabling the detailed depiction of the binding sites between small molecules and proteins.
Reliable Mendelian Randomization (MR) analysis is grounded in three fundamental assumptions: (1) the correlation assumption, which states that the IV must be strongly associated with the exposure but not directly with the outcome; (2) the independence assumption, which requires that the IV is independent of confounding factors; and (3) the exclusivity assumption, which posits that the IV can influence the outcome only through the exposure. If the IV affects the outcome through alternative pathways, it indicates the presence of genetic pleiotropy. For this analysis, we utilized R language (version 4.3.0). All statistical tests were two-sided, with a p-value < 0.05 considered statistically significant. Flow chart of this study was presented in Fig. 1 .
Fig. 1 Overview of the study design
Overview of the study design
Conclusion
Mendelian randomization (MR) has shown important application value in the field of drug target discovery. Based on large-scale genetic data, this study successfully screened out two potential immunotherapeutic targets for OCCC-PPP1R14A and PTGS2 through MR analysis.
Discussion
OCCC represents the lethal gynecological malignancy, with advanced-stage patients demonstrating significantly poorer prognosis compared to epithelial ovarian cancer counterparts. Characteristic peritoneal dissemination and malignant ascites frequently accompany disease progression. Although cytoreductive surgery combined with hyperthermic intraperitoneal chemotherapy (HIPEC) has been tried to use to control tumor spread, the intrinsic chemoresistance of OCCC to platinum-based drugs severely limits therapeutic efficacy in advanced cases [ 32 ]. Compounded by high thrombotic risk that substantially compromises quality of life, the urgent need for novel systemic therapies cannot be overstated.
Our Mendelian randomization analysis identified PPP1R14A and PTGS2 as causal drivers through colocalization verification. Single-cell transcriptomics revealed PPP1R14A’s predominant expression in malignant epithelial cells with elevated metabolic pathway activity, while PTGS2 showed immune compartment-specific expression suggesting tumor microenvironment (TME) modulation. PPP1R14A has been implicated in immunotherapy resistance in head-neck carcinomas [ 33 ] and demonstrates ovarian cancer-specific dysregulation [ 34 ]. However, in OCCC, there is still a lack of relevant research evidence for PPP1R14A, and its role in tumor immunity is still unclear. At present, due to the lack of summary data on immunotherapy of OCCC, the relationship between PPP1R14A and immunotherapy resistance still needs to be further explored through model experiments. The Cox-2 protein encoded by PTGS2 is believed to be mainly involved in the inflammatory response process [ 35 ], and it is expressed in many types of cancer [ 36 ]. Studies have shown that in lung cancer, the remodeling of the immune microenvironment by COX-2 affects the effect of immunotherapy [ 12 ]. At present, the mechanism of PTGS2’s role in OCCC is still unclear. Our analysis in single-cell RNA sequencing showed that PTGS2 is mainly expressed in tumor-associated immune cells. Our study suggests its correlation with the immune microenvironment of OCCC. We speculate that PTGS2 may play a role in remodeling the immune microenvironment in OCCC, just as in lung cancer.
No targeted therapies have been approved specifically for OCCC. Current clinical trials primarily evaluate immune checkpoint inhibitors [ 37 ] and tyrosine kinase inhibitors (TKIs) [ 38 ] developed for all epithelial ovarian carcinomas. Limited OCCC-specific trials include metabolic pathway-targeting agents and epigenetic modulators [ 39 ]. Metabolic interventions comprise CB-839 (Telaglenastat, NCT03875313 ), a glutaminase inhibitor addressing OCCC’s metabolic dependency, and HIF-1α inhibitors [ 40 ] targeting aberrant hypoxia pathways. Epigenetic approaches feature EZH2 inhibitors like Tazemetostat ( NCT03348631 ), developed through synthetic lethal strategies based on ARID1A mutations [ 41 ]. While pivotal endpoint data remain undisclosed, drug development efforts persist in identifying novel OCCC targets.
Our study nominates PPP1R14A and PTGS2 as potential therapeutic targets. Given OCCC’s resistance to existing immunotherapies, single-cell sequencing and pathway enrichment analyses suggest these genes may modulate treatment efficacy through immune mechanisms. COX-2 inhibitors, predominantly classified as NSAIDs, were identified among 110 PTGS2-interacting compounds via DGIdb screening, warranting biological validation.
Although our European-centric dataset effectively mitigated population stratification bias, significant epidemiological disparities must be acknowledged. OCCC demonstrates noticeable ethnic predilection, with higher incidence in Asian populations—particularly Japanese women [ 42 ]—yet pan-Asian genomic repositories remain critically lacking. This ethnic expression divergence may constrain therapeutic generalizability. Notably, many of OCCC cases arise from endometriosis-associated malignant transformation [ 43 , 44 ], a pathogenic situation our analysis did not incorporate. Future studies should prioritize molecular profiling of endometriosis-associated OCCC subtypes to describe distinct oncogenic track. While early-stage OCCC patients exhibit favorable prognosis, advanced disease means depressed outcomes. Our unstratified cohort design risks obscuring stage-specific therapeutic targets due to temporal genomic heterogeneity during disease progression. Implementation of rigorous stage-stratified analyses in multi-ethnic cohorts could refine precision oncology frameworks. Mendelian randomization relies on genetic variation of the target in identifying drug targets, but this does not equate to the effectiveness of the relevant drugs. Moreover, genetic variation may affect the whole body, but tumors have clear tissue specificity. In general, due to the limitations of Mendelian randomization, its results do not equal clinical effectiveness. Finally, the PPP1R14A/PTGS2 therapeutic targets requires systematic validation through Organoid-based high-throughput screening, PDX model efficacy assessments and Phase 0/I biomarker-driven clinical trials.
Through integrated eQTL/pQTL Mendelian randomization and colocalization analyses, we established two novel OCCC targets. Subsequent single-cell transcriptomic profiling elucidated potential mechanistic roles through transcriptional network enrichment and compound screening. In conclusion, we have discovered two potential immunotherapy targets for OCCC. While these findings represent meaningful progress, they constitute an initial step toward OCCC treatment. Preclinical validation and clinical translation are essential to advance therapeutic development.
Introduction
Ovarian cancer (OC), the most lethal malignancy of the female reproductive system, accounted for 324,398 new cases and 206,839 deaths globally in 2022 [ 1 – 3 ]. The insidious onset of OC frequently leads to delayed diagnosis at advanced stages. Among its histological subtypes, ovarian clear cell cancer (OCCC) represents approximately 10% of epithelial ovarian cancers, surpassed only by high-grade serous cancer in prevalence [ 4 , 5 ].
While early-stage OCCC shows favorable prognosis, advanced disease exhibits poorer chemosensitivity and clinical outcomes compared to other epithelial ovarian cancer subtypes [ 6 , 7 ]. Notably, OCCC patients show high susceptibility to paraneoplastic syndromes, with frequent complications including deep venous thrombosis and pulmonary embolism that significantly decrease quality of life [ 7 ].
Current therapeutic strategies primarily involve cytoreductive surgery combined with platinum-based chemotherapy [ 8 ]. However, rapid development of chemoresistance typically occur in OCCC [ 9 ]. This therapeutic challenge is compounded by the limited efficacy of PARP inhibitors, owing to the low BRCA mutation rate in OCCC [ 10 , 11 ]. The molecular biological characteristics of OCCC suggest potential immunotherapy benefits. However, in clinical trials, immune checkpoint blockers have not achieved ideal therapeutic effects. OCCC have become natural immunotherapy-resistant tumors [ 6 ]. This suggests that OCCC may have potential immunotherapy regulatory mechanisms. These clinical realities highlight the pressing need for new therapies targeting the specific mechanisms of OCCC.
Transcriptome sequencing has significantly transformed the landscape of tumor research, enabling the precise identification of molecular markers, patient stratification, and prognosis prediction [ 12 – 16 ]. Single-cell RNA sequencing further advances this by capturing gene expression at the resolution of individual cells, thereby revealing cellular heterogeneity and providing more refined insights. However, current single-cell immune profiling of OCCC remains limited, with only a few studies having investigated its tumor immune microenvironment at high resolution [ 17 , 18 ]. Available data suggest that OCCC displays an immune-cold phenotype, but a comprehensive understanding of immune cell heterogeneity and functional states is still lacking.
The translational research in OCCC faces three principal challenges: (1) low incidence and tumor heterogeneity, (2) insufficient sample sizes for robust statistical power, and (3) limited funding for dedicated therapeutic exploration—collectively posing significant challenges for conducting large-scale randomized controlled trials (RCTs) in this specific population.
Mendelian randomization (MR), an epidemiological method defining genetic variants as instrumental variables (IVs), provides an alternative approach for causal inference between modifiable exposures and clinical outcomes. This method utilizes single nucleotide polymorphisms (SNPs) that comply with Mendelian inheritance principles to mimic the randomization process in observational studies [ 19 – 21 ]. Notably, MR analysis demonstrates particular utility in identifying novel therapeutic targets through drug target prioritization based on genomic evidence [ 22 ]. Although MR has been successfully applied to drug target discovery in ovarian cancer as a whole, pathotype-specific investigations remain absent for OCCC [ 23 – 26 ]. This gap remains unresolved. Tumor heterogeneity across histological subtypes may basically alter therapeutic response patterns [ 27 ], highlighting the need for subtype-specific targets.
This study employs a three-stage analytical workflow to identify and validate therapeutic targets for OCCC. First, we conducted MR analysis using eQTL and pQTL data to prioritize candidate genes with causal associations. Subsequently, colocalization analysis was performed to ensure shared causal variants between gene expression and disease risk. The identified targets were further characterized through single-cell RNA sequencing (scRNA-seq) data to delineate cell type-specific expression patterns and pathway enrichment analysis to elucidate potential mechanisms. Finally, computational drug discovery was implemented via molecular docking simulations against the DGIdb to assess ligand-receptor binding affinity.
Supplementary Material
Below is the link to the electronic supplementary material.
Supplementary Material 1
Supplementary Material 1
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.