Comprehensive analysis of the deleterious nonsynonymous, non-coding SNPs and cancer variants of human aldo-keto reductase type 1 (AKR1C1) protein and their probable association with disease risk and progression: a computational study.

OA: gold publisher-OA-unknown
Full text 100,870 characters · extracted from pmc-nxml · 9 sections · click to expand

Credit

Nurun Nahar Nila: Data curation, Formal analysis, Methodology, Software, Validation, Writing – original draft. Israt Dilruba Mishu: Data curation, Formal analysis, Methodology, Software, Writing – original draft, Writing – review & editing. Zimam Mahmud: Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Project administration, Resources, Software, Supervision, Validation, Visualization, Writing – original draft, Writing – review & editing. Sonia Tamanna: Formal analysis, Investigation, Methodology, Validation, Writing – review & editing. Kaniz Fahima: Data curation, Formal analysis, Methodology, Software, Writing – review & editing. Adittya Sabhayasachi Khan: Formal analysis, Methodology, Software. Farzana Ansari: Formal analysis, Methodology, Software, Writing – review & editing. Mahmud Hossain: Formal analysis, Methodology, Resources, Software, Visualization, Writing – review & editing. Md. Zakir Hossain Howlader: Formal analysis, Investigation, Methodology, Software, Validation, Writing – review & editing.

Ethical

Not applicable.

Funding

This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

Methods

Information of missense SNPs, intronic SNPs, SNPs at 5′-UTR and 3′-UTR of canonical transcript ENST00000380872.9 were retrieved from Ensembl and NCBI database. AKR1C1 protein information and fasta format sequence of this protein were collected from Uniprot database. The AKR1C1 protein's 3D structure was downloaded from Protein Data Bank (PDB) in PDB format. Mutations in AKR1C1 gene in cancer patients were obtained from cBioPortal database. Disease associated SNPs of AKR1C1 gene were collected from GWAS catalog database. All variants retrieved from multiple databases were cross-checked, and duplicate entries were merged. Variants were uniformly mapped to the canonical transcript (ENST00000380872.9), and synonymous or noncoding SNPs were excluded from downstream analysis. Gene prioritization within the AKR1C family was performed using the VarElect Next-Generation Sequencing Phenotyper to identify genes associated with cancer-related phenotypes. Gene symbols and cancer/tumorigenesis terms were provided as input, and VarElect returned prioritization scores along with −log10( p -values), where higher scores and larger −log10( p -values) indicate stronger gene–disease associations. Association of AKR1C1 gene with diseases was assessed from several databases. GWAS catalog database, Comparative Toxicogenomics Database (CTD) annotates AKR1C1 gene associated diseases with some other information regarding the disease and the references where they are mentioned. GWAS gives association count and CTD gives reference count. Association count refers to the number of independent associations significant at p < 0.00001 in all stages of the analysis (eg. discovery and replication or combined). Functional effects of missense SNPs in AKR1C1 were evaluated using a multi-algorithm in silico approach, as commonly applied in structural–functional SNP studies [ 15 ]. Meta -SNP, PolyPhen-2, PhD-SNP, SNPs&GO, SNAP2, PredictSNP, PANTHER, Mutation Assessor, and SuSPect were run in batch mode to distinguish deleterious variants from neutral substitutions.Meta-SNP integrates predictions from PANTHER, PhD-SNP, SIFT, and SNAP and reports a Reliability Index (RI) [ 16 , 17 ]. PolyPhen-2 assesses structural and functional impacts based on sequence conservation and phylogenetic features [ 18 ]. PhD-SNP and SNPs&GO apply SVM-based and GO-annotation–based classifiers, respectively, to infer disease association [ 19 ]([ 20 ]). SNAP2 uses a neural network to predict functional effects and provides confidence scores [ 21 ]. PredictSNP is a consensus classifier combining outputs from MAPP, PhD-SNP, PolyPhen-1/2, SIFT, SNAP, nsSNPAnalyzer, and PANTHER, providing normalized confidence scores and highlighting neutral (green) or deleterious (red) mutations [ 22 ]. PANTHER and Mutation Assessor evaluate functional impairment using evolutionary conservation [ 23 ]([ 24 ]). SuSPect integrates sequence, structural, and systems biology features to score variant pathogenicity [ 25 ]. The scoring and prediction of the SNP analysis tools is provided in Supplementary Tables S1–S4 . A binomial test was applied to assess the functional significance of missense variants by comparing the number of deleterious predictions to the total number of valid calls across 12 in silico tools. Predictions were binarized and summarized per variant, and statistical significance was evaluated against a background probability of p 0 = 0.5 . To control for multiple testing, p-values were adjusted using the Benjamini–Hochberg false discovery rate (FDR) method, with variants showing FDR-adjusted p < 0.05 considered significant. Here adjusted p -values were calculated as FDR i = min ⁡ ( p i × m i , 1 ) , with pᵢ denoting the ordered raw p -value, i its rank, and m the total number of tests. Conservation of AK1C1 protein residues of missense SNPs were assessed using the ConSurf web server. Based on the phylogenetic relationships between homologous sequences, the ConSurf predicts conservation score of amino acids or nucleic acid locations of a protein, DNA, or RNA molecule. ConSurf conservation scores [ [1] , [2] , [3] , [4] , [5] , [6] , [7] , [8] , [9] ] were obtained for all residues, and residues scoring 7, 8, or 9—indicating high to very high conservation—were considered for domain conservation analysis. For PyMol, UCSF-ChimeraX, and Jmol, visualization scripts for the protein colored with ConSurf scores are provided [ 26 ]. On the other hand, Conservation of Domains, active sites and catalytic sites in AK1C1 protein were analyzed and identified by NCBI Conserved Domains Analysis Tool [ 27 ]. Post-translational Modifications sites in AK1C1 protein were retrieved from DBPTM, ADPriboDB 2.0, BioGRID and CPLM databases and NetPhos, and PhosphoSitePlus web-based tools were used to predict PTMs sites. Drug binding, disease and cancer associated PTMs were also obtained from DBPTM and CPLM database. DBPTM provides experimentally supported PTMs and disease-associated nsSNP-linked modifications [ 28 ]. ADPriboDB catalogs ADP-ribosylated proteins [ 29 ]. The Biological General Repository for Interaction Datasets (BioGRID) provides curated PTMs and interaction datasets [ 30 ]. Compendium of Protein Lysine Modifications 4.0 (CPLM 4.0) compiles lysine modification data with extensive protein annotations [ 31 ]. NetPhos2.0 and 3.1 predicts generic and kinase-specific phosphorylation sites [ 32 , 33 ], while PhosphoSitePlus (PSP) provides functional and structural information for experimentally validated PTMs [ 34 ]. Protein Variant Instability analysis tools-DynaMut, DUET, ENCoM, INPS, mCSM, I-Mutant, and SDM were utilized to filter out the missense SNPs that increases stability of protein. DynaMut provides ΔΔG values and integrates predictions from DUET, ENCoM, SDM, and mCSM [ 35 ]. INPS predicts ΔΔG from sequence or structure using SVM regression. Protein sequence or 3D structure of the protein, mutation list, and protein chain are required as input. It gives output in table format with the ΔΔG value in kcal/mol unit [ 36 ]. mCSM uses graph-based signatures for stability estimation. Protein structure in PDB file format, list of mutations in batch mode are provided as input. It gives the ΔΔG value in kcal/mol unit for each of the mutations in tabular format [ 37 ]. I-Mutant predicts stability changes from sequence or structure caused by single nucleotide polymorphism. [ 38 ]. SDM (Site Directed Mutator) evaluates mutational effects using environment-specific amino acid substitution probabilities and provide DDG value as result [ 39 ]. For the purpose of examining the spatial pattern of amino acid substitutions on protein structures, mutation3D is a functional prediction and visualization tool. In order to find functional hotspots and support further hypotheses, it is designed to be used to find clusters of amino acid substitutions resulting from somatic cancer mutations in a large number of patients. It can also be used to swiftly analyze the relative positions of amino acids in proteins or to cluster various types of mutational data [ 40 ]. The secondary structure of AKR1C1 was predicted using NetSurfP 3.0 and SOPMA. NetSurfP 3.0 estimates secondary structure, solvent accessibility, intrinsic disorder, and backbone dihedral angles using a deep-learning model based on the ESM-1b language framework, enabling accurate predictions without reliance on sequence alignments [ 41 ]. SOPMA (Self-Optimized Prediction Method with Alignment) predicts protein secondary structure by incorporating multiple sequence alignments from homologous proteins, achieving improved accuracy over single-sequence methods [ 42 ]. Wild type residue and mutant residue containing protein 3D structure were modeled using SWISS-MODEL web-based tool. SWISS-MODEL is a completely automated protein structure homology-modeling system which is available through the Expasy web server or the program DeepView (Swiss Pdb-Viewer). This server's goal is to make protein modeling available to all life science researchers across the world [ 43 ]. SWISS-MODEL Structure Assessment tool helps to analyze the quality of predicted 3D structure of a protein. The user can acquire a quick, basic overview of a homology model on the Model Results page of SWISS-MODEL by seeing ligands, global and local quality, and target-template alignments. One may also upload one's own structures for structural evaluation. The Structure Assessment page aims to display the most significant Molprobity scores and make it simple for users to locate low-quality residues in their model or structure [ 44 ]. The structural effects of a point mutation in a protein sequence are evaluated through the user-friendly web service HOPE. Protein sequence and the mutation are required as input, and HOPE will then gather and combine the information from many web services and databases to provide a report with the results, figures, and animations [ 45 ]. TM-align is good tool for comparing protein structures. The TM-score ranges from 0 to 1, where 1 denotes that the two structures are a perfect match. Scores below 0.2 indicate unrelated proteins. On the other hand, scores above 0.5 represents that the proteins have typically the same fold in SCOP/CATH [ 46 ]. Superposed wild-type and mutant models were generated to visualize local residue displacement. Although TM-align and RMSD indicate global similarity, examination of the local environment shows side-chain clashes and altered packing. Per-residue energy deviations (DynaMut vibrational entropy) were used as a proxy for RMSF, highlighting increased local flexibility around the mutation site. The wild-type AKR1C1 structure was predicted using AlphaFold version 3 and subsequently superimposed with the four mutant variants using the Matchmaker tool in UCSF ChimeraX to visualize conformational changes. The input generator option of CHARMMGUI [ 47 ] was applied to the protein structure of aldo-keto reductase type 1 (AKR1C1), PDB ID: 8JP2 for GROMACS (GROningen Machine for Chemical Simulation) 2025.3 packages [ 48 ] with the following criteria: pH: 7.4, ion placing: Monte Carlo, ions: potassium chloride and sodium chloride, force field: Charmm36 m, and temperature: 310.15k. The mutation option in CHARMMGUI was selected to make G20R and G62R mutant files of AKR1C1. Finally, the files were run in GROMACS v25.3 [ 48 ] for 200ns. Ensembl Variant Effect Predictor or VEP predicts the impact of input variants on regulatory areas, genes, transcripts, and protein sequence. It provides information on variants’ location, impact of variants on protein sequence (e.g., stop codon gained, missense, stop codon lost, frameshift mutation etc), minor allele frequencies, SIFT and PolyPhen-2 scores. Additionally, VEP gives information on Transcription Factor and its binding sites at that variant position [ 49 ]. Population allele frequencies (gnomAD v4), clinical significance (ClinVar), and variant identifiers (dbSNP) were assessed using the MANE Select transcript ENST00000380872.9 ( NM_001353.6 ). Gene-level constraint metrics (Z-score, o/e ratio, pLI) were obtained from the gnomAD gene constraint dataset. Comparative Toxicogenomics Database (CTD) is a publicly available comprehensive database and attempts to help studying on how environmental factors affect the health conditions of human. It offers information about chemical–gene/protein interactions i.e, increase/decrease in expression/activity/binding, chemical–disease and gene–disease relationships which are manually curated. To help in generating hypotheses about the mechanisms underlying environmentally influenced diseases, this interaction information is combined with the functional and pathway data [ 50 ]. Gene-Gene and Protein-Protein interaction of AKR1C1 gene was obtained from GeneMANIA web-based tool and BioGRID, STRING, and dbPTM Databases. GeneMANIA predicted interacting genes based on co-expression, co-localozation, pathway, protein and genetic interaction datasets [ 51 ]. STRING provided experimentally validated and predicted protein–protein interactions [ 52 ]. 3′ UTR SNPs of AKR1C1 gene which are in miRNA-binding regions were analyzed using PolymiRTS, which catalogs polymorphisms in miRNA seed regions and target sites [ 53 ]. RegulomeDB was used to annotate non-coding SNPs with regulatory features including promoter regions, TF-binding sites, and DNase hypersensitivity regions in H. sapiens genome . Also used for the analysis of regulatory function of 3′ UTR, 5′ UTR, and intronic SNPs and SNPs which were obtained from GWAS Catalog being associated with disease [ 54 ]. Regulatory SNPs were annotated using RegulomeDB v2.2, obtaining genomic coordinates, TF binding sites, histone marks, chromatin accessibility, DNase footprints, PWM matches, and QTL associations. To validate the functional impact of the identified GWAS SNPs and address the distinction between tag variants and causal variants, expression Quantitative Trait Loci (eQTL) analysis was performed. We utilized the Genotype-Tissue Expression (GTEx) database (v8; https://gtexportal.org/ ) to examine the association between SNP genotypes and the expression levels of the AKR1C1 gene across various human tissues. Significant eQTL associations were identified based on the P-value and the Normalized Effect Size (NES). A significant eQTL association suggests that the variant is a credible functional candidate that modulates gene expression rather than a mere genomic marker. Genetic regulation of AKR1C1 alternative splicing was investigated using sQTL data from the GTEx portal (v8). We analyzed the association between GWAS-identified SNPs and intron excision clusters across all available human tissues, quantifying effects using Normalized Effect Size (NES) and statistical significance (P-values). cBioPortal datasets were queried to extract AKR1C1 somatic mutations, co-occurrence patterns, and clinical annotation [ 55 , 56 ]. In addition to retrieving mutation data, we examined the mutation frequency across multiple cancer types to identify tumor-specific patterns. We retrieved KM plot for visualization and comparison of the survival of AKR1C1 gene mutated patients and other patients. Moreover, distribution of missense mutations across the domain and exons along with the PTM sites were also visualized and retrieved from cBioPortal. Functional Analysis of missense mutations from cancer patients were done using Mutation Assessor, SIFT and Polyphen-2. They gave their prediction as deleterious/probably damaging or neutral/benign and score of probability. CGI classified identified mutations as drivers or passengers using algorithms such as BoostDM and OncodriveMut [ 57 ]. Driver predictions were further reviewed to identify variants with potential roles in altering AKR1C1 function or cancer progression.

Results

A comprehensive workflow describing coding SNP analysis, noncoding SNP analysis, and cancer mutation analysis is given in Fig. 1 . Fig. 1 Workflow of the structured study pipeline used for variant analysis, integrating SNP mining, functional prediction, conservation and stability assessment, 3D modelling, and regulatory/disease association analysis. This multi-layered workflow ensures reproducibility and minimizes tool-specific bias. Fig. 1 Workflow of the structured study pipeline used for variant analysis, integrating SNP mining, functional prediction, conservation and stability assessment, 3D modelling, and regulatory/disease association analysis. This multi-layered workflow ensures reproducibility and minimizes tool-specific bias. From Ensembl Database 301 missense SNP variants, 11226 intronic SNPs, 1384 of 3′ UTR SNPs and 16 5′ UTR SNPs were retrieved which were mapped to the canonical transcript ENST00000380872.9. From cBioPortal, 57 missense mutations were obtained before duplicate removal. From GWAS Catalog disease associated 7 SNPs were obtained where 5 of them are intronic, one is at 3′ UTR and another one is intergenic (∼200 bp downstream of AKR1C1 gene). After merging duplicate variants across databases and mapping all entries to the canonical transcript ENST00000380872.9, 14 duplicated records and 7 synonymous/noncoding variants were removed. A final set of 43 unique cancer-associated missense SNPs remained and was used for downstream pathogenicity, stability, and structural analyses. Using the VarElect NGS Phenotyper, we evaluated the AKR1C gene family to identify members most closely associated with cancer-related phenotypes ( Table 1 ). Among the four genes analyzed, AKR1C3 received the highest prioritization score 19.25, followed by AKR1C1 got 13.92. AKR1C2 and AKR1C4 showed comparatively lower scores was 9.26 and 6.84, respectively ( Table 1 ). Although AKR1C1 obtained a moderate prioritization score, its accompanying −log10(p) value (1.14, corresponding to p-value ≈ 0.072) was not statistically significant. Therefore, the prioritization result was interpreted as supportive but not conclusive. AKR1C1 was still selected for downstream in silico analyses for several reasons. First, the prioritization step was exploratory in nature, and near-significant signals were considered in conjunction with biological relevance. Second, AKR1C1 plays a well-established role in hormone metabolism, cancer progression, and chemoresistance [ [58] , [59] , [60] ], which are directly relevant to the study context. Finally, the lack of significance may reflect limited statistical power rather than absence of biological effect. Collectively, these considerations justify its inclusion as a candidate gene for further analysis. Table 1 Prioritization of genes that are associated with diseases, cancer or tumor as predicted by VarElect NGS Phenotyper. Table 1 Gene Symbol Matched Phenotypes Score -Log10(p) AKR1C3 Cancer; Tumor 19.25 1.31 AKR1C1 Cancer; Tumor 13.92 1.14 AKR1C2 Cancer; Tumor 9.26 0.96 AKR1C4 Cancer; Tumor 6.84 0.84 Prioritization of genes that are associated with diseases, cancer or tumor as predicted by VarElect NGS Phenotyper. Diseases that are directly associated with AKR1C1 gene were retrieved from GWAS catalog and CTD database [ 61 ]. Table 2 shows name of the diseases that were curated from literature i.e., neoplasms, liver cirrhosis, osteoarthritis, disorders of sex development with their reference count and the source of information. All detailed results covering disease terms, evidence scores, and data sources are provided in the Supplementary Tables S2, S3, S4 and S5 included in the supplementary file. Table 2 Association of AKR1C1 with diseases as annotated by CTD database and GWAS Catalog. Table 2 Disease Reference/Association Count Source Liver Cirrhosis, Experimental 153 CTD Cell Transformation, Neoplastic 149 CTD Carcinoma, Non-Small-Cell Lung 70 CTD Osteoarthritis 27 CTD Endometrial Neoplasms 20 CTD Disorders of Sex Development 19 CTD Polycystic Ovary Syndrome 16 CTD Endometriosis 16 CTD Neoplasms, Hormone-Dependent 3 CTD laryngeal squamous cell carcinoma 2 GWAS hypopharynx cancer 2 GWAS Association of AKR1C1 with diseases as annotated by CTD database and GWAS Catalog. Association count: number of independent associations significant at p < 0.00001 in all stages of the analysis (e.g. discovery and replication or combined). Reference count: number of references where the association was found. Functional analysis of 301 missense SNPs from 12 different SNP analysis tools narrowed down the number from 301 to 62 missense SNPs which were predicted to be deleterious by all these very tools ( Supplementary Tables S2–S5 ). Visual representation of SNP analyzer results on deleterious and non-deleterious/Neutral missense SNPs constructed in R studio is shown in Fig. 2 A. Functional analysis results of 2 finally short-listed most deleterious SNPs (G20R, G62R) are given in Table 3 with name of tools used, their prediction, probability or score and Reliability Index (RI). Fig. 2 Integrated computational characterization of AKR1C1 missense variants, structural features, and functional domains. (A) Distribution of deleterious and neutral missense SNPs predicted by multiple in silico tools, showing tool-specific variation in deleteriousness calls, with PANTHER identifying the highest number and SuSPect the lowest. (B) Evolutionary conservation profile of AKR1C1 generated using ConSurf, where highly conserved residues (scores 8–9) are indicated in magenta/purple. (C) Experimentally validated post-translational modification (PTM) sites mapped from PhosphoSitePlus, highlighting deleterious variants at Y55 and K270 that coincide with phosphorylation and ubiquitylation/acetylation sites, respectively. (D) Predicted phosphorylation sites identified by NetPhos, with conserved deleterious SNPs at S166 and S271 highlighted. (E) Domain organization, catalytic tetrad, and active-site architecture of AKR1C1 as determined by NCBI Conserved Domain Analysis. (F) Predicted ligand-binding and active-site regions of AKR1C1, indicating NADP + -binding residues (positions 20–22) and the catalytic residue Y55. Fig. 2 Table 3 Functional analysis results of finally found most deleterious mutations evaluated from 12 different web-based tools. Table 3 Functional Analysis Most Deleterious Mutations Tools Results G20R G62R SNPs&GO Prediction Disease Disease Probability 0.949 0.899 RI 9 8 PhD-SNP Prediction Disease Disease Probability 0.924 0.89 RI 8 8 PANTHER Prediction probably damaging probably damaging Probability 0.86 0.86 Meta -SNP Prediction Disease Disease Score 0.863 0.838 RI 7 7 SIFT Prediction Disease Disease Score 0 0 PolyPhen-2 Prediction probably damaging probably damaging Probability 0.96 0.96 PredictSNP Prediction DELETERIOUS DELETERIOUS Expected accuracy 0.86908365 0.86908365 MAPP Prediction DELETERIOUS DELETERIOUS Expected accuracy 0.71814093 0.87706147 PolyPhen-1 Prediction DELETERIOUS DELETERIOUS Expected accuracy 0.74491225 0.74491225 SNAP2 Predicted effect effect effect Score 94 89 Expected accuracy 95% 91% Mutation Assessor Functional Impact high medium FI score 4.205 3.5 SuSPect Prediction disease-causing disease-causing Score 98 82 Integrated computational characterization of AKR1C1 missense variants, structural features, and functional domains. (A) Distribution of deleterious and neutral missense SNPs predicted by multiple in silico tools, showing tool-specific variation in deleteriousness calls, with PANTHER identifying the highest number and SuSPect the lowest. (B) Evolutionary conservation profile of AKR1C1 generated using ConSurf, where highly conserved residues (scores 8–9) are indicated in magenta/purple. (C) Experimentally validated post-translational modification (PTM) sites mapped from PhosphoSitePlus, highlighting deleterious variants at Y55 and K270 that coincide with phosphorylation and ubiquitylation/acetylation sites, respectively. (D) Predicted phosphorylation sites identified by NetPhos, with conserved deleterious SNPs at S166 and S271 highlighted. (E) Domain organization, catalytic tetrad, and active-site architecture of AKR1C1 as determined by NCBI Conserved Domain Analysis. (F) Predicted ligand-binding and active-site regions of AKR1C1, indicating NADP + -binding residues (positions 20–22) and the catalytic residue Y55. Functional analysis results of finally found most deleterious mutations evaluated from 12 different web-based tools. A total of 62 protein variants were analyzed to determine their functional impact, with every variant identified as significantly deleterious. High consensus was observed across the dataset, as many variants including A41T, C206R, and G20E showed 100% agreement across 12 predictive calls, yielding highly significant FDR p-values of 0.0004587. This consistent impact was also reflected in the G20R and G62R mutations, which maintained deleterious agreement rates of 100% and 91.6% respectively. Ultimately, every variant yielded an FDR p-value significantly below the 0.05 alpha level, confirming a robust and statistically significant prediction of functional loss across the entire mutation library ( Supplementary Table S6 ). Amino acid residue conservation analysis from Consurf gave conservation scale for each of the amino acid residues ( Fig. 2 B). Residues that have highest conservation score of 8 or 9 and some residues that have conservation score 7 but reside in/near active or catalytic site, were filtered in for further domain conservation analysis ( Table 4 ). Finally, 42 residues were found to be highly conserved and significant based on their location. Table 4 Mutations which are highly conserved throughout evolution and reside within or near catalytic site and/or active sites of AKR1C1 protein. Positions of active site and catalytic site residues are indicated in bold format and red color indicate the finally short-listed most deleterious SNPs (G20R, G62R). Table 4 Mutations Position Conservation Score Catalytic or Active site residue Buried/Exposed Function G20E 20 8 near active site Buried Structural G20R 20 8 near active site Buried Structural G22D 22 9 active site residue Buried Structural T23A 23 8 active site residue Exposed Functional T23P 23 8 active site residue Exposed Functional A41T 41 9 near catalytic site Buried Structural R47L 47 8 near catalytic site Exposed Functional R47S 47 8 near catalytic site Exposed Functional Y55H 55 9 catalytic site residue Exposed Functional N57D 57 8 near catalytic site Exposed Functional E58V 58 9 near catalytic site Exposed Functional G62R 62 8 near catalytic site Exposed Functional F80S 80 7 near active & catalytic sites Buried - D112G 112 9 near active & catalytic sites Buried Structural L113P 113 8 near active & catalytic sites Buried Structural H117D 117 9 active & catalytic sites Buried Structural H117P 117 9 active & catalytic sites Buried Structural P119T 119 8 active & catalytic sites Exposed Functional G164R 164 9 near active site Buried Structural V165 M 165 8 near active site Buried Structural S166Y 166 9 active site residue Buried Structural N167T 167 9 active site residue Exposed Functional N189K 189 8 near active site Exposed Functional E192A 192 9 near active site Exposed Functional I211F 211 8 near active site Buried - Y216C 216 8 active site residue Buried - Y216 N 216 8 active site residue Buried - G220V 220 7 active site residue Exposed - A245E 245 8 near active site Buried Structural A245T 245 8 near active site Buried Structural A253T 253 8 active site residue Buried - A253V 253 8 active site residue Buried - R258C 258 8 near active site Exposed Functional R258G 258 8 near active site Exposed Functional R258L 258 8 near active site Exposed Functional L268R 268 8 active site residue Buried K270 N 270 9 active site residue Exposed Functional S271 N 271 8 active site residue Exposed Functional S271R 271 8 active site residue Exposed Functional R276C 276 8 active site residue Exposed Functional R276S 276 8 active site residue Exposed Functional N280K 280 9 active site residue Exposed Functional Mutations which are highly conserved throughout evolution and reside within or near catalytic site and/or active sites of AKR1C1 protein. Positions of active site and catalytic site residues are indicated in bold format and red color indicate the finally short-listed most deleterious SNPs (G20R, G62R). Furthermore, NCBI Domain Conservation analysis tool aided to select the significant residues that reside within conserved domain of aldo-keto-reductase/active site/catalytic tetrad or reside near these 2 sites ( Fig. 2 (E–F)). Thus, 42 residues were found to be highly conserved with significant functional or structural contributions. Table 4 summarizes 42 missense SNPs, including conservation scores, structural context, and proximity to active-site and catalytic residues. Of these, 20 variants map to active sites, 15 lies near active sites, and 6 affect both active and catalytic regions; ConSurf predicted 21 as functionally important and 13 as structurally relevant. G20R and G62R (conservation score = 8) were prioritized, with G20R buried near the active site and G62R exposed near the catalytic site, supporting combined functional and structural effects. Conserved active-site and catalytic tetrad alignments are shown in Supplementary Fig. S1–S2 . Despite some variants showing higher conservation (score = 9), G20R and G62R ranked as most deleterious across all tools ( Supplementary Table S1–S4 ), likely due to the disruptive Gly→Arg substitution introduces a far more drastic physicochemical shift than other amino-acid changes, leading to substantial steric hindrance and electrostatic perturbation within highly conserved functional regions. A 200ns molecular dynamics simulation analysis of wild type and mutated variants including G20R and G62R were evaluated to determine the probable structural changes. Root mean square deviation (RMSD) analysis dictated that mutants and wild type protein stabilized at equivalent time which is within 20 s of starting simulation. Also, there was no overall difference in structural stability since the stabilization was between 0.2 and 0.3 ( Fig. 3 A–E). Fig. 3 (A, E) Root mean square deviation of wildtype, G20R, and G62R variants. (B, F) Root mean square fluctuation (RMSF) of the variants. (C, G) Solvent accessible surface area of the variants. (D, H) Radius of gyration of the variants. Fig. 3 (A, E) Root mean square deviation of wildtype, G20R, and G62R variants. (B, F) Root mean square fluctuation (RMSF) of the variants. (C, G) Solvent accessible surface area of the variants. (D, H) Radius of gyration of the variants. Similarly, structural flexibility analysis with root mean square fluctuation (RMSF) was equivalent for the wild type and mutants. Wild type protein started with a score of approximately 0.7 whereas the mutants started at 0.4. Similarly, wild type protein showed steeper decent than the mutants. However, all the structures reached an equilibration point after 100ns ( Fig. 3 B–F). Although several variants were predicted as deleterious by in silico tools, these predictions reflect localized energetic destabilization or functional perturbation rather than large-scale global unfolding or major structural disruption, as supported by the overall structural stability observed during molecular dynamics simulations. Moreover, the compactness of wild type and mutants remained between 1.940 and 1.975 nm with equivalent average scores of 1.96 ( Fig. 3 C–G). There was no significant difference in solvent accessible surface area between wild type and the mutants, with average values ranging between 0.48 and 0.49 respectively ( Fig. 3 D–H). PTMs like phosphorylation, acetylation, and ubiquitylation in AK1C1 protein were obtained from PhosphoSitePlus which also provide information on their references ( Fig. 2 C– Supplementary Table S10 ). Probability or potential of phosphorylation event in AK1C1 protein's Ser, Thr, and Tyr residues were predicted by NetphosP ( Fig. 2 D– Supplementary Table S10 ). DBPTM, ADPriboDB databases and PhosphositePlus (PSP) and NetphosP 3.1 states that phosphorylation occurs at T23, Y55, S166, Y216, K270, and S271 residue positions of highly conserved amino acids ( Supplementary Table S10 ). From BioGRID and CPLM it was found that K270 is ubiquitylation and acetylation site. Residues Y55 and K270 in AKR1C1 are located in drug-binding and post-translational modification sites (phosphorylation and acetylation/ubiquitylation, respectively) as annotated in PhosphoSitePlus and DBPTM. Although specific amino-acid level associations with cancer have not been reported, dysregulation or mutation of the AKR1C1 gene has been linked to various cancers, including breast cancer and colorectal adenocarcinoma [ 62 ]. On the other hand, no post translational modification occurs at G20 and G62 sites according to these tools and databases. Stability analysis of single amino acid mutated proteins showed that 28 missense SNPs result in decrease in protein stability. These missense SNPs result in negative ΔG value of protein as they decrease protein's conformational stability and increases entropy. To correctly interpret stability effects, ΔΔG <0 kcal/mol is typically considered significantly destabilizing, whereas values around +0.5 kcal/mol represent minimal impact. Predictions of I-Mutant, Dynamut, DUET, ENCoM, INPS, SDM, and mCSM are given in Table 5 with their ΔΔG value in kcal/mol unit. ΔΔG value of these tools for G20R are −1.04, 0.148, −1.02, 2.045, −0.633746, −2.37, and −1.093, respectively. On the other side, for G62R, the ΔΔG values are −1.03, −0.623, −1.393, 0.379, −0.591518, −1.81, and −1.399, accordingly. These 28 SNPs were used for further downstream analysis. G20R and G62R were highlighted in Table 5 because they exhibited the most negative and consistent ΔΔG values across multiple stability prediction tools, indicating a strong destabilizing effect on the AKR1C1 protein. The results are consistent across most tools, suggesting a localized destabilizing effect rather than a global unfolding of the protein. Computational stability analysis using seven ΔΔG prediction tools showed that most substitutions are predicted to destabilize the protein, with mean ΔΔG values for the majority of variants ranging between −2.0 and 0 kcal/mol ( Fig. 4 a). The aggregated mean ± SD plots further confirmed consistent destabilization across variants, with strongly negative values observed for mutations such as F80S, G164R, R258C, and Y216H ( Fig. 4 b). Variants with larger variability, including G20E and S217 N, reflected lower cross-tool confidence. Notably, G20R exhibited only mild and variable destabilization, whereas G62R displayed a uniformly negative ΔΔG profile with high agreement among tools, highlighting these two variants as biologically distinct within the dataset. Further analysis of Z-scores revealed that the most destabilizing mutations (e.g., F80S and R258G) also had strongly negative normalized deviations, indicating high consensus on their impact ( Fig. 4 c). Conversely, variants with Z-scores near zero or positive values, such as G164R and S271 N, suggest less consistent destabilizing effects. Specifically, G20R showed a moderately positive Z-score (0.588), consistent with its mild destabilization, while G62R's near-zero Z-score (0.225) reflects its stable yet consistently negative ΔΔG profile. Table 5 Mutations which result in decrease in AKR1C1 protein stability, as predicted by I-Mutant, Dynamut, DUET, ENCoM, INPS, SDM, and mCSM protein stability analysis tools. Table 5 Mutations I-Mutant, ΔΔG value (kcal/mol) Dynamut, ΔΔG value (kcal/mol) DUET, ΔΔG value (kcal/mol) ENCoM, ΔΔG value (kcal/mol) INPS, ΔΔG value (kcal/mol) SDM, ΔΔG value (kcal/mol) mCSM, ΔΔG value (kcal/mol) G20E −1.25 −0.712 −1.803 0.895 −1.19264 −2.45 −1.72 G20R −1.04 0.148 −1.02 2.045 −0.633746 −2.37 −1.093 G22D −1.12 1.036 −0.627 0.261 −1.050571 −1.4 −0.831 T23P −0.78 0.695 −0.606 −0.507 −1.488985 −1.09 −0.512 A41T −1.12 0.624 −1.104 0.185 −1.47495 −2.54 −0.894 R47S −1.4 −1.594 −1.918 −0.687 −1.70519 −0.77 −1.891 Y55H −1.26 −0.746 −1.204 −0.721 −1.19008 −0.75 −1.381 N57D −0.73 −0.505 −2.289 −0.02 −0.834249 −0.58 −2.248 G62R −1.03 −0.623 −1.393 0.379 −0.591518 −1.81 −1.399 F80S −2.38 −2.879 −3.01 −1.094 −3.04638 −2.1 −2.701 D112G −1.72 −1.329 −1.66 −0.535 −0.8121475 −0.54 −1.684 L113P −2.4 −1.474 −2.446 −0.237 −3.346865 −4.3 −1.753 H117D −1.04 −1.391 −2.325 −0.459 −1.194741 −1.81 −2.164 H117P −0.79 −0.615 −1.636 −0.541 −1.161878 −2.18 −1.26 G164R −1.45 0.491 −0.758 2.462 −0.698486 −2.29 −0.807 V165 M −1.48 −0.569 −0.879 0.146 −1.143197 −2.1 −0.509 N167T −1.29 −0.142 −0.495 −0.136 −0.585181 −0.48 −0.66 Y216C −1.55 0.234 −0.867 −0.639 −1.81517 −1.15 −0.86 Y216N −1.65 −1.169 −1.649 −0.738 −2.53414 −2.73 −1.5 A245E −0.8 −1.063 −3.004 0.14 −1.457965 −3.53 −2.967 A245T −0.76 −0.874 −2.321 0.13 −1.42189 −3.49 −2.036 A253T −1.13 −0.096 −1.598 0.18 −0.9527435 −1.25 −1.624 R258C −1.3 −1.222 −2.219 −0.668 −0.8745555 −1.69 −2.002 R258G −1.9 −1.54 −2.772 −1.218 −1.963615 −3.81 −2.506 L268R −2.07 0.699 −1.294 0.138 −1.931125 −3 −1.244 S271N −0.02 0.31 −0.376 −0.016 −0.5372655 −0.75 −0.599 R276C −0.47 −0.382 −0.854 −0.081 −0.513698 −0.35 −0.866 R276S −0.74 −0.24 −1.148 0.052 −1.14263 −0.85 −1.169 Table 6 Output of Mutant Proteins’ 3D model quality assessment by using SWISS-MODEL Structure Assessment tool. Red color indicates the finally short-listed most deleterious SNPs (G20R, G62R). Table 6 Mutations MolProbity Score Clash Score Ramachandran Favored Ramachandran Outliers Rotamer Outliers C-Beta Deviations Bad Angles Ideal Case As low as possible Zero >98% <0.2% <1% Zero Zero R47S 1.2 2.08 96.56% 0.31% 0.35% 2 24/3574 Y55H 0.95 0.58 96.25% 0.31% 0.35% 2 26/3577 N57D 1.23 2.07 96.25% 0.31% 0.35% 2 27/3580 G62R 1.26 2.25 96.25% 0.62% 0.70% 4 26/3589 F80S 1.23 2.08 96.25% 0.31% 0.35% 4 27/3572 D112G 1.23 2.08 96.25% 0.31% 0.35% 4 26/3574 L113P 1.26 2.08 96.94% 0.62% 0.70% 4 21/3581 G20E 1.28 2.45 96.25% 0.31% 0.70% 2 24/3587 G20R 1.35 3.38 96.56% 0.31% 0.70% 2 15/3589 G22D 1.15 1.53 96.25% 0.31% 0.35% 2 28/2586 T23P 1.15 1.53 96.25% 0.31% 0.35% 2 25/3582 A41T 1.23 2.26 96.56% 0.31% 0.70% 1 22/3583 Y216C 1.18 1.72 96.25% 0.31% 0.35% 2 25/3570 Y216 N 1.21 1.91 96.25% 0.31% 0.35% 2 25/3573 L268R 1.18 1.72 96.25% 0.31% 0.35% 2 26/3583 S271 N 1.18 1.72 96.25% 0.31% 0.35% 2 27/3583 R276C 1.18 1.72 96.25% 0.31% 0.35% 2 26/3574 R276S 1.18 1.72 96.25% 0.31% 0.35% 2 26/3575 Fig. 4 Protein stability analysis of variants using ΔΔG predictions. (a) Scatter plot of ΔΔG predictions from seven computational tools for each variant. Points represent individual tool values; the red line shows the mean ΔΔG per variant. (b) Mean ΔΔG ± SD bar plot per variant summarizing stability predictions. Error bars indicate variability across tools, with most variants showing destabilization. (c) Z-score distribution derived from combined ΔΔG values, reflecting the normalized deviation and confidence in predicted stability effects across all variants. Fig. 4 Mutations which result in decrease in AKR1C1 protein stability, as predicted by I-Mutant, Dynamut, DUET, ENCoM, INPS, SDM, and mCSM protein stability analysis tools. Output of Mutant Proteins’ 3D model quality assessment by using SWISS-MODEL Structure Assessment tool. Red color indicates the finally short-listed most deleterious SNPs (G20R, G62R). Protein stability analysis of variants using ΔΔG predictions. (a) Scatter plot of ΔΔG predictions from seven computational tools for each variant. Points represent individual tool values; the red line shows the mean ΔΔG per variant. (b) Mean ΔΔG ± SD bar plot per variant summarizing stability predictions. Error bars indicate variability across tools, with most variants showing destabilization. (c) Z-score distribution derived from combined ΔΔG values, reflecting the normalized deviation and confidence in predicted stability effects across all variants. Mutation 3D examined 28 missense SNPs which were filtered by instability analysis and gave 2 statistically significant clusters ( Table 7 , p value < 0.05). The first cluster has 11 missense SNPs and it is centered to the active sites. On the other hand, the second one has 7 SNPs which are catalytic tetrad centered. In total 18 SNPs were thus found to be in cluster. In Fig. 5 (A) the 3D visualization of active site cluster and position of mutations in domain are shown and for catalytic tetrad centered cluster these are shown in Fig. 5 (B) . Both of the figures show that these SNPs reside within the aldo-keto reductase domain. Table 7 Project-Hope results that shows changes in amino acid properties due to mutations and other molecular effects. Table 7 Mutations Hydrophobicity Size Electric Charge Others in Wild Type Others in Mutant A41T Decreases Bigger Neutral to slightly Positive Buried in protein core Loss of hydrophobic interactions & may not fit in core D112G Increases Smaller Negative to Neutral Buried & Forms salt bridge Flexible, can't give rigidity & unable to form salt-bridge F80S Decreases Smaller Neutral to slightly Positive Resides in protein core Cause an empty space & loss of hydrophobic interactions G20E Decreases Bigger Neutral to Negative Buried in protein core May not fit in core region & create problem in folding G20R Decreases Bigger Neutral to Positive Buried in core with hydrophobic interaction and flexibility. Can loss interaction with NADP, may not fit in core, create misfolding G62R Decreases Bigger Neutral to Positive Buried in core with hydrophobic interaction and flexibility. May not fit and form hydrophobic bond in core region & create problem in folding L113P - Smaller - Buried in core and forms hydrophobic interaction Creates empty space and problem in folding because of its structural uniqueness and rigidity N57D Decreases - Neutral to Negative Buried and forms H-bond with Thr23 & Ser51 Negative charge can create problem in protein folding at the core R47S Increases Smaller Positive to Neutral Forms H-bond with Ile42 & Phe46, and salt bridge with Glu43 & Asp78 Loss of charge may hamper H-bond & salt bridge formation and may cause loss of external interactions Fig. 5 (A) Mutation3D-derived active-site–centered mutation clusters and their spatial distribution within the Aldo-Keto Reductase (AKR1C1) domain. Orange font indicates the active site residues. (B) Mutation3D-derived catalytic-tetrad–centered mutation clusters and their corresponding positions within the AKR1C1 domain. In both panels, red-labelled SNPs represent clustered variants, while blue-labelled SNPs represent non-clustered variants. Here, orange font indicates the catalytic site residue (Y55H) and R47, D112, and L113 are near catalytic site residues. Red font indicates the finally short-listed most deleterious SNP (G62R). Fig. 5 Project-Hope results that shows changes in amino acid properties due to mutations and other molecular effects. (A) Mutation3D-derived active-site–centered mutation clusters and their spatial distribution within the Aldo-Keto Reductase (AKR1C1) domain. Orange font indicates the active site residues. (B) Mutation3D-derived catalytic-tetrad–centered mutation clusters and their corresponding positions within the AKR1C1 domain. In both panels, red-labelled SNPs represent clustered variants, while blue-labelled SNPs represent non-clustered variants. Here, orange font indicates the catalytic site residue (Y55H) and R47, D112, and L113 are near catalytic site residues. Red font indicates the finally short-listed most deleterious SNP (G62R). Secondary structure analysis of AK1C1 protein by NetSurfP showed the coiled, helix and strand regions of the protein. Furthermore, it gave information on regions that are buried or exposed to the surface ( Fig. 6 A) [ 63 ]. SOPMA, on the other side, provided statistical analysis on the percentage of AK1C1 proten sequence that form alpha helix, extended strand, beta turn and random coil in configuration, which are in this case 33.13%, 18.58%, 7.43%, and 40.87% respectively ( Fig. 6 B). For better visualization of the prediction, there is a graphics also ( Fig. 7 ). From these figures it is clearly seen that G20R reside within extended strand and alpha helix respectively. Fig. 6 (A) Secondary structure of the AKR1C1 protein predicted by NetSurfP. Figure depicts the helix, strand and coil regions of the protein and the exposed as well as the buried residues are also shown, from where the active site (G22, T23, H117, P119, S166, N167, Y216, G220, A253, L268, K270, S271, R276, and N280) and catalytic site residue (Y55, H117, P119) status is obvious. (B) Secondary structure of the AKR1C1 protein predicted by SOPMA. From the data, it was observed that largest number of residues are in coil region, then alpha helix, which is followed by extended strand and beta turn. Fig. 6 Fig. 7 (A) Three-dimensional structure of the AKR1C1 protein in complex with its ligands NAP and STR, modeled using SWISS-MODEL. (B) Visualization of four selected missense SNP sites (G20, G62, Y55, Y110) mapped onto the AKR1C1 3D structure, highlighting their positions near the active site. (C – D) Superimposed 3D structure of wild (green), G20R (cyan), G62R (pink). Side chain of the mutant amino acids are shown for positions (G20R, G62R respectively). Fig. 7 (A) Secondary structure of the AKR1C1 protein predicted by NetSurfP. Figure depicts the helix, strand and coil regions of the protein and the exposed as well as the buried residues are also shown, from where the active site (G22, T23, H117, P119, S166, N167, Y216, G220, A253, L268, K270, S271, R276, and N280) and catalytic site residue (Y55, H117, P119) status is obvious. (B) Secondary structure of the AKR1C1 protein predicted by SOPMA. From the data, it was observed that largest number of residues are in coil region, then alpha helix, which is followed by extended strand and beta turn. (A) Three-dimensional structure of the AKR1C1 protein in complex with its ligands NAP and STR, modeled using SWISS-MODEL. (B) Visualization of four selected missense SNP sites (G20, G62, Y55, Y110) mapped onto the AKR1C1 3D structure, highlighting their positions near the active site. (C – D) Superimposed 3D structure of wild (green), G20R (cyan), G62R (pink). Side chain of the mutant amino acids are shown for positions (G20R, G62R respectively). SWISS-MODEL generated wild type and mutant protein 3D models ( Fig. 7 ) were assessed using SWISS-MODEL Structure Assessment Tool. The residues that are far away from the ideal case are low quality residues for the ideal wild type conformational structure of the protein. Thus, this tool filtered 9 missense SNPs that are low quality or in other words, most deleterious SNPs. They are R47S, N57D, G62R, F80S, D112G, L113P, G20E, G20R, and A41T with their clash score being more than 2.00 which, in ideal case, should be Zero ( Table 7 ). Additionally, their MolProbity score is greater than 1.20 that should be as low as possible. Their Ramachandran favored percentage should be more than 98% which were near 96% instead. Ramachandran outliers in ideal case is lower than 0.2% which in case of mutated proteins were predicted to be more than 0.31%. Likewise, the other aspect of quality assessment features was not near ideal case values. Low quality residues or missense SNPs from mutant protein 3D model quality assessment were analyzed in Hope. Hope provided molecular insights into the mutant protein structure that gives clear understanding of why the mutant protein is unlikely to function like a wild type one and what are the features the mutant one lacks and differs in. The output of Project-Hope is given in Table 7 . In case of Gly to Arg mutation at 20th position, hydrophobicity decreases, amino acid size increases and electric charge becomes neutral to positive. Gly amino acid residue is buried in core with hydrophobic interaction and flexibility. G20R may cause loss of interaction with NADP, may not fit in the core due to charge and size, and so may create misfolding. In case of G20R, hydrophobicity, size and electric charge are same to G20R. Gly to Arg transition at 62nd position may result in not fitting in the core or form hydrophobic bond in core region. This is why it may create problems in folding. The G20R and G62R mutations introduce bulky, charged side chains that cause significant steric clashes (scores of 3.38 and 2.25) and disrupted H-bonding in strand and alpha-helical regions. Despite a stable global structure, G20R is buried (0.93 confidence) and G62R is solvent-exposed (0.87 confidence), leading to localized instability at these glycine hotspots. These disruptions increase solvent exposure and alter local packing, which explains the predicted functional loss observed in the mutant proteins. This high-confidence modeling suggests that even minor amino acid substitutions in these specific domains can fundamentally compromise protein efficacy. The H-bonds in wild type and mutant proteins are given in Supplementary Table S7 which were predicted from DynaMut. PDB files are provided in the Supplementary as Pymol session files. TM-Align compared wild type and mutant protein models and gave TM values and RMSD score ( Table 8 ). TM value that is much lower than 1 were selected to have much more dissimilarity to the wild type protein model. The higher the RMSD score, the more probability it has to be non-identical or dissimilar to the wild type protein. TM Align helped to filter 2 SNPs which are finally found to be most deleterious. These 2 missense SNPs are G20R, and G62R with their TM value being 0.99678 and 0.99679 while both of their RMSD score to be 0.7. The TM-score reflects overall fold similarity and is not designed to resolve single-residue structural changes [ 64 ]; therefore, local effects of G20R and G62R were evaluated using residue-level tools such as Missense3D, DynaMut, DUET, and RMSD-based local structural comparisons. These TM-scores close to 1.0 and RMSD values near 0.07 Å indicate that the global backbone conformation of G20R and G62R remains essentially unchanged relative to the wild-type structure. Therefore, TM-align and RMSD do not suggest major global structural disruption. Any functional impact must instead arise from local changes near the mutation site rather than global structural refolding. Table 8 Output of TM-Align which compares the 3D models of wild type and mutant proteins from SWISS-MODEL web-based tool. Table 8 Mutant Protein Model Wild Type Protein TM value RMSD score R47S 1mrq 0.99683 0.06 N57D 1mrq 0.99683 0.06 G62R 1mrq 0.99679 0.07 F80S 1mrq 0.99683 0.06 D112G 1mrq 0.99683 0.06 L113P 1mrq 0.99683 0.06 G20E 1MRQ 0.99682 0.06 G20R 1MRQ 0.99678 0.07 A41T 1MRQ 0.99684 0.05 Output of TM-Align which compares the 3D models of wild type and mutant proteins from SWISS-MODEL web-based tool. VEP analyzed the effect of missense SNPs on Transcription Factor binding, their impact at moderate or modifier level, and effect on other transcripts like snoRNA etc. Table 9 shows the short-listed VEP results for SNPs which were used to model mutant proteins by SWISS MODEL tool and were found to be highly unstable protein structure after their assessment in TM align. From table it is clearly seen that both of the G20R and G62R variants have moderate level impact. Transcription factors that interact with G20R position are TEAD4, MAX, POU2F1, ELK1, ETV1, and ETV4. So, it is upstream regulatory region's TF binding site variant. Table 9 Ensembl Variant Effect Predictor (VEP) results for Highly deleterious SNPs of AKR1C1 gene. Table 9 Mutations Impact Strand Gene Symbol BIOTYPE Transcription factors Consequence G62R Moderate 1 AKR1C1 protein_coding - missense_variant N57D Moderate 1 AKR1C1 protein_coding - missense_variant A41T Moderate 1 AKR1C1 protein_coding - missense_variant G20R Moderate 1 AKR1C1 protein_coding - missense_variant --- Modifier 1 AKR1C1 protein_coding - upstream_gene_variant --- Modifier −1 U8 snoRNA - upstream_gene_variant --- Modifier - - promoter - regulatory_region_variant --- Modifier 1 - - TEAD4:MAX TF_binding_site_variant --- Modifier −1 - - POU2F1:ELK1, POU2F1:ETV1, POU2F1:ETV4 TF_binding_site_variant G20E Moderate 1 AKR1C1 protein_coding - missense_variant --- Modifier 1 AKR1C1 protein_coding - upstream_gene_variant --- Modifier −1 U8 snoRNA - upstream_gene_variant --- Modifier - - promoter - regulatory_region_variant --- Modifier 1 - - TEAD4:MAX TF_binding_site_variant R47S Moderate 1 AKR1C1 protein_coding - missense_variant F80S Moderate 1 AKR1C1 protein_coding - missense_variant D112G Moderate 1 AKR1C1 protein_coding - missense_variant Ensembl Variant Effect Predictor (VEP) results for Highly deleterious SNPs of AKR1C1 gene. To evaluate the population frequency and clinical relevance of the prioritized AKR1C1 variants, we queried gnomAD, ClinVar, and dbSNP using the MANE Select transcript (ENST00000380872.9/ NM_001353.6 ). Both prioritized missense variants G20R and G62R were identified as extremely rare in population-scale datasets. The G20R variant (p.Gly20Arg; rs1384869123) corresponds to a G > C substitution at NC_000010.11 :g.4963502G > C (c.58G > C) and shows an allele frequency of 3.42 × 10 −6 in gnomAD v4.1.0 exomes, with no homozygotes reported. Similarly, the G62R variant (p.Gly62Arg; rs1274415938) arises from a G > A substitution at NC_000010.11 :g.4966013G > A (c.184G > A) and exhibits an allele frequency of 6.84 × 10 −7 in gnomAD v4.1.0 exomes, also with no homozygous individuals observed. Both variants fall well below the 0.01 minor allele frequency threshold, confirming their exceptional rarity in the general population. No pathogenic or likely pathogenic clinical assertions were available for either variant in ClinVar, indicating a lack of prior clinical characterization. Gene-level constraint metrics from gnomAD show that AKR1C1 displays moderate tolerance to missense variation (missense Z-score = 0.51; o/e = 0.92 [0.83–1.01]) and low intolerance to loss-of-function variants (pLI = 0; pLoF o/e = 0.93 [0.69–1.25]). Although AKR1C1 is not globally constrained, the extreme rarity of G20R and G62R supports their potential functional relevance and justifies further structural and mechanistic investigation. Comparative Toxicogenomics Database (CTD) provided many chemical compounds that interact with AKR1C1 protein. According to the Rat Genome Database (RGD, Gene ID: 1306847) ( https://rgd.mcw.edu/rgdweb/report/gene/main.html?id=1306847 ), their interaction directly or indirectly results in decrease or increase in AKR1C1 gene expression, negative effect in splicing of transcript, or negative effect on binding, catalytic or regulatory function of AK1C1 protein ( Supplementary Table S11 ). Several compounds, including 7-hydroxyflavone, flufenamic acid, salicylic acid, and acetylsalicylic acid, directly inhibit AKR1C1 enzymatic function, whereas others such as pirinixic acid and ciglitazone indirectly increase AKR1C1 expression through PPARA- and PPARG-mediated pathways. Environmental and toxic agents, including arsenite, cadmium chloride, lead, and aflatoxin B1, further demonstrate that AKR1C1 regulation occurs through diverse mechanisms involving transcriptional, splicing, and epigenetic effects [ 65 , 66 ]. Physical interactions of proteins with AKR1C1 are enlisted in Supplementary Table S13which were retrieved from BioGRID, Genemania, CTD and dbPTM databases. HCVGP1 is an HCV protein, NSP12, ORF14, and ORF3B are SARS-CoV proteins and the rest are human proteins ( Supplementary Table S13 ). From STRING protein-protein interaction network shows important proteins to be interacting with AKR1C1 and they are AKR1C2, AKR1C3, AKR1D1 and CYP1A1 ( Fig. 8 C). BioGRID shows other interactions like protein-chemicals interaction and protein-protein interactions within same organism and with different organisms ( Fig. 8 B). From BioGRID, noteworthy proteins that interact with AKR1C1 protein are AKR1C2, AKR1C3, AKR1D1, AGR2, BRD4, PTPN3. Physical interaction network of proteins as predicted by Genemania shows that AKR1D1, AKR1C2, AKR1C3, ARG2, MAPK3 with AKR1C1 protein ( Fig. 8 A). The CTD database annotation indicates that AKR1C1 participates in multiple biologically relevant pathways ( Supplementary Table S11 ), most notably those involved in xenobiotic metabolism and steroid hormone biosynthesis, along with broader roles in lipid, bile acid, and vitamin metabolic processes. Fig. 8 (A) GeneMANIA-predicted physical interaction network of the AKR1C1 gene product with functionally related genes. (B) BioGRID-predicted interaction network showing protein–protein interactions (blue: same species; yellow: different species) and protein–chemical interactions (green) associated with AKR1C1 . (C) STRING-derived protein–protein interaction network illustrating functional partners of the AKR1C1 protein. Fig. 8 (A) GeneMANIA-predicted physical interaction network of the AKR1C1 gene product with functionally related genes. (B) BioGRID-predicted interaction network showing protein–protein interactions (blue: same species; yellow: different species) and protein–chemical interactions (green) associated with AKR1C1 . (C) STRING-derived protein–protein interaction network illustrating functional partners of the AKR1C1 protein. 3′ UTR SNPs that reside in miRNA target or seed region show altered effect in mRNA-miRNA interactions. PolymiRTS shows several SNPs which alters miRNA Site and thus alter function like creation of new miRNA site (denoted as C in Table 10 ), and/or disruption of conserved or non-conserved miRNA site (non-conserved is denoted as N in Table 10 ). rs8483, rs35289491, rs35763015, rs192167415, and rs184205383 are the SNPs in miRNA site. The assessment of miRNA binding energy (ΔG) and cross-species conservation for PolymiRTS variants was limited by the lack of consistent and reliable outputs from currently available computational tools. Table 10 3′ UTR region SNPs that reside in miRNA target site and show impact in mRNA-miRNA binding as predicted by PolymiRTS (C = derived allele creates new miR site, N = derived allele disrupts non-conserved miR site). Table 10 dbSNP ID Ancestral Allele Allele miR ID miR Site Function Class rs8483 A G hsa-miR-3128 aggtcT G CCAGAa C hsa-miR-4720-5p aggtcT G CCAGAa C hsa-miR-4799-3p aggtcT G CCAGAa C hsa-miR-5588-5p aggtcT G CCAGAa C hsa-miR-6868-5p aggtCT G CCAGAa C A hsa-miR-3529-5p agGTCT A CCAgaa N hsa-miR-3684 AGGTCT A ccagaa N hsa-miR-379-5p agGTCT A CCAgaa N hsa-miR-4451 aggTCT A CCAgaa N hsa-miR-4490 aggtcT A CCAGAa N hsa-miR-4650-3p aggTCT A CCAgaa N hsa-miR-4774-5p aggtcT A CCAGAa N rs35289491 G A hsa-miR-1909-3p CCCTGC A tgtgga C hsa-miR-6132 CCCTGC A tgtgga C hsa-miR-6722-3p CCCTGC A tgtgga C hsa-miR-6836-5p CCCTGC A tgtgga C rs35763015 A A hsa-miR-592 ggTGAC A CAgagg N hsa-miR-597-5p gGTGAC A CAgagg N hsa-miR-6879-3p GGTGAC A cagagg N C hsa-miR-326 ggtgaC C CAGAGg C hsa-miR-330-5p ggtgaC C CAGAGg C hsa-miR-6776-5p ggtgAC C CAGAgg C hsa-miR-6861-5p ggtgAC C CAGAgg C rs192167415 G G hsa-miR-6717-5p caCATC G CCtctg N A hsa-miR-4257 cacatc A CCTCTG C hsa-miR-7854-3p cacaTC A CCTCtg C rs184205383 A A hsa-miR-628-5p gattTC A GCAAgc N 3′ UTR region SNPs that reside in miRNA target site and show impact in mRNA-miRNA binding as predicted by PolymiRTS (C = derived allele creates new miR site, N = derived allele disrupts non-conserved miR site). RegulomeDB predicted regulatory role of 3′ and 5′ UTR region SNPs are given in Table 11 , Table 12 , respectively. Probability that is close to 1 is more likely to have regulatory role. It gave regulatory features like Transcription Factor (TF) binding, DNase Peak (DNase activity is at peak level), matched TF motif, matched DNase Footprint, presence of any other motif and eQTL or Expression Quantitative Trait Loci. Table 11 Regulatory role of 3′ UTR SNPs predicted by RegulomeDB. (Rank 5 = TF binding or DNase peak, 1a = eQTL + TF binding + matched TF motif + matched DNase Footprint + DNase peak, 2a = TF binding + matched TF motif + matched DNase Footprint + DNase peak, and 2b = TF binding + any motif + DNase Footprint + DNase peak.) Table 11 rsIDs Probability Ranking rsIDs Probability Ranking rs1163624611 1 2a rs954025312 1 5 rs1257081278 1 2a rs879976718 1 5 rs1379911930 1 2a rs1264258878 0.99722 5 rs1458978843 1 2a rs1318657994 0.99722 5 rs1554771158 1 2a rs1280202307 0.99667 5 rs1554771192 1 2a rs1240998134 0.995 5 rs1173838688 1 2b rs1353483394 0.995 5 rs543031141 1 2b rs1446515024 0.99471 5 rs1223911697 1 3b rs1273445162 0.99333 5 rs1171126646 1 5 rs559378823 0.99333 5 rs1181968843 1 5 rs73602918 0.99267 1a rs1139140 1 5 rs1449643901 0.992 5 rs1178945725 1 5 rs1554770944 0.992 5 rs1183799622 1 5 rs538743884 0.99 2a rs1209483536 1 5 rs782312072 0.99 2b rs1256639249 1 5 rs1236258670 0.985 5 rs1239905262 1 5 rs1319806453 0.98167 5 rs1290572359 1 5 rs1205056447 0.98162 5 rs1249435865 1 5 rs1280086685 0.98162 5 rs1268803565 1 5 rs189110933 0.98162 5 rs1340604133 1 5 rs1357464009 0.98 2a rs1300225773 1 5 rs569717968 0.98 2a rs1306626529 1 5 rs1161197353 0.97858 5 rs1342959002 1 5 rs1264516438 0.97858 5 rs1418119813 1 5 rs1364271286 0.97858 5 rs1416762840 1 5 rs1554770723 0.9775 5 rs1386042955 1 5 rs1139139 0.975 5 rs1371123251 1 5 rs1234307207 0.975 5 rs1386618720 1 5 rs781871351 0.975 5 rs1472786262 1 5 rs1296445547 0.97 2a rs1433857742 1 5 rs1413911196 0.97 5 rs1564319734 1 5 rs1360664284 0.96817 2a rs1554770745 1 5 rs1554771185 0.96817 2a rs1554771008 1 5 rs1157645126 0.96624 2a rs1554771058 1 5 rs1381440489 0.96108 5 rs534192764 1 5 rs1460629149 0.96108 5 rs782031381 1 5 rs1554771156 0.96 2a rs571223202 1 5 rs375513473 0.95846 2a rs558632901 1 5 rs1313075748 0.95831 5 Table 12 Regulatory role of 5′ UTR SNPs as predicted by RegulomeDB. Table 12 rs IDs Probability Regulatory Feature rs1169939032 0.85633 TF binding or DNase peak rs1373539110 0.85633 TF binding or DNase peak rs777665259 0.60906 TF binding + DNase peak rs575034166 0.60906 TF binding + DNase peak rs757239913 0.60906 TF binding + DNase peak rs780527109 0.60906 TF binding + DNase peak rs1564313906 0.60906 TF binding + DNase peak rs200933984 0.60906 TF binding + DNase peak rs769377816 0.60906 TF binding + DNase peak rs1338637474 0.60906 TF binding + DNase peak rs779665083 0.60906 TF binding + DNase peak rs1284314650 0.60906 TF binding + DNase peak rs765822846 0.58955 TF binding or DNase peak rs753355476 0.37407 TF binding or DNase peak Regulatory role of 3′ UTR SNPs predicted by RegulomeDB. (Rank 5 = TF binding or DNase peak, 1a = eQTL + TF binding + matched TF motif + matched DNase Footprint + DNase peak, 2a = TF binding + matched TF motif + matched DNase Footprint + DNase peak, and 2b = TF binding + any motif + DNase Footprint + DNase peak.) Regulatory role of 5′ UTR SNPs as predicted by RegulomeDB. Regulatory features of intronic SNPs as predicted by RegulomeDB is given in Table 13 . Score 1 predicted SNPs are listed here which were ranked as 2a, 2b, 3a, 3b and 5 as well. These indicates that these SNP residues have significant role in regulation like, TF binding, have highly active sites which are sensitive to DNase activity. Detailed regulatory evidence including ChIP-seq TF binding signals, chromatin accessibility, PWM matches is provided in Supplementary Table S8 . Several non-coding SNPs exhibited strong regulatory potential, suggesting possible influence on AKR1C1 transcription. Table 13 RegulomeDB predicted regulatory role of intronic SNPs of AKR1C1 gene. (Rank 2a = TF binding + matched TF motif + matched DNase Footprint + DNase peak, and 2b = TF binding + any motif + DNase Footprint + DNase peak, 3a = TF binding + any motif + DNase peak, 3b = TF binding + matched TF motif, and Rank 5 = TF binding or DNase peak). Table 13 rs IDs Probability Ranking rs IDs Probability Ranking rs1329563998 1 2a rs1453634573 1 5 rs1480434544 1 2a rs1476009160 1 5 rs936628284 1 2a rs1483636512 1 5 rs975443240 1 2a rs1554769755 1 5 rs1186495541 1 2a rs1488641967 1 5 rs1292521379 1 2a rs1554770275 1 5 rs1201154907 1 2b rs1554770304 1 5 rs1312036264 1 2b rs1554770325 1 5 rs1293158858 1 3a rs1554770330 1 5 rs1138575 1 3b rs1554770336 1 5 rs1385134508 1 3b rs1554770398 1 5 rs1422074136 1 3b rs1554770470 1 5 rs1204599670 1 3b rs1554770600 1 5 rs1219576607 1 3b rs1564316172 1 5 rs1238050774 1 3b rs187053210 1 5 rs1157209765 1 5 rs201649634 1 5 rs1162013077 1 5 rs191958914 1 5 rs1171126646 1 5 rs185440371 1 5 rs1166052596 1 5 rs187559938 1 5 rs1168253366 1 5 rs542848301 1 5 rs1172391120 1 5 rs544482192 1 5 rs1178633044 1 5 rs534192764 1 5 rs1335750366 1 5 rs537399568 1 5 rs1331397182 1 5 rs547353186 1 5 rs1354359276 1 5 rs540167233 1 5 rs1370371083 1 5 rs559050038 1 5 rs1383939313 1 5 rs573385619 1 5 rs1382267298 1 5 rs567267393 1 5 rs1398092575 1 5 rs747482713 1 5 rs1407138211 1 5 rs750946892 1 5 rs1404080487 1 5 rs774801120 1 5 rs1443537504 1 5 rs866444443 1 5 rs1423219083 1 5 rs782415900 1 5 rs1438375711 1 5 rs997155847 1 5 rs1435898865 1 5 rs906344870 1 5 rs1433078428 1 5 rs954025312 1 5 rs1438569969 1 5 rs1181739785 1 5 rs1464433945 1 5 rs1195342998 1 5 rs146807360 1 5 rs1201839684 1 5 RegulomeDB predicted regulatory role of intronic SNPs of AKR1C1 gene. (Rank 2a = TF binding + matched TF motif + matched DNase Footprint + DNase peak, and 2b = TF binding + any motif + DNase Footprint + DNase peak, 3a = TF binding + any motif + DNase peak, 3b = TF binding + matched TF motif, and Rank 5 = TF binding or DNase peak). Seven SNPs were retrieved from GWAS catalog which has regulatory features like TF binding, having DNase peak, eQTL (Expression Quantitative Trait Loci) as predicted by RegulomeDB. Table 14 lists the 7 SNPs with their probability of being associated with regulation, regulatory feature, and their genomic annotation as predicted from RegulomeDB. Among the 7 SNPs 6 of them were also retrieved from Ensembl; however, the intergenic SNP (rs6601888) was not retrieved from Ensembl. Five of them are intronic, rs77045180 is 3′ UTR region SNP and rs6601888 is intergenic (∼200bp downstream of AKR1C1 gene). Here it is noteworthy that rs77045180 was documented in GWAS tagging it to laryngeal squamous cell carcinoma, and hypopharynx cancer with a p value 1 × 10 −7 and rs145648894 being tagged to blood protein level with a p value of 4 × 10 −23 . Table 14 Regulatory role of Disease associated SNPs from GWAS Catalog, as predicted by RegulomeDB. Table 14 rs IDs Genomic Position Probability Regulatory Feature rs2904804 intronic 0.72333 eQTL + TF binding/DNase peak rs7073738 intronic 0.58955 TF binding or DNase peak rs7916500 intronic 0.58955 TF binding or DNase peak rs41305661 intronic 0.55436 eQTL + TF binding/DNase peak rs77045180 3′ UTR 0.55436 eQTL + TF binding/DNase peak rs145648894 intronic 0.55324 eQTL + TF binding/DNase peak rs6601888 intergenic 0.55324 eQTL + TF binding/DNase peak Regulatory role of Disease associated SNPs from GWAS Catalog, as predicted by RegulomeDB. To investigate the regulatory potential of the SNPs retrieved from the GWAS Catalog, we performed eQTL analysis across multiple human tissues. Five SNPs—rs41305661, rs145648894, rs2904804, rs77045180, and rs6601888—demonstrated highly significant associations with AKR1C1 expression levels ( Fig. 9 A–B). Specifically, rs41305661 and rs145648894 showed exceptionally strong eQTL signals in Whole Blood (P = 3.9 × 10 −46 and 1.8 × 10 −45 , respectively), with positive NES values indicating the strongest regulatory effects. The alternative alleles for these variants were strongly associated with increased expression in blood (NES = 0.81) but decreased expression in the brain (NES = −1.10). Furthermore, rs2904804 showed a consistent positive regulatory effect on AKR1C1 in eight tissues, including the Adrenal Gland and Pancreas. This analysis confirms that these are not merely tag SNPs but likely serve as functional regulatory elements. Fig. 9 eQTL and sQTL landscape of GWAS-identified AKR1C1 variants across human tissues. (A) Heatmap of normalized effect sizes (NES) illustrating tissue-specific eQTL effects of five significant SNPs (rs41305661, rs145648894, rs2904804, rs77045180, and rs6601888) on AKR1C1 expression. Red and blue indicate positive and negative regulatory effects, respectively, highlighting pronounced blood- and brain-specific expression differences. (B) Bubble plot summarizing the tissue distribution and statistical significance of eQTL associations for each SNP. Bubble size corresponds to statistical significance (−log 10  P value), while color reflects the direction and magnitude of the NES. (C) Heatmap of significant splicing quantitative trait loci (sQTLs) depicting tissue- and event-specific associations of rs41305661, rs145648894, and rs2904804 with AKR1C1 alternative splicing. Strong splicing signals are observed in Colon, Testis, and Brain–Hippocampus tissues, indicating context-dependent regulation of isoform usage. Fig. 9 eQTL and sQTL landscape of GWAS-identified AKR1C1 variants across human tissues. (A) Heatmap of normalized effect sizes (NES) illustrating tissue-specific eQTL effects of five significant SNPs (rs41305661, rs145648894, rs2904804, rs77045180, and rs6601888) on AKR1C1 expression. Red and blue indicate positive and negative regulatory effects, respectively, highlighting pronounced blood- and brain-specific expression differences. (B) Bubble plot summarizing the tissue distribution and statistical significance of eQTL associations for each SNP. Bubble size corresponds to statistical significance (−log 10  P value), while color reflects the direction and magnitude of the NES. (C) Heatmap of significant splicing quantitative trait loci (sQTLs) depicting tissue- and event-specific associations of rs41305661, rs145648894, and rs2904804 with AKR1C1 alternative splicing. Strong splicing signals are observed in Colon, Testis, and Brain–Hippocampus tissues, indicating context-dependent regulation of isoform usage. The sQTL analysis identified three SNPs—rs41305661, rs145648894, and rs2904804—as significant regulators of AKR1C1 alternative splicing across multiple tissues. Notably, rs41305661 and rs145648894 exhibited strong splicing associations in Colon and Testis tissues ( P  < 10 −14 ). In addition, rs2904804 showed a highly significant effect on AKR1C1 splicing in the Brain–Hippocampus ( P  = 1.4 × 10 −11 ), indicating pronounced tissue-specific modulation of isoform expression ( Fig. 9 C). Analysis of AKR1C1 genomic alterations across multiple cancer types using cBioPortal revealed substantial heterogeneity in both the type and frequency of alterations ( Fig. 10 A). These alterations include point mutations, gene amplifications, and deep deletions, with the highest overall alteration frequencies observed in bladder cancer and ovarian epithelial tumors (approximately 6%). In contrast, mutation-driven alterations were more prominent in endometrial cancer, melanoma, and colorectal cancer, each exhibiting alteration frequencies of approximately 2%. Furthermore, survival analysis comparing patients with and without AKR1C1 alterations demonstrated a significantly poorer overall survival in the altered group ( Fig. 10 B), as indicated by a highly significant p value ( p = 3.309 × 10 − 4 ). Collectively, these findings suggest that diverse genomic alterations in AKR1C1 are not only cancer-type specific but may also have important prognostic implications across malignancies. Fig. 10 (A) Occurrence of different types of alteration in AKR1C1 gene in different types of cancer patients obtained from cBioPortal Database. (B) Overall Survival of Cancer patients with or without the alteration in AKR1C1 gene, obtained from cBioPortal (p value = 0.0003). (C) Lollipop plot of different types of mutations in AKR1C1 gene along with post-translational modification in cancer patients. Fig. 10 (A) Occurrence of different types of alteration in AKR1C1 gene in different types of cancer patients obtained from cBioPortal Database. (B) Overall Survival of Cancer patients with or without the alteration in AKR1C1 gene, obtained from cBioPortal (p value = 0.0003). (C) Lollipop plot of different types of mutations in AKR1C1 gene along with post-translational modification in cancer patients. Functional assessment of missense mutations retrieved from cBioPortal was performed using Mutation Assessor, SIFT, and PolyPhen-2 to predict deleterious and neutral effects. From cBioPortal 57 missense cancer mutations were retrieved where 14 of them were reported in ensembl as genetic SNPs and analyzed in the missense part mentioned above; however, as the duplicate 14 SNVs were found in cancer patients (from cBioPortal), they are presented in the Supplementary file “Supplementary_cBioPortal_Functional analysis”. A total of 45 from 57 cancer-associated missense mutations were predicted to be deleterious by at least one of these tools, with 12 mutations (cancer mutation which were not reported as SNPs in ensembl) consistently classified as deleterious by all three predictors. To further contextualize their role in cancer, Cancer Genome Interpreter analysis was conducted and revealed that all analyzed missense mutations were categorized as passenger mutations rather than oncogenic driver mutations ( Table 15 ), indicating potential functional impact without established driver status. Table 15 Functional Analysis and “Cancer Genome Interpreter” result of cancer mutations retrieved from cBioPortal which were predicted to be deleterious by these 3 SNP analysis tools. Table 15 Mutations Mutation Assessor SIFT Polyphen-2 Oncogenic Classification Impact Score Prediction Score Prediction Score Passenger Y55C high 4.72 deleterious 0 probably_damaging 0.999 Passenger Q190R high 3.915 deleterious 0.01 probably_damaging 0.974 Passenger A37D high 3.85 deleterious 0.01 possibly_damaging 0.49 Passenger S102L medium 3.385 deleterious 0 probably_damaging 0.995 Passenger F80L medium 3.295 deleterious 0 probably_damaging 0.985 Passenger D78H medium 2.97 deleterious 0 possibly_damaging 0.57 Passenger D156Y medium 2.9 deleterious 0 possibly_damaging 0.627 Passenger Y110H medium 2.865 deleterious 0.01 probably_damaging 0.945 Passenger L106I medium 2.645 deleterious 0 possibly_damaging 0.732 Passenger C188R medium 2.455 deleterious 0 probably_damaging 0.999 Passenger V120E medium 2.31 deleterious 0.02 possibly_damaging 0.731 Passenger K68 N medium 2.28 deleterious 0.01 probably_damaging 0.956 Passenger Functional Analysis and “Cancer Genome Interpreter” result of cancer mutations retrieved from cBioPortal which were predicted to be deleterious by these 3 SNP analysis tools. The distribution of missense mutations across the AKR1C1 protein, including the aldo-keto reductase domain and the nine gene exons, is illustrated in Fig. 10 C, where mutations are indicated by green lollipop markers. The analysis shows that the lowest number of mutations occurs in exons 8, 9, 4, and 1, with no reported cancer-associated mutations in exons 8 and 9. In addition, post-translational modification (PTM) mapping in cancer patients revealed that several mutations overlap known PTM sites. Specifically, phosphorylation-related mutations were identified at Y55C and Y110H, while an acetylation-associated alteration was observed at the K247Nfs∗8 fusion somatic mutation, highlighting potential regulatory disruption at these modified residues ( Fig. 10 C).

Conclusion

Aldo-keto reductase 1C1 (AKR1C1) plays an important role in the metabolism of xenobiotics and endogenous steroids. In this study, computational analyses were employed to assess the impact of selected single nucleotide polymorphisms (SNPs) on the structural integrity and regulatory characteristics of the AKR1C1 protein. The results indicate that the variants G20R and G62R, along with the cancer-associated mutations Y55C and Y110H, are predicted to adversely affect protein structure or regulation. These predicted alterations suggest a potential influence on AKR1C1 functional properties at the molecular level. The identified variants therefore represent relevant targets for future structural and functional investigations aimed at further characterizing their effects on AKR1C1 behavior.

Discussion

Aldo-keto reductase family type 1C (AKR1C) genes share greater than 86% sequence identity [ 67 , 68 ]. Many research have found that non synonymous SNPs in human AKR genes are associated with various kinds of disease [ 69 , 70 ]. Among this family of enzymes, AKR1C1 is highly expressed in liver and fat tissues [ 71 ]. Non-syndromic lipedema patients carry an AKR1C1 L213Q mutation that reduces the enzyme's function without changing its expression [ 7 ]. Recent clinical reports have further emphasized the pathogenic and diagnostic relevance of AKR1C family variants in human disease in similar studies [ 72 , 73 ]. VarElect NGS Phenotyper ranked AKR1C family genes based on phenotype–disease relevance, placing AKR1C3 first and AKR1C1 second for cancer-associated conditions ( Table 1 ). Since AKR1C3 has already been extensively analyzed in previous in silico studies, and AKR1C2/AKR1C4 received low relevance scores (<10), AKR1C1 was selected as the primary target for SNP and cancer-mutation assessment [ 1 ]. GWAS Catalog literature-extracted data further supported this choice, identifying strong associations between AKR1C1 and laryngeal squamous cell carcinoma and hypopharyngeal cancer (p < 0.00001) ( Table 2 ) [ [74] , [75] , [76] ]. CTD annotations additionally linked AKR1C1 to a wide range of experimentally supported conditions, including lung and endometrial carcinoma, hormone-dependent tumors, liver cirrhosis, osteoarthritis, and sex-development disorders. This study applied a structured, multi-layered computational framework integrating functional prediction, evolutionary conservation, protein-stability modeling, structural analysis, regulatory annotation, and population-level filtering. Unlike previous studies evaluating individual variants, our integrated pipeline enabled systematic prioritization of rare and deleterious SNPs with potential structural or regulatory impact [ 77 , 78 ]. Twelve independent prediction tools were used to filter neutral missense SNPs, collectively identifying 62 deleterious variants among 301 missense SNPs. These tools incorporate sequence conservation, physicochemical features, Gene Ontology functions, structural context, and solvent accessibility. Conservation analysis via ConSurf narrowed these to 42 highly conserved residues, mostly with scores of 8–9, indicating strong functional importance [ 79 ]. Two residues F80 and G220 were retained despite having scores of 7 due to their proximity to the catalytic tetrad and active-site region, consistent with NCBI domain annotations. Among these, G20R and G62R emerged as the most deleterious variants across all functional tools and stability predictors, supporting their selection for downstream structural evaluation. Phosphositeplus annotates Y55H to be a phosphorylation site and K270 N as ubiquitylation and acetylation sites from the literatures ( Fig. 2 (C–D) & Supplementary Table S10 ). Residues Y55 and K270 in AKR1C1 are located in drug-binding and post-translational modification sites (phosphorylation and acetylation/ubiquitylation, respectively) as annotated in PhosphoSitePlus and dbPTM. Although specific amino-acid level associations with cancer have not been reported, dysregulation or mutation of the AKR1C1 gene has been linked to various cancers, including breast cancer and colorectal adenocarcinoma [ 62 ]. Therefore, mutations at these critical residues may potentially impact protein function and contribute to cancer-related pathways, warranting further experimental validation. NetPhos also predicted some phosphorylation site of AKR1C1 protein ( Fig. 2 C–D). Protein Instability analysis tools-DynaMut, DUET, ENCoM, INPS, mCSM, I-Mutant, and SDM were utilized to evaluate the effects of mutations on protein conformation, dynamics and stability caused by changes in thermodynamic free energy or protein vibrational entropy [ 80 ]. Protein stability change can lead to protein malfunction, resulting in disease. 28 missense SNPs that decrease protein stability as predicted by 5 of the 7 tools were selected for further analysis ( Table 5 ). Residues that were predicted to be clustered by Mutation 3D were used for modeling ( Fig. 5 ) and then their quality assessment by protein structure homology modeling system, SWISS-MODEL ( Table 6 ) [ 81 ]. Among the 18 mutant models 9 (R47S, N57D, G62R, F80S, D112G, L113P, G20E, G20R, and A41T) were found to have MolProbity score and Clash score of >1.20 and 2.00 respectively ( Table 6 ), which were far away from the ideal case or wild type model values [ 82 ]. So, these mutant models don't have ideal conformational structure that is required for the wild type AKR1C1 protein function. TM-Align helped to measure RMSD score and TM value which are indicators of protein models being identical or non-identical ( Table 8 ). After comparing in TM Align, we found 2 SNPs that had higher RMSD score of 0.07 than other SNPs. The Higher is the RMSD score, the more deviation will be between the wild type and mutant protein structure. Two SNPs, G20R and G62R, exhibited markedly lower TM-scores compared with other mutant models, indicating substantial structural divergence from the wild-type protein. Both substitutions involve replacement of a conserved glycine with arginine, a larger, positively charged residue with reduced hydrophobicity, which is likely to introduce steric strain and local destabilization within the tightly packed protein core. Similar destabilizing effects of Gly→Arg substitutions have been experimentally documented in multiple proteins [ [83] , [84] , [85] ]. The high Percent Agreement values (0.916–1.0) and consistent statistical significance further support the conclusion that mutations at conserved glycine hotspots, such as G20 and G62, are particularly disruptive to protein function. Molecular dynamics simulations provided additional insight into the structural consequences of these variants. Despite predicted destabilization from static energy-based analyses, 200-ns simulations showed that both G20R and G62R mutants retained global structural stability comparable to the wild-type protein, as evidenced by similar RMSD, RMSF, radius of gyration, and solvent-accessible surface area profiles. These results suggest that while G20R and G62R do not alter the overall fold of AKR1C1, they likely impair function through subtle, localized conformational perturbations not captured by global stability metrics alone. Thus, the predicted deleterious effects of these variants are likely mediated through subtle local conformational or functional changes rather than gross alterations of global protein architecture. Ensembl's Variant Effect Predictor (VEP) gives information about the genomics of missense SNPs along with their impact on translational protein product ( Table 9 ). Among 28 variants, G20R and G20E were shown to be promoter region transcription factor binding site variants, where TEAD4MAX, POU2F1ELK1, POU2F1ETV1, and POU2F1ETV4 TFs interaction occurs. On the negative strand they also fall within the snoRNA U8 gene region. Impact of these 2 SNPs were predicted to be at modifier level by VEP. On the other hand, VEP predicted G62R to have impact at moderate level and it did not annotate any TF interactions. From a population genetics perspective, integration of large-scale population and clinical variant databases indicates that the prioritized AKR1C1 variants, G20R (p.Gly20Arg; rs1384869123) and G62R (p.Gly62Arg; rs1274415938), are exceptionally rare in the general population. Their very low minor allele frequencies in gnomAD v4.1.0 exomes and exclusive observation in the heterozygous state suggest that these substitutions are not common polymorphisms. The absence of pathogenic, likely pathogenic, or benign assertions for these variants in ClinVar and dbSNP reflects limited prior clinical characterization rather than evidence for or against disease association. In parallel, gene-level constraint metrics indicate that AKR1C1 exhibits moderate tolerance to missense variation and low intolerance to loss-of-function mutations, implying that not all coding changes are deleterious at the gene level. Nevertheless, when considered alongside the predicted structural and energetic perturbations observed for G20R and G62R, their extreme rarity supports prioritization of these variants for further experimental and mechanistic validation, while avoiding definitive claims regarding clinical pathogenicity. Several chemical components are enlisted in Supplementary Table S12 which give hints on the effect of environmental exposure to AKR1C1 protein's expression, mRNA splicing, binding, catalytic and regulatory activity. Some of the chemicals increase the expression of AKR1C1 gene whereas some decrease. i.e., Arsenic Trioxide inhibits the reaction of AKR1C1 protein by binding to it, Arsenite increases AKR1C1 gene expression, Lead (Pb) affects AKR1C1 mRNA splicing, and Aflatoxin B1 results in decreased methylation of AKR1C1 gene. Proteins/genes that interact physically with AKR1C1 were retrieved from dbPTM, BioGRID, STRING and CTD databases and predicted by Genemania tool ( Supplementary Table S13 ). AGR2, AKR1D1, AKR1C2, AKR1C3, BRD4, MAPK3, TFF3 etc are human proteins that interact with AKR1C1 ( Fig. 8 ). Moreover, HCV protein GP1 and SARS-CoV proteins like NSP12, ORF14, ORF3B interact with AKR1C1 protein physically ( Fig. 8 , Supplementary Table S13 ). CTD database enlisted some of the significant pathways that AKR1C1 is involved in, i.e., metabolism of bile acid & bile salt, xenobiotics by cyt p450 ( Supplementary Table S11 ). Therefore, change in AKR1C1 protein's structure and/or function may have negative impact on those pathways and functional networks. 3′ UTR and miRNA analysis was done in PolymiRTS where 5 of the 3′ UTR SNPs were found to reside in miR site and possibly disrupting and/or creating new mRNA-miRNA interaction ( Table 10 ). They are rs8483, rs35289491, rs35763015, rs192167415, and rs184205383. RegulomeDB predicted regulatory role of 1384 3′ UTR SNPs and 14 5′ UTR SNPs of AKR1C1 gene with probability. In case of 3′ UTR, predictions with probability of more than 0.95 are shown in Table 11 . Where most of the 5′ UTR SNPs showed to have Transcription Factor binding and DNase peak role, 3′ UTR SNPs were found to have many more regulatory roles. i.e. eQTL + TF binding + matched TF motif + matched DNase Footprint + DNase peak ( Table 12 ). Regulatory role of 3216 intronic SNPs were also predicted with their probability by RegulomeDB. 830 of them had probability of more than 0.6. intronic SNPs with probability of 1 and of higher ranking are shown in Table 13 . GWAS Catalog analysis identified seven disease-associated SNPs, none of which were exonic. Five were intronic, one located in the 3′-UTR, and one ∼200 bp downstream of the AKR1C1 gene; therefore, missense SNP analyses were not applicable to these variants. Instead, regulatory annotation was performed using RegulomeDB. Among the seven SNPs, rs2904804 showed the strongest regulatory evidence (probability >0.72), including eQTL signals, transcription factor binding, and DNase hypersensitivity. The remaining SNPs exhibited weaker regulatory support. As is typical for GWAS findings, these variants such as rs2904804, rs77045180, and rs6601888 are likely tag SNPs representing linkage disequilibrium blocks rather than direct causal variants. Thus, RegulomeDB predictions provide preliminary functional insights but do not establish causality. Further studies incorporating fine-mapping and colocalization analyses, along with tissue-specific eQTL evaluation, are required to identify causal variants and clarify the regulatory role of AKR1C1 in disease susceptibility [ 86 , 87 ]. To distinguish statistically associated “tag” variants from those with true functional relevance, we integrated comprehensive eQTL analyses into our interpretation. This approach revealed strong regulatory evidence for five prioritized SNPs—rs41305661, rs145648894, rs2904804, rs77045180, and rs6601888—supporting their candidacy as functional contributors rather than passive markers. Notably, 186 tissue-specific eQTL associations were identified, with rs41305661 and rs145648894 exhibiting particularly robust effects across multiple tissues (P < 10 −44 ), thereby providing critical fine-mapping context. The pronounced tissue-dependent regulatory patterns observed for rs2904804 and rs77045180, especially within metabolically active and skeletal muscle tissues, further indicate that these variants actively modulate AKR1C1 expression in biologically and clinically relevant environments. Collectively, these findings extend beyond statistical association and implicate a direct regulatory mechanism through which these SNPs may contribute to AKR1C1 dysregulation and its associated disease phenotypes. In addition to their previously observed eQTL effects, rs41305661, rs145648894, and rs2904804 were identified as significant sQTLs for AKR1C1, indicating that these variants exert regulatory influence at multiple levels of gene control. The strong splicing associations of rs41305661 and rs145648894 in Colon and Testis tissues (P < 10 −14 ) suggest that these SNPs may modulate tissue-specific isoform usage in organs where AKR1C1 is known to play important metabolic and regulatory roles. Similarly, the pronounced splicing effect of rs2904804 in the Brain–Hippocampus (P = 1.4 × 10 −11 ) highlights a distinct, tissue-restricted regulatory mechanism that may influence neural-specific AKR1C1 transcript diversity. The convergence of eQTL and sQTL signals indicates that these variants regulate AKR1C1 through both expression and splicing mechanisms in a tissue-dependent manner. By altering isoform composition, they may influence protein function and downstream pathways beyond changes in total transcript levels. Together, this multilayered regulatory evidence supports these GWAS-associated SNPs as likely causal contributors to AKR1C1-related disease risk. Mutation data of AKR1C1 gene in cancer patients were retrieved from cBioportal Database. Fig. 10 A shows the type of AKR1C1 gene alterations that occur in various cancer types. Most of the alterations occurs in bladder cancer, breast cancer and ovarian epithelial tumor with their alteration frequency of ∼6% [ 88 ]. On the other hand, among other kinds of alterations most of the mutations occur in endometrial cancer, melanoma, and colorectal cancer with ∼2% alteration frequency ( Fig. 10 A). It is also seen that amplification occurs at higher percentage in bladder cancer, ovarian epithelial tumor, breast cancer, glioma and hepatobiliary cancer. Furthermore, deep deletion prevails in mature B cell neoplasms and non-seminomatous germ cell tumor. Overall survival analysis showed that the overall survival is worse in AKR1C1 altered group than unaltered group ( Fig. 10 B). This indicates that mutations in AKR1C1 gene may have important impact on cancer patients' longevity. Functional analysis of missense mutations from SNP analysis tool named Mutation Assessor, SIFT, and Polyphen-2 revealed that out of 57 missense mutations 12 were predicted to be deleterious by these 3 tools including Y55C and Y110H ( Table 15 ). After analyzing cancer mutation positions and PTMs it was found that, Y55C and Y110H mutations occur at 2 phosphorylation sites ( Fig. 10 C) and K247Nfs∗8 mutation occur at acetylation site. So, mutation at these sites hinders from proper post translational modification. The deleterious variants we identified may have significant implications for tumor aggressiveness because AKR1C1 has been experimentally shown to enhance invasion and metastasis in cancer cells [ 58 ]. Moreover, variants such as G20R and G62R are predicted to affect protein stability and catalytic region integrity, consistent with AKR1C1's role in invasion and metastasis and with implications for chemoresistance and steroid hormone metabolism, as supported by prior mechanistic and review studies [ [58] , [59] , [60] ]. Although several studies have linked AKR1C1 dysregulation to cancer development and therapeutic resistance, comprehensive functional characterization of AKR1C1 genetic variants remains largely unexplored. Most previous work has focused on expression-level changes or isolated mutations, without evaluating how evolutionarily conserved residues, PTM sites, and structural hotspots respond to specific missense alterations. Our analysis addresses this gap by identifying high-impact variants such as G20R and G62R; however, the absence of experimental validation highlights an ongoing need for biochemical, structural, and cellular studies to confirm the predicted destabilization and functional consequences. Furthermore, the regulatory and non-coding variants identified here have not been mechanistically studied in disease contexts, representing an additional requirement for future research.

Introduction

A subfamily of reductase enzymes known as aldo-keto reductase type 1C (AKR1C) includes 4 enzymes termed as AKR1C1–AKR1C4. These enzymes conduct NADPH dependent reductions and are essential for the biotransformation of numerous pharmacological substrates and endogenous substances like steroids, as well as for biosynthesis and intermediary metabolism [ 1 , 2 ]. AKR1C1 gene encoded enzyme converts progesterone into the inactive 20 α-hydroxyprogesterone with its ketosteroid reduction activity [ 3 , 4 ]. Variable drug response and disease risk are influenced by genetic variation in metabolizing enzymes. Pharmacogenetic studies have looked at AKR1C gene variants and how they relate to the development of diseases, how drugs behave, and how treatments work [ 2 ]. The gene for AKR1C1 is located in the forward strand of Chromosome 10 in human genome. With genomic length of 16 kb, it encodes aldo-keto reductase family 1 member C1 enzyme of 323aa. AKR1C1 consists of 9 exons. AKR1C1 is significantly expressed in the cytoplasm of fat and liver tissues as ∼25kD protein, AKR1C1 [ 3 , 5 , 6 ]. According to Michelini et al.’s hypothesis, an inherited missense variant Leu213Gln in AKR1C1 leads to partial loss of function of the reductase enzyme AKR1C1, which would cause progesterone to be converted to hydroxyprogesterone more slowly and ineffectively, showing association with lipedema, a condition marked by subcutaneous fat accumulation [ 7 ]. AKR1C1 down-regulation was found to be associated with advanced clinicopathological characteristics such as larger size tumors, more lymphatic nodes involvement, metastasis and later clinical stages. On the other hand, it was found in nasopharyngeal carcinoma (NPC) patients, that down-regulation of AKR1C1 was an excellent prognostic factor for overall survival (OS). While AKR1C1 knockdown could increase the cisplatin sensitivity of NPC cells, an in vitro investigation revealed that AKR1C1 was not directly implicated in the biological behaviors of malignant NPC cells such as proliferation, cell cycle progression, and migration. This enhanced sensitivity indicates a reduced chemoresistance phenotype, suggesting that lowering AKR1C1 expression may improve therapeutic response and make NPC tumors more susceptible to cisplatin-based treatment. These findings imply that AKR1C1 has the potential to be a marker for cisplatin response prediction and a molecular target for improving cisplatin sensitivity in NPC [ 8 ]. According to Cheng et al.’s research on lung cancer cell lines, the decrease in AK1C1 expression in lung tumors may help to raise the quantities of DNA adducts, which may help explain in part why female lung cancer patients are more susceptible to DNA damage [ 9 ]. Findings of Lewis et al. shows that expression of AKR1C1, AKR1C2 and AKR1C3 is lowered in tumorous breast tissue when compared to normal breast tissue, which could aid in understanding why breast tumor tissues have lower levels of the mitogen/metastasis inhibiting 3HP progesterone metabolites [ 10 ]. In non-metastatic cancer cells, overexpression of AKR1C1 gene noticeably increased metastasis both in vitro and in vivo, while AKR1C1 decrease in highly metastatic tumors effectively mitigated these effects. It was found from quantitative genomic and functional investigations that AKR1C1 enhanced STAT3 phosphorylation directly and strengthened STAT3 binding to the promoter regions of target genes, thus endorsing tumor metastasis. This is done in a catalytic independent pathway [ 11 ]. Thus, at different stages of different cancer types AKR1C1 gene plays diverse roles either as positive or negative player. Several studies have demonstrated that single nucleotide variations in AKR1C1, AKR1C2 , and AKR1C3 play critical roles in the biosynthesis and metabolism of intracrine androgens. These variations are associated with increased resistance to multiple anticancer drugs, including doxorubicin, oracin, and cisplatin, and with elevated expression of radio-resistance–associated proteins in lung cancer [ 12 ]. Although most prostate cancer patients initially respond to androgen deprivation therapy, relapse commonly occurs with a more aggressive, androgen-independent disease. Stanbrough et al. reported that upregulated expression of androgen-metabolizing genes such as AKR1C1, AKR1C2 , and AKR1C3 enables prostate cancer cells to adapt to androgen-deprived conditions [ 13 ]. Accumulating evidence further indicates that AKR1C1 genetic variants contribute to carcinogenesis, chemoresistance, and disrupted hormonal homeostasis, particularly in breast cancer, glioma, colon adenocarcinoma, lung cancer, and nasopharyngeal carcinoma. Despite this growing evidence, information on the association between AKR1C1 genetic variations and pharmacological efficacy, safety, and pharmacokinetics remains limited. Consequently, further studies are required to clarify these relationships and establish their clinical relevance. Before AKR1C genetic variants can be reliably translated into clinical practice, consistent and reproducible findings are essential [ 2 ]. Notably, this study is among the first to integrate coding and non-coding SNP analysis of AKR1C1 with post-translational modification (PTM) site mapping, protein–chemical interaction analysis, cancer genomics, and structural modeling within a unified workflow. This integrative strategy enables robust prioritization of high-impact variants. Although AKR1C1 has been extensively characterized at the biochemical level, the functional consequences of its missense and regulatory SNPs remain poorly understood, particularly in the context of cancer, endocrine disorders, and drug response. Previous investigations have largely focused on expression-level dysregulation, with no comprehensive computational study combining functional prediction, structural modeling, PTM analysis, regulatory annotation, and cancer mutation profiling of AKR1C1 variants. To address this gap, we conducted a detailed in silico structural and functional analysis of the AKR1C1 gene, leveraging large-scale data from multiple databases and a wide range of web-based tools and software [ 14 ]. Despite the established biological importance of AKR1C1 in xenobiotic metabolism, hormone regulation, and cancer, no study has systematically evaluated all reported coding and non-coding SNPs using a comprehensive, multi-layered framework. Prior work has typically examined expression dysregulation or a limited number of variants, without integrating evolutionary conservation, PTM disruption, protein stability, structural modeling, regulatory effects, and population genetics. This lack of a complete variant-impact landscape represents a critical knowledge gap, particularly given the frequent alteration of AKR1C1 in cancer and its sensitivity to single-residue changes. Therefore, a full-scale in silico prioritization of AKR1C1 variants is necessary to identify the most functionally disruptive and potentially pathogenic SNPs.

Coi Statement

The authors declare no conflict of interest. The study represents original research work carried out by the team and the contents of the paper are neither published nor submitted for publication to any other journal.

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

Ask this paper AI returns verbatim quotes from the full text · source: pmc-nxml

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2026) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc
last seen: 2026-06-26T06:14:25.090378+00:00
unpaywall
last seen: 2026-05-21T05:10:58.409756+00:00
License: publisher-OA-unknown · commercial use NOT OK · attribution required