Full text
2,876 characters
· extracted from
oa-doi-fallback
· click to expand
Abstract
The impact of missense genetic variations on protein function is often enigmatic, especially for mutations that map to intrinsically disordered regions (IDRs). Given the functional importance of phase separation of IDRs, it has been proposed that mutations that modulate phase separation might preferentially lead to disease. To examine this idea, we used the robust predictability of phase-separating (PS) IDRs and annotation of disease-associated proteins and mutations to map the correlation between disease and phase separation. Consistent with previous work linking phase separation to cancer and autism spectrum disorder, we find a higher prevalence of predicted phase separation behavior in disease-associated proteins than typical for human proteins. We map the prevalence of phase separation across a wide range of diseases, finding that many, but not all, show an enrichment of phase separation in the proteins associated with them. Strikingly, the pathogenic mutation rate in predicted PS IDRs was elevated three-fold relative to IDRs not predicted to phase separate. Substitutions involving arginine and the aromatic types were among the most pathogenic for PS IDRs, while substitutions involving serine, threonine, and alanine the most benign. We applied these trends to mutations of uncertain clinical significance and predict that half found in PS IDRs are likely pathogenic. We find that phosphorylation sites were enriched in PS IDRs when compared to other protein regions, though mutations at such sites were mostly benign. Pathogenicity was highest for mutations in predicted PS IDRs when also found in a short linear motif, known mediators of protein-protein interactions.
Competing Interest Statement
The authors have declared no competing interest.
Footnotes
1.Changed the order of results as presented. 2.Used statistical tests to identify which results were statistically significant and which were a null or neutral result. 3.Analytical methods have been described with equations, rather than prose. 4.Calculated the standard error associated with mutation rates and odds ratios to provide error bars. 5.Calculated per residue mutation rates and removed the per protein computed rates. 6.Calculated mutation rates in confirmed ID, confirmed disease, and confirmed phase-separating proteins to test for bias that could arise from some proteins being more researched than others (finding none). 7.Calculated mutation rates in sequence sets with different levels of enrichment for predicted phase separation to test for bias in the computational predictor (finding none).
Abbreviations
- AUC
- area under the curve
- ELM
- Eukaryotic Linear Motif
- ID
- intrinsically disordered
- IDR
- intrinsically disordered region
- nonPS
- nonphase-separating
- PS
- phase-separating
- SLiM
- short linear motif
- UniProt
- Universal Protein Knowledgebase.
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.