Full text
36,172 characters
· extracted from
preprint-html
· click to expand
Neur-Ally: A deep learning model for regulatory variant prediction based on genomic and epigenomic features in brain and its validation in certain neurological disorders | bioRxiv /* */ /* */ <!-- <!-- /*! * yepnope1.5.4 * (c) WTFPL, GPLv2 */ (function(a,b,c){function d(a){return"[object Function]"==o.call(a)}function e(a){return"string"==typeof a}function f(){}function g(a){return!a||"loaded"==a||"complete"==a||"uninitialized"==a}function h(){var a=p.shift();q=1,a?a.t?m(function(){("c"==a.t?B.injectCss:B.injectJs)(a.s,0,a.a,a.x,a.e,1)},0):(a(),h()):q=0}function i(a,c,d,e,f,i,j){function k(b){if(!o&&g(l.readyState)&&(u.r=o=1,!q&&h(),l.onload=l.onreadystatechange=null,b)){"img"!=a&&m(function(){t.removeChild(l)},50);for(var d in y[c])y[c].hasOwnProperty(d)&&y[c][d].onload()}}var j=j||B.errorTimeout,l=b.createElement(a),o=0,r=0,u={t:d,s:c,e:f,a:i,x:j};1===y[c]&&(r=1,y[c]=[]),"object"==a?l.data=c:(l.src=c,l.type=a),l.width=l.height="0",l.onerror=l.onload=l.onreadystatechange=function(){k.call(this,r)},p.splice(e,0,u),"img"!=a&&(r||2===y[c]?(t.insertBefore(l,s?null:n),m(k,j)):y[c].push(l))}function j(a,b,c,d,f){return q=0,b=b||"j",e(a)?i("c"==b?v:u,a,b,this.i++,c,d,f):(p.splice(this.i++,0,a),1==p.length&&h()),this}function k(){var a=B;return a.loader={load:j,i:0},a}var l=b.documentElement,m=a.setTimeout,n=b.getElementsByTagName("script")[0],o={}.toString,p=[],q=0,r="MozAppearance"in l.style,s=r&&!!b.createRange().compareNode,t=s?l:n.parentNode,l=a.opera&&"[object Opera]"==o.call(a.opera),l=!!b.attachEvent&&!l,u=r?"object":l?"script":"img",v=l?"script":u,w=Array.isArray||function(a){return"[object Array]"==o.call(a)},x=[],y={},z={timeout:function(a,b){return b.length&&(a.timeout=b[0]),a}},A,B;B=function(a){function b(a){var a=a.split("!"),b=x.length,c=a.pop(),d=a.length,c={url:c,origUrl:c,prefixes:a},e,f,g;for(f=0;f<d;f++)g=a[f].split("="),(e=z[g.shift()])&&(c=e(c,g));for(f=0;f<b;f++)c=x[f](c);return c}function g(a,e,f,g,h){var i=b(a),j=i.autoCallback;i.url.split(".").pop().split("?").shift(),i.bypass||(e&&(e=d(e)?e:e[a]||e[g]||e[a.split("/").pop().split("?")[0]]),i.instead?i.instead(a,e,f,g,h):(y[i.url]?i.noexec=!0:y[i.url]=1,f.load(i.url,i.forceCSS||!i.forceJS&&"css"==i.url.split(".").pop().split("?").shift()?"c":c,i.noexec,i.attrs,i.timeout),(d(e)||d(j))&&f.load(function(){k(),e&&e(i.origUrl,h,g),j&&j(i.origUrl,h,g),y[i.url]=2})))}function h(a,b){function c(a,c){if(a){if(e(a))c||(j=function(){var a=[].slice.call(arguments);k.apply(this,a),l()}),g(a,j,b,0,h);else if(Object(a)===a)for(n in m=function(){var b=0,c;for(c in a)a.hasOwnProperty(c)&&b++;return b}(),a)a.hasOwnProperty(n)&&(!c&&!--m&&(d(j)?j=function(){var a=[].slice.call(arguments);k.apply(this,a),l()}:j[n]=function(a){return function(){var b=[].slice.call(arguments);a&&a.apply(this,b),l()}}(k[n])),g(a[n],j,b,n,h))}else!c&&l()}var h=!!a.test,i=a.load||a.both,j=a.callback||f,k=j,l=a.complete||f,m,n;c(h?a.yep:a.nope,!!i),i&&c(i)}var i,j,l=this.yepnope.loader;if(e(a))g(a,0,l,0);else if(w(a))for(i=0;i (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0];var j=d.createElement(s);var dl=l!='dataLayer'?'&l='+l:'';j.src='//www.googletagmanager.com/gtm.js?id='+i+dl;j.type='text/javascript';j.async=true;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-M677548'); Skip to main content Home About Submit ALERTS / RSS Search for this keyword Advanced Search New Results Neur-Ally: A deep learning model for regulatory variant prediction based on genomic and epigenomic features in brain and its validation in certain neurological disorders Anil Prakash , View ORCID Profile Moinak Banerjee doi: https://doi.org/10.1101/2025.01.27.635013 Anil Prakash 1 Human Molecular Genetics Lab, Neurobiology and Genetics Division, Rajiv Gandhi Centre for Biotechnology , Thiruvananthapuram, Kerala, 695014, India 2 Department of Biotechnology, University of Kerala , Kariavattom, Thiruvananthapuram, Kerala, India Find this author on Google Scholar Find this author on PubMed Search for this author on this site Moinak Banerjee 1 Human Molecular Genetics Lab, Neurobiology and Genetics Division, Rajiv Gandhi Centre for Biotechnology , Thiruvananthapuram, Kerala, 695014, India Find this author on Google Scholar Find this author on PubMed Search for this author on this site ORCID record for Moinak Banerjee For correspondence: mbanerjee{at}rgcb.res.in moinak{at}gmail.com Abstract Full Text Info/History Metrics Supplementary material Preview PDF ABSTRACT Large scale quantitative studies have identified significant genetic associations for various neurological disorders. Expression quantitative trait loci [eQTL] studies have shown the effect of single nucleotide polymorphisms [SNPs] on the differential expression of genes in brain tissues. However, a large majority of the associations are contributed by SNPs in the noncoding regions which can have significant regulatory function but are often ignored. Besides mutations that are in high linkage disequilibrium [LD] with actual regulatory SNPs will also show significant associations. Therefore, it is important to differentiate a regulatory non-coding SNPs with a non-regulatory one. To resolve this, we developed a deep-learning model named Neur-Ally, which was trained on epigenomic datasets from nervous tissue and cell line samples. The model predicts differential occurrence of regulatory features like chromatin accessibility, histone modifications and transcription-factor [TF] binding on genomic regions using DNA sequence as input. The model was used to predict the regulatory effect of neurological condition specific non-coding SNPs using in-silico mutagenesis. The effect of associated SNPs reported in Genome-wide association studies [GWAS] of neurological condition, Brain eQTLs, Autism Spectrum Disorder [ASD] and reported probable regulatory SNPs in neurological conditions were predicted by Neur-Ally. INTRODUCTION Understanding the genetics of complex disorders are extremely difficult due to the heterogeneity within and between populations [ 1 ]. The decreasing cost of sequencing and the emergence of custom genotyping arrays have resulted in an increase of quantitative genetic studies like GWAS and eQTL analysis. Linkage disequilibrium between regulatory and normal SNPs can lead to the emergence of non-causative genetic variants in association results [ 2 ]. Identifying the actual causative variant using experimental techniques will be extremely difficult. The common significant variant may be having a narrow effect on the phenotype and the combined effect of multiple variants will be needed for the phenotype to occur [ 2 ]. So functional screening of a large number of associated single genetic variants will be challenging. Non-coding genetic variants are highly enriched in risk variants, identified by quantitative genetic analysis of complex diseases [ 3 ]. The effect of such mutations can be indirect and regulatory in function. The effect of coding mutations like missense and nonsense SNPs can be studied by analyzing the structural changes introduced in the normal protein. In contrast, the regulatory effect of a non-coding SNP will be difficult to decipher. The regulatory landscape can be cell line or tissue specific and this adds to the problem. Large number of epigenomic datasets from different tissues and cell lines are available from the Encode project [ 4 ]. The incorporation of regulatory features improved the modelling and representation of complex diseases [ 5 ]. Hence the use of tissue specific regulatory datasets will aid in understanding more about the interplay between the regulome and complex diseases [ 6 ]. Through the use of ChIP-seq, researchers have made substantial progress in locating transcription factor binding sites, investigating the regulatory roles of transcription factors in gene expression, and mapping histone modifications throughout the genome [ 7 ]. ChIP-seq analysis has increasingly been combined with other functional genomics techniques, facilitating a deeper understanding of the mechanisms that regulate gene expression [ 8 ]. A wide array of quantitative methodologies has significantly contributed to the progress in assessing chromatin accessibility. Continuous improvements in integrated analysis are expected to enhance our comprehension of the intricate relationships among DNA accessibility, gene expression, genetic variants, protein interactions, transcription and subsequent phenotypes [ 9 ]. The regulation of genes, which includes transcription and alternative splicing, is fundamentally influenced by DNA- and RNA-binding proteins. Recent advancements in deep learning methodologies have facilitated the prediction of the sequence specificities of these proteins, thereby enhancing our understanding of regulatory mechanisms [ 10 ]. Deep learning models are capable of discerning regulatory sequence patterns from extensive regulatory datasets, such as chromatin accessibility data. This capability allows for the prediction of chromatin effects resulting from sequence modifications with single-nucleotide precision. As a result, these models have significantly improved the prioritization of functional variants, which encompasses eQTLs and variants linked to diseases [ 11 ]. Computational tools that are created using cell or tissue specific datasets will help in better understanding of the diseases that are connected to those types of tissues or cells. This prompted us to create a model for all neurological conditions, which can be trained on neuronal specific epigenomic datasets and that will subsequently help in the variant effect prediction of mutations specific to neurological conditions. Several deep learning models have been developed with applications in biology [ 12 ]. They will learn the linear and non-linear relationships within the vast amount of data that is available. Attention layers have improved the performance of various Natural language processing [NLP] tasks which in turn helped in the analysis of sequence datasets [ 13 ]. With this in mind, we developed a deep learning model called Neur-Ally. The model has convolution and attention layers incorporated into the architecture. The model after training can be used to predict the regulatory effect of SNPs specific to neurological conditions. These SNPs can be chosen from significant GWAS associations or candidate genetic association studies for neurologic conditions [ 14 ]. Alternatively, eQTL SNPs which regulate the expression of genes can also be used for the prediction [ 15 ]. In case of variants occurring in non-neuronal tissues contributing to the phenotype, epigenomic datasets from those samples can also be processed using the data processing codes available with the model. For neurological condition specific mutations, the pre-trained weights along with the codes are available for testing. In short, Neur-Ally will help in identifying the regulatory potential of SNPs in the brain, based on the differential epigenomic signatures in response to in silico mutagenesis. MATERIALS AND METHODS Model Architecture The genomic bins where epigenomic labels overlap are used as input to the model. The 200base pair [bp] sequence of the bin along with the flanking regions [1800 bp] are subjected to vectorization, word embedding and positional encoding to create a multi-dimensional tensor. Then, it is fed into subsequent layers of 1D convolution and max pooling twice, followed by Multi-Head Attention layers. The output from the final attention layer is fed into the Keras Dense layer [ 16 ], this is followed by reduction in the dimension of the tensor using squeeze operation. The final Dense layer and sigmoid activation function, provides the output of the probability of regulatory signatures in the genomic bin region used as the input. The flow diagram of the model architecture is shown in Figure 1 . Download figure Open in new tab Figure 1: Neur-Ally architecture. Flowchart diagram of the model architecture. Data processing Epigenomic datasets regarding chromatin accessibility [ATAC-Seq, DNase-Seq], Histone modifications and TF binding pertaining to tissue type or cell type were selected for processing the data [ Figure 2 ]. The narrow peak bed files of nervous tissue and cell samples were extracted from the Encode Project. Genomic bins of 200 bp length were selected as positive samples if the epigenomic signature is overlapping more than half of it. Genomic bins with low mappability were excluded from the analysis. The processed dataset was split into training, validation and testing based on the chromosome number of the genomic bin. Those belonging to chromosome 7 and chromosome 8 were used for validation and testing, whereas the remaining ones were kept for training. For testing the model predictions, Area Under the Curve of Receiver Operator Characteristic [AUROC] and Precision Recall [PR-AUC] curves were estimated. Download figure Open in new tab Figure 2: Assay distribution. Bar graph of assays used for data preprocessing. Variant effect prediction As the model was trained on genomic sequence and regulatory labels, it can learn the contribution of sequence features to the prediction. Thus, the prediction of regulatory labels upon giving altered sequences as input can shed light on the regulatory effect of mutations. Therefore, we predicted the regulatory effect of SNPs specific to neurological conditions [SNMs] by in silico mutagenesis. The sequence of the genomic bin harboring the mutation was extracted from the reference genome. Another input sequence was generated by altering the nucleotide at the mutation site. Both the inputs were given to the model in a sequential manner. The predictions of the regulatory labels were compared for both the sequences to estimate the SNP Activity Difference [SAD] score [ 17 ]. where P ref is the probability of regulatory labels predicted on the reference sequence, whereas P alt is the probability of regulatory labels predicted on the mutated sequence. SAD score is the absolute difference between P ref and P alt . To identify significant regulatory variants, we created a negative non-regulatory set of SNPs from the 1000 genome dataset [ 18 ]. A million variants were randomly selected and a negative set of SNPs were created by filtering GWAS and eQTL variants and those occurring in exonic or candidate cis-regulatory regions. The significance of the regulatory effect of the predicted variants were estimated using the E-value method [ 19 ]. E-value of an epigenomic target for a particular SNP is defined as the ratio of SNPs from the negative non-regulatory set having higher SAD score for the same target. The same number of positive and negative variant sets are used for E-value prediction. The selection of negative samples is repeated ten times and the mean E-value is selected for comparison. SNPs with an E-value of “1e-05” or less are considered as significant. Neur-ally was trained on epigenomic datasets with coordinates according to the hg38 human reference build. So, the input variant coordinates have to be based on the hg38 build. While using SNP datasets belonging to older genomic builds, the coordinates were converted to the latest genomic build using liftover tools. Chromosomal coordinates belonging to conversion unstable positions were excluded from the analysis [ 20 ]. Model prediction on GWAS of neurological conditions and eQTL SNPs The trained model was used to identify the differential regulatory label prediction of neurological condition specific SNPs extracted from the GWAS catalog and eQTL variants in the brain tissues. For the neurological condition specific GWAS SNPs, “GWAS catalog v1.0” dataset was extracted and associated SNPs were selected based on matching keywords in the disease or trait column. The following keywords were used for the filtering: “alzheimer”, “epilepsy”, “multiple sclerosis”, “parkinson”, “autism”, “attention deficit”, “schizophrenia”, “bipolar”, “major depressive”. Significant variant-gene pair datasets of the neuronal tissues from the GTEx portal were used for selecting the neurological condition specific eQTLs. Top 1000 significant eQTL SNPs from each sample were used for creating the list of eQTLs to be tested. Model prediction on ASD GWAS and brain regulatory SNPs The E-value threshold of “1e-05” is a stringent one, but since the calculated E-value will depend on the number of variants present in the positive set, so we tried to restrict the prediction to Autism Spectrum Disorder associated SNPs from the GWAS catalog. Hence, we used the keywords, “Asperger disorder”, “Autism”, “Autism spectrum disorder”, to select the variants from the dataset. Next, we wanted to test the model performance on reported probable brain regulatory SNPs [ 21 ]. As the number of positive variants are few, 200 negative variants were sub-sampled for comparing the SAD scores. RESULTS Model performance The Binary Cross-Entropy loss values and metrics during training and validation over 39 epochs are depicted in Figure 3 . The metric values generated by Keras were approximated ones and the individual metric values of each epigenomic label were generated using scikit-learn [ 22 ] [Supplementary Table S1]. The prediction of chromatin accessibility assay labels had a mean AUROC of 0.93 and PR-AUC of 0.23 [baseline PR-AUC is 0.01]. Histone modifications had a mean AUROC of 0.84 and PR-AUC of 0.29 [baseline PR-AUC is 0.03]. Transcription factor binding labels had a mean AUROC of 0.87 and PR-AUC of 0.22 [baseline PR-AUC is 0.01]. Download figure Open in new tab Figure 3: Training and validation metrics. Radar plots showing A.Training loss, B. Validation loss, C. Training mean AUROC, D. Validation mean AUROC over subsequent epochs. Epochs are depicted in circular axes and metric or loss values in radial axes. GWAS and neurological condition specific eQTL variants GWAS associated SNPs were selected after removing GWAS Catalog variants occurring in the coding regions. 7663 neurological condition specific SNPs were extracted using keywords and selected as the positive variant set. 48 SNPs were showing significant E-values after comparing with the negative variant set at a threshold of “1e-05” [ Figure 4A , Supplementary Table S2]. Using a less stringent threshold can reveal more possible regulatory variants. Download figure Open in new tab Figure 4: E-value heatmap. Heatmap showing significant variants A. Neurological condition specific GWAS SNPs, B. Neurological eQTL SNPs. Variants in rows and labels in columns. The significant gene-variant pair files [v8] of the following samples belonging to the nervous system from GTEx portal were used for extracting neurological condition specific eQTL positive variant set: “Cerebellum”, “Nucleus accumbens basal ganglia”, “Cortex”, “Caudate basal ganglia”, “Cerebellar Hemisphere”, “Anterior cingulate cortex BA24”, “Amygdala”, “Spinal cord cervical C-1”, “Hypothalamus”, “Substantia nigra”, “Frontal Cortex BA9”, “Hippocampus”, “Putamen basal ganglia”. Top 1000 hits from each sample were used for the predictions and after stringent filtering, 169 highly significant regulatory SNPs were predicted by the model [ Figure 4B , Supplementary Table S3]. ASD GWAS variants The variant effect predictions were restricted to 92 ASD SNPs from GWAS Catalog and 4 highly significant regulatory ones were predicted by the model [ Figure 5 ]. The significant labels and their E-values are shown in Supplementary Table S4. Download figure Open in new tab Figure 5: Manhattan plot of ASD regulatory variants predicted by Neur-Ally. ASD associated SNPs from GWAS Catalog having significant E-values are shown above the horizontal threshold line. Regulatory brain variants The model was tested for variant effect prediction on reported probable regulatory variants in the brain. 8 such SNPs were selected from publications [ 23 – 29 ] and their differential epigenomic changes upon in silico mutagenesis were predicted by the model [ Table 1 ]. Predictions of differential epigenomic labels for rs7364180 and rs12411216 were found to be significant. rs7364180 is found to be associated with many eQTL genes in brain tissues [ 28 ]. rs12411216 is reported to be a probable regulatory risk variant for mild cognitive impairment in Parkinson’s disease [ 29 ]. View this table: View inline View popup Download powerpoint Table 1: Neur-Ally predictions for reported probable brain regulatory variants . a) rsIDs of the regulatory SNPs, b) Functional effect reported in the publications and their reference number in superscript, c) Minimum E-value predicted by the model, d) Epigenomic label having the lowest E-value. For the SNP rs7364180, the model predicted significant chromatin accessibility changes in samples like motor neuron, head of caudate nucleus, brain microvascular endothelial cell and cell lines like SK-N-SH, H54, Daoy and A172. In addition to that, significant changes were also found in TF ChiP-Seq labels from dorsolateral prefrontal cortex, PFSK-1 and SK-N-SH. Neur-Ally also predicted significant changes in chromatin accessibility because of rs12411216 in dorsolateral prefrontal cortex and posterior cingulate gyrus. DISCUSSION Quantitative genetic studies like GWAS and eQTL analysis in neurological disorders have revealed several risk variants in the non-coding regions of the genome. The functional consequences of the coding variants can be interpreted by the effect of the mutation on the protein structure. But for non-coding variants, the regulatory consequence can be specific to the cell or tissue type. Experimental methods to determine the effect of all the significant variants from a study will be difficult because of this heterogeneous nature. Computational tools developed from the publicly available epigenomic datasets can be used to create prediction models for this purpose. Such functional predictions will help to differentiate between actual causative variants and those which are highly linked to them. In this study, we have created a deep learning model named Neur-Ally, trained it on epigenomic datasets derived from nervous tissues and cells. Most of the existing variant effect prediction models were trained on regulatory datasets from multiple tissues or cell line samples. It will be difficult for such models to learn the regulatory signatures that are cell or tissue specific. In case of diseases where changes in a particular organ or tissue contribute majorly to the pathophysiology, models trained on that particular tissue or cells will be more helpful. Machine learning model trained on human retinal epigenomic datasets was developed to predict the effect of non-coding variants in human retinal cis-regulatory elements [ 30 ]. Another pancreatic islets specific model trained on multiple epigenome profiling datasets was used for prioritizing type 2 diabetes association signals [ 31 ]. Machine learning model for variant effect prediction in Alzheimer’s disease was developed and achieved better accuracy compared to other models [ 32 ]. The model was trained on 39 features of which 9 were regulatory ones. Data from seven brain tissues were used to form regulatory regions. In contrast, Neur-Ally was developed and trained on multiple brain samples including tissues, cells, in vitro differentiated cells, primary cells and organoids. 758 regulatory datasets were used to create a training dataset of 9 million genomic bins [200bp] overlapping epigenetic features. Hence Neur-Ally is important in the sense that it is a variant prediction model that uses a large number of neuronal specific regulatory datasets and can be used for all neuronal disorders. The model predicts the regulatory labels upon giving nucleotide sequences of genomic bins as input. Model achieved commendable performance while predicting TF binding, Histone modifications and Chromatin accessibility upon testing. In silico mutagenesis was carried out after training the model and the significant regulatory effects of neurological condition specific mutations was identified. Regulatory consequences were identified in neurological condition specific GWAS, eQTL, ASD GWAS and reported probable regulatory neurological condition specific variants. Immune system abnormalities are identified in patient categories with neurological disorders. Thus, the associated genetic variants may have regulatory functions in non-neuronal samples as well. The data pre-processing scripts available with Neur-Ally can be used for creating training datasets with analysis files of different sample types. For the neuronal predictions, additional epigenomic data can be added whenever they are available. Therefore, the prediction performance of the model that we have developed can further be improvised as and when the newer datasets arrive. DATA AVAILABILITY Neur-Ally is open source and available at [ https://github.com/ anilprakash94/neur_ally]. The epigenomic datasets are available at https://www.encodeproject.org/ FUNDING This work was supported by the Department of Biotechnology, Government of India; and Council of Scientific and Industrial Research [CSIR] for Senior Research Fellowship to A.P. CONFLICT OF INTEREST None of the authors have anything to disclose nor have any potential conflict of interest. COMPETING INTERESTS The authors declare no competing interests. CONTRIBUTIONS A.P. and M.B. conceptualized the work, A.P. and M.B. performed the analysis, A.P. and M.B. interpreted and wrote the manuscript. SUPPLEMENTARY INFORMATION Supplementary Table S1: Metric values for individual epigenomic labels. The AUROC and PR_AUC values are given in separate columns and their mean values are provided at the end of the file. Supplementary Table S2: GWAS SNPs of neurological disorders with significant regulatory predictions. The “Signif. labels” column contains the epigenomic labels in which significant regulatory changes were predicted and the respective E-values are given in the “Signif. scores” column. Supplementary Table S3: Top brain eQTL SNPs with significant regulatory predictions. The “Signif. labels” column contains the epigenomic labels in which significant regulatory changes were predicted and the respective E-values are given in the “Signif. scores” column. Supplementary Table S4: Autism Spectrum Disorder GWAS SNPs with significant regulatory predictions. The “Signif. labels” column contains the epigenomic labels in which significant regulatory changes were predicted and the respective E-values are given in the “Signif. scores” column. ACKNOWLEDGEMENTS We are grateful to the Dept. of Biotechnology, Govt. of India for providing the intramural support to Rajiv Gandhi Centre for Biotechnology to MB, and Council of Scientific and Industrial Research [CSIR] for Senior Research Fellowship to A.P. REFERENCES 1. ↵ Manchia M , Cullis J , Turecki G , Rouleau GA , Uher R , Alda M . The Impact of Phenotypic and Genetic Heterogeneity on Results of Genome Wide Association Studies of Complex Diseases . Plos One . 2013 ; 8 : e76295 . OpenUrl CrossRef PubMed 2. ↵ Uffelmann E , Huang QQ , Munung NS , Vries J de , Okada Y , Martin AR , et al. Genome-wide association studies . Nat. Rev. Methods Prim. 2021 ; 1 : 59 . OpenUrl CrossRef 3. ↵ Schipper M , Posthuma D . Demystifying non-coding GWAS variants: an overview of computational tools and methods . Hum. Mol. Genet . 2022 ; 31 : R73 – R83 . OpenUrl CrossRef PubMed 4. ↵ Dunham I , Kundaje A , Aldred SF , Collins PJ , Davis CA , Doyle F , et al. An integrated encyclopedia of DNA elements in the human genome . Nature . 2012 ; 489 : 57 – 74 . OpenUrl CrossRef PubMed Web of Science 5. ↵ Iwata M , Kosai K , Ono Y , Oki S , Mimori K , Yamanishi Y . Regulome based characterization of drug activity across the human diseasome. npj Syst . Biol. Appl . 2022 ; 8 : 44 . OpenUrl 6. ↵ Yaghoobi A , Malekpour S.A . Unraveling the genetic architecture of blood unfolded p-53 among non-demented elderlies: novel candidate genes for early Alzheimer’s disease . BMC Genomics . 2024 ; 25 : 440 . OpenUrl CrossRef PubMed 7. ↵ Mundade R , Ozer HG , Wei H , Prabhu L , Lu T . Role of ChIP-seq in the discovery of transcription factor binding sites, differential gene regulation mechanism, epigenetic marks and beyond . Cell Cycle . 2014 ; 13 ( 18 ): 2847 – 52 . OpenUrl CrossRef PubMed 8. ↵ Jiang S , Mortazavi A . Integrating ChIP-seq with other functional genomics data . Brief Funct Genomics . 2018 ; 17 ( 2 ): 104 – 115 . OpenUrl CrossRef PubMed 9. ↵ Mansisidor AR , Risca VI . Chromatin accessibility: methods, mechanisms, and biological insights . Nucleus . 2022 ; 13 ( 1 ): 236 – 276 . OpenUrl CrossRef PubMed 10. ↵ Alipanahi B , Delong A , Weirauch MT , Frey BJ . Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning . Nat Biotechnol . 2015 ; 33 ( 8 ): 831 – 8 . OpenUrl CrossRef PubMed 11. ↵ Zhou J , Troyanskaya OG . Predicting effects of noncoding variants with deep learning-based sequence model . Nat Methods . 2015 ; 12 ( 10 ): 931 – 4 . OpenUrl CrossRef PubMed 12. ↵ Wainberg M , Merico D , Delong A , Frey BJ . Deep learning in biomedicine . Nat. Biotechnol . 2018 ; 36 : 829 – 838 . OpenUrl CrossRef PubMed 13. ↵ Vaswani A , Shazeer N , Parmar N , Uszkoreit J , Jones L , Gomez AN , et al. Attention is all you need . Adv. Neural Inf. Process. Syst . 2017 ; 17 : 5999 – 6009 . OpenUrl 14. ↵ Sollis E , Mosaku A , Abid A , Buniello A , Cerezo M , Gil L , et al. The NHGRI-EBI GWAS Catalog: knowledgebase and deposition resource . Nucleic Acids Res . 2023 ; 51 : D977 – D985 . OpenUrl CrossRef PubMed 15. ↵ Lonsdale J , Thomas J , Salvatore M , Phillips R , Lo E , Shad S , et al. The Genotype-Tissue Expression [GTEx] project . Nat. Genet . 2013 ; 45 : 580 – 585 . OpenUrl CrossRef PubMed 16. ↵ Chollet F. et al. 2015 . Keras . https://keras.io . 17. ↵ Pei G , Hu R , Dai Y , Manuel AM , Zhao Z , Jia P . Predicting regulatory variants using a dense epigenomic mapped CNN model elucidated the molecular basis of trait-tissue associations . Nucleic Acids Res . 2021 ; 49 : 53 – 66 . OpenUrl CrossRef PubMed 18. ↵ Auton A , Abecasis GR , Altshuler DM , Durbin RM , Bentley DR , Chakravarti A , et al. A global reference for human genetic variation . Nature . 2015 ; 526 : 68 – 74 . OpenUrl CrossRef PubMed 19. ↵ Yang M , Huang L , Huang H , Tang H , Zhang N , Yang H , et al. Integrating convolution and self-Attention improves language model of human genome for interpreting non-coding regions at base-resolution . Nucleic Acids Res . 2022 ; 50 : E81 . OpenUrl CrossRef PubMed 20. ↵ Ormond C , Ryan NM , Corvin A , Heron EA . Converting single nucleotide variants between genome builds: From cautionary tale to solution . Brief. Bioinform . 2021 ; 22 : 1 – 7 . OpenUrl CrossRef PubMed 21. ↵ Frydas A , Wauters E , Zee J van der , Van Broeckhoven C . Uncovering the impact of noncoding variants in neurodegenerative brain diseases . Trends Genet . 2022 ; 38 : 258 – 272 . OpenUrl CrossRef PubMed 22. ↵ Pedregosa F , Varoquaux G , Gramfort A , Michel V , Thirion B , Grisel O , et al. Scikit-learn: Machine learning in Python . J. Mach. Learn. Res . 2011 ; 12 : 2825 – 2830 . OpenUrl CrossRef PubMed 23. ↵ Choi KY , Lee JJ , Gunasekaran TI , Kang S , Lee W , Jeong J , et al. APOE promoter polymorphism-219T/G is an effect modifier of the influence of APOE ε4 on Alzheimer’s disease risk in a multiracial sample . J. Clin. Med . 2019 ; 8 : 1236 . OpenUrl CrossRef PubMed 24. Ghanbari M , Munshi ST , Ma B , Lendemeijer B , Bansal S , Adams HH , et al. A functional variant in the miR-142 promoter modulating its expression and conferring risk of Alzheimer disease . Hum. Mutat . 2019 ; 40 : 2131 – 2145 . OpenUrl CrossRef PubMed 25. Nott A , Holtman IR , Coufal NG , Schlachetzki JC , Yu M , Hu R , et al. Brain cell type–specific enhancer–promoter interactome maps and disease-risk association . Science . 2019 ; 366 : 1134 – 1139 . OpenUrl Abstract / FREE Full Text 26. Gallagher MD , Posavi M , Huang P , Unger TL , Berlyand Y , Gruenewald AL , et al. A Dementia-Associated Risk Variant near TMEM106B Alters Chromatin Architecture and Gene Expression . Am. J. Hum. Genet . 2017 ; 101 : 643 – 663 . OpenUrl CrossRef PubMed 27. Soldner F , Stelzer Y , Shivalila CS , Abraham BJ , Latourelle JC , Barrasa MI , et al. Parkinson associated risk variant in distal enhancer of α-synuclein modulates target gene expression . Nature . 2016 ; 533 : 95 – 99 . OpenUrl CrossRef PubMed 28. ↵ Kikuchi M , Hara N , Hasegawa M , Miyashita A , Kuwano R , Ikeuchi T , Nakaya A . Enhancer variants associated with Alzheimer’s disease affect gene expression via chromatin looping . BMC Med. Genomics . 2019 ; 12 : 128 . OpenUrl CrossRef PubMed 29. ↵ Jiang Z , Huang Y , Zhang P , Han C , Lu Y , Mo Z , et al. Characterization of a pathogenic variant in GBA for Parkinson’s disease with mild cognitive impairment patients . Mol. Brain . 2020 ; 13 : 102 . OpenUrl CrossRef PubMed 30. ↵ VandenBosch LS , Luu K , Timms AE , Challam S , Wu Y , Lee AY , Cherry TJ . Machine learning prediction of non-coding variant impact in human retinal cis-regulatory elements . Transl Vis Sci Technol . 2022 ; 11 : 16 . OpenUrl 31. ↵ Wesolowska-Andersen A , Zhuo YG , Nylander V , Abaitua F , Thurner M , Torres JM , et al. Deep learning models predict regulatory variants in pancreatic islets and refine type 2 diabetes association signals . eLife . 2020 ; 9 : e51503 . OpenUrl CrossRef PubMed 32. ↵ Rangaswamy U , Dharshini SAP , Yesudhas D , Gromiha MM . Vepad - predicting the effect of variants associated with Alzheimer’s disease using machine learning . Comput Biol Med . 2020 ; 124 : 103933 . OpenUrl CrossRef PubMed View the discussion thread. Back to top Previous Next Posted January 28, 2025. Download PDF Supplementary Material Email Thank you for your interest in spreading the word about bioRxiv. NOTE: Your email address is requested solely to identify you as the sender of this article. Your Email * Your Name * Send To * Enter multiple addresses on separate lines or separate them with commas. You are going to email the following Neur-Ally: A deep learning model for regulatory variant prediction based on genomic and epigenomic features in brain and its validation in certain neurological disorders Message Subject (Your Name) has forwarded a page to you from bioRxiv Message Body (Your Name) thought you would like to see this page from the bioRxiv website. Your Personal Message CAPTCHA This question is for testing whether or not you are a human visitor and to prevent automated spam submissions. Share Neur-Ally: A deep learning model for regulatory variant prediction based on genomic and epigenomic features in brain and its validation in certain neurological disorders Anil Prakash , Moinak Banerjee bioRxiv 2025.01.27.635013; doi: https://doi.org/10.1101/2025.01.27.635013 Share This Article: Copy Citation Tools Neur-Ally: A deep learning model for regulatory variant prediction based on genomic and epigenomic features in brain and its validation in certain neurological disorders Anil Prakash , Moinak Banerjee bioRxiv 2025.01.27.635013; doi: https://doi.org/10.1101/2025.01.27.635013 Citation Manager Formats BibTeX Bookends EasyBib EndNote (tagged) EndNote 8 (xml) Medlars Mendeley Papers RefWorks Tagged Ref Manager RIS Zotero Tweet Widget Facebook Like Google Plus One Subject Area Bioinformatics Subject Areas All Articles Animal Behavior and Cognition (7622) Biochemistry (17648) Bioengineering (13870) Bioinformatics (41880) Biophysics (21423) Cancer Biology (18553) Cell Biology (25458) Clinical Trials (138) Developmental Biology (13364) Ecology (19866) Epidemiology (2067) Evolutionary Biology (24290) Genetics (15589) Genomics (22475) Immunology (17711) Microbiology (40327) Molecular Biology (17145) Neuroscience (88472) Paleontology (666) Pathology (2826) Pharmacology and Toxicology (4815) Physiology (7635) Plant Biology (15114) Scientific Communication and Education (2044) Synthetic Biology (4286) Systems Biology (9815) Zoology (2268)
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.