Dosage effect of Copy Number Variation in Epilepsy and ten regions of the human brain | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Article Dosage effect of Copy Number Variation in Epilepsy and ten regions of the human brain Tisham De, Lachlan Coin, Michael R Johnson This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-6130694/v1 This work is licensed under a CC BY 4.0 License Status: Published Journal Publication published 04 Dec, 2025 Read the published version in Scientific Reports → Version 1 posted 11 You are reading this latest preprint version Abstract Epilepsy and seizures are one of the most common neurological conditions which often manifest with complex symptoms. Several studies including large scale GWAS and exome studies have reported a comprehensive catalog of genes related to Epilepsy. Similarly, there exists several successful studies elucidating the role of SNP QTLs in the normal human brain. Here, as one of few studies in current literature we have explored and reported the dosage effect of small to intermediate length CNVs in two Epilepsy cohorts characterized for phenotypes such as seizure counts, seizure frequency and remission to anti-epilepsy drugs. In addition, we have performed comprehensive CNV QTL analysis in ten regions of the human brain (normal) from the UKBEC study. We leveraged all analyses to decipher new genes for Epilepsy phenotypes such as seizure frequency and further uncovered genetic controls of neurotransmitters such as serotonin, dopamine and signaling molecules like GPCRs. Importantly we observed and have reported clustering of CNV QTL signals in specific regions of the genome such as the chromosome 1p36 proband containing the GNB1 gene or the chromosome 9q22 proband containing NANS. This observed phenomenon of clustering of association signals was further corroborated by our non-negative matrix factorization (NMF) analysis of UKBEC gene expression data. To conclude our results here successfully describe in detail the dosage effect of CNVs for Epilepsy seizures and further elucidates its role in the genomic architecture of gene expression in various regions of the human brain. Biological sciences/Genetics Biological sciences/Neuroscience Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Figure 6 Introduction Epilepsy is a common neurological disease affecting around 1% of the population worldwide. Anti-epileptic drugs in general work well for 60% of Epilepsy patients who successfully achieve seizure control with current medication within a year or two, however for around one third or 20-30% patients, the latest anti-epilepsy drugs (25 licensed drugs worldwide 1,2 ) do not work well and these patients continue to have regular seizures 3,4 . Further it has been observed that resistance to one anti-epileptic drug correlates well with resistance to all other drugs. Current anti-epileptic medication is not prescribed for vulnerable groups such as pregnant women. One off seizure are quite common in young children and adults but are usually termed benign. For some refractory groups of Epilepsy patients who do not respond to standard medication a brain surgery is required to control seizures 5,6 . In these cases, first an exploratory surgery is required to identify regions of the brain where seizures originated (which can be almost anywhere in the brain) and then a second surgery is needed to remove these regions. In some rare cases of Epilepsy known as Lennox-gastaut syndromes, which often originates in the occipital lobe of the brain, a child may have numerous seizures a day. As a part of the CADET trial (Children’s Adaptive Deep brain stimulation for Epilepsy Trial), United Kingdom’s first patient, a child with Lennox-gastaut syndrome with mutations in the SCN1Bgene [1] ,was successfully implanted with a device in the brain to control seizures through electric pulses. Chromodomain-helicase-DNA-binding protein 2 or the CHD2 gene is another candidate gene for this syndrome 7 . However, little is known about the biology of brain seizures, the regions where it originates and the causal mechanisms behind it. Thus, the genetic basis of seizures remains an open question in neurology. Here we present comprehensive copy number variation (CNV) analysis for Epilepsy phenotypes including seizure counts, seizure frequency and 12-month remission to anti-epileptic drugs in two cohorts, denoted as SANAD and Australian cohort 4 . In addition we have also analysed and reported here CNV-gene expression signatures (CNV eQTLs) in ten regions of the normal human brain from the UKBEC 8 and NABEC 9–11 studies. We leveraged all analyses to decipher new gene clusters and loci for neurobiology of seizures and report the phenomenon of reciprocal CNV dosage in genes related to neurotransmitters and GPCR mediated signal transduction in different regions of the human brain. [1] https://www.gosh.nhs.uk/news/first-uk-trial-of-deep-brain-stimulation-for-children-with-epilepsy-begins-at-gosh/ Results CNV analysis in Epilepsy cohorts In our main discovery cohort SANAD, the top hit for CNV genotypes was GNB1 for the phenotype total number of seizures (chr1:1745726, p-value=2.89e-168, MAF=1.1%) and seizure frequency (chr1:1745726, p-value=2.82e-95) ( Figure 1, Supplementary table 1a ). In addition, GNB1 also replicated as the top hit in the joint model (chr1:1745726, p-value=6.3e-202) and joint model with variable selection using CNV genotypes (chr1:1745726, p-value=2.27e-207) ( Supplementary table 1a ). In the clinvar database annotations for GNB1 we observed that pathogenic CNVs were more numerous than pathogenic SNVs thus adding more weight to our CNV results ( Supplementary table 2 ). In Log R Ratio (LRR) based models the strongest signal was found within growth hormone receptor gene (GHR) (chr5:42569642, p-value < 6e-128) ( Supplementary table 1a ). In SANAD the top gene for drug-response was TRAPPC9 (chr8:140765991, p-value=1.7e-05, LRR univariate model, Supplementary table 1a ). TRAPPC9 is used for clinical diagnosis of a rare neuro-endocrine disease Intellectual disability-obesity-brain malformations-facial dysmorphism syndrome 12 (Malacards.org). We further noticed that in SANAD GNB1 along with several other seizure-associated genes such as PRKCZ (chr1:2082566, p-value=1.8e-41, MAF=1.08%) and CDK11A (chr1:1645366, p-value=6.25e-8, MAF=2.4%) were clustered within a one mega base window in the chromosome 1p36 region ( Figure 2a ). Of note, GNB1 has been shown to bind to growth hormone releasing hormone receptor (GHRHR) and 5-hydroxytryptamine receptor 1B (HTR1B or serotonin receptor) ( Figure 2b ). The 1p36 region is often used for karyotyping and diagnosing neurodevelopmental disorders related to 1p36 deletion syndrome 13,14 . This fact further corroborates our results in SANAD where for seizure-related phenotypes, we found numerous contiguous probes with MAF >1% and significantly low p-values (e.g. NBPF1, Supplementary table 3 ). In SANAD outside the chromosome 1p36 region we found several genes of interest including six loci exceeding the p value threshold of =1% ( Figure 1 ). Amongst these some notable genes of interest included HEATR1, CNTNAP3 and GABRB3 (See discussion for further details). In the independent analysis of the Australian cohort the top gene of interest for 12-month remission to anti-epilepsy drugs was found within the PPFIA2 gene using CNV genotype univariate analysis (chr12:82081470, p-value= 6.21e-06, MAF= 4.4%). PPFIA2 also replicated as the top hit in the CNV genotype multivariate joint model with variable selection (chr12:82081470, p-value=2.21e-06) ( Supplementary table 1b ). PPFIA2 belongs to the LAR protein-tyrosine phosphatase-interacting protein (liprin) family 15 and is known to be involved in pathways related to Neurotransmitter release cycle and transmission across chemical synapses (https://pathcards.genecards.org/). Previous studies 16 have indicated that Ca 2+ modulated lirpin-𝞪 proteins capture KIF1A-driven dense core vesicles (DCV) in dendritic spines 16 . In the joint meta-analysis of SANAD and Australian cohort the top hit of interest was a locus within the DAGLA gene (chr11:61462424, p-value=0.000301, MAF=1%). DAGLA is a neural stem-cell derived dendrite regulator which is involved in 2-Arachidonoylglycerol (2-AG) signaling in the central nervous system (CNS) 17–19 . It helps in axonal growth during neurogenesis in early stages of development and in addition helps with neuroinflammatory response in the brain ( Supplementary table 1c ). Other interesting results for meta-analysis included ASIC2 (Acid Sensing Ion Channel Subunit 2) (chr17:31454867, p-value=8.54e-19) which was the top hit in LRR multivariate models whereas GNB1 successfully replicated in the LRR univariate model (chr1:1793111, meta p-value=9.49e-16). Across all analyses and cohorts, the most distinct common CNV-phenotype signal with the highest number of contiguous probes and significant p-values was found within the WWOX gene (MAF=47%, 48 contiguous probes for CNV genotypes and 66 contiguous probes in the LRR analysis) ( Supplementary table 4 ). This CNV was found to be exclusively associated with drug-response phenotype in SANAD. Though WWOX is a known candidate gene for epilepsy and could potentially have real seizure related effects in the brain 20–22 , its association with the drug-response was less convincing in the meta-analysis and in the Australian cohort. The fact that WWOX harbors one of the most fragile sites in the human genome further puts these results in the grey area for biological interpretation and warrants further experimental validation. CNV dosage effects in the UKBEC and NABEC cohorts Briefly, in the UKBEC study we generated two sets of CNVs calls from two different genotyping platforms namely Illumina Omni 1M chip and a custom Immuno chip. Next we analysed these two datasets independently with Illumina platform specific parameters in cnvHap 23 (See methods). In the Omni 1M chip we detected 9,242 homozygous deletions (type 0), 1,29,929 heterozygous deletions (type 1), 7,840 heterozygous duplications (type 3) and 546 homozygous duplications (type 4). Genome-wide CNV breakpoint information for all cohorts is available in the supplementary data section. Next for every probe in these two platforms we derived expected CNV genotypes based on posterior probability from cnvHap and used these values for univariate and multivariate association analyses using the multiphen 24 method with gene expression data from different brain regions in the UKBEC study (see methods). In total in the UKBEC analysis we generated 96 sets of transcriptome-wide CNV QTL results across CNV genotypes, LRR, univariate and multivariate methods for eight brain regions. The top (rank 1) result for these analyses is reported in Supplementary table 1d, 1e . The most notable observation in these results is a cluster of significant CNV QTLs within one mega base pair window on chromosome 9 which consistently harbored the top hit from different brain regions ( Figure 3a , Supplementary table 1d, 1e ). Two genes of interest from this cluster included TDRD7 (p-value < 1e-269, MAF=2.6%) for CNV genotypes and NANS for LRR based models (p-value < 1e-269). N-acetyl-neuraminic acid synthase or the NANS gene synthesises sialic acid in humans and has the highest concentration in the brain. Biallelic recessive mutations in NANS are known to cause intellectual disability with short stature 25 and plays an important function in neural transmission and ganglioside structures in synaptogenesis 26 . Interestingly in the UKBEC Immuno chip dataset the top hit ANP32B (p-value < 1e-269, MAF=1.8%) was located within the same chromosome 9 gene cluster containing TDRD7 and NANS. Here, ANP32B, like TDRD7 and NANS was consistently the top hit in all brain regions ( Figure 3a, Supplementary table 1d, 1e ). ANP32B is known to regulate gene expression 27 and leads to transcriptional repression of the KLF5 gene 28 . Interestingly NANS and ANP32B also lie near the GABBR2 gene. GABBR2, though not associated with any of our epilepsy phenotypes or harbored any CNV with MAF >1% in any of the cohorts, is known to cause Developmental and epileptic encephalopathy 59 (DEE59) 29 (malacards) and was further shown to bind to GNB1 ( Figure 3b ). In our analysis for UKBEC and NABEC datasets we discovered many additional significant loci and regions of interest. One important example we highlight here is the significant cis CNV QTL between serotonin and dopamine receptors on chromosome 11. In this instance we found probes in the HTR3B gene to be significantly associated with DRD2 gene expression (e.g. CNV genotypes at chr11:113802601 MAF=1.1% associated with DRD2 probe id 3391654 with p-value=5.88e-81 in white matter brain region) and probes in the DRD2 gene was found to be significantly associated with HTR3B gene expression (e.g. LRR in the DRD2 gene was associated with HTR3B probe id 3349661 with p-value < 1e-308 using LRR joint model with variable selection in cerebellum brain region) ( Figure 4 ). Further delving into the serotonin biosynthesis pathway we additionally discovered significant cis CNV QTL for CNVs in the cortisol receptor gene CRHR2 which was associated with the expression of the nearby INMT gene (p-value < 1e-20 in putamen and frontal cortex brain regions using LRR joint model with variable selection method). INMT is known to N-methylate indoles such as tryptamine, which interacts with tryptophan to produce serotonin ( Figure 4 ). So far, all analyses we have reported here are based on a overlapping moving window across the genome i.e. CNV calling and association were performed for segments of the genome consisting of both genic and intergenic regions. In addition to this, for the UKBEC Omni chip dataset, we also performed CNV analysis on a gene-by-gene basis. The motivation for this approach is to find CNVs lying within a gene and to further uncover its local effect on exonic expression. This approach could potentially help identify relative importance of exons in a given tissue. To this end, we called CNVs through the cnvHap HMM model but for all human genes individually (with a 5 kilo base pair window around gene boundary). Next, we derived expected CNV genotypes and associated them with eight different UKBEC gene expression matrices using our univariate and multivariate association models. Some example results for genes with common CNVs are as follows. For the putamen brain region and in the HEATR1 gene the CNV-dosage analysis using Multiphen joint model with variable selection identified exon 4 to be most significant (probe id=2462530), for PRKCZ it was exon 10 (probe id=2316299) and for WWOX it was exon 5 (close to the last exon, probe id=3700865) ( Supplementary figure 1 ). A joint consensus analysis of important exons identified through CNV genotypes, LRR, univariate and multivariate methods remain to be explored. However, we have provided a complete set of results containing CNV-dosage effects in all brain regions using different methods in the supplementary data section. NMF analysis if UKBEC Omni dataset Application of the non-negative factorization method allowed us to deconvolute the UKBEC gene expression dataset from 10 regions into meta exons or genes (also referred to as hidden genes or patients in the literature). This transformation of gene expression data can be biologically interpreted as exons and genes which are consistently over or under expressed in different regions of the human brain ( Figure 5 a, b ). For the 1p36 region we ran the NMF algorithm on two expression matrices. One is derived from averaging the expression values across 10 regions (aveALL) and second is a combined expression matrix including all brain regions (Full set). On analysing the relative frequency of individual exons in these results (NMF consensus clustering) we discovered two gene clusters in the chromosome 1p36 region ( Figure 6, Supplementary table 5 ). The first cluster was located around CDC42 gene with nearby genes (~1 mega base pair window) such as RAP1GAP, USP48, HSPG2 and LUZP1. The second cluster was found around CHD5 and included nearby genes KCNAB2, CAMTA1 and ACOT7. Further, application of NMF for every gene individually (with 5 kilo base pairs window around gene boundary) allowed us to see which exons are consistently over or under expressed across various brain regions. In summary, first we applied the NMF deconvolution to the chromosome 1p36 region in the UKBEC omni dataset with rank 10 and 20 for two expression matrices derived from all brain regions, and then separately to all known human genes (rank 2 to 6, for 2 derived expression sets for ten brain regions and in addition each brain region separately (see methods). Discussion Our results complement the recent large-scale studies in Epilepsy GWAS 1,30 by filling the gap for small and intermediate CNVs and its effect on seizures and drug response phenotypes. We found that population level analysis of CNVs leverages more power to detect previous findings such as GNB1 for neurodevelopmental disorders 31 as well as uncovered new disease associated loci. Existing protein structures depicting the interaction of GNB1 with HTR1B, and growth hormone-releasing hormone (GHRHR) receptors further strengthens our CNV results in the SANAD cohort. These observations also highlight the power of population aware methods for CNV genotyping 32 . This is especially significant since unlike the cnvHap algorithm current methods for CNV detection from bead array platforms usually apply hidden markov model (HMM) in a single sample mode and fail to capture information which could be leveraged for modelling the dosage landscape in the human genome 33 . These aspects of our analyses along with univariate and multivariate models distinguish our results from the current literature and further lead to new findings. Some highlights of new genes we reported for Epilepsy phenotypes like seizure counts and seizure frequency included PRKCZ, HEATR1 TRDN, CNTNAP3, AEBP2 and GABRB3. PRKCZ is a calcium (Ca 2+ ) dependent gene known for memory function or long-term potentiation 34,35 . HEATR1 also known as BAP28 is required for pre-ribosomal RNA transcription by RNA polymerase I and is known to cause brain abnormalities in zebrafish 36 and drosophila 37 . TRDN leads to muscle contraction by Ca 2+ release and a known causal gene for Cardiac arrhythmia syndrome with or without skeletal muscle weakness (CARDAR) 38–41 (malacards). The relevance of CNTNAP3 42–44 and GABRB3 45–47 for neurological diseases is well documented in current literature ( Supplementary figure 2 ). Lastly, the AEBP2 gene codes for a subunit in the core Polycomb repressive complex 2 (PRC2) 48,49 which affects histone H3K27 (H3K27me3) trimethylation 50 on the chromatin leading to long term epigenetic silencing, also referred as cellular memory. Based on these observations we suggest the possibility that these genes might lead to the manifestations of secondary symptoms in Epilepsy or other neurodevelopmental disorders (e.g memory loss or arrhythmia). These results also strongly indicate that transcriptional regulation and heterogeneity is likely to be important for critical brain functions and homeostasis. Our CNV results for UKBEC and NABEC cohorts complement the current SNP QTL results 8,51,52 in current literature. To the best of our knowledge our study is one of few which has elucidated the dosage effect of small CNVs in different regions of the brain in a comprehensive manner. Further our replication model based on LRR signals was consistent in uncovering meaningful genes for neurology or the brain functions. For instance, in the case of the SCN1B gene, which is a known candidate gene for Lennox-Gastaut syndrome, we did not detect any CNVs in all our cohorts (which could be due to low probe density). However, by using the LRR based model in the UKBEC dataset we uncovered suggestive signals in particular brain regions such as putamen, white matter and occipital cortex ( Supplementary table 6 ). The convergence of top hits of CNV QTL in UKBEC Omni chip and Immuno chip data to the chromosome 9 gene cluster which also happens to harbor the GABBR2 gene (G protein-coupled receptor 3 family and GABA-B receptor subfamily) strengthens the clustering phenomenon earlier reported in the original UKBEC SNP eQTL study. Further experiments aimed at better understanding the causal mechanisms behind such clustering could potentially provide new information related to human brain function and warrants further investigations. Importantly based on our observations we suggest that in-tandem transcriptional regulation of collocated genes in the genome could be an important mechanism by which important molecular functions are carried out in the brain. To this end our NMF results from the UKBEC expression dataset hint that transcriptional sense and relative position of exons (i.e. exons which are transcribed early along the RNA polymerase machinery) might have higher or lower overall RNA concentration in the cell ( Supplementary table 7 ). This differential expression of exons which could potentially affect splicing might be interpreted as RNA amplitude for a given exon in a tissue. The clustering of CNV-phenotype signals in SANAD e.g. GNB1 region on chromosome 1p36 or the chromosome 9 CNV QTL cluster in UKBEC results provides suggestive evidence for the existence of such phenomenon. However, the underlying mechanism for this remains to be elucidated and experimentally validated. Lastly, the reciprocal CNV dosage effect we observed in the HTR3B and DRD2 genes and its link to the CRHR2 - INMT metabolic pathway is potentially a new genetic finding. This molecular axis possibly explains how the brain maintains homeostasis and balances the critical role of neurotransmitters and hormones through the genomic regulation of serotonin, dopamine and cortisol. Based on these findings and the results for GNB1 we conclude that CNVs through its effect on protein receptor complexes has an important role to play in neurological diseases and maintaining homeostasis in the brain. Methods Cohorts Epilepsy cohorts Similar to our previous study 4 , in our main CNV analysis in Epilepsy we adopted a prospective cohort design instead of case-control designs. Our discovery cohort consisted of 916 subjects from the Standard and New AED (SANAD) clinical trial and a secondary replication cohort consisted of 380 subjects recruited from the royal Melbourne and Austin hospitals in Australia. A brief description of all clinical variables analysed in these cohorts, including common phenotypes for meta-analysis is described in the supplementary material. The main phenotype categories analysed can be broadly divided into seizure related phenotypes (e.g. seizure frequency, total number of seizures etc) and 12-month remission to AED medication (responders vs non-responders). Epilepsy and seizure classification were based on the latest ILAE guidance. The main types of Epilepsy in our study consisted of focal, generalized and unclassified Epilepsy. Genotyping of UK and Australian samples was carried out using Illumina 660 bead chips at the Sanger Centre at different points of time. Further quality control measures based on heterozygosity, sample call rates and relatedness were carried out on the raw intensity data. Additional information on genotyping and quality control for these cohorts can be found in our earlier reports 4 . The UKBEC and NABEC study The UKBEC study consists of 134 individuals of European descent who were confirmed to be neuropathologically normal during life and had a median age of 59 years of age 8 . 74.5% of patients are male and the most common cause of death in these individuals was heart attack. Further details about tissue collection and genotyping have been reported in detail in earlier studies 8 . Briefly, the whole transcriptome was available for 10 brain regions based on their relevance to human disease and reported to exhibit high expression profiles. RNA was extracted from postmortem brain tissues with randomization and checked for quality. Next, processed RNA was analysed through the Affymetrix Exon 1.0 ST array. Next, all arrays were processed by robust multi-array average normalisation and log 2 transformed in two different ways. Genomic DNA from samples from the post-mortem brain tissue using Qiagen’s DNeasy kit and subsequently genotyped on Illumina Omni-Quad bead chip and a custom Immunochip designed to fine map autoimmune disorders. GenomeStudio v.1.8.x was used for processing intensity data from which log R ratio (LRR) and B allele frequency (BAF) was derived and exported for CNV analysis. The NABEC study 9,10 consists of approximately 360 individuals of European descent and free from any neuropathological disorders. RNA was quality checked and extracted for hybridization onto Human HT12v3 expression beadchips. Raw gene expression data was further transformed using cubic spline and log transformed. Next, expression values were re-mapped using ReMOAT onto human genome build 19 and annotated with genes with reliable data and free from common polymorphisms. Genomic DNA for the NABEC study was extracted and genotyped on Illumina Infinium HumanHap550 chips. Similar to the UKBEC study, intensity values LRR and BAF were processed using genome studio software and exported for subsequent CNV analysis. CNV calling - two approaches - moving window and gene by gene approach Similar to our earlier studies 32,33 we relied on the cnvHap algorithm 23 as the main CNV discovery method. cnvHap is a multi-platform CNV-SNP haplotype based CNV detection algorithm which uses Log-r ratio and B-allele frequency jointly to discover and genotype CNVs simultaneously. cnvHap was shown to have more sensitivity in detecting smaller common CNVs with high genotyping accuracy. Due to its population aware mode of model training, it was pragmatic to use this method for CNV-phenotype association in large cohorts which were originally genotyped on bead array chips for SNP GWAS or SNP eQTL studies. All cohorts analysed in our study including for Epilepsy (UK and Australia), UKBEC and NABEC were genotyped on different versions of Illumina bead array chips, hence were subjected to a common CNV pre-processing pipeline. Briefly, for each cohort two intensity measurements, log-R ratio (LRR) and B-allele frequency (BAF) were exported from Illumina genome studio software as final reports. Next, the exported intensity measurements were normalised for GC content and further regressed out from LRR values. Genomic wave effects were removed by fitting a localised loess function. In the main cnvHap analysis, joint CNV-SNP haplotype structure information was incorporated to refine CNV predictions and further based on allele frequencies, expected CNV genotypes were calculated for subsequent association analysis. First, we fine-tuned our CNV analysis pipeline to reproduce known common CNVs in our cohorts. We chose the WWOX intronic deletions as reported by the gnomAD database in the region chr16:78371638-78385000 (Grch 37/hg19) which has a deletion and a multi-CNV with allele frequency of 34% to 54% respectively ( Supplementary figure 3, 4 ). Missense mutations in exons of WWOX have been reported to be associated with highly pathogenic WOREE syndrome in epilepsy, which usually occurs in young children and has a very poor prognosis. In SANAD we re-discovered this intronic CNV in WWOX as common deletions with an allele frequency of 47% spanning the region chr16: 78373644 - 78384121. Manual inspection of cluster plots of LRR and BAF showed distinct homozygous deletions spanning more than 40 contiguous probes which was the highest amongst all CNVs in all our cohorts. This contiguity of probes was also reflected through significant association with epilepsy drug response phenotype in the LRR based analysis of SANAD (without cnvHap calls). However, we failed to detect similar common CNVs in WWOX in our Australian replication or in the joint meta-analysis (multi-platform integration) of SANAD and Australian cohort through cnvHap. Of note, our replication cohort from Australia had batch effects and around 30% of samples were excluded from our CNV analysis pipeline. Unlike the SANAD cohort the resulting sample size from Australia (N=280) thus had limited statistical power to detect and genotype common CNVs on a broad allele frequency spectrum at genome wide significance level. However, results from the Australian cohort independently and in the meta-analysis with SANAD were able to replicate many of the primary findings and in addition were enriched for genes related to neurological conditions, hence reported in our analysis. For meta-analysis of our Epilepsy and Australian cohorts, we utilised the multi-platform integration feature of cnvHap, where intensity values for each genotyping probes from the UK and Australian cohorts were modelled jointly in the HMM model for CNV genotype predictions. Association models One of the main goals of SNP eQTL studies is enabling better understanding of GWAS loci. Here, in our study design we aimed at emulating this approach by finding new CNV loci for Epilepsy phenotypes in SANAD and Australian cohorts and then leveraging CNV eQTL analysis in normal human brain regions from the UKBEC and NABEC studies for understanding CNV signals for Epilepsy. The CNV eQTL results and gene NMF programs (described below) from the UKBEC and NABEC cohorts can be used as an independent resource for replication and validation of a priori disease hypothesis as well for other neurological diseases. Our association analysis for Epilepsy cohorts, UKBEC and NABEC consisted of six different linear modes implemented in the Multiphen software 24 . These models were based on two approaches of CNVs detection 1) Expected CNV genotypes derived from cvnHap and 2) Log-r ratio based raw intensity measurements (i.e. without using cnvHap and for secondary validation of CNV-phenotype signals). Next, two modes of CNV signals were analysed using 3 linear models: a) standard univariate model (phenotype ~ CNV genotypes /LRR) b) Multiphen joint model (CNV genotypes/LRR ~ Phenotypes) and c) Multiphen joint model with backward variable selection (CNV genotypes/LRR ~ Phenotypes (subset). Of note, in our analysis only the results of Cnv association with epilepsy phenotypes in SANAD is considered to be the main discovery results and all other reported results from Australia, UKBEC and NABEC are meant to be used as replication results with a priori hypothesis. We used 5 LRR principal components and gender as covariates for the SANAD cohort and only gender for the Australian, UKBEC and NABEC studies. The main criteria for choosing parameters for cnvHap and covariates for Multiphen were reproducing allele frequency of common CNVs e.g. matching WWOX deletion frequency from gnomAD CNV results and reproducibility of known genes for neurological diseases. NMF gene programs Non-negative matrix factorization is a popular method in data science for deconvoluting complex data including images and gene expression data. It has been successfully used for finding subtypes of cancer and more recently for finding common gene programs across multiple single cell gene expression data. Here, we have measurements for all human genes and their corresponding exons in 10 brain tissues with an additional set derived from the average expression across all 10 brain tissues. With the aim of finding common exons or genes which are consistently over expressed or under expressed in these 11 sets of gene expression data matrices, we leveraged the NMF algorithm to find meta exons/genes and corresponding meta patients. We performed this analysis using two approaches. First, a gene-by-gene approach where for all 20, 000 human genes we extracted probes within a 5-kbps window around each gene and derived 11 sets of brain tissue expression data. Next, for these matrices we ran the NMF algorithm 50 times for rank 2 to 6. Next, from the output of multiple NMF runs we extracted meta exons or genes using consensus clustering methods as described here 53 . Next, from these results we counted how many times each probe occurs in each gene program for ranks 2 to 6. Here, frequency of these counts may be interpreted as relative importance of exons which might have deeper biological or disease relevance. In the second approach instead of a gene boundary we ran the NMF analysis over a genomic region e.g. chromosome 1p36 region. Declarations Acknowledgements We would like to thank the School of Public Health, Imperial College London and the Imperial College research computing team for their valuable help and support for this study. We thank Dr Doug Speed for statistical advice and input for the SANAD and Australian cohorts. We thank Dr Adaikalavan Ramasamy for providing access to the UKBEC and NABEC gene expression datasets and helping with data interpretation. Additional information Any additional information required to reanalyse the data reported in this paper is available from the lead contact upon request. Declaration of interests / Competing interests No external or financial interests to be declared. No organs and tissues were procured from prisoners. Correspondence Further information and requests for resources and reagents should be directed to and will be fulfilled by the lead contact, Dr Tisham De ( [email protected] or [email protected] ). Materials availability Samples and further materials related to the SANAD study and the Australian Epilepsy cohort can be requested from the lead author or Dr Michael R Johnson, Department of Brain Sciences, Imperial College London, UK. Materials requests for the UKBEC and NABEC study should be addressed to the respective authors of the original study. Data Availability UKBEC study data is available on Gene Expression Omnibus as described by the authors of the original study. Microarray CEL files and processed data is under the accession number GSE46706. SNP data is available through dbGAP. All data in our database are publicly available or open source hence do not require any ethical guidelines. All association results are available on Zenedo with url https://zenodo.org/records/14946834 Code Availability All codes used to process and analyse data are published, and the source code is currently available at. Multiphen:https://github.com/lachlancoin/MultiPhen cnvHap:https://www.imperial.ac.uk/people/l.coin This study did not generate new reagents or software code. Author contributions T.D., L.J.M.C. and M.R.J. were involved in study design, performed analysis, and wrote the manuscript. L.J.M.C and M.R.J. advised and supervised the CNV, transcriptomics and multi phenotype association aspects of this work. T.D conceived and performed the NMF analysis for the UKBEC transcriptomics dataset. All authors contributed to the overall interpretation of results and the discussion section of the manuscript. References Genetics, N. & 2023. GWAS meta-analysis of over 29,000 people with epilepsy identifies 26 risk loci and subtype-specific genetic architecture. Nat. Genet. 55 , 1471–1482 (2023). Chen, Z., Brodie, M., Liew, D. & Kwan, P. Treatment outcomes in patients with newly diagnosed epilepsy treated with established and new antiepileptic drugs: A 30-year longitudinal cohort study. JAMA Neurol. 75 , 279–286 (2017). Annegers, J. F., Hauser, W. A. & Elveback, L. R. Remission of seizures and relapse in patients with epilepsy. Epilepsia 20 , 729–737 (1979). Speed, D. et al. A genome-wide association study and biological pathway analysis of epilepsy prognosis in a prospective cohort of newly treated epilepsy. Hum. Mol. Genet. 23 , 247–258 (2014). Saggi, S. et al. Surgical outcomes following resection in patients with language dominant posterior quadrant epilepsy. Epilepsy Behav. Rep. 27 , 100695 (2024). Tandon, N., Alexopoulos, A. V., Warbel, A., Najm, I. M. & Bingaman, W. E. Occipital epilepsy: spatial categorization and surgical management: Clinical article. J. Neurosurg. 110 , 306–318 (2009). Lund, C., Brodtkorb, E., Øye, A.-M., Røsby, O. & Selmer, K. K. CHD2 mutations in Lennox-Gastaut syndrome. Epilepsy Behav. 33 , 18–21 (2014). Ramasamy, A. et al. Genetic variability in the regulation of gene expression in ten regions of the human brain. Nat. Neurosci. 17 , 1418–1428 (2014). Hernandez, D. G. et al. Integration of GWAS SNPs and tissue specific expression profiling reveal discrete eQTLs for human traits in blood and brain. Neurobiol. Dis. 47 , 20–28 (2012). Gibbs, J. R. et al. Abundant quantitative trait loci exist for DNA methylation and gene expression in human brain. PLoS Genet. 6 , e1000952 (2010). Keller, M. F., Saad, M., Bras, J., Bettella, F. & Nicolaou, N. International Parkinson’s disease genomics consortium (IPDGC) wellcome trust case control consortium 2 (WTCCC2), 2012. Using genome-wide …. Hum. Mol. Genet. Marangi, G. et al. TRAPPC9-related autosomal recessive intellectual disability: report of a new mutation and clinical phenotype. Eur. J. Hum. Genet. 21 , 229–232 (2013). Shapira, S. K. et al. Chromosome 1p36 deletions: the clinical phenotype and molecular characterization of a common newly delineated syndrome. The American Journal of Human Genetics 61 , 642–650 (1997). Jordan, V. K., Zaveri, H. P. & Scott, D. A. 1p36 deletion syndrome: an update. Appl. Clin. Genet. 8 , 189–200 (2015). Serra-Pages, C., Medley, Q. G., Tang, M., Hart, A. & Streuli, M. Liprins, a family of LAR transmembrane protein-tyrosine phosphatase-interacting proteins. Journal of Biological Chemistry 273 , 15611–15620 (1998). Stucchi, R. et al. Regulation of KIF1A-driven dense core vesicle transport: Ca2+/CaM controls DCV binding and liprin-α/TANC2 recruits DCVs to postsynaptic sites. Cell reports 24 , 685–700 (2018). Bisogno, T. et al. Cloning of the first sn1-DAG lipases points to the spatial and temporal regulation of endocannabinoid signaling in the brain. J. Cell Biol. 163 , 463–468 (2003). Shonesy, B. C. et al. CaMKII regulates diacylglycerol lipase-α and striatal endocannabinoid signaling. Nat. Neurosci. 16 , 456–463 (2013). Ogasawara, D. et al. Rapid and profound rewiring of brain lipid signaling networks by acute diacylglycerol lipase inhibition. Proc. Natl. Acad. Sci. U. S. A. 113 , 26–33 (2016). Banne, E. et al. Neurological disorders associated with WWOX germline mutations—A comprehensive overview. Cells 10 , (2021). Suzuki, H. et al. A spontaneous mutation of the Wwox gene and audiogenic seizures in rats with lethal dwarfism and epilepsy. Genes Brain Behav. 8 , 650–660 (2009). Repudi, S. et al. Neuronal deletion of Wwox, associated with WOREE syndrome, causes epilepsy and myelin defects. Brain : a journal of neurology (2021) doi:10.1093/brain/awab174. Coin, L. J. M., Asher, J. E., Walters, R. G. & Moustafa, J. cnvHap: an integrative population and haplotype–based multiplatform model of SNPs and CNVs. Nature (2010). O’Reilly, P. F. et al. MultiPhen: joint model of multiple phenotypes can increase discovery in GWAS. PLoS One 7 , e34861 (2012). van Karnebeek, C. D. M. et al. NANS-mediated synthesis of sialic acid is required for brain and skeletal development. Nat. Genet. 48 , 777–784 (2016). Wang, B. & Brand-Miller, J. The role and potential of sialic acid in human nutrition. Eur. J. Clin. Nutr. 57 , 1351–1369 (2003). Shen, S.-M. et al. Downregulation of ANP32B, a novel substrate of caspase-3, enhances caspase-3 activation and apoptosis induction in myeloid leukemic cells. Carcinogenesis 31 , 419–426 (2010). Munemasa, Y. et al. Promoter region-specific histone incorporation by the novel histone chaperone ANP32B and DNA-binding factor KLF5. Mol. Cell. Biol. 28 , 1171–1181 (2008). EuroEPINOMICS-RES Consortium, Epilepsy Phenome/Genome Project & Epi4K Consortium. De novo mutations in synaptic transmission genes including DNM1 cause epileptic encephalopathies. Am. J. Hum. Genet. 95 , 360–370 (2014). Montanucci, L. et al. Genome-wide identification and phenotypic characterization of seizure-associated copy number variations in 741,075 individuals. Nat. Commun. 14 , 4392 (2023). Petrovski, S. et al. Germline DE Novo mutations in GNB1 cause severe neurodevelopmental disability, hypotonia, and seizures. Am. J. Hum. Genet. 98 , 1001–1010 (2016). De, T. et al. Signatures of TSPAN8 variants associated with human metabolic regulation and diseases. iScience 24 , 102893 (2021). De, T., Coin, L., Herberg, J., Johnson, M. & Jarvelin, M.-R. Plasma metabolomic signatures for copy number variants and COVID-19 risk loci in Northern Finland Populations. (2024). Tsokas, P. et al. Compensation for PKMζ in long-term potentiation and spatial long-term memory in mutant mice. Elife 5 , (2016). Sacktor, T. PKMzeta, LTP maintenance, and the dynamic molecular biology of memory storage. Prog. Brain Res. 169 , 27–40 (2008). Azuma, M., Toyama, R., Laver, E. & Dawid, I. B. Perturbation of rRNA synthesis in the bap28 mutation leads to apoptosis mediated by p53 in the zebrafish central nervous system. J. Biol. Chem. 281 , 13309–13316 (2006). Diaz, L. R. et al. Ribogenesis boosts controlled by HEATR1-MYC interplay promote transition into brain tumour growth. EMBO Rep. 25 , 168–197 (2024). Altmann, H. M. et al. Homozygous/compound heterozygous triadin mutations associated with autosomal-recessive long-QT syndrome and pediatric sudden cardiac arrest: Elucidation of the triadin knockout syndrome: Elucidation of the triadin knockout syndrome. Circulation 131 , 2051–2060 (2015). Rossi, D. et al. A novel homozygous mutation in the TRDN gene causes a severe form of pediatric malignant ventricular arrhythmia. Heart Rhythm 17 , 296–304 (2020). Rooryck, C. et al. New family with catecholaminergic polymorphic ventricular tachycardia linked to the Triadin gene: Sudden death linked to the triadin gene. J. Cardiovasc. Electrophysiol. 26 , 1146–1150 (2015). Roux-Buisson, N. et al. Absence of triadin, a protein of the calcium release complex, is responsible for cardiac arrhythmia with sudden death in human. Hum. Mol. Genet. 21 , 2759–2767 (2012). Tong, D. et al. The critical role of ASD-related gene CNTNAP3 in regulating synaptic. scholar.archive.org . Agarwala, S. & Ramachandra, N. B. Role of CNTNAP2 in autism manifestation outlines the regulation of signaling between neurons at the synapse. Egypt. J. Med. Hum. Genet. 22 , (2021). Tong, D.-L. et al. The critical role of ASD-related gene CNTNAP3 in regulating synaptic development and social behavior in mice. bioRxiv (2018) doi:10.1101/260083. Møller, R. S. et al. Mutations in GABRB3: From febrile seizures to epileptic encephalopathies. Neurology 88 , 483–492 (2017). Chen, C.-H. et al. Genetic analysis of GABRB3 as a candidate gene of autism spectrum disorders. Mol. Autism 5 , 36 (2014). Tanaka, M., DeLorey, T. M., Delgado-Escueta, A. & Olsen, R. W. GABRB3, epilepsy, and neurodevelopment. (2012). Kim, H., Kang, K. & Kim, J. AEBP2 as a potential targeting protein for Polycomb Repression Complex PRC2. Nucleic Acids Res. (2009). Kim, H., Kang, K., Ekram, M. B., Roh, T.-Y. & Kim, J. Aebp2 as an epigenetic regulator for neural crest cells. PLoS One 6 , e25174 (2011). Chen, S., Jiao, L., Liu, X., Yang, X. & Liu, X. A dimeric structural scaffold for PRC2-PCL targeting to CpG island chromatin. Mol. Cell 77 , 1265–1278.e7 (2020). Kang, H. J. et al. Spatio-temporal transcriptome of the human brain. Nature 478 , 483–489 (2011). Mehta, D. et al. Comprehensive survey of CNVs influencing gene expression in the human brain and its implications for pathophysiology. Neurosci. Res. 79 , 22–33 (2014). Kim, H. & Park, H. Sparse non-negative matrix factorizations via alternating non-negativity-constrained least squares for microarray data analysis. Bioinformatics 23 , 1495–1502 (2007). Additional Declarations No competing interests reported. Supplementary Files Supplementarymaterials.pdf Supplementarytable1Analysisoverview.xlsx Supplementarytable2GNB1Clinvar.xlsx Supplementarytable3SANADOneP36.xlsx Supplementarytable4WWOX.xlsx Supplementarytable5NMF1p36rank1020.xlsx Supplementarytable6SCN1B.xlsx Supplementarytable7transcriptionalsense.xlsx Cite Share Download PDF Status: Published Journal Publication published 04 Dec, 2025 Read the published version in Scientific Reports → Version 1 posted Editorial decision: Revision requested 24 Jun, 2025 Reviews received at journal 15 Jun, 2025 Reviewers agreed at journal 05 Jun, 2025 Reviews received at journal 12 Apr, 2025 Reviewers agreed at journal 08 Apr, 2025 Reviewers agreed at journal 29 Mar, 2025 Reviewers invited by journal 27 Mar, 2025 Editor assigned by journal 24 Mar, 2025 Editor invited by journal 24 Mar, 2025 Submission checks completed at journal 20 Mar, 2025 First submitted to journal 28 Feb, 2025 You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-6130694","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Article","associatedPublications":[],"authors":[{"id":442166718,"identity":"31962c01-5894-485c-899a-8bf087d875bb","order_by":0,"name":"Tisham De","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAABGUlEQVRIie2RMUvDQBiGv3CQW05dL9TUv3AlkC5S/0rCQaYMHR0EC0KcNOv5L3S51SuBZom4ZugQl2xCXYQOimdrHTSXWfCe5eD7eLj35QOwWP4iCNzPx5khNG+i7cRRmxXpNsi3gjPOvhToV2CnkCqku2mvcoJx26xT8HOaJKdP6XI4vkSNgrMJsEoZgpHx6EpCcCP4oo5lGxwWLlOw4MAeZibFpXsS4ts6SbRSxAKBVlwF7NFUH7fem1bu6zScauVcILxS8N6nQDjY/KLrg1YiighTTqbMwQoSDnxJA1FmnOouI4HIVMXXnHiG+jgvW+9ZHvv5BZq/rOXyiB6Ud83qdTLcr6LuZFt+XAQi41V+053FYrFY/jcfMoxfX4vplPkAAAAASUVORK5CYII=","orcid":"","institution":"Imperial College London","correspondingAuthor":true,"prefix":"","firstName":"Tisham","middleName":"","lastName":"De","suffix":""},{"id":442166719,"identity":"079ad902-adcc-4dda-8325-9541a8b49f56","order_by":1,"name":"Lachlan Coin","email":"","orcid":"","institution":"The University of Melbourne","correspondingAuthor":false,"prefix":"","firstName":"Lachlan","middleName":"","lastName":"Coin","suffix":""},{"id":442166720,"identity":"46508136-be49-460a-a126-f5dcf0917544","order_by":2,"name":"Michael R Johnson","email":"","orcid":"","institution":"Imperial College London","correspondingAuthor":false,"prefix":"","firstName":"Michael","middleName":"R","lastName":"Johnson","suffix":""}],"badges":[],"createdAt":"2025-02-28 17:53:21","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-6130694/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-6130694/v1","draftVersion":[],"editorialEvents":[{"content":"https://doi.org/10.1038/s41598-025-28338-2","type":"published","date":"2025-12-04T15:57:47+00:00"}],"editorialNote":"","failedWorkflow":false,"files":[{"id":80637986,"identity":"f0070ebe-0cd4-4104-b717-9bf91d336aac","added_by":"auto","created_at":"2025-04-15 12:47:57","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":146067,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eManhattan plot. \u003c/strong\u003eFigure depicting the CNVs associated with seizure phenotypes in the SANAD cohort. Genes with p value \u0026lt; 1e-25 are highlighted and marked.\u003c/p\u003e","description":"","filename":"Figure1.png","url":"https://assets-eu.researchsquare.com/files/rs-6130694/v1/76d5d6c1b7b1cbed0b1e0c28.png"},{"id":80637985,"identity":"95ee6c02-21a0-416a-a0d0-7cb4f27e7e36","added_by":"auto","created_at":"2025-04-15 12:47:57","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":1485620,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003ea) GNB1 gene cluster. \u003c/strong\u003eFigure showing the ~ 1 megabase gene cluster in the chromosome 1p36 region for CNVs significantly associated with seizure related phenotypes in SANAD. Notable results included TTLL10, GNB1 and PRKCZ.\u003cstrong\u003eb) Protein structures of GNB1. \u003c/strong\u003eThree-dimensional protein structures from the pdb database showing the protein complexes of GNB1 bound to \u003cstrong\u003ei) \u003c/strong\u003egrowth hormone receptor protein GHRHR and \u003cstrong\u003eii)\u003c/strong\u003e serotonin receptor protein HTR1B.\u003c/p\u003e","description":"","filename":"Figure2.png","url":"https://assets-eu.researchsquare.com/files/rs-6130694/v1/c7cbb4e5ceb88939e81ab172.png"},{"id":80638889,"identity":"364ffafa-e7f8-4f18-a9c1-7a87e7c0e25c","added_by":"auto","created_at":"2025-04-15 12:55:57","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":1339304,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eCNV QTL results in the UKBEC study. a) \u003c/strong\u003eFigure highlights the ~ 1 megabase window where the top (rank 1) hits from UKBEC analysis for ten brain regions were found to be clustered (\u003cstrong\u003eSupplementary table 1 d,e\u003c/strong\u003e). Notable genes included TDRD7, ANP32B and NANS (GABBR2 also lies within this cluster and is a known epilepsy gene but was not found to be significant in our results). \u003cstrong\u003eb) \u003c/strong\u003eProtein complex from the pdb database depicting the three-dimensional structure of GNB1 bound to GABBR2.\u003c/p\u003e","description":"","filename":"Figure3.png","url":"https://assets-eu.researchsquare.com/files/rs-6130694/v1/e3a13926cc2ad4cfd19b47d8.png"},{"id":80637988,"identity":"1b6397f0-fa8f-4c9d-ba1b-1b1c7c38c7b5","added_by":"auto","created_at":"2025-04-15 12:47:57","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":116127,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eThe genetic link between cortisol, serotonin and dopamine. \u003c/strong\u003eTop panel shows the cis CNV QTL of cortisol receptor gene CRHR2 with INMT and its link to the tryptamine metabolite. The bottom panel shows the reciprocal cis CNV QTL (or dosage) of DRD2 and HTR3B and its link to the serotonin synthesis pathway.\u003c/p\u003e","description":"","filename":"Figure4.png","url":"https://assets-eu.researchsquare.com/files/rs-6130694/v1/f9e51b30607b81773451814b.png"},{"id":80637992,"identity":"eee41c8b-a61b-43b9-9082-98d263c39ae8","added_by":"auto","created_at":"2025-04-15 12:47:57","extension":"png","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":676412,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eNMF analysis in the UKBEC study. a) \u003c/strong\u003eSummary of the brain regions analysed in the UKBEC and NABEC studies. b) Overview of the NMF deconvolution analysis in the UKBEC gene expression dataset.\u003c/p\u003e","description":"","filename":"Figure5.png","url":"https://assets-eu.researchsquare.com/files/rs-6130694/v1/c06ab164a1eba2cecfc4160e.png"},{"id":80637991,"identity":"c70287c3-2e7c-47fb-8541-6af9cd368380","added_by":"auto","created_at":"2025-04-15 12:47:57","extension":"png","order_by":6,"title":"Figure 6","display":"","copyAsset":false,"role":"figure","size":461276,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eNMF gene clusters. \u003c/strong\u003eOverview of the two gene clusters uncovered through the NMF analysis in the UKBEC gene expression dataset. Average all is derived by averaging gene expression across the ten brain regions for every probe id whereas Full set refers to the combined gene expression matrix from all brain regions.\u003c/p\u003e","description":"","filename":"Figure6.png","url":"https://assets-eu.researchsquare.com/files/rs-6130694/v1/5e195d884f91aeb490f013db.png"},{"id":97723850,"identity":"98abd059-f33d-4f56-bdc1-eff13c5499aa","added_by":"auto","created_at":"2025-12-08 16:08:46","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":6937357,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-6130694/v1/369a1813-77a0-4890-9e76-2e02bc363e5e.pdf"},{"id":80638888,"identity":"7de6a843-36c1-477d-ae63-81a964075ba9","added_by":"auto","created_at":"2025-04-15 12:55:57","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"supplement","size":1536168,"visible":true,"origin":"","legend":"","description":"","filename":"Supplementarymaterials.pdf","url":"https://assets-eu.researchsquare.com/files/rs-6130694/v1/e9b03c3a32ac99324d62106c.pdf"},{"id":80639552,"identity":"6c85baae-39b0-46f6-8732-8816f8a4b1f8","added_by":"auto","created_at":"2025-04-15 13:03:57","extension":"xlsx","order_by":1,"title":"","display":"","copyAsset":false,"role":"supplement","size":15554,"visible":true,"origin":"","legend":"","description":"","filename":"Supplementarytable1Analysisoverview.xlsx","url":"https://assets-eu.researchsquare.com/files/rs-6130694/v1/e96de16bd9c215f6917d9543.xlsx"},{"id":80640842,"identity":"f2cbdc30-dfc6-472a-aee0-02e4bfa5caa2","added_by":"auto","created_at":"2025-04-15 13:11:57","extension":"xlsx","order_by":2,"title":"","display":"","copyAsset":false,"role":"supplement","size":102928,"visible":true,"origin":"","legend":"","description":"","filename":"Supplementarytable2GNB1Clinvar.xlsx","url":"https://assets-eu.researchsquare.com/files/rs-6130694/v1/4e050f7f13e29da9d9546300.xlsx"},{"id":80641133,"identity":"e5b11d74-5ba9-47c5-8ffa-6f251d4a5294","added_by":"auto","created_at":"2025-04-15 13:19:57","extension":"xlsx","order_by":3,"title":"","display":"","copyAsset":false,"role":"supplement","size":1892779,"visible":true,"origin":"","legend":"","description":"","filename":"Supplementarytable3SANADOneP36.xlsx","url":"https://assets-eu.researchsquare.com/files/rs-6130694/v1/e812b45ab695a3a43033d746.xlsx"},{"id":80639554,"identity":"a1015fab-ac63-4ce2-beee-bd439d8273da","added_by":"auto","created_at":"2025-04-15 13:03:57","extension":"xlsx","order_by":4,"title":"","display":"","copyAsset":false,"role":"supplement","size":1591510,"visible":true,"origin":"","legend":"","description":"","filename":"Supplementarytable4WWOX.xlsx","url":"https://assets-eu.researchsquare.com/files/rs-6130694/v1/7652c5937351c1d5fdcfa3db.xlsx"},{"id":80638897,"identity":"2cbf4d32-587b-4e02-86ba-6dac7a4af7c4","added_by":"auto","created_at":"2025-04-15 12:55:57","extension":"xlsx","order_by":5,"title":"","display":"","copyAsset":false,"role":"supplement","size":117845,"visible":true,"origin":"","legend":"","description":"","filename":"Supplementarytable5NMF1p36rank1020.xlsx","url":"https://assets-eu.researchsquare.com/files/rs-6130694/v1/818378dc8879cff7786d580d.xlsx"},{"id":80638000,"identity":"b44d3586-2086-4b27-b991-f81c1089b49c","added_by":"auto","created_at":"2025-04-15 12:47:57","extension":"xlsx","order_by":6,"title":"","display":"","copyAsset":false,"role":"supplement","size":21443,"visible":true,"origin":"","legend":"","description":"","filename":"Supplementarytable6SCN1B.xlsx","url":"https://assets-eu.researchsquare.com/files/rs-6130694/v1/f91955d9c99099a31b7d991c.xlsx"},{"id":80637993,"identity":"215ddd61-fd20-4403-b759-f488d6d9b9c1","added_by":"auto","created_at":"2025-04-15 12:47:57","extension":"xlsx","order_by":7,"title":"","display":"","copyAsset":false,"role":"supplement","size":10153,"visible":true,"origin":"","legend":"","description":"","filename":"Supplementarytable7transcriptionalsense.xlsx","url":"https://assets-eu.researchsquare.com/files/rs-6130694/v1/817b4d4622bd50485b55f7ae.xlsx"}],"financialInterests":"No competing interests reported.","formattedTitle":"Dosage effect of Copy Number Variation in Epilepsy and ten regions of the human brain ","fulltext":[{"header":"Introduction","content":"\u003cp\u003eEpilepsy is a common neurological disease affecting around 1% of the population worldwide. Anti-epileptic drugs in general work well for 60% of Epilepsy patients who successfully achieve seizure control with current medication within a year or two, however for around one third or 20-30% patients, the latest anti-epilepsy drugs (25 licensed drugs worldwide\u003csup\u003e1,2\u003c/sup\u003e) do not work well and these patients continue to have regular seizures\u003csup\u003e3,4\u003c/sup\u003e. Further it has been observed that resistance to one anti-epileptic drug correlates well with resistance to all other drugs. Current anti-epileptic medication is not prescribed for vulnerable groups such as pregnant women. One off seizure are quite common in young children and adults but are usually termed benign. For some refractory groups of Epilepsy patients who do not respond to standard medication a brain surgery is required to control seizures\u003csup\u003e5,6\u003c/sup\u003e. In these cases, first an exploratory surgery is required to identify regions of the brain where seizures originated (which can be almost anywhere in the brain) and then a second surgery is needed to remove these regions. In some rare cases of Epilepsy known as Lennox-gastaut syndromes, which often originates in the occipital lobe of the brain, a child may have numerous seizures a day. As a part of the CADET trial (Children\u0026rsquo;s Adaptive Deep brain stimulation for Epilepsy Trial), United Kingdom\u0026rsquo;s first patient, a child with Lennox-gastaut syndrome with mutations in the SCN1Bgene\u003csup\u003e\u003csup\u003e[1]\u003c/sup\u003e\u003c/sup\u003e,was successfully implanted with a device in the brain to control seizures through electric pulses. Chromodomain-helicase-DNA-binding protein 2 or the CHD2 gene is another candidate gene for this syndrome\u003csup\u003e7\u003c/sup\u003e. \u003c/p\u003e\n\n\u003cp\u003eHowever, little is known about the biology of brain seizures, the regions where it originates and the causal mechanisms behind it. Thus, the genetic basis of seizures remains an open question in neurology. Here we present comprehensive copy number variation (CNV) analysis for Epilepsy phenotypes including seizure counts, seizure frequency and 12-month remission to anti-epileptic drugs in two cohorts, denoted as SANAD and Australian cohort\u003csup\u003e4\u003c/sup\u003e. In addition we have also analysed and reported here CNV-gene expression signatures (CNV eQTLs) in ten regions of the normal human brain from the UKBEC\u003csup\u003e8\u003c/sup\u003e and NABEC\u003csup\u003e9\u0026ndash;11\u003c/sup\u003e studies. We leveraged all analyses to decipher new gene clusters and loci for neurobiology of seizures and report the phenomenon of reciprocal CNV dosage in genes related to neurotransmitters and GPCR mediated signal transduction in different regions of the human brain. \u003c/p\u003e\n\u003cp\u003e\u003csup\u003e\u003csup\u003e[1]\u003c/sup\u003e\u003c/sup\u003e https://www.gosh.nhs.uk/news/first-uk-trial-of-deep-brain-stimulation-for-children-with-epilepsy-begins-at-gosh/ \u003c/p\u003e"},{"header":"Results","content":"\u003cp\u003e\u003cstrong\u003eCNV analysis in Epilepsy cohorts\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eIn our main discovery cohort SANAD, the top hit for CNV genotypes was GNB1 for the phenotype total number of seizures (chr1:1745726, p-value=2.89e-168, MAF=1.1%) and seizure frequency (chr1:1745726, p-value=2.82e-95) (\u003cstrong\u003eFigure 1,\u003c/strong\u003e \u003cstrong\u003eSupplementary table 1a\u003c/strong\u003e). In addition, GNB1 also replicated as the top hit in the joint model (chr1:1745726, p-value=6.3e-202) and joint model with variable selection using CNV genotypes (chr1:1745726, p-value=2.27e-207) (\u003cstrong\u003eSupplementary table 1a\u003c/strong\u003e). In the clinvar database annotations for GNB1 we observed that pathogenic CNVs were more numerous than pathogenic SNVs thus adding more weight to our CNV results (\u003cstrong\u003eSupplementary table 2\u003c/strong\u003e). In Log R Ratio (LRR) based models the strongest signal was found within \u003cem\u003egrowth hormone receptor gene \u003c/em\u003e(GHR) (chr5:42569642, p-value \u0026lt; 6e-128) (\u003cstrong\u003eSupplementary table 1a\u003c/strong\u003e). In SANAD the top gene for drug-response was TRAPPC9 (chr8:140765991, p-value=1.7e-05, LRR univariate model, \u003cstrong\u003eSupplementary table 1a\u003c/strong\u003e). TRAPPC9 is used for clinical diagnosis of a rare neuro-endocrine disease\u003cem\u003e Intellectual disability-obesity-brain malformations-facial dysmorphism syndrome\u003c/em\u003e\u003cem\u003e\u003csup\u003e12\u003c/sup\u003e\u003c/em\u003e\u003cem\u003e \u003c/em\u003e(Malacards.org).\u003c/p\u003e\n\u003cp\u003eWe further noticed that in SANAD GNB1 along with several other seizure-associated genes such as PRKCZ (chr1:2082566, p-value=1.8e-41, MAF=1.08%) and CDK11A (chr1:1645366, p-value=6.25e-8, MAF=2.4%) were clustered within a one mega base window in the chromosome 1p36 region (\u003cstrong\u003eFigure 2a\u003c/strong\u003e). Of note, GNB1 has been shown to bind to growth hormone releasing hormone receptor (GHRHR) and 5-hydroxytryptamine receptor 1B (HTR1B or serotonin receptor) (\u003cstrong\u003eFigure 2b\u003c/strong\u003e). The 1p36 region is often used for karyotyping and diagnosing neurodevelopmental disorders related to \u003cem\u003e1p36 deletion syndrome\u003c/em\u003e\u003cem\u003e\u003csup\u003e13,14\u003c/sup\u003e\u003c/em\u003e\u003cem\u003e.\u003c/em\u003e This fact further corroborates our results in SANAD where for seizure-related phenotypes, we found numerous contiguous probes with MAF \u0026gt;1% and significantly low p-values (e.g. NBPF1, \u003cstrong\u003eSupplementary table 3\u003c/strong\u003e). \u003c/p\u003e\n\u003cp\u003eIn SANAD outside the chromosome 1p36 region we found several genes of interest including six loci exceeding the p value threshold of \u0026lt; 1e-25 and MAF \u0026gt;=1% (\u003cstrong\u003eFigure 1\u003c/strong\u003e). Amongst these some notable genes of interest included HEATR1, CNTNAP3 and GABRB3 (See discussion for further details). In the independent analysis of the Australian cohort the top gene of interest for 12-month remission to anti-epilepsy drugs was found within the PPFIA2 gene using CNV genotype univariate analysis (chr12:82081470, p-value= 6.21e-06, MAF= 4.4%). PPFIA2 also replicated as the top hit in the CNV genotype multivariate joint model with variable selection (chr12:82081470, p-value=2.21e-06) (\u003cstrong\u003eSupplementary table 1b\u003c/strong\u003e). PPFIA2 belongs to the LAR protein-tyrosine phosphatase-interacting protein (liprin) family\u003csup\u003e15\u003c/sup\u003e and is known to be involved in pathways related to Neurotransmitter release cycle and transmission across chemical synapses (https://pathcards.genecards.org/). Previous studies\u003csup\u003e16\u003c/sup\u003e have indicated that Ca\u003csup\u003e2+\u003c/sup\u003e modulated lirpin-𝞪 proteins capture KIF1A-driven dense core vesicles (DCV) in dendritic spines\u003csup\u003e16\u003c/sup\u003e. \u003c/p\u003e\n\u003cp\u003eIn the joint meta-analysis of SANAD and Australian cohort the top hit of interest was a locus within the DAGLA gene (chr11:61462424, p-value=0.000301, MAF=1%). DAGLA is a neural stem-cell derived dendrite regulator which is involved in 2-Arachidonoylglycerol (2-AG) signaling in the central nervous system (CNS)\u003csup\u003e17\u0026ndash;19\u003c/sup\u003e. It helps in axonal growth during neurogenesis in early stages of development and in addition helps with neuroinflammatory response in the brain (\u003cstrong\u003eSupplementary table 1c\u003c/strong\u003e). Other interesting results for meta-analysis included ASIC2 (Acid Sensing Ion Channel Subunit 2) (chr17:31454867, p-value=8.54e-19) which was the top hit in LRR multivariate models whereas GNB1 successfully replicated in the LRR univariate model (chr1:1793111, meta p-value=9.49e-16). \u003c/p\u003e\n\u003cp\u003eAcross all analyses and cohorts, the most distinct common CNV-phenotype signal with the highest number of contiguous probes and significant p-values was found within the WWOX gene (MAF=47%, 48 contiguous probes for CNV genotypes and 66 contiguous probes in the LRR analysis) (\u003cstrong\u003eSupplementary table 4\u003c/strong\u003e). This CNV was found to be exclusively associated with drug-response phenotype in SANAD. Though WWOX is a known candidate gene for epilepsy and could potentially have real seizure related effects in the brain\u003csup\u003e20\u0026ndash;22\u003c/sup\u003e, its association with the drug-response was less convincing in the meta-analysis and in the Australian cohort. The fact that WWOX harbors one of the most fragile sites in the human genome further puts these results in the grey area for biological interpretation and warrants further experimental validation. \u003c/p\u003e\n\n\u003cp\u003e\u003cstrong\u003eCNV dosage effects in the UKBEC and NABEC cohorts\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eBriefly, in the UKBEC study we generated two sets of CNVs calls from two different genotyping platforms namely Illumina Omni 1M chip and a custom Immuno chip. Next we analysed these two datasets independently with Illumina platform specific parameters in cnvHap\u003csup\u003e23\u003c/sup\u003e (See methods). In the Omni 1M chip we detected 9,242 homozygous deletions (type 0), 1,29,929 heterozygous deletions (type 1), 7,840 heterozygous duplications (type 3) and 546 homozygous duplications (type 4). Genome-wide CNV breakpoint information for all cohorts is available in the supplementary data section. Next for every probe in these two platforms we derived expected CNV genotypes based on posterior probability from cnvHap and used these values for univariate and multivariate association analyses using the multiphen\u003csup\u003e24\u003c/sup\u003e method with gene expression data from different brain regions in the UKBEC study (see methods). In total in the UKBEC analysis we generated 96 sets of transcriptome-wide CNV QTL results across CNV genotypes, LRR, univariate and multivariate methods for eight brain regions. The top (rank 1) result for these analyses is reported in \u003cstrong\u003eSupplementary table 1d, 1e\u003c/strong\u003e. The most notable observation in these results is a cluster of significant CNV QTLs within one mega base pair window on chromosome 9 which consistently harbored the top hit from different brain regions (\u003cstrong\u003eFigure 3a\u003c/strong\u003e, \u003cstrong\u003eSupplementary table 1d, 1e\u003c/strong\u003e). Two genes of interest from this cluster included TDRD7 (p-value \u0026lt; 1e-269, MAF=2.6%) for CNV genotypes and NANS for LRR based models (p-value \u0026lt; 1e-269). N-acetyl-neuraminic acid synthase or the NANS gene synthesises sialic acid in humans and has the highest concentration in the brain. Biallelic recessive mutations in NANS are known to cause intellectual disability with short stature\u003csup\u003e25\u003c/sup\u003e and plays an important function in neural transmission and ganglioside structures in synaptogenesis\u003csup\u003e26\u003c/sup\u003e. Interestingly in the UKBEC Immuno chip dataset the top hit ANP32B (p-value \u0026lt; 1e-269, MAF=1.8%) was located within the same chromosome 9 gene cluster containing TDRD7 and NANS. Here, ANP32B, like TDRD7 and NANS was consistently the top hit in all brain regions (\u003cstrong\u003eFigure 3a, Supplementary table 1d, 1e\u003c/strong\u003e). ANP32B is known to regulate gene expression\u003csup\u003e27\u003c/sup\u003e and leads to transcriptional repression of the KLF5 gene\u003csup\u003e28\u003c/sup\u003e. Interestingly NANS and ANP32B also lie near the GABBR2 gene. GABBR2, though not associated with any of our epilepsy phenotypes or harbored any CNV with MAF \u0026gt;1% in any of the cohorts, is known to cause Developmental and epileptic encephalopathy 59 (DEE59)\u003csup\u003e29\u003c/sup\u003e (malacards) and was further shown to bind to GNB1 (\u003cstrong\u003eFigure 3b\u003c/strong\u003e).\u003c/p\u003e\n\u003cp\u003eIn our analysis for UKBEC and NABEC datasets we discovered many additional significant loci and regions of interest. One important example we highlight here is the significant cis CNV QTL between serotonin and dopamine receptors on chromosome 11. In this instance we found probes in the HTR3B gene to be significantly associated with DRD2 gene expression (e.g. CNV genotypes at chr11:113802601 MAF=1.1% associated with DRD2 probe id 3391654 with p-value=5.88e-81 in white matter brain region) and probes in the DRD2 gene was found to be significantly associated with HTR3B gene expression (e.g. LRR in the DRD2 gene was associated with HTR3B probe id 3349661 with p-value \u0026lt; 1e-308 using LRR joint model with variable selection in cerebellum brain region) (\u003cstrong\u003eFigure 4\u003c/strong\u003e). Further delving into the serotonin biosynthesis pathway we additionally discovered significant cis CNV QTL for CNVs in the cortisol receptor gene CRHR2 which was associated with the expression of the nearby INMT gene (p-value \u0026lt; 1e-20 in putamen and frontal cortex brain regions using LRR joint model with variable selection method). INMT is known to N-methylate indoles such as tryptamine, which interacts with tryptophan to produce serotonin (\u003cstrong\u003eFigure 4\u003c/strong\u003e). \u003c/p\u003e\n\u003cp\u003eSo far, all analyses we have reported here are based on a overlapping moving window across the genome i.e. CNV calling and association were performed for segments of the genome consisting of both genic and intergenic regions. In addition to this, for the UKBEC Omni chip dataset, we also performed CNV analysis on a gene-by-gene basis. The motivation for this approach is to find CNVs lying within a gene and to further uncover its local effect on exonic expression. This approach could potentially help identify relative importance of exons in a given tissue. To this end, we called CNVs through the cnvHap HMM model but for all human genes individually (with a 5 kilo base pair window around gene boundary). Next, we derived expected CNV genotypes and associated them with eight different UKBEC gene expression matrices using our univariate and multivariate association models. Some example results for genes with common CNVs are as follows. For the putamen brain region and in the HEATR1 gene the CNV-dosage analysis using Multiphen joint model with variable selection identified exon 4 to be most significant (probe id=2462530), for PRKCZ it was exon 10 (probe id=2316299) and for WWOX it was exon 5 (close to the last exon, probe id=3700865) (\u003cstrong\u003eSupplementary figure 1\u003c/strong\u003e). A joint consensus analysis of important exons identified through CNV genotypes, LRR, univariate and multivariate methods remain to be explored. However, we have provided a complete set of results containing CNV-dosage effects in all brain regions using different methods in the supplementary data section. \u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eNMF analysis if UKBEC Omni dataset \u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eApplication of the non-negative factorization method allowed us to deconvolute the UKBEC gene expression dataset from 10 regions into meta exons or genes (also referred to as hidden genes or patients in the literature). This transformation of gene expression data can be biologically interpreted as exons and genes which are consistently over or under expressed in different regions of the human brain (\u003cstrong\u003eFigure 5 a, b\u003c/strong\u003e). For the 1p36 region we ran the NMF algorithm on two expression matrices. One is derived from averaging the expression values across 10 regions (aveALL) and second is a combined expression matrix including all brain regions (Full set). On analysing the relative frequency of individual exons in these results (NMF consensus clustering) we discovered two gene clusters in the chromosome 1p36 region (\u003cstrong\u003eFigure 6, Supplementary table 5\u003c/strong\u003e). The first cluster was located around CDC42 gene with nearby genes (~1 mega base pair window) such as RAP1GAP, USP48, HSPG2 and LUZP1. The second cluster was found around CHD5 and included nearby genes KCNAB2, CAMTA1 and ACOT7. Further, application of NMF for every gene individually (with 5 kilo base pairs window around gene boundary) allowed us to see which exons are consistently over or under expressed across various brain regions. In summary, first we applied the NMF deconvolution to the chromosome 1p36 region in the UKBEC omni dataset with rank 10 and 20 for two expression matrices derived from all brain regions, and then separately to all known human genes (rank 2 to 6, for 2 derived expression sets for ten brain regions and in addition each brain region separately (see methods). \u003c/p\u003e\n"},{"header":"Discussion","content":"\u003cp\u003eOur results complement the recent large-scale studies in Epilepsy GWAS\u003csup\u003e1,30\u003c/sup\u003e by filling the gap for small and intermediate CNVs and its effect on seizures and drug response phenotypes. We found that population level analysis of CNVs leverages more power to detect previous findings such as GNB1 for neurodevelopmental disorders\u003csup\u003e31\u003c/sup\u003e as well as uncovered new disease associated loci. Existing protein structures depicting the interaction of GNB1 with HTR1B, and growth hormone-releasing hormone (GHRHR) receptors further strengthens our CNV results in the SANAD cohort. These observations also highlight the power of population aware methods for CNV genotyping\u003csup\u003e32\u003c/sup\u003e. This is especially significant since unlike the cnvHap algorithm current methods for CNV detection from bead array platforms usually apply hidden markov model (HMM) in a single sample mode and fail to capture information which could be leveraged for modelling the dosage landscape in the human genome\u003csup\u003e33\u003c/sup\u003e. These aspects of our analyses along with univariate and multivariate models distinguish our results from the current literature and further lead to new findings. Some highlights of new genes we reported for Epilepsy phenotypes like seizure counts and seizure frequency included PRKCZ, HEATR1 TRDN, CNTNAP3, AEBP2 and GABRB3.\u003c/p\u003e\n\u003cp\u003ePRKCZ is a calcium (Ca\u003csup\u003e2+\u003c/sup\u003e) dependent gene known for memory function or long-term potentiation\u003csup\u003e34,35\u003c/sup\u003e. HEATR1 also known as BAP28 is required for pre-ribosomal RNA transcription by RNA polymerase I and is known to cause brain abnormalities in zebrafish\u003csup\u003e36\u003c/sup\u003e and drosophila\u003csup\u003e37\u003c/sup\u003e. TRDN leads to muscle contraction by Ca\u003csup\u003e2+\u003c/sup\u003e release and a known causal gene for \u003cem\u003eCardiac arrhythmia syndrome with or without skeletal muscle weakness \u003c/em\u003e(CARDAR)\u003csup\u003e38\u0026ndash;41\u003c/sup\u003e (malacards). The relevance of CNTNAP3\u003csup\u003e42\u0026ndash;44\u003c/sup\u003e and GABRB3\u003csup\u003e45\u0026ndash;47\u003c/sup\u003e for neurological diseases is well documented in current literature (\u003cstrong\u003eSupplementary figure 2\u003c/strong\u003e). Lastly, the AEBP2 gene codes for a subunit in the core Polycomb repressive complex 2 (PRC2)\u003csup\u003e48,49\u003c/sup\u003e which affects histone H3K27 (H3K27me3) trimethylation\u003csup\u003e50\u003c/sup\u003e on the chromatin leading to long term epigenetic silencing, also referred as cellular memory. Based on these observations we suggest the possibility that these genes might lead to the manifestations of secondary symptoms in Epilepsy or other neurodevelopmental disorders (e.g memory loss or arrhythmia). These results also strongly indicate that transcriptional regulation and heterogeneity is likely to be important for critical brain functions and homeostasis. \u003c/p\u003e\n\u003cp\u003eOur CNV results for UKBEC and NABEC cohorts complement the current SNP QTL results\u003csup\u003e8,51,52\u003c/sup\u003e in current literature. To the best of our knowledge our study is one of few which has elucidated the dosage effect of small CNVs in different regions of the brain in a comprehensive manner. Further our replication model based on LRR signals was consistent in uncovering meaningful genes for neurology or the brain functions. For instance, in the case of the SCN1B gene, which is a known candidate gene for Lennox-Gastaut syndrome, we did not detect any CNVs in all our cohorts (which could be due to low probe density). However, by using the LRR based model in the UKBEC dataset we uncovered suggestive signals in particular brain regions such as putamen, white matter and occipital cortex (\u003cstrong\u003eSupplementary table 6\u003c/strong\u003e). \u003c/p\u003e\n\u003cp\u003eThe convergence of top hits of CNV QTL in UKBEC Omni chip and Immuno chip data to the chromosome 9 gene cluster which also happens to harbor the GABBR2 gene (G protein-coupled receptor 3 family and GABA-B receptor subfamily) strengthens the clustering phenomenon earlier reported in the original UKBEC SNP eQTL study. Further experiments aimed at better understanding the causal mechanisms behind such clustering could potentially provide new information related to human brain function and warrants further investigations. Importantly based on our observations we suggest that\u003cem\u003e in-tandem \u003c/em\u003etranscriptional regulation of collocated genes in the genome could be an important mechanism by which important molecular functions are carried out in the brain. To this end our NMF results from the UKBEC expression dataset hint that transcriptional sense and relative position of exons (i.e. exons which are transcribed early along the RNA polymerase machinery) might have higher or lower overall RNA concentration in the cell (\u003cstrong\u003eSupplementary table 7\u003c/strong\u003e). This differential expression of exons which could potentially affect splicing might be interpreted as \u003cem\u003eRNA amplitude\u003c/em\u003e for a given exon in a tissue. The clustering of CNV-phenotype signals in SANAD e.g. GNB1 region on chromosome 1p36 or the chromosome 9 CNV QTL cluster in UKBEC results provides suggestive evidence for the existence of such phenomenon. However, the underlying mechanism for this remains to be elucidated and experimentally validated. Lastly, the reciprocal CNV dosage effect we observed in the HTR3B and DRD2 genes and its link to the CRHR2 - INMT metabolic pathway is potentially a new genetic finding. This molecular axis possibly explains how the brain maintains homeostasis and balances the critical role of neurotransmitters and hormones through the genomic regulation of serotonin, dopamine and cortisol. Based on these findings and the results for GNB1 we conclude that CNVs through its effect on protein receptor complexes has an important role to play in neurological diseases and maintaining homeostasis in the brain. \u003c/p\u003e"},{"header":"Methods","content":"\u003cp\u003e\u003cstrong\u003eCohorts\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eEpilepsy cohorts\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eSimilar to our previous study\u003csup\u003e4\u003c/sup\u003e, in our main CNV analysis in Epilepsy we adopted a prospective cohort design instead of case-control designs. Our discovery cohort consisted of 916 subjects from the Standard and New AED (SANAD) clinical trial and a secondary replication cohort consisted of 380 subjects recruited from the royal Melbourne and Austin hospitals in Australia. A brief description of all clinical variables analysed in these cohorts, including common phenotypes for meta-analysis is described in the supplementary material. The main phenotype categories analysed can be broadly divided into seizure related phenotypes (e.g. seizure frequency, total number of seizures etc) and 12-month remission to AED medication (responders vs non-responders). Epilepsy and seizure classification were based on the latest ILAE guidance. The main types of Epilepsy in our study consisted of focal, generalized and unclassified Epilepsy. Genotyping of UK and Australian samples was carried out using Illumina 660 bead chips at the Sanger Centre at different points of time. Further quality control measures based on heterozygosity, sample call rates and relatedness were carried out on the raw intensity data. Additional information on genotyping and quality control for these cohorts can be found in our earlier reports\u003csup\u003e4\u003c/sup\u003e. \u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eThe UKBEC and NABEC study \u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe UKBEC study consists of 134 individuals of European descent who were confirmed to be neuropathologically normal during life and had a median age of 59 years of age\u003csup\u003e8\u003c/sup\u003e. 74.5% of patients are male and the most common cause of death in these individuals was heart attack. Further details about tissue collection and genotyping have been reported in detail in earlier studies\u003csup\u003e8\u003c/sup\u003e. Briefly, the whole transcriptome was available for 10 brain regions based on their relevance to human disease and reported to exhibit high expression profiles. RNA was extracted from postmortem brain tissues with randomization and checked for quality. Next, processed RNA was analysed through the Affymetrix Exon 1.0 ST array. Next, all arrays were processed by robust multi-array average normalisation and log 2 transformed in two different ways. Genomic DNA from samples from the post-mortem brain tissue using Qiagen’s DNeasy kit and subsequently genotyped on Illumina Omni-Quad bead chip and a custom Immunochip designed to fine map autoimmune disorders. GenomeStudio v.1.8.x was used for processing intensity data from which log R ratio (LRR) and B allele frequency (BAF) was derived and exported for CNV analysis. \u003c/p\u003e\n\u003cp\u003eThe NABEC study\u003csup\u003e9,10\u003c/sup\u003e consists of approximately 360 individuals of European descent and free from any neuropathological disorders. RNA was quality checked and extracted for hybridization onto Human HT12v3 expression beadchips. Raw gene expression data was further transformed using cubic spline and log transformed. Next, expression values were re-mapped using ReMOAT onto human genome build 19 and annotated with genes with reliable data and free from common polymorphisms. Genomic DNA for the NABEC study was extracted and genotyped on Illumina Infinium HumanHap550 chips. Similar to the UKBEC study, intensity values LRR and BAF were processed using genome studio software and exported for subsequent CNV analysis. \u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eCNV calling - two approaches - moving window and gene by gene approach \u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eSimilar to our earlier studies\u003csup\u003e32,33\u003c/sup\u003e we relied on the cnvHap algorithm\u003csup\u003e23\u003c/sup\u003e as the main CNV discovery method. cnvHap is a multi-platform CNV-SNP haplotype based CNV detection algorithm which uses Log-r ratio and B-allele frequency jointly to discover and genotype CNVs simultaneously. cnvHap was shown to have more sensitivity in detecting smaller common CNVs with high genotyping accuracy. Due to its population aware mode of model training, it was pragmatic to use this method for CNV-phenotype association in large cohorts which were originally genotyped on bead array chips for SNP GWAS or SNP eQTL studies. All cohorts analysed in our study including for Epilepsy (UK and Australia), UKBEC and NABEC were genotyped on different versions of Illumina bead array chips, hence were subjected to a common CNV pre-processing pipeline. Briefly, for each cohort two intensity measurements, log-R ratio (LRR) and B-allele frequency (BAF) were exported from Illumina genome studio software as final reports. \u003c/p\u003e\n\u003cp\u003eNext, the exported intensity measurements were normalised for GC content and further regressed out from LRR values. Genomic wave effects were removed by fitting a localised loess function. In the main cnvHap analysis, joint CNV-SNP haplotype structure information was incorporated to refine CNV predictions and further based on allele frequencies, expected CNV genotypes were calculated for subsequent association analysis. First, we fine-tuned our CNV analysis pipeline to reproduce known common CNVs in our cohorts. We chose the WWOX intronic deletions as reported by the gnomAD database in the region chr16:78371638-78385000 (Grch 37/hg19) which has a deletion and a multi-CNV with allele frequency of 34% to 54% respectively (\u003cstrong\u003eSupplementary figure 3, 4\u003c/strong\u003e). Missense mutations in exons of WWOX have been reported to be associated with highly pathogenic WOREE syndrome in epilepsy, which usually occurs in young children and has a very poor prognosis. In SANAD we re-discovered this intronic CNV in WWOX as common deletions with an allele frequency of 47% spanning the region chr16: 78373644 - 78384121. Manual inspection of cluster plots of LRR and BAF showed distinct homozygous deletions spanning more than 40 contiguous probes which was the highest amongst all CNVs in all our cohorts. This contiguity of probes was also reflected through significant association with epilepsy drug response phenotype in the LRR based analysis of SANAD (without cnvHap calls). However, we failed to detect similar common CNVs in WWOX in our Australian replication or in the joint meta-analysis (multi-platform integration) of SANAD and Australian cohort through cnvHap. Of note, our replication cohort from Australia had batch effects and around 30% of samples were excluded from our CNV analysis pipeline. Unlike the SANAD cohort the resulting sample size from Australia (N=280) thus had limited statistical power to detect and genotype common CNVs on a broad allele frequency spectrum at genome wide significance level. However, results from the Australian cohort independently and in the meta-analysis with SANAD were able to replicate many of the primary findings and in addition were enriched for genes related to neurological conditions, hence reported in our analysis. For meta-analysis of our Epilepsy and Australian cohorts, we utilised the multi-platform integration feature of cnvHap, where intensity values for each genotyping probes from the UK and Australian cohorts were modelled jointly in the HMM model for CNV genotype predictions. \u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAssociation models\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eOne of the main goals of SNP eQTL studies is enabling better understanding of GWAS loci. Here, in our study design we aimed at emulating this approach by finding new CNV loci for Epilepsy phenotypes in SANAD and Australian cohorts and then leveraging CNV eQTL analysis in normal human brain regions from the UKBEC and NABEC studies for understanding CNV signals for Epilepsy. The CNV eQTL results and gene NMF programs (described below) from the UKBEC and NABEC cohorts can be used as an independent resource for replication and validation of a priori disease hypothesis as well for other neurological diseases. Our association analysis for Epilepsy cohorts, UKBEC and NABEC consisted of six different linear modes implemented in the Multiphen software\u003csup\u003e24\u003c/sup\u003e. These models were based on two approaches of CNVs detection 1) Expected CNV genotypes derived from cvnHap and 2) Log-r ratio based raw intensity measurements (i.e. without using cnvHap and for secondary validation of CNV-phenotype signals). Next, two modes of CNV signals were analysed using 3 linear models: a) standard univariate model (phenotype ~ CNV genotypes /LRR) b) Multiphen joint model (CNV genotypes/LRR ~ Phenotypes) and c) Multiphen joint model with backward variable selection (CNV genotypes/LRR ~ Phenotypes (subset). Of note, in our analysis only the results of Cnv association with epilepsy phenotypes in SANAD is considered to be the main discovery results and all other reported results from Australia, UKBEC and NABEC are meant to be used as replication results with a priori hypothesis. We used 5 LRR principal components and gender as covariates for the SANAD cohort and only gender for the Australian, UKBEC and NABEC studies. The main criteria for choosing parameters for cnvHap and covariates for Multiphen were reproducing allele frequency of common CNVs e.g. matching WWOX deletion frequency from gnomAD CNV results and reproducibility of known genes for neurological diseases.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eNMF gene programs \u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eNon-negative matrix factorization is a popular method in data science for deconvoluting complex data including images and gene expression data. It has been successfully used for finding subtypes of cancer and more recently for finding common gene programs across multiple single cell gene expression data. Here, we have measurements for all human genes and their corresponding exons in 10 brain tissues with an additional set derived from the average expression across all 10 brain tissues. With the aim of finding common exons or genes which are consistently over expressed or under expressed in these 11 sets of gene expression data matrices, we leveraged the NMF algorithm to find meta exons/genes and corresponding meta patients. We performed this analysis using two approaches. First, a gene-by-gene approach where for all 20, 000 human genes we extracted probes within a 5-kbps window around each gene and derived 11 sets of brain tissue expression data. Next, for these matrices we ran the NMF algorithm 50 times for rank 2 to 6. Next, from the output of multiple NMF runs we extracted meta exons or genes using consensus clustering methods as described here\u003csup\u003e53\u003c/sup\u003e. Next, from these results we counted how many times each probe occurs in each gene program for ranks 2 to 6. Here, frequency of these counts may be interpreted as relative importance of exons which might have deeper biological or disease relevance. In the second approach instead of a gene boundary we ran the NMF analysis over a genomic region e.g. chromosome 1p36 region.\u003c/p\u003e"},{"header":"Declarations","content":"\u003cp\u003e\u003cstrong\u003eAcknowledgements\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eWe would like to thank the School of Public Health, Imperial College London and the Imperial College research computing team for their valuable help and support for this study. We thank Dr Doug Speed for statistical advice and input for the SANAD and Australian cohorts. We thank Dr Adaikalavan Ramasamy for providing access to the UKBEC and NABEC gene expression datasets and helping with data interpretation. \u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAdditional information\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eAny additional information required to reanalyse the data reported in this paper is available from the lead\u003c/p\u003e\n\u003cp\u003econtact upon request.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eDeclaration of interests / Competing interests\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eNo external or financial interests to be declared.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eNo organs and tissues were procured from prisoners.\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eCorrespondence\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eFurther information and requests for resources and reagents should be directed to and will be fulfilled by\u003c/p\u003e\n\u003cp\u003ethe lead contact, Dr Tisham De (
[email protected] or
[email protected]).\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eMaterials availability\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eSamples and further materials related to the SANAD study and the Australian Epilepsy cohort can be requested from the lead author or Dr Michael R Johnson, Department of Brain Sciences, Imperial College London, UK.\u003c/p\u003e\n\u003cp\u003eMaterials requests for the UKBEC and NABEC study should be addressed to the respective authors of the original study. \u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e Data Availability\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eUKBEC study data is available on Gene Expression Omnibus as described by the authors of the original study. Microarray CEL files and processed data is under the accession number GSE46706. SNP data is available through dbGAP. All data in our database are publicly available or open source hence do not require any ethical guidelines.\u003c/p\u003e\n\u003cp\u003eAll association results are available on Zenedo with url https://zenodo.org/records/14946834 \u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eCode Availability\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eAll codes used to process and analyse data are published, and the source code is currently available at.\u003c/p\u003e\n\u003cp\u003eMultiphen:https://github.com/lachlancoin/MultiPhen\u003c/p\u003e\n\u003cp\u003ecnvHap:https://www.imperial.ac.uk/people/l.coin\u003c/p\u003e\n\u003cp\u003eThis study did not generate new reagents or software code.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAuthor contributions\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eT.D., L.J.M.C. and M.R.J. were involved in study design, performed analysis, and wrote the manuscript. L.J.M.C and M.R.J. advised and supervised the CNV, transcriptomics and multi phenotype association aspects of this work. T.D conceived and performed the NMF analysis for the UKBEC transcriptomics dataset. All authors contributed to the overall interpretation of results and the discussion section of the manuscript.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\n\u003cli\u003eGenetics, N. \u0026amp; 2023. GWAS meta-analysis of over 29,000 people with epilepsy identifies 26 risk loci and subtype-specific genetic architecture. \u003cem\u003eNat. Genet.\u003c/em\u003e \u003cstrong\u003e55\u003c/strong\u003e, 1471\u0026ndash;1482 (2023).\u003c/li\u003e\n\u003cli\u003eChen, Z., Brodie, M., Liew, D. \u0026amp; Kwan, P. Treatment outcomes in patients with newly diagnosed epilepsy treated with established and new antiepileptic drugs: A 30-year longitudinal cohort study. \u003cem\u003eJAMA Neurol.\u003c/em\u003e \u003cstrong\u003e75\u003c/strong\u003e, 279\u0026ndash;286 (2017).\u003c/li\u003e\n\u003cli\u003eAnnegers, J. F., Hauser, W. A. \u0026amp; Elveback, L. R. Remission of seizures and relapse in patients with epilepsy. \u003cem\u003eEpilepsia\u003c/em\u003e \u003cstrong\u003e20\u003c/strong\u003e, 729\u0026ndash;737 (1979).\u003c/li\u003e\n\u003cli\u003eSpeed, D. \u003cem\u003eet al.\u003c/em\u003e A genome-wide association study and biological pathway analysis of epilepsy prognosis in a prospective cohort of newly treated epilepsy. \u003cem\u003eHum. Mol. Genet.\u003c/em\u003e \u003cstrong\u003e23\u003c/strong\u003e, 247\u0026ndash;258 (2014).\u003c/li\u003e\n\u003cli\u003eSaggi, S. \u003cem\u003eet al.\u003c/em\u003e Surgical outcomes following resection in patients with language dominant posterior quadrant epilepsy. \u003cem\u003eEpilepsy Behav. Rep.\u003c/em\u003e \u003cstrong\u003e27\u003c/strong\u003e, 100695 (2024).\u003c/li\u003e\n\u003cli\u003eTandon, N., Alexopoulos, A. V., Warbel, A., Najm, I. M. \u0026amp; Bingaman, W. E. Occipital epilepsy: spatial categorization and surgical management: Clinical article. \u003cem\u003eJ. Neurosurg.\u003c/em\u003e \u003cstrong\u003e110\u003c/strong\u003e, 306\u0026ndash;318 (2009).\u003c/li\u003e\n\u003cli\u003eLund, C., Brodtkorb, E., \u0026Oslash;ye, A.-M., R\u0026oslash;sby, O. \u0026amp; Selmer, K. K. CHD2 mutations in Lennox-Gastaut syndrome. \u003cem\u003eEpilepsy Behav.\u003c/em\u003e \u003cstrong\u003e33\u003c/strong\u003e, 18\u0026ndash;21 (2014).\u003c/li\u003e\n\u003cli\u003eRamasamy, A. \u003cem\u003eet al.\u003c/em\u003e Genetic variability in the regulation of gene expression in ten regions of the human brain. \u003cem\u003eNat. Neurosci.\u003c/em\u003e \u003cstrong\u003e17\u003c/strong\u003e, 1418\u0026ndash;1428 (2014).\u003c/li\u003e\n\u003cli\u003eHernandez, D. G. \u003cem\u003eet al.\u003c/em\u003e Integration of GWAS SNPs and tissue specific expression profiling reveal discrete eQTLs for human traits in blood and brain. \u003cem\u003eNeurobiol. Dis.\u003c/em\u003e \u003cstrong\u003e47\u003c/strong\u003e, 20\u0026ndash;28 (2012).\u003c/li\u003e\n\u003cli\u003eGibbs, J. R. \u003cem\u003eet al.\u003c/em\u003e Abundant quantitative trait loci exist for DNA methylation and gene expression in human brain. \u003cem\u003ePLoS Genet.\u003c/em\u003e \u003cstrong\u003e6\u003c/strong\u003e, e1000952 (2010).\u003c/li\u003e\n\u003cli\u003eKeller, M. F., Saad, M., Bras, J., Bettella, F. \u0026amp; Nicolaou, N. International Parkinson\u0026rsquo;s disease genomics consortium (IPDGC) wellcome trust case control consortium 2 (WTCCC2), 2012. Using genome-wide \u0026hellip;. \u003cem\u003eHum. Mol. Genet.\u003c/em\u003e\u003c/li\u003e\n\u003cli\u003eMarangi, G. \u003cem\u003eet al.\u003c/em\u003e TRAPPC9-related autosomal recessive intellectual disability: report of a new mutation and clinical phenotype. \u003cem\u003eEur. J. Hum. Genet.\u003c/em\u003e \u003cstrong\u003e21\u003c/strong\u003e, 229\u0026ndash;232 (2013).\u003c/li\u003e\n\u003cli\u003eShapira, S. K. \u003cem\u003eet al.\u003c/em\u003e Chromosome 1p36 deletions: the clinical phenotype and molecular characterization of a common newly delineated syndrome. \u003cem\u003eThe American Journal of Human Genetics\u003c/em\u003e \u003cstrong\u003e61\u003c/strong\u003e, 642\u0026ndash;650 (1997).\u003c/li\u003e\n\u003cli\u003eJordan, V. K., Zaveri, H. P. \u0026amp; Scott, D. A. 1p36 deletion syndrome: an update. \u003cem\u003eAppl. Clin. Genet.\u003c/em\u003e \u003cstrong\u003e8\u003c/strong\u003e, 189\u0026ndash;200 (2015).\u003c/li\u003e\n\u003cli\u003eSerra-Pages, C., Medley, Q. G., Tang, M., Hart, A. \u0026amp; Streuli, M. Liprins, a family of LAR transmembrane protein-tyrosine phosphatase-interacting proteins. \u003cem\u003eJournal of Biological Chemistry\u003c/em\u003e \u003cstrong\u003e273\u003c/strong\u003e, 15611\u0026ndash;15620 (1998).\u003c/li\u003e\n\u003cli\u003eStucchi, R. \u003cem\u003eet al.\u003c/em\u003e Regulation of KIF1A-driven dense core vesicle transport: Ca2+/CaM controls DCV binding and liprin-\u0026alpha;/TANC2 recruits DCVs to postsynaptic sites. \u003cem\u003eCell reports\u003c/em\u003e \u003cstrong\u003e24\u003c/strong\u003e, 685\u0026ndash;700 (2018).\u003c/li\u003e\n\u003cli\u003eBisogno, T. \u003cem\u003eet al.\u003c/em\u003e Cloning of the first sn1-DAG lipases points to the spatial and temporal regulation of endocannabinoid signaling in the brain. \u003cem\u003eJ. Cell Biol.\u003c/em\u003e \u003cstrong\u003e163\u003c/strong\u003e, 463\u0026ndash;468 (2003).\u003c/li\u003e\n\u003cli\u003eShonesy, B. C. \u003cem\u003eet al.\u003c/em\u003e CaMKII regulates diacylglycerol lipase-\u0026alpha; and striatal endocannabinoid signaling. \u003cem\u003eNat. Neurosci.\u003c/em\u003e \u003cstrong\u003e16\u003c/strong\u003e, 456\u0026ndash;463 (2013).\u003c/li\u003e\n\u003cli\u003eOgasawara, D. \u003cem\u003eet al.\u003c/em\u003e Rapid and profound rewiring of brain lipid signaling networks by acute diacylglycerol lipase inhibition. \u003cem\u003eProc. Natl. Acad. Sci. U. S. A.\u003c/em\u003e \u003cstrong\u003e113\u003c/strong\u003e, 26\u0026ndash;33 (2016).\u003c/li\u003e\n\u003cli\u003eBanne, E. \u003cem\u003eet al.\u003c/em\u003e Neurological disorders associated with WWOX germline mutations\u0026mdash;A comprehensive overview. \u003cem\u003eCells\u003c/em\u003e \u003cstrong\u003e10\u003c/strong\u003e, (2021).\u003c/li\u003e\n\u003cli\u003eSuzuki, H. \u003cem\u003eet al.\u003c/em\u003e A spontaneous mutation of the Wwox gene and audiogenic seizures in rats with lethal dwarfism and epilepsy. \u003cem\u003eGenes Brain Behav.\u003c/em\u003e \u003cstrong\u003e8\u003c/strong\u003e, 650\u0026ndash;660 (2009).\u003c/li\u003e\n\u003cli\u003eRepudi, S. \u003cem\u003eet al.\u003c/em\u003e Neuronal deletion of Wwox, associated with WOREE syndrome, causes epilepsy and myelin defects. \u003cem\u003eBrain : a journal of neurology\u003c/em\u003e (2021) doi:10.1093/brain/awab174.\u003c/li\u003e\n\u003cli\u003eCoin, L. J. M., Asher, J. E., Walters, R. G. \u0026amp; Moustafa, J. cnvHap: an integrative population and haplotype\u0026ndash;based multiplatform model of SNPs and CNVs. \u003cem\u003eNature\u003c/em\u003e (2010).\u003c/li\u003e\n\u003cli\u003eO\u0026rsquo;Reilly, P. F. \u003cem\u003eet al.\u003c/em\u003e MultiPhen: joint model of multiple phenotypes can increase discovery in GWAS. \u003cem\u003ePLoS One\u003c/em\u003e \u003cstrong\u003e7\u003c/strong\u003e, e34861 (2012).\u003c/li\u003e\n\u003cli\u003evan Karnebeek, C. D. M. \u003cem\u003eet al.\u003c/em\u003e NANS-mediated synthesis of sialic acid is required for brain and skeletal development. \u003cem\u003eNat. Genet.\u003c/em\u003e \u003cstrong\u003e48\u003c/strong\u003e, 777\u0026ndash;784 (2016).\u003c/li\u003e\n\u003cli\u003eWang, B. \u0026amp; Brand-Miller, J. The role and potential of sialic acid in human nutrition. \u003cem\u003eEur. J. Clin. Nutr.\u003c/em\u003e \u003cstrong\u003e57\u003c/strong\u003e, 1351\u0026ndash;1369 (2003).\u003c/li\u003e\n\u003cli\u003eShen, S.-M. \u003cem\u003eet al.\u003c/em\u003e Downregulation of ANP32B, a novel substrate of caspase-3, enhances caspase-3 activation and apoptosis induction in myeloid leukemic cells. \u003cem\u003eCarcinogenesis\u003c/em\u003e \u003cstrong\u003e31\u003c/strong\u003e, 419\u0026ndash;426 (2010).\u003c/li\u003e\n\u003cli\u003eMunemasa, Y. \u003cem\u003eet al.\u003c/em\u003e Promoter region-specific histone incorporation by the novel histone chaperone ANP32B and DNA-binding factor KLF5. \u003cem\u003eMol. Cell. Biol.\u003c/em\u003e \u003cstrong\u003e28\u003c/strong\u003e, 1171\u0026ndash;1181 (2008).\u003c/li\u003e\n\u003cli\u003eEuroEPINOMICS-RES Consortium, Epilepsy Phenome/Genome Project \u0026amp; Epi4K Consortium. De novo mutations in synaptic transmission genes including DNM1 cause epileptic encephalopathies. \u003cem\u003eAm. J. Hum. Genet.\u003c/em\u003e \u003cstrong\u003e95\u003c/strong\u003e, 360\u0026ndash;370 (2014).\u003c/li\u003e\n\u003cli\u003eMontanucci, L. \u003cem\u003eet al.\u003c/em\u003e Genome-wide identification and phenotypic characterization of seizure-associated copy number variations in 741,075 individuals. \u003cem\u003eNat. Commun.\u003c/em\u003e \u003cstrong\u003e14\u003c/strong\u003e, 4392 (2023).\u003c/li\u003e\n\u003cli\u003ePetrovski, S. \u003cem\u003eet al.\u003c/em\u003e Germline DE Novo mutations in GNB1 cause severe neurodevelopmental disability, hypotonia, and seizures. \u003cem\u003eAm. J. Hum. Genet.\u003c/em\u003e \u003cstrong\u003e98\u003c/strong\u003e, 1001\u0026ndash;1010 (2016).\u003c/li\u003e\n\u003cli\u003eDe, T. \u003cem\u003eet al.\u003c/em\u003e Signatures of TSPAN8 variants associated with human metabolic regulation and diseases. \u003cem\u003eiScience\u003c/em\u003e \u003cstrong\u003e24\u003c/strong\u003e, 102893 (2021).\u003c/li\u003e\n\u003cli\u003eDe, T., Coin, L., Herberg, J., Johnson, M. \u0026amp; Jarvelin, M.-R. Plasma metabolomic signatures for copy number variants and COVID-19 risk loci in Northern Finland Populations. (2024).\u003c/li\u003e\n\u003cli\u003eTsokas, P. \u003cem\u003eet al.\u003c/em\u003e Compensation for PKM\u0026zeta; in long-term potentiation and spatial long-term memory in mutant mice. \u003cem\u003eElife\u003c/em\u003e \u003cstrong\u003e5\u003c/strong\u003e, (2016).\u003c/li\u003e\n\u003cli\u003eSacktor, T. PKMzeta, LTP maintenance, and the dynamic molecular biology of memory storage. \u003cem\u003eProg. Brain Res.\u003c/em\u003e \u003cstrong\u003e169\u003c/strong\u003e, 27\u0026ndash;40 (2008).\u003c/li\u003e\n\u003cli\u003eAzuma, M., Toyama, R., Laver, E. \u0026amp; Dawid, I. B. Perturbation of rRNA synthesis in the bap28 mutation leads to apoptosis mediated by p53 in the zebrafish central nervous system. \u003cem\u003eJ. Biol. Chem.\u003c/em\u003e \u003cstrong\u003e281\u003c/strong\u003e, 13309\u0026ndash;13316 (2006).\u003c/li\u003e\n\u003cli\u003eDiaz, L. R. \u003cem\u003eet al.\u003c/em\u003e Ribogenesis boosts controlled by HEATR1-MYC interplay promote transition into brain tumour growth. \u003cem\u003eEMBO Rep.\u003c/em\u003e \u003cstrong\u003e25\u003c/strong\u003e, 168\u0026ndash;197 (2024).\u003c/li\u003e\n\u003cli\u003eAltmann, H. M. \u003cem\u003eet al.\u003c/em\u003e Homozygous/compound heterozygous triadin mutations associated with autosomal-recessive long-QT syndrome and pediatric sudden cardiac arrest: Elucidation of the triadin knockout syndrome: Elucidation of the triadin knockout syndrome. \u003cem\u003eCirculation\u003c/em\u003e \u003cstrong\u003e131\u003c/strong\u003e, 2051\u0026ndash;2060 (2015).\u003c/li\u003e\n\u003cli\u003eRossi, D. \u003cem\u003eet al.\u003c/em\u003e A novel homozygous mutation in the TRDN gene causes a severe form of pediatric malignant ventricular arrhythmia. \u003cem\u003eHeart Rhythm\u003c/em\u003e \u003cstrong\u003e17\u003c/strong\u003e, 296\u0026ndash;304 (2020).\u003c/li\u003e\n\u003cli\u003eRooryck, C. \u003cem\u003eet al.\u003c/em\u003e New family with catecholaminergic polymorphic ventricular tachycardia linked to the Triadin gene: Sudden death linked to the triadin gene. \u003cem\u003eJ. Cardiovasc. Electrophysiol.\u003c/em\u003e \u003cstrong\u003e26\u003c/strong\u003e, 1146\u0026ndash;1150 (2015).\u003c/li\u003e\n\u003cli\u003eRoux-Buisson, N. \u003cem\u003eet al.\u003c/em\u003e Absence of triadin, a protein of the calcium release complex, is responsible for cardiac arrhythmia with sudden death in human. \u003cem\u003eHum. Mol. Genet.\u003c/em\u003e \u003cstrong\u003e21\u003c/strong\u003e, 2759\u0026ndash;2767 (2012).\u003c/li\u003e\n\u003cli\u003eTong, D. \u003cem\u003eet al.\u003c/em\u003e The critical role of ASD-related gene CNTNAP3 in regulating synaptic. \u003cem\u003escholar.archive.org\u003c/em\u003e.\u003c/li\u003e\n\u003cli\u003eAgarwala, S. \u0026amp; Ramachandra, N. B. Role of CNTNAP2 in autism manifestation outlines the regulation of signaling between neurons at the synapse. \u003cem\u003eEgypt. J. Med. Hum. Genet.\u003c/em\u003e \u003cstrong\u003e22\u003c/strong\u003e, (2021).\u003c/li\u003e\n\u003cli\u003eTong, D.-L. \u003cem\u003eet al.\u003c/em\u003e The critical role of ASD-related gene CNTNAP3 in regulating synaptic development and social behavior in mice. \u003cem\u003ebioRxiv\u003c/em\u003e (2018) doi:10.1101/260083.\u003c/li\u003e\n\u003cli\u003eM\u0026oslash;ller, R. S. \u003cem\u003eet al.\u003c/em\u003e Mutations in GABRB3: From febrile seizures to epileptic encephalopathies. \u003cem\u003eNeurology\u003c/em\u003e \u003cstrong\u003e88\u003c/strong\u003e, 483\u0026ndash;492 (2017).\u003c/li\u003e\n\u003cli\u003eChen, C.-H. \u003cem\u003eet al.\u003c/em\u003e Genetic analysis of GABRB3 as a candidate gene of autism spectrum disorders. \u003cem\u003eMol. Autism\u003c/em\u003e \u003cstrong\u003e5\u003c/strong\u003e, 36 (2014).\u003c/li\u003e\n\u003cli\u003eTanaka, M., DeLorey, T. M., Delgado-Escueta, A. \u0026amp; Olsen, R. W. GABRB3, epilepsy, and neurodevelopment. (2012).\u003c/li\u003e\n\u003cli\u003eKim, H., Kang, K. \u0026amp; Kim, J. AEBP2 as a potential targeting protein for Polycomb Repression Complex PRC2. \u003cem\u003eNucleic Acids Res.\u003c/em\u003e (2009).\u003c/li\u003e\n\u003cli\u003eKim, H., Kang, K., Ekram, M. B., Roh, T.-Y. \u0026amp; Kim, J. Aebp2 as an epigenetic regulator for neural crest cells. \u003cem\u003ePLoS One\u003c/em\u003e \u003cstrong\u003e6\u003c/strong\u003e, e25174 (2011).\u003c/li\u003e\n\u003cli\u003eChen, S., Jiao, L., Liu, X., Yang, X. \u0026amp; Liu, X. A dimeric structural scaffold for PRC2-PCL targeting to CpG island chromatin. \u003cem\u003eMol. Cell\u003c/em\u003e \u003cstrong\u003e77\u003c/strong\u003e, 1265\u0026ndash;1278.e7 (2020).\u003c/li\u003e\n\u003cli\u003eKang, H. J. \u003cem\u003eet al.\u003c/em\u003e Spatio-temporal transcriptome of the human brain. \u003cem\u003eNature\u003c/em\u003e \u003cstrong\u003e478\u003c/strong\u003e, 483\u0026ndash;489 (2011).\u003c/li\u003e\n\u003cli\u003eMehta, D. \u003cem\u003eet al.\u003c/em\u003e Comprehensive survey of CNVs influencing gene expression in the human brain and its implications for pathophysiology. \u003cem\u003eNeurosci. Res.\u003c/em\u003e \u003cstrong\u003e79\u003c/strong\u003e, 22\u0026ndash;33 (2014).\u003c/li\u003e\n\u003cli\u003eKim, H. \u0026amp; Park, H. Sparse non-negative matrix factorizations via alternating non-negativity-constrained least squares for microarray data analysis. \u003cem\u003eBioinformatics\u003c/em\u003e \u003cstrong\u003e23\u003c/strong\u003e, 1495\u0026ndash;1502 (2007).\u003c/li\u003e\n\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":true,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"scientific-reports","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"scirep","sideBox":"Learn more about [Scientific Reports](http://www.nature.com/srep/)","snPcode":"","submissionUrl":"","title":"Scientific Reports","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"stoa","reportingPortfolio":"Scientific Reports","inReviewEnabled":true,"inReviewRevisionsEnabled":true},"keywords":"","lastPublishedDoi":"10.21203/rs.3.rs-6130694/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-6130694/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eEpilepsy and seizures are one of the most common neurological conditions which often manifest with complex symptoms. Several studies including large scale GWAS and exome studies have reported a comprehensive catalog of genes related to Epilepsy. Similarly, there exists several successful studies elucidating the role of SNP QTLs in the normal human brain. Here, as one of few studies in current literature we have explored and reported the dosage effect of small to intermediate length CNVs in two Epilepsy cohorts characterized for phenotypes such as seizure counts, seizure frequency and remission to anti-epilepsy drugs. In addition, we have performed comprehensive CNV QTL analysis in ten regions of the human brain (normal) from the UKBEC study. We leveraged all analyses to decipher new genes for Epilepsy phenotypes such as seizure frequency and further uncovered genetic controls of neurotransmitters such as serotonin, dopamine and signaling molecules like GPCRs. Importantly we observed and have reported clustering of CNV QTL signals in specific regions of the genome such as the chromosome 1p36 proband containing the GNB1 gene or the chromosome 9q22 proband containing NANS. This observed phenomenon of clustering of association signals was further corroborated by our non-negative matrix factorization (NMF) analysis of UKBEC gene expression data. To conclude our results here successfully describe in detail the dosage effect of CNVs for Epilepsy seizures and further elucidates its role in the genomic architecture of gene expression in various regions of the human brain.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e","manuscriptTitle":"Dosage effect of Copy Number Variation in Epilepsy and ten regions of the human brain ","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-04-15 12:47:52","doi":"10.21203/rs.3.rs-6130694/v1","editorialEvents":[{"type":"communityComments","content":0},{"type":"decision","content":"Revision requested","date":"2025-06-24T10:35:42+00:00","index":"","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2025-06-16T03:14:09+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"317730465414891514976660643563004237783","date":"2025-06-05T20:16:08+00:00","index":"hide","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2025-04-13T03:19:59+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"28224303121028865638274872013008285719","date":"2025-04-08T14:46:22+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"299161620849846950898440802451465686386","date":"2025-03-30T00:45:48+00:00","index":"hide","fulltext":""},{"type":"reviewersInvited","content":"","date":"2025-03-27T21:07:24+00:00","index":"","fulltext":""},{"type":"editorAssigned","content":"","date":"2025-03-24T13:07:38+00:00","index":"","fulltext":""},{"type":"editorInvited","content":"","date":"2025-03-24T06:38:36+00:00","index":"","fulltext":""},{"type":"checksComplete","content":"","date":"2025-03-20T11:30:56+00:00","index":"","fulltext":""},{"type":"submitted","content":"Scientific Reports","date":"2025-02-28T17:42:30+00:00","index":"","fulltext":""}],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"scientific-reports","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"scirep","sideBox":"Learn more about [Scientific Reports](http://www.nature.com/srep/)","snPcode":"","submissionUrl":"","title":"Scientific Reports","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"stoa","reportingPortfolio":"Scientific Reports","inReviewEnabled":true,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"a5e3ab35-6a5d-4b7e-9846-e7b0b7bfdf0c","owner":[],"postedDate":"April 15th, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"published-in-journal","subjectAreas":[{"id":47078017,"name":"Biological sciences/Genetics"},{"id":47078018,"name":"Biological sciences/Neuroscience"}],"tags":[],"updatedAt":"2025-12-08T16:01:43+00:00","versionOfRecord":{"articleIdentity":"rs-6130694","link":"https://doi.org/10.1038/s41598-025-28338-2","journal":{"identity":"scientific-reports","isVorOnly":false,"title":"Scientific Reports"},"publishedOn":"2025-12-04 15:57:47","publishedOnDateReadable":"December 4th, 2025"},"versionCreatedAt":"2025-04-15 12:47:52","video":"","vorDoi":"10.1038/s41598-025-28338-2","vorDoiUrl":"https://doi.org/10.1038/s41598-025-28338-2","workflowStages":[]},"version":"v1","identity":"rs-6130694","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-6130694","identity":"rs-6130694","version":["v1"]},"buildId":"8U1c8b4HqxoKbykW_rLl7","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.