{"paper_id":"f2f8e1b4-06cf-4fcb-b6b5-61a3053b46aa","body_text":"1 \nDissecting the contribution of common variants to risk of rare \nneurodevelopmental conditions \n \nQin Qin Huang†,1, Emilie M Wigdor†,1, Patrick Campbell1,2, Daniel S Malawsky1, Kaitlin E \nSamocha3,4, V Kartik Chundru1,5, Petr Danecek1, Sarah Lindsay1, Thomas Marchant1, \nMahmoud Koko Musa1, Sana Amanat1, Davide Bonifanti1, Eamonn Sheridan1,6,7, Elizabeth J \nRadford1,8, Jeffrey C Barrett1, Caroline F Wright5, Helen V Firth1,9, Varun Warrier10,11, \nAlexander Strudwick Young12,13, Matthew E Hurles1, Hilary C Martin*,1 \n  \n1. Human Genetics, Wellcome Trust Sanger Institute, Hinxton, UK \n2. Department of Medical and Molecular Genetics, King's College London, London, UK \n3. Center for Genomic Medicine, Massachusetts General Hospital, Boston, USA \n4. Broad Institute of MIT and Harvard, Cambridge, USA \n5. Institute of Biomedical and Clinical Science, University of Exeter, Exeter, UK \n6. Leeds Institute of Medical Research, University of Leeds, St. James’s University Hospital, UK \n7. Yorkshire Regional Genetics Service, Chapel Allerton Hospital, Leeds, UK \n8. Department of Paediatrics, University of Cambridge, Cambridge Biomedical Campus, Cambridge, UK \n9. Cambridge University Hospitals Foundation Trust, Addenbrooke’s Hospital, Cambridge, UK \n10. Department of Psychiatry, University of Cambridge, Cambridge, UK \n11. Department of Psychology, University of Cambridge, Cambridge, UK \n12. University of California Los Angeles (UCLA) Anderson School of Management, Los Angeles, CA, \nUSA \n13. Human Genetics Department, UCLA David Geffen School of Medicine, Los Angeles, CA, USA \n \n✝ These authors contributed equally to this work and their names are listed alphabetically. \nBoth authors are entitled to list their name first on their CV. \n*Correspondence to hcm@sanger.ac.uk. \nAbstract \nAlthough rare neurodevelopmental conditions have a large Mendelian component, common \ngenetic variants also contribute to risk. However, little is known about how this polygenic risk is \ndistributed among patients with these conditions and their parents, its  interplay with rare \nvariants, and whether parents’ polygenic background contributes to their children’s risk beyond \nthe direct effect of variants transmitted to the child (i.e. via indirect genetic effects potentially \nmediated through the prenatal environ ment or ‘genetic nurture’). Here, we addressed these \nquestions using genetic data from 11,573 patients with rare neurodevelopmental conditions, \n9,128 of their parents and 26,869 controls. Common variants explained ~10% of variance in \noverall risk. Patients with a monogenic diagnosis had significantly less polygenic risk than those \nwithout, supporting a liability threshold model, while both genetically undiagnosed patients and \ndiagnosed patients with affected parents had significantly more risk than controls. In a trio-based \nmodel, using a polygenic score for neurodevelopmental conditions, the transmitted but not the \nnon-transmitted parental alleles were associated with risk, indicating a direct genetic effect. In \ncontrast, we observed no direct genetic effec t of polygenic scores for educational attainment \nand cognitive performance, but saw a significant correlation between the child’s risk and non -\ntransmitted alleles in the parents, potentially due to indirect genetic effects and/or parental \nassortment for th ese traits. Indeed, as expected under parental assortment, we show that \n . CC-BY 4.0 International licenseIt is made available under a \n is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)\nThe copyright holder for this preprintthis version posted March 6, 2024. ; https://doi.org/10.1101/2024.03.05.24303772doi: medRxiv preprint \nNOTE: This preprint reports new research that has not been certified by peer review and should not be used to guide clinical practice.\n\n \n2 \ncommon variant predisposition for neurodevelopmental conditions is correlated with the rare \nvariant component of risk. Our findings thus suggest that future studies should investigate the \npossible role and nature of indirect genetic effects on rare neurodevelopmental conditions, and \nconsider the contribution of common and rare variants simultaneously when studying cognition-\nrelated phenotypes. \nMain \nRare conditions affect 3.5%-6% of the global population1 and of these, the majority involve the \ncentral nervous system 2. While genomic sequencing has revolutionized the diagnosis of rare \nneurodevelopmental conditions, which typically include intellectual disability and/or \ndevelopmental delay, a monogenic diagnosis is only identified for about 30 -40% of patients3–5. \nCommon variants also contribute to risk for rare neurodevelopmental conditions6,7. In particular, \nthis common variant contribution overlaps with  polygenic risk for schizophrenia and for \npredisposition to reduced educational attainment and cognitive performance6. Accordingly, rare \ndamaging variants in constrained genes, which play a major role in risk of rare \nneurodevelopmental conditions, are also associated with reduced educational attainment and \ncognitive performance and increased risk of mental health conditions in UK Biobank 8–12. In this \nwork, we seek to better understand the nature of common variant risk for rare \nneurodevelopmental conditions , its interplay with rare variants and its distribution amongst \ndifferent patients and their parents.  \n \nWe begin by leveraging new, larger genome -wide association studies (GWASs) than were \npreviously available 6 to explore the extent to which common variant effects on rare \nneurodevelopmental conditions are correlated with their effects on a broad range of mental \nhealth conditions. This is motivated by findings that some psychiatric conditions have a partial \nneurodevelopmental origin 13–15, and that people with rare neurodevelopmental conditions16, as \nwell as their relatives 17–19, are more likely to have psychiatric conditions. Furthermore, some of \nthis overlap appears to be driven by certain rare copy number variants with variable \nexpressivity20–22, suggesting some shared etiology between psychiatric and rare \nneurodevelopmental conditions. Here we explore whether shared common variant effects may \nalso contribute, and whether this is independent of the genetic overlap between these conditions \nand cognitive traits. \n \nLittle is known about the interplay between rare and common variants in the context of rare \nneurodevelopmental conditions, and dissecting this will be key to fully understanding their \ngenetic architecture and improving genetic diagnosis and risk prediction . Here we address two \nhypotheses in this space, testing the liability threshold model and whether common variants \nmodify the penetrance of rare variants. The liability threshold model predicts that an individual \nwill develop a condition once the sum of ind ependent genetic and environmental risk factors \nexceeds some threshold 23–26. Under this model, one might expect that patients with \nneurodevelopmental conditions who have a highly penetrant damaging variant would require, \non average, less polygenic load to cross a diagnostic threshold than those without such variants \n(Extended Data Figure 1 ). We previously saw no significant difference in polygenic scores \nbetween patients with versus without a monogenic diagnosis 6, but in this work, we anticipated \nthat increased sample size and improved diagnostic rate 5,27 might improve power. Since rare \nvariants associated with neurodevelopmental conditions appear to act additively with polygenic \n . CC-BY 4.0 International licenseIt is made available under a \n is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)\nThe copyright holder for this preprintthis version posted March 6, 2024. ; https://doi.org/10.1101/2024.03.05.24303772doi: medRxiv preprint \n\n \n3 \nscores in affecting cognitive ability in UK Biobank 9,10, we hypothesized that polygenic \nbackground would modify the penetrance of these inherited rare variants in families with \nneurodevelopmental conditions, as it does, for example, in the context of BRCA1/2 variants \npredisposing to breast cancer28.  \n \nFinally, we explore the extent to which common variants predisposing to rare \nneurodevelopmental conditions act directly on the affected individuals carrying them (“direct \ngenetic effects”). Many studies have shown that genetic associations between common genetic \nvariants and educational and cognitive phenotypes shrink when estimated within families 29–33. \nOne possible explanation for this is that variants associated with these traits have indirect \ngenetic effects, i.e. they have some effect on the parents, and this then affects the offspring \nthrough the family environment 29,33–35. These indirect genetic effects are under-explored in the \ncontext of rare diseases, but we hypothesized that they may play a role in rare \nneurodevelopmental conditions given the genetic overlap with educational attainment. \n \nWe address these questions using two large UK -based cohorts of individuals with rare \nneurodevelopmental conditions, the Deciphering Developmental Disorders study (DDD; \nN=7,955 patients with genotype array and exome sequence data) and the Genomics England \n100,000 Genomes project (GEL; N=3,618 patients with genome sequence data), combined with \nseveral control cohorts ( Supplementary Table 1 ). We have included a “Frequently Asked \nQuestions” document in less technical language to explain the study, and to addre ss some \npossible misunderstandings. \nResults \nGWAS meta -analysis for neurodevelopmental conditions reveals novel \ngenetic correlations with other brain-related traits and conditions \nWe first sought to replicate the key findings from our previous GWAS for neurodevelopmental \nconditions6 in a large independent cohort. We identified a subset of GEL rare disease families \nwith neurodevelopmental conditions and removed families overlapping with the DDD study \n(Methods). Almost all probands with neurodevelopmental conditions in GEL (97%) had \nintellectual disability or global developmental delay, versus 88% of those in DDD. The cohorts \nwere broadly phenotypically similar ( Extended Data Figure 2 ; Supplementary Note 1 ). To \navoid spurious results due to population stratification, all genetic analyses were conducted in a \ngenetically homogeneous subset of individuals with genetic similarity to British individuals from \nthe 1000 Genomes Project 36, henceforth referred to as having GBR ancestry. \n \nWhen comparing 3,618 unrelated patients with neurodevelopmental conditions to 13,667 \nunrelated controls within GEL, polygenic scores (PGSs) for educational attainment 37, cognitive \nperformance37, and schizophrenia38 each explained a significant but small amount of variance \non the liability scale (<1%; logistic regression p<3.9x10 -4). This was similar to that observed \nwhen comparing 6,397 unrelated patients from DDD with 9,270 independent unrelated controls \n(Supplementary Table 2 ). The polygenic score for neu rodevelopmental conditions derived \nfrom our GWAS in DDD 6 was also associated with neurodevelopmental conditions within GEL \n(p=1.1x10-6, R2=0.11% on the liability scale; Supplementary Table 2). \n \n . CC-BY 4.0 International licenseIt is made available under a \n is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)\nThe copyright holder for this preprintthis version posted March 6, 2024. ; https://doi.org/10.1101/2024.03.05.24303772doi: medRxiv preprint \n\n \n4 \nThese results suggested that the polygenic contribution to rare neurodevelopmental conditions \nwas similar between these two cohorts. Thus, to increase power to study common variant effects \non these conditions, we conducted a GWAS in GEL, then meta -analyzed the results with the \nDDD GWAS (Extended Data Figure 3; Supplementary Data 1, 2 and 3). No single nucleotide \npolymorphism (SNP) passed genome-wide significance (p<5x10-8) in either DDD or GEL alone, \nbut in the meta-analysis, six SNPs were significant in two independent loci on chromosomes 15 \nand 22, respectively (lead SNPs: rs113446150, p=4.0x10 -8; rs2284084, p=1.7x10 -8; \nSupplementary Note 2). Variants at one of these loci are associated with cognitive traits 39,40. \nThe fraction of phenotypic variance explained by genome -wide common variants - the SNP \nheritability - was estimated at between 3.7% (95% CI: 1.7 –5.7%) and 11.2% (8.5 –13.8%), \ndepending on the method used (Supplementary Table 3).  \n \nTo test for possible shared genetic contributors to rare neurodevelopmental conditions and other \nbrain-related traits and conditions, we calculated genetic correlations ( rg) between them using \nour own and published GWAS meta -analyses. We observed the expected negative genetic \ncorrelations between neurodevelopmental conditions and educational attainment 37 (EA; rg=-\n0.65 [ -0.84, -0.47], p=4.9x10 -12) and cognitive performance 37 (CP; rg=-0.56 [ -0.73, -0.39], \np=1.6x10-10), stronger in magnitude than those obser ved with the DDD GWAS alone, and a \npositive genetic correlation with schizophrenia38 (SCZ; rg=0.27 [0.13, 0.40], p=9.7x10-5) (Figure \n1A; Supplementary Table 4 ). Additionally, we detected significant genetic correlations \n(p<0.0038=0.05/13 traits) with several other mental health conditions including Attention-Deficit \nHyperactive Disorder (ADHD) 41 (rg=0.46 [0.28, 0.64], p=5.2x10 -7), and with the non -cognitive \ncomponent of educational attainment derived from GWAS -by-subtraction (NonCogEA) 42 (rg=-\n0.37 [-0.52, -0.22], p=1.2x10-6) (Figure 1A). We hypothesized that the genetic correlations with \nmental health conditions could be explained at least in part by their relationship with educational \nattainment42,43, given the strong negative genetic correlation between that and \nneurodevelopmental conditions. To explore this, we used Genomic Structural Equation \nModelling 44 (GenomicSEM) to re -estimate the genetic correlations while conditioning on the \neducational attainment GWAS summary statistics (Figure 1B). This significantly attenuated the \ngenetic correlation with ADHD ( rg=0.14 [ -0.06, 0.34], p=0.18; two -sided z -test p=0.021 \ncompared to unconditional rg), but the genetic correlations with the other conditions did not \nsignificantly change.  \n \nThese results confirmed that common variants collectively associate with rare \nneurodevelopmental conditions in two independent cohorts, and that these common variant \neffects are shared with other brain-related conditions and cognitive traits. To further explore the \ncontribution of polygenic background, below we used polygenic scores for neurodevelopmental \nconditions from the DDD-derived GWAS6 (PGSNDC,DDD) and for the most significantly genetically \ncorrelated traits (PGSEA, PGSCP, PGSNonCogEA, PGSSCZ) for which much larger GWASs and thus \nmore powerful polygenic scores are available. All polygenic scores were corrected for principal \ncomponents and standardized such that the controls from GEL and the UK Household \nLongitudinal Study have mean 0 and variance 1, except PGSNDC,DDD which was standardized in \nthe GEL controls alone. We note that several of these polygenic scores are significantly \ncorrelated with each other (Supplementary Figure 1), so, in the analyses below, our correction \nfor multiples of five tests is conservative.  \n \n . CC-BY 4.0 International licenseIt is made available under a \n is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)\nThe copyright holder for this preprintthis version posted March 6, 2024. ; https://doi.org/10.1101/2024.03.05.24303772doi: medRxiv preprint \n\n \n5 \nFigure 1 \nGenetic correlations between neurodevelopmental conditions (NDCs) and other brain -related \ntraits and conditions. A) shows the estimates from Linkage Disequilibrium Score Regression for the \nDDD GWAS (orange) and the meta -analysis of neurodevelopmental conditions between DDD and GEL \n(blue). B) shows the estimates for the meta-analysis after conditioning on the GWAS summary statistics \nfor educational attainment (green) or cognitive performance (purple) using GenomicSEM. Error bars show \n95% confidence intervals.  \nProbands with monogenic diagnoses have less polygenic risk \nSince 36% of patients in these cohorts have a molecular monogenic diagnosis (including de \nnovo, recessive, X -linked or inherited dominant diagnoses), we next tested whether these \ndiagnosed patients differed from undiagnosed patients in terms of their polygenic risk. \nConsistent with the liability threshold model ( Extended Data Figure 1 ), we observed \n . CC-BY 4.0 International licenseIt is made available under a \n is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)\nThe copyright holder for this preprintthis version posted March 6, 2024. ; https://doi.org/10.1101/2024.03.05.24303772doi: medRxiv preprint \n\n \n6 \nsignificantly higher PGSEA (DDD and GEL combined; average difference Δ=0.12 SD, two-sided \nt-test p=3.0x10-9), PGSCP (Δ=0.068 SD, p=1.2x10 -3), and PGSNonCogEA (Δ=0.085 SD, p=3.7x10-\n5) in probands with versus without a monogenic diagnosis (all passing Bonferroni significance \ni.e. p<0.05/5; Figure 2A). Despite this, we observed that for all polygenic scores except for \nPGSNonCogEA, the diagnosed probands still had significantly more polygenic risk than the controls \n(p<0.05/5; Figure 2A ; Supplementary Table 5 ). Sensitivity analyses suggest that this \nobservation is not driven by ascertainment bias in the controls, although the effect size is \nsensitive to the choice o f control cohort, particularly for PGS EA (Supplementary Note 3 , \nExtended Data Figure 4, Extended Data Figure 5, Supplementary Table 6). The difference \nbetween the diagnosed probands and controls is driven by those with affected parents (i.e. \nthose reported by clinicians to show a similar phenotype to their child), who had significantly \nmore polygenic risk for several traits than those with unaffected parents (e.g. PGSEA Δ=0.26 SD, \np=3.4x10-3) ( Extended Data Figure 4 ; Supplementary Table 5 ). However, amongst  \nundiagnosed probands, both those with affected parents and with unaffected parents showed \nsignificantly more polygenic risk than controls (Extended Data Figure 4, Supplementary Table \n7). \n \nWe next explored whether the difference in polygenic risk between diagnosed and undiagnosed \nprobands was related to various technical, clinical and prenatal factors that are associated with \nreceiving a monogenic diagnosis in DDD5 (Figure 2B). For example, diagnosed probands were \nmore likely to be in a trio (probably due to the ability to distinguish de novo  from inherited \nvariants) and to have severe intellectual disability, and less likely to have been born prematurely \n(a known epidemiological risk factor for neurodevelopmental conditions 45–47) (Supplementary \nTable 8 ). We hypothesized that some of these associations might be confounding, or be \nconfounded by the association between PGS EA and diagnostic status, since, for example, \nsingle-parent households and premature birth are associated with higher levels of \ndeprivation/lower parental educational attainment 48,49. Indeed, we observed that the probands’ \nPGSEA was significantly associated with several of these factors ( Figure 2C): a higher chance \nof being in a trio and having more severe intellectual disability, and a lower chance of being born \nprematurely and having any affected first-degree relatives (Extended Data Figure 6). However, \nit was not associated with sex (Supplementary Note 4; Extended Data Figure 7) or maternal \ndiabetes ( Figure 2C ; Supplementary Table 8 ). Controlling for PGS EA minimally altered the \nassociation between these factors and diagnostic status ( Figure 2B). Only a small part of the \nassociation between PGS EA and diagnostic status was mediat ed by the effects of trio status \n(11%, 95% CI: 6.2 –20.8%) and prematurity (3.1%, 95% CI: 0.4 –7.3%). Thus, the observation \nthat diagnosed patients tend to have lower polygenic risk than undiagnosed probably largely \nreflects the liability threshold model und er which both common and rare variants contribute to \nrisk (Extended Data Figure 1). \n \n . CC-BY 4.0 International licenseIt is made available under a \n is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)\nThe copyright holder for this preprintthis version posted March 6, 2024. ; https://doi.org/10.1101/2024.03.05.24303772doi: medRxiv preprint \n\n \n7 \nFigure 2  \n \nDisentangling polygenic score associations with diagnostic status. A) Average polygenic scores in \nprobands with (“diagnosed”; N=3,821; dark blue) versus without (“undiagnosed; N=6,345; red) a \nmonogenic diagnosis, from DDD and GEL combined. Subsets of diagnosed probands from trios are in \nlight blue. The polygenic scores have been standardized such that the controls (UK Household \nLongitudinal Study+GEL for all polygenic scores except PGSNDD,DDD, for which only GEL controls were \nused) have mean 0 and variance 1. See Supplementary Table 5 for results of statistical tests comparing \nthe various groups. See also Extended Data Figure 4 . B) Associations between var ious factors and \ndiagnostic status within the full DDD cohort 5, with or without correcting for the proband’s PGS EA, \ncalculated within GBR -ancestry probands with neurodevelopmental conditions from DDD using logistic \nregression (see Supplementary Table 8). C) Associations between these factors and DDD probands’, \nmothers’ or fathers’ PGS EA, assessed via linear regression. Two asterisks indicate that the association \npassed Bonferroni correction for seven factors. Error bars show 95% confidence intervals. FROH: the \nfraction of the genome in runs of homozygosity; the expected value is 0.0625 for individuals whose \nparents are first cousins. EA: educational attainment; CP: cognitive performance; NonCogEA: the non -\ncognitive component of EA 42; SCZ: schizophrenia; NDC,DDD: neurodevelopmental conditions, with the \nGWAS conducted in DDD versus the UK Household Longitudinal Study, and the polygenic score tested \nonly in samples excluded from the GWAS (GEL and DDD Omni chip).  \nLimited evidence for over-transmission of polygenic risk from unaffected \nparents to probands \nCommon variants are inherited from parents, and most of the parents in our sample are reported \nby clinicians to be clinically unaffected (89.2% in DDD and 95.4% in GEL, although the clinical \nannotation of parental affected status may be imperfect). Given this, and results in autism50, we \n . CC-BY 4.0 International licenseIt is made available under a \n is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)\nThe copyright holder for this preprintthis version posted March 6, 2024. ; https://doi.org/10.1101/2024.03.05.24303772doi: medRxiv preprint \n\n \n8 \nhypothesized that probands without monogenic diagnoses inherit higher common variant risk \nfor neurodevelopmental conditions from unaffected parents than one would expect given their \nparents’ mean risk. Applying the polygenic transmission disequilibrium tes t (pTDT) 50 to \nundiagnosed trios with unaffected parents ( Figure 3A) (1,343 in GEL plus 1,523 in DDD), we \nsaw nominally significant over-transmission of PGSNDC,DDD in 1,567 families not included in the \noriginal GWAS (pTDT deviation = 0.062; paired t-test p=0.014). This over -transmission was \nsignificant in females (pTDT deviation = 0.10, p=0.0078 in 589 trios) but not in males (pTDT \ndeviation = 0.036, p=0.27 in 978 trios) ( Extended Data Figure 7C ; Supplementary Note 4). \nHowever, we saw no significant transmission disequilibrium for the other polygenic scores \n(paired t-test p>0.05), in either sex or in both sexes combined. Given the known over -\ntransmission of PGSEA to autistic individuals50, we excluded autistic individuals from our sample \nand repeated the pTDT, but conclusions were unchanged ( Supplementary Figure 2 ). When \nfocusing on probands with a monogenic genetic diagnosis, we saw no significant transmission \ndistortion for any polygenic score tested (Supplementary Figure 3). \n \nTo put the pTDT results in context, we compared average polygenic scores between unaffected \nparents of undiagnosed patients and controls (Figure 3B; Supplementary Table 7). For all five \npolygenic scores tested, the parents had more polygenic risk than controls (two -sided t-test \np<5.2x10-6 for all polygenic scores except PGSSCZ which had p=0.0093). Given this observation \nand the results from the pTDT, we conclude that risk for neurodevelopmental conditions is \naffected both by familial polygenic background, or factors corre lated with it, and by polygenic \nrisk (specifically, PGS NDC,DDD) that is over -transmitted from unaffected parents to affected \nchildren. \nFigure 3 \n \nPolygenic background in parents of patients with neurodevelopmental conditions.  A) Polygenic \ntransmission disequilibrium test (pTDT) in undiagnosed probands with unaffected parents. We tested if \nprobands’ polygenic score deviated from mean parental polygenic score in trios from GEL (N=1,343) and \nDDD (N=1,523, or N=224 for testing PGSNDC,DDD). Plotted is the mean pTDT deviation (difference between \nthe child’s polygenic score and the mean parental polygenic score, in units of the SD of the latter), with \nerror bars showing 95% confidence intervals. B) Mean polygenic score difference from control samples \n(GEL+UK Household Longitudinal Study, or GEL alone when testing PGSNDC,DDD). This includes only the \nsamples in the trios used in the pTDT analysis. Error bars indicate 95% confidence intervals estimated \nfrom two-sided t-tests. See also Extended Data Figure 4 and Supplementary Table 7. \n . CC-BY 4.0 International licenseIt is made available under a \n is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)\nThe copyright holder for this preprintthis version posted March 6, 2024. ; https://doi.org/10.1101/2024.03.05.24303772doi: medRxiv preprint \n\n \n9 \nNon-transmitted common alleles in unaffected parents are associated with \ntheir children’s risk \nWe next explored one way in which familial polygenic background might affect children's risk of \nneurodevelopmental conditions, namely indirect genetic effects, i.e. effects of alleles in parents \non parental phenotypes that affect their offspring’s risk through the family environment. Indirect \ngenetic effects have been argued to explain around ~30 -45% of the association between \npolygenic predictors of educational attainment and school grades 33,51 and educational \nattainment29,40,52, although these inferences have been contested as confounded by parental \nassortment and population stratification 51,53. To investigate the possible role of indirect genetic \neffects in risk of neurodevelopmental conditions, we compared 2,866 affected trio probands \nwhose parents are unaffected with 4,804 control trios from two UK birth cohorts (N=3,932 trios) \nand from GEL (N=872 trios without neurodevelopmental conditio ns). Using logistic regression, \nwe tested whether the children’s polygenic scores for traits related to neurodevelopmental \nconditions were significantly associated with case status (“proband only” model), and whether \nthis held after conditioning on the parents’ polygenic scores (i.e. including all three trio members’ \npolygenic scores as covariates in the “trio model”) 54 (Figure 4). The idea of this model is to \nisolate the environmentally -mediated portion of polygenic risk in the parents from the direct \neffects of alleles transmitted to their children. Following Young et al .31, we refer to the \ncoefficients on the parental polygenic scores in the trio model as the “non -transmitted \ncoefficients”, since they are mathematically equivalent to the coefficients on the polygenic score \nconstructed from the non -transmitted alleles in a joint regression with the proband polygenic \nscore (Methods).  \nFigure 4 \n \nRegressions comparing undiagnosed probands with neurodevelopmental conditions to controls, \nwith and without controlling for parental PGSs. The plot shows effect sizes of PGSs on case/control \nstatus, testing either the child’s PGS alone (“proband only”) amongst trio probands, or while additionally \ncontrolling for the parents’ PGSs (“trio model”). These were obtained from a logistic regression comparing \nundiagnosed proband with neurodevelopmental conditions from 2,866 trios in which parents are \nunaffected wit h 4,804 control trios from GEL (N=872), the Avon Longitudinal Study of Children and \n . CC-BY 4.0 International licenseIt is made available under a \n is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)\nThe copyright holder for this preprintthis version posted March 6, 2024. ; https://doi.org/10.1101/2024.03.05.24303772doi: medRxiv preprint \n\n \n10 \nParents (N=1,434) and the Millennium Cohort Study (N=2,498). Error bars indicate 95% confidence \nintervals. \n \nFor PGSEA, PGSCP, and PGS NonCogEA, we found that undiagnosed probands’ polygenic scores \nwere no longer significantly associated with having a neurodevelopmental condition after \nconditioning on their parents’ polygenic scores in the trio model (implying limited or no direct \ngenetic effects ), whereas the non -transmitted coefficients were highly significant ( Figure 4). \nThis result held for PGS EA and PGS NonCogEA when analyzing trios with versus without \nneurodevelopmental conditions from GEL alone, and when usin g different combinations of \ncontrol cohorts (Supplementary Figure 4); for PGSCP, results from these sensitivity analyses \nwere more equivocal, but no evidence of direct genetic effects was seen. This finding could \nimply that there are aspects of the environment — including the prenatal environment — that \nare correlated with these non -transmitted alleles and that affect risk of neurodevelopmental \nconditions, including genetically -influenced parental phenotypes. However, our observations \ncould also be due to t he effects of parental assortment (i.e. phenotypic correlation between \npartners), which we discuss further below.  \n \nFor PGSNDC,DDD, we found that the probands’ polygenic scores were still nominally significantly \nassociated with having a neurodevelopmental condition after controlling for their parents’ \npolygenic scores in the trio model (Figure 4). This implies that there is a direct genetic effect of \nPGSNDC,DDD on the probands’ risk of neurodevelopmental conditions, consistent with the over -\ntransmission observed in Figure 3A . For schizophrenia, we saw no significant effect of the \nprobands’ PGSSCZ (p=0.089) in the trio mo del, whereas the mothers’ PGS SCZ was significant \n(p=8.6x10-3). Thus, in summary, there is evidence for direct genetic effects of the polygenic \nscore for rare neurodevelopmental conditions, but not for polygenic scores for related traits. \nExploring the role of prenatal factors \nWe explored whether prenatal factors might mediate the effects of non -transmitted parental \nalleles on risk of neurodevelopmental conditions ( Supplementary Note 5 ). Preterm delivery \n(i.e. giving birth prematurely) 55, which is a risk factor for neurodevelopmental conditions in the \noffspring45–47, showed significant genetic correlations with lower educational attainment (r g=-\n0.30 [ -0.39, -0.21], p=2.3x10 -10), mirroring the epidemiological association 56, and with \nneurodevelopmental conditions (r g=0.58 [0.18, 0. 97], p=0.004) ( Extended Data Figure 8A , \nSupplementary Table 9 ). Premature birth was also associated with lower PGS EA in DDD \n(Extended Data Figure 8B). In theory, the genetic correlation between educational attainment \nand premature delivery could reflect a causal effect of lower educational attainment on \npremature birth, and/or a causal effect of premature birth on lower educational attainment. Using \nMendelian randomization, we found some evidence that lower educational attainment causally \nincreases the risk of giving birth prematurely and of neurodevelopmental conditions (p=1.5x10-\n5 and p=6.5x10 -19 respectively, using the inverse variance -weighted method; Extended Data \nFigure 9; Supplementary Note 5). The fact that the neurodevelopmental conditions of the kind \nstudied in this paper are, by definition, childhood-onset, implies that individuals’ own educational \nattainment is unlikely to causally influence their risk of developing a condition; instead, our \nfinding of a causal effect of educational attainment on these conditions is more likely to ref lect \na causal effect of parents’ educational attainment on their children’s risk, consistent with the \npresence of indirect genetic effects. However, we did not find significant evidence that \nprematurity explained the association between neurodevelopmental conditions and non -\n . CC-BY 4.0 International licenseIt is made available under a \n is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)\nThe copyright holder for this preprintthis version posted March 6, 2024. ; https://doi.org/10.1101/2024.03.05.24303772doi: medRxiv preprint \n\n \n11 \ntransmitted common variants in the parents that are associated with educational attainment \n(Supplementary Note 5; Supplementary Figure 5). \nParental assortment obscures the true nature of common variant effects \nAnother factor that may contribute to the significant correlation between non-transmitted alleles \nin parents and neurodevelopmental conditions in their children is parental assortment, the \nphenomenon whereby people are more likely to choose partners with s imilar traits to \nthemselves. Parental assortment is known to be particularly strong for educational attainment \nand cognitive ability, with estimates of phenotypic correlation between spouses ranging from \n0.25 to 0.657–64. It is also observed for psychiatric conditions63,65–67, including in parents of autistic \nindividuals and of individuals with neurodevelopmental conditions due to the 16p12.1 deletion68. \nOne consequence of parental assortment is that it induces a correlation between alleles that act \nin the same direction on a trait, both between parents and, in their descendents, within and \nbetween loci 57. Thus, parental assortment on cognitive ability or correlated traits (e.g. \neducational attainment) would be expected to lead to individuals with inherited rare variants \nassociated with reduced cognitive ability8,9,12,69 also having a polygenic background of common \nvariants associated with reduced cognitive ability 57,68. In the context of our polygenic score \nanalyses in Figure 4 , in the proband -only model, the proband’s polygenic score would \nstatistically capture (‘tag’) the correlated effects of these rare variants (which causally impact \nneurodevelopmental conditions69,70). However, in the trio model, the proband’s polygenic score \nwould no longer be correlated with the rare variant component after conditioning on the parents’ \npolygenic scores, because the rare and common variant components segregate approximately \nindependently within-family; instead, this correlation with the rare variant component would be \nreflected by the non-transmitted coefficients on the parents’ polygenic scores53.  \n \nTo explore this potential genetic consequence of parental assortment in our cohorts, we tested \nwhether the common and rare variant components contributing risk of neurodevelopmental \nconditions are indeed correlated. From the sequencing data in DDD and GEL, we extracted rare \n(MAF<1x10-4) protein -truncating variants (PTVs) and damaging missense variants in genes \nintolerant of loss -of-function (LoF) variation (“constrained genes”), which are associated with \nreduced cognitive ability 9 and risk of neurodevelopme ntal conditions 69,70. Consistent with the \neffects of parental assortment, amongst unaffected parents of probands with \nneurodevelopmental conditions, we observed that the number of rare damaging coding variants \nin constrained genes (the “rare variant burden score”, RVBS) in on e parent was significantly \nnegatively correlated with the other parent’s PGS EA (r=-0.065, p=5.5x10 -9), PGS CP (r=-0.036, \np=1.4x10-3) and PGS NonCogEA (r=-0.046, p=4.3x10 -5) ( Figure 5 ), after correcting for genetic \nprincipal components. As  expected, a similar correlation was seen within the probands \nthemselves (Figure 5), regardless of whether including all probands, undiagnosed probands, or \nprobands with de novo diagnoses, and if restricting RVBS to haploinsufficient genes associated \nwith developmental disorders (Supplementary Figure 6). We also saw a similar result amongst \ncontrol children from the Millennium Cohort Study, including after applying weights to adjust for \nnon-random sampling and attrition in that cohort, indicating that this correlation is not only \nobserved in patients with neurodevelopmental conditions ( Supplementary Figure 7). We saw \nno significant correlation between any of the polygenic scores and the burden of rare \nsynonymous variants in constrained genes or dominant genes associated with developmental \nconditions ( Figure 5 , Supplementary Figure 6 ), confirming that the result observed for \ndeleterious variants is unlikely to be due to population structure artifacts. The correlations \n . CC-BY 4.0 International licenseIt is made available under a \n is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)\nThe copyright holder for this preprintthis version posted March 6, 2024. ; https://doi.org/10.1101/2024.03.05.24303772doi: medRxiv preprint \n\n \n12 \nbetween polygenic scores and rare damaging variants may explain why we saw very limited \nevidence that these polygenic scores modify the penetrance of such variants in families with \nneurodevelopmental conditions (Supplementary Note 6, Supplementary Figure 8). \n \nFigure 5 \n \nCorrelation between rare variant burden scores and polygenic scores in patients with \nneurodevelopmental conditions and their parents. Correlation coefficients between the number of \nrare inherited rare damaging coding (left) or synonymous variants (right; negative control) in constrained \ngenes and polygenic scores within/between different sets of individuals. In blue are the correlations within \nprobands with neurodevelopmental conditions whose parents are unaffected (i.e. the child’s rare variant \nburden score, RVBS, with their own polygenic score, PGS), and in purple are the correlations within their \nparents. In orange is the cross -parental correlation i.e. one parent’s RVBS correlated with the other \nparent’s PGS. We calculated the correlations in a combined sample of trios with neurodevelopmental \nconditions from DDD and GEL (N=3,999 or 2,553 for PGS NDC,DDD excluding samples from the original \nGWAS6). Note that both the RVBSs and PGSs have been corrected for 20 genetic principal components. \nError bars represent 95% confidence intervals.  \n \nTo explore whether the correlation between common and rare variants associated with \nneurodevelopmental conditions could be driving the association between non -transmitted \ncommon alleles and children’s risk shown in Figure 4, we extended the trio model to control for \nthe probands’, mothers’ and fathers’ rare variant burden scores as well as polygenic scores \n(Extended Data Figure 10). This did not change our original conclusion from the trio regression, \nnamely that the risk of neurodevelopmental conditions is  correlated with non -transmitted \ncommon alleles in the parents that are associated with cognitive performance, educational \nattainment and the non -cognitive component thereof, but not with the transmitted common \nalleles. However, we cannot rule out that this association with non-transmitted common alleles \nis primarily driven by the assortment -induced correlation between common and rare variants, \n . CC-BY 4.0 International licenseIt is made available under a \n is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)\nThe copyright holder for this preprintthis version posted March 6, 2024. ; https://doi.org/10.1101/2024.03.05.24303772doi: medRxiv preprint \n\n \n13 \nsince the rare variant burden score we have used likely only captures a small proportion of the \ntotal rare variant component (just as the polygenic score only captures a small fraction of SNP \nheritability).  \n \nIn summary, we find that parents’ non -transmitted alleles at common variants ascertained for \ntheir association with educational attainment and cognitive performance are correlated with their \nchildren's risk of neurodevelopmental conditions, but we do not see evidence for direct genetic \neffects from transmitted alleles. Further work is needed to confirm whether this association with \nthe non-transmitted alleles is due to true indirect genetic effects and/or parental assortment.   \nDiscussion \nHere we combined two large cohorts of patients with rare neurodevelopmental conditions to \nexplore the contribution of common variants to risk. After first demonstrating that polygenic \nscores for neurodevelopmental conditions and several related traits were  significantly \nassociated with case/control status within both DDD and GEL ( Supplementary Table 2 ), we \nconducted a GWAS meta-analysis of patients with neurodevelopmental conditions from the two \ncohorts and revealed significant genetic correlations with sev eral psychiatric conditions which \nhad not been previously reported 6 (Figure 1A). Conditional genetic correlations ( Figure 1B) \nshow that several of these (e.g. schizophrenia, Tourette’s) are not simply driven by the \ncomponent of polygenic risk for neurodevelopmental conditions that is shared with educational \nattainment. This suggests that these mental health conditions share und erlying biology with \nneurodevelopmental conditions that is independent of that captured by effects of common \nvariants on educational  attainment, although we acknowledge that estimates of genetic \ncorrelations can be biased by cross-trait parental assortment and other confounding factors71. \n \nWe showed that polygenic scores for several traits that are genetically correlated with \nneurodevelopmental conditions were significantly associated with having a monogenic \ndiagnosis, with the strongest effect observed for educational attainment ( Figure 2A ). Our \nprevious work had found no such difference in polygenic background between diagnosed and \nundiagnosed probands in DDD 6, and it is likely that power has been improved here by our larger \nsample size and better definition of which probands truly have a monogenic diagnosis 5,27. Our \nresult is consistent with a liability threshold model for rare neurodevelopmental conditions; \nchildren without a large-effect monogenic variant may require higher polygenic load (or a major \nenvironmental contribution such as a teratogenic infection e .g. Zika virus) to move their \nphenotype over the threshold required to be clinically diagnosed with a neurodevelopmental \ncondition (Extended Data Figure 1 ). Perhaps important for consideration in clinical settings, \nwe find probands with more affected first-degree relatives had both a lower PGSEA and a lower \nchance of getting a monogenic diagnosis in DDD (Extended Data Figure 6), emphasizing that \nif there are multiple first -degree relatives with neurodevelopmental conditions in a family, this \nmay not necessarily be due to a monogenic cause. Our observation that diagnosed patients \nwith affected parents (most of whom have inherite d dominant diagnoses), and their parents, \nhave lower average PGS EA than those with unaffected parents ( Extended Data Figure 4 ) is \nconsistent with the effects of parental assortment (Figure 5).  \n \nSince most parents of the patients we studied are annotated as clinically unaffected, we \nhypothesized that they might be over -transmitting polygenic risk to their affected offspring. We \n . CC-BY 4.0 International licenseIt is made available under a \n is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)\nThe copyright holder for this preprintthis version posted March 6, 2024. ; https://doi.org/10.1101/2024.03.05.24303772doi: medRxiv preprint \n\n \n14 \nsaw nominally significant over -transmission of PGS NDC,DDD from unaffected parents to \nundiagnosed probands, but saw no significant transmission distortion for PGS EA or PGS CP \n(Figure 3A ), despite these polygenic scores explaining much more variance in risk than \nPGSNDC,DDD (Supplementary Table 2). Consistent with this, in a two-generation model (Figure \n4), we found evidence for a direct genetic effect of PGS NDC,DDD on risk of neurodevelopmental \nconditions, but no evidence for direct genetic effects of the othe r polygenic scores tested. \nInstead, we observed that the parents’ PGS EA, PGS CP and PGS NonCogEA were significantly \nassociated with their children’s risk even after controlling for the children’s PGS, indicating a \ncorrelation between non-transmitted alleles and the children’s phenotype. This may be due to \nindirect genetic effects and/or the consequences of parental assortment.  \n \nPrevious papers have shown that non -transmitted alleles in the parents are associated with \nchildren’s educational attainment and school grades, explaining a third to a half of the overall \nassociation between educational outcomes and PGS EA that is seen in population -based \nsamples29,33,34,51. However, the interpretation of this finding is still a matter of debate, since most \nof these papers use models that can give spurious or inflated indirect genetic effect estimates \ndue to population stratification and/or parental assortment51,53,72. Parental assortment induces a \ncorrelation between the polygenic score associated with the trait under assortment and the \nremaining genetic component of the phenotype with which the polygenic score would be \nuncorrelated under random assortment. This includes the component due to rare variants, which \ncould have a much stronger effect on risk of neurodevelopmental conditions than the common \nvariant component. We demonstrated (to our knowledge, for the first time) a correlation between \nthe rare and common variant components affecting cognitive and educational outcomes, both \nbetween parents and within both offspring and parents (Figure 5 and Supplementary Figure 7). \nThis supports the hypothesis that the association of PGS EA with lower risk of \nneurodevelopmental conditions is at least partly due to the assortment -induced correlation of \nPGSEA with rare variants affecting both neurodevelopmental conditions and educational \nattainment. Although these observed correlations are small in magnitude (|r|<0.1), it is likely that \nthe correlation between the total common and rare variant components of educational outcomes \nand neurodevelopmental conditions is substantially higher than this53, since only small fractions \nof these components are likely to be captured by the polygenic scores and our rare variant \nburden score, respectively. Very large whole -genome sequenced datasets will be required to \nbetter characterize the total rare variant component of these traits and estimate this correlation \nmore accurately. \n \nWith the current study design, we were unable to demonstrate the presence of indirect genetic \neffects on risk of neurodevelopmental conditions unambiguously, and nor could we test whether, \nif present, these are mediated by parenting behaviors. However, we did explore whether \ncommon genetic variants might influence risk by affecting prenatal risk factors (a form of indirect \ngenetic effects). We found that educational attainment showed a significant negative genetic \ncorrelation with preterm delivery, whereas neurodevelopmental conditions showed a significant \npositive genetic correlation with it even after conditioning on educational attainment with \nGenomicSEM (r g=0.47) (Extended Data Figure 8A ). This is consistent with epidemiological \nstudies that found an association between prematurity and poorer neurocognitive outcomes \neven after controlling for socioeconomic confounders 45,73–79. We found some evidence from \nMendelian randomization that lower educational attainment is causally associated with preterm \ndelivery (Extended Data Figure 9B ; Supplementary Note 5 ); this may be because lower \neducational attainment is associated with several factors that increase the risk of preterm \n . CC-BY 4.0 International licenseIt is made available under a \n is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)\nThe copyright holder for this preprintthis version posted March 6, 2024. ; https://doi.org/10.1101/2024.03.05.24303772doi: medRxiv preprint \n\n \n15 \ndelivery in the mother (such as a short inter -pregnancy interval80, exposure to tobacco smoke \nduring pregnancy 81,82, and pre -eclampsia56). We acknowledge that causal estimates from \nMendelian Randomization analyses may be biased when using population -based GWASs, as \nwe have done, so these findings should be considered tentative until confirmed using sufficiently \nwell-powered within-family GWASs32. Although we did not find evidence for a causal effect of \nprematurity on neurodevelopmental conditions ( Extended Data Figure 9C ), several factors \nmay have reduced the power of this analysis ( Supplementary Note 5 ). We also saw no \nsignificant evidence that prematurity mediates indirect genetic effects of common alleles \nassociated with educational attainment (Supplementary Note 5). However, it may be that our \nanalysis was simply underpowered at this sample size, since we did see some attenuation \n(albeit not significant) of the non -transmitted coefficients for PGS EA when removing premature \nprobands (Supplementary Figure 5). Nonetheless, our results emphasize how genetics may \nconfound epidemiological associations between risk factors and neurodevelopmental \nconditions83,84, and also suggest that studies seeking to characterize the nature of indirect \ngenetic effects on educational outcomes should consider the contribution of prenatal factors.  \n \nOur study has several limitations. Firstly, the overall variance in risk of neurodevelopmental \nconditions explained by common variants is low (~10%) and the polygenic scores tested here \nexplain only a fraction of this. Having said that, these polygenic scores are statistically significant \npredictors of neurodevelopmental conditions (Supplementary Table 2) and are likely to explain \nmore variance as GWAS sample sizes grow. Secondly, the reported significance of detected \nPGS effects does not simply reflect the  strength of the real associations, but also the power of \nthe original GWAS from which SNP effect sizes were derived. Thus, one must be cautious when \ncomparing effects between polygenic scores for different traits. Thirdly, the phenotypic \nheterogeneity of the cohorts likely limits our power and may confound results. For example, \nmissed diagnoses of autism amongst DDD and GEL participants with neurodevelopmental \nconditions (perhaps due to the young average age; Supplementary Note 1 ) could be \nconfounding our result of there being no apparent under -transmission of PGS EA (Figure 3A; \nSupplementary Figure 2), since PGSEA may be over-transmitted to autistic individuals26,50 but \nunder-transmitted to patients with intellectual disability who are not autistic. Fourthly, the fact \nthat probands in trios tend to have higher polygenic scores for educational attainment than those \nnot in trios ( Extended Data Figure 5B ) suggests that the trio probands are a non -random \nsample, which could potentially induce biases in trio -based analyses; for example, the \nundiagnosed trio probands may be enriched for monogenic causes in as -yet-undiscovered \ngenes, which could reduce power when assessing over -transmission of polygenic risk. \nAdditionally, many of our analyses are predicated on the assumption that the “unaffected \nparents” (i.e. those reported by the clinician not to have a similar phenotype to the proband) do \nnot have phenotypes related to neurodevelopmental conditions. It may be that some fraction of \nthem do have (or did have, earlier in life) relevant phenotypic features (e.g. learning difficulty, \nspeech delay), but that these were not detected and recorded by the clinicians. The inclusion of \nthese parents could be reducing power or confounding results in several analyses. Another \ncaveat is that our estimates of effect size when comparing to controls are sensitive to the choice \nof control cohort, likely reflecting differences in educational-related ascertainment bias between \nthem 85 (Supplementary Note 3; Extended Data Figure 5 ). Despite this, sensitivity analyses \nsuggest that our main conclusions are robust and not driven by ascertainment bias of a \nparticular control cohort ( Supplementary Note 3 ; Extended Data Figure 4 ; Supplementary \nFigure 4 ; Supplementary Table 6 ). Finally, the correlation between the rare and common \n . CC-BY 4.0 International licenseIt is made available under a \n is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)\nThe copyright holder for this preprintthis version posted March 6, 2024. ; https://doi.org/10.1101/2024.03.05.24303772doi: medRxiv preprint \n\n \n16 \nvariant components of neurodevelopmental conditions (Figure 5), which is likely due to parental \nassortment, may have confounded several of these analyses.  \n \nIn future, as GWAS discovery cohorts for both rare neurodevelopmental conditions and related \ntraits increase in size, we will have more power to explore these common variant effects on risk, \npenetrance and phenotypic expressivity of these conditions. These  studies should seek to \nconfirm whether there really are no direct genetic effects of common variants influencing \neducational attainment and cognitive performance on risk of neurodevelopmental conditions, or \nwhether these are just small. To disentangle the  contribution of indirect genetic effects and \nparental assortment to common variant associations with neurodevelopmental conditions, future \nstudies will need to use extended genealogies and/or more sophisticated modeling of the \ninfluence of parental assortment on common and rare variants than is currently possible 51,53,72. \nIf these studies also had measures of epidemiological and prenatal risk factors such as \nprematurity, and of parental phenotypes and nurturing behaviors, one could explore how indirect \ngenetic effects (if present) are mediated, which has potential implications for assessing the \nmodifiability of risk. Finally, it will be important for future studies to explore the role of polygenic \nbackground in neurodevelopmental conditions in families with non-European genetic ancestries. \nAbbreviations \nDDD: Deciphering Developmental Disorders study; GEL: Genomics England 100,000 Genomes \nProject; GWAS: genome-wide association study; GenomicSEM: Genomic Structural Equation \nModelling; PGS: polygenic score; NDCs: neurodevelopmental conditions; EA: educationa l \nattainment; CP: cognitive performance; SCZ: schizophrenia; NonCogEA: the non -cognitive \ncomponent of educational attainment; pTDT: polygenic transmission disequilibrium test; RBVS: \nrare variant burden score; UKHLS: United Kingdom Household Longitudinal St udy; ALSPAC: \nAvon Longitudinal Study of Parents and Children; MCS: Millennium Cohort Study; QC: quality \ncontrol; MAF: minor allele frequency; GSA: Global Screening Array from Illumina; SNP: single \nnucleotide polymorphism; HWE: Hardy-Weinberg Equilibrium. \nAcknowledgements \nWe are extremely grateful to families for their participation and engagement in the DDD study \nand 100,000 Genomes projects; without them, this research would not be possible. We also \nthank their clinicians and our colleagues (including the Sanger Human Gen etics Informatics \nteam, particularly Iaroslav Popov and Ruth Eberhardt) who assisted in the generation and \nprocessing of data. We are very grateful to Jillian Hastings -Ward, Hannah Podd and Hannah \nHumphrey from the Participant Panel for the 100,000 Genomes project, Ana Lisa Taylor Tavares \nfrom Genomics England, and Sarah Wynn from the patient organization Unique, for their \nassistance with writing the FAQ. We thank Athanasios Kousathanas and Loukas Moutsianas \nfrom Genomics England Bioinformatics Research Ser vices for help with data quality control, \nHilary Wong for useful discussions on prematurity, Michel Nivard for advice on use of \nGenomicSEM, and Angelica Ronald, Naomi Wray, and Nick Martin for helpful discussions.  \n \nDDD: The DDD study presents independent research commissioned by the Health Innovation \nChallenge Fund (grant no. HICF -1009-003). The full acknowledgements can be found at \nwww.ddduk.org/access.html. This study makes use of DECIPHER, which is funded by the \nWellcome Trust.   \n . CC-BY 4.0 International licenseIt is made available under a \n is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)\nThe copyright holder for this preprintthis version posted March 6, 2024. ; https://doi.org/10.1101/2024.03.05.24303772doi: medRxiv preprint \n\n \n17 \n \nGEL: This research was made possible through access to data in the National Genomic \nResearch Library, which is managed by Genomics England Limited (a wholly owned company \nof the Department of Health and Social Care). The National Genomic Research Library h olds \ndata provided by patients and collected by the NHS as part of their care and data collected as \npart of their participation in research. The National Genomic Research Library is funded by the \nNational Institute for Health Research and NHS England. The Wellcome Trust, Cancer \nResearch UK and the Medical Research Council have also funded research infrastructure. \n \nUK Household Longitudinal Study: We used data from ‘Understanding Society: The UK \nHousehold Longitudinal Study’, which is led by the Institute for Social and Economic Research \nat the University of Essex and funded by the Economic and Social Research Counci l (grant \nnumber ES/M008592/1). The data were collected by NatCen and the genome -wide scan data \nwere analysed by the Wellcome Trust Sanger Institute. Data governance was provided by the \nMETADAC data access committee, funded by ESRC, Wellcome and MRC (grant number \nMR/N01104X/1).  \n \nALSPAC: We are extremely grateful to all the families who took part in ALSPAC, the midwives \nfor their help in recruiting them, and the whole ALSPAC team, which includes interviewers, \ncomputer and laboratory technicians, clerical workers, research scientist s, volunteers, \nmanagers, receptionists and nurses. The UK Medical Research Council and Wellcome (Grant \nref: 217065/Z/19/Z) and the University of Bristol provide core support for ALSPAC. This \npublication is the work of the authors and Hilary Martin will serve as a guarantor for the contents \nof this paper. Genome -wide genotyping data was generated by Sample Logistics and \nGenotyping Facilities at the Wellcome Sanger Institute and LabCorp (Laboratory Corporation of \nAmerica) using support from 23andMe. \n \nMCS: We are grateful to the Centre for Longitudinal Studies (CLS), UCL Social Research \nInstitute, for the use of these data and to the UK Data Service for making them available. \nHowever, neither CLS nor the UK Data Service bear any responsibility for the a nalysis or \ninterpretation of these data. \n \nThis research was funded in part by Wellcome (grant no. 220540/Z/20/A, “Wellcome Sanger \nInstitute Quinquennial Review 2021–2026”). For the purpose of open access, the authors have \napplied a CC -BY public copyright license to any author accepted manuscript v ersion arising \nfrom this submission. DB thanks the University of Cambridge Amgen Scholar Program for \nsupport. \nAuthor contributions \nQQH and EMW conducted most of the analyses, with the remainder being conducted by PC \nand DSM. QQH and EMW carried out data preparation and quality control, with assistance from \nKES, VKC, PD, SL, TM, MKM, SA and DB. ES, CFW and HVF helped supervise the DDD study, \ntogether with MEH. EJR, VW, ASY and MEH provided key intellectual input. HCM supervised \nthe analyses and directed the study. QQH, EMW, and HCM wrote the first draft of the \nmanuscript, with input from PC, DSM, JCB, VW, ASY and MEH. All authors read a nd \ncommented on the final manuscript.  \n . CC-BY 4.0 International licenseIt is made available under a \n is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)\nThe copyright holder for this preprintthis version posted March 6, 2024. ; https://doi.org/10.1101/2024.03.05.24303772doi: medRxiv preprint \n\n \n18 \nOnline Methods \nCohort Descriptions and phenotypes \nDeciphering Developmental Disorders (DDD) \nThe aim of the DDD study is to find molecular diagnoses for families and patients affected by \npreviously genetically undiagnosed, severe developmental conditions. Recruitment was \nconducted from 2011 to 2015 across twenty -four clinical genetics services in the United \nKingdom (UK) and Ireland 86. The DDD study has UK Research Ethics Committee approval \n(10/H0305/83, granted by the Cambridge South Research Ethics Committee and GEN/284/12, \ngranted by the Republic of Ireland Research Ethics Committee). The clinica l inclusion criteria \nincluded neurodevelopmental conditions, congenital, growth or behavioral abnormalities and \ndysmorphic features. Probands were systematically phenotyped via DECIPHER87 using Human \nPhenotype Ontology (HPO) 88 terms and a bespoke online questionnaire that collected \ninformation on developmental milestones, growth measurements, number of affected relatives, \nprematurity, maternal diabetes, and other clinically -relevant parameters. The cohort has been \ndescribed extensively5,70,86,89.  \n \nWe focused on probands in the DDD cohort who had neurodevelopmental conditions (NDCs), \nwhich were defined previously by Niemi et al.6. Briefly, these were probands who had at least \none of the following neurodevelopmental HPO terms or their descendant terms: abnormality of \nhigher mental function (HP:0011446), neurodevelopmental abnormality (HP:0012759), \nabnormality of the nervous system morphology (HP:0012639), behavioural abnormality \n(HP:0000708), seizures (HP:0001250), encephalopathy (HP:001298), abnormal synaptic \ntransmission (HP:0012535), or abnormal nervous system electrophysiology (HP:0001311).   \nGenomics England (GEL) 100,000 Genomes Project \nThe 100,000 Genomes project is an initiative by the UK Department of Health and Social Care \nto whole-genome sequence individuals with rare conditions and cancer in the National Health \nService90,91. The 100,000 Genomes project was approved by the East of England—Cambridge \nCentral Research Ethics Committee (REF 20/EE/0035). The rare disease branch of the project \nconsists of sequencing data from ~72,000 patients with rare conditions and their relative s, in \n~34,000 families with a variety of structures. There are over 190 rare conditions represented in \nthe cohort, and about 23% of the patients have NDCs. The cohort was sequenced at around \n35x coverage, and variant calling and quality control (QC) were performed by Genomics \nEngland91,92.  \n \nGEL NDC patients were defined as those recruited under the “Neurodevelopmental disorders” \ndisease sub -category, or with more than one HPO term that was a descendent of \n“Neurodevelopmental Abnormality” (HP:0012759). We removed probands whose age of onset \nwas >16 years or who had neurodegenerative conditions.  \n \nThe set of unrelated GEL controls included cancer patients over 30 years old (N=10,469) and \nunaffected relatives (N=3,198) of probands with rare conditions who were not in the NDC set \nand did not have phenotypes similar to probands from DDD (“DDD -like”). T he “DDD -like” \nprobands were defined as those who: \n . CC-BY 4.0 International licenseIt is made available under a \n is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)\nThe copyright holder for this preprintthis version posted March 6, 2024. ; https://doi.org/10.1101/2024.03.05.24303772doi: medRxiv preprint \n\n \n19 \n1) were recruited into a disease model which was also used to recruit probands who had \npreviously been recruited into DDD (see section below on identifying probands \noverlapping between the two cohorts), or  \n2) had one the top five HPO terms used in DDD and their descendants, namely \nHP:0000729 (autistic behaviour), HP:0001250 (seizure), HP:0000252 (microcephaly), \nHP:0000750 ( delayed speech and language development ) and HP:0001263 (global \ndevelopmental delay).  \nProbands recruited into the neurodegenerative disorders subcategory or with an age of \nonset >16 years were removed from the DDD-like set, as were probands recruited into a disease \nsubcategory for which the average age of probands was >16 years. \n \nTo define relatedness, we used a file generated by GEL consisting of a pairwise kinship matrix \nproduced using the PLINK2 93,94 implementation of the KING robust algorithm 95 and a --king-\ncutoff of 0.0442 (i.e. \n1\n29/2).  \nControl cohorts \nThe UK Household Longitudinal Study (UKHLS) cohort consists of a continuation of the British \nHousehold Panel Survey (BHPS) of individuals living in the UK96,97. The Avon Longitudinal Study \nof Parents and Children (ALSPAC) is a birth cohort study of children born in Avon, England with \nexpected dates of delivery between 1st April 1991 and 31st December 199298. Eligible pregnant \nwomen (N=13,761) were recruited and their children have been phenotyped extensively over \nthe last 30 years. Ethical approval for the study was obtained from the ALSPAC Ethics and Law \nCommittee and the Local Research Ethics Committees. Please note that the study website \n(http://www.bristol.ac.uk/alspac/researchers/our-data/) contains details of all the data that is \navailable through a fully searchable data dictionary and variable search tool. The Millennium \nCohort Study (MCS) is a birth cohort study of children born across the UK during 2000 and 2001 \nfrom 18,552 families 99,100. Further information about recruitment of these cohorts is gi ven in \nSupplementary Note 3. \nPreparation of genetic data \nIndividuals from DDD, UKHLS, ALSPAC, and MCS were genotyped on various arrays, whereas \nGEL individuals were whole -genome sequenced. The available data are summarized here \nbriefly: \n● A subset of the DDD cohort (all children and several thousand parents) was genotyped \non three genotype array chips: the Illumina HumanCoreExome chip (CoreExome), the \nIllumina OmniChipExpress (OmniChip), and the Illumina Infinium Global Screening \nArray (GSA ). Some probands were genotyped on more than one chip, as shown in \nSupplementary Figure 9 . In downstream analysis, we used the CoreExome and \nOmniChip data for analyses of probands, and the GSA and OmniChip data for analyses \nof trios. QC of CoreExome (inclu ding DDD patients and 9,270 UKHLS controls \ngenotyped on the same chip) and Omnichip data were performed by Niemi et al. 6 and \nwe performed QC in the GSA data specifically for this paper ( Supplementary Tables \n10 and 11). The DDD cohort was also exome sequenced, and those data were used for \nthe analyses involving rare variants. QC and processing of the exome data are described \nbelow in the section “Extracting and annotating rare variants”. \n . CC-BY 4.0 International licenseIt is made available under a \n is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)\nThe copyright holder for this preprintthis version posted March 6, 2024. ; https://doi.org/10.1101/2024.03.05.24303772doi: medRxiv preprint \n\n \n20 \n● GEL individuals were whole genome sequenced with 150bp paired -end reads using \nIllumina HiSeqX. Variant calling and QC were performed by Genomics England. We \nused 78,195 post -QC germline genomes from the Aggregated Variant Calls (aggV2) \nprepared by the GEL team. We kept variants that passed the QC filters shown in \nSupplementary Table 12.  \n● Data we received from ALSPAC were processed in two batches97. In the first batch, we \nreceived post -QC array data for G0 mothers (N=8,884) who were genotyped on the \nIllumina Human 660W chip and G1 children (N=8,932) genotyped on the HumanHap550 \nquad chip. In the second batch, we received another 2,198 parents (G0 mothers and G0 \npartners101) who were genotyped on the CoreExome array.  \n● We received data for 21,181 MCS samples who were genotyped using the GSA array \nchip102.  \nWe applied standard QC filters in each dataset separately, described further in Supplementary \nMethods. \nGenetically predicted ancestry \nThe Supplementary Methods  provide detailed information on ancestry inference, but we \nsummarize it briefly here. The identification of GBR -ancestry samples from the DDD \nCoreExome and OmniChip data was described previously6. To identify individuals of genetically \ninferred GBR ancestry in DDD GSA samples, we first projected post -QC samples onto 1,000 \nGenomes phase 3 individuals 36 (Supplementary Figure 10 ). We then performed another \nprincipal component analysis (PCA) within the loosely defined European ancestry subset and \nidentified a homogeneous subgroup ( Supplementary Figure 11 ) using Uniform Manifold \nApproximation and Projection (UMAP)103. Since we merged parent-offspring trios genotyped on \nGSA and Omnichip array chips in downstream analysis, we kept GSA individuals who were \nsimilar to Omnichip individuals in terms of genetic ancestry in PCA space ( Supplementary \nFigure 12). In GEL, we used individuals with genetically inferred European ancestry, which were \nidentified by the GEL bioinformatics team. We further restricted to a homogeneo us subset \n(N=56,249) that represents white British individuals ( Supplementary Figure 13 ). Array data \nreceived from the ALSPAC all had genetically predicted European ancestry, so we did not \nperform any filtering based on genetic ancestry. We performed similar PCA and UMAP \nclustering to identify GBR -ancestry individuals in MCS ( Supplementary Figure 14 ; \nSupplementary Figure 15 ), and further filtered to individuals who self -reported as being of \nWhite ethnicity.  \nIdentifying and removing relatives within and across cohorts \nWithin each dataset, we identified up to third-degree relatives (kinship coefficient > 0.0442 by \nKING v2.2.495) using post-QC genotyped array data or WGS data. We always used a subset \nof unrelated individuals (i.e. more distant than third-degree relatives) in downstream analysis. \nIn analyses using trios, we made sure probands in trios were unrelated and parents were \nunrelated with parents from other families.  \n \nIn analyses combining DDD and GEL, we removed from GEL any participants who were also \nrecruited into DDD and or who were related to DDD participants, and also removed Scottish \nsamples from DDD since we were unable to check whether GEL samples were related to \nthem (Supplementary Methods). We removed individuals from the two birth cohorts who \n . CC-BY 4.0 International licenseIt is made available under a \n is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)\nThe copyright holder for this preprintthis version posted March 6, 2024. ; https://doi.org/10.1101/2024.03.05.24303772doi: medRxiv preprint \n\n \n21 \nwere related to each other or to DDD participants, which left 1,434 and 2,498 trios from \nALSPAC and MCS, respectively (Supplementary Methods).  \nImputation and post-imputation QC \nImputation of array data was performed in each genotyped cohort separately using the \nmaximum number of variants available after QC. Prior to imputation, we removed palindromic \nSNPs, SNPs that were not in the imputation reference panel, and SNPs with mismatched alleles. \nDDD samples and UKHLS controls who were genotyped on the CoreExome array were imputed \nwith the HRC r1.1 reference panel by Niemi et al. 6. DDD GSA and Omnichip samples and \nALSPAC samples were imputed to the TOPMed r2 reference panel using the TOPMed \nimputation server, and the MCS samples to the HRC r1.1 reference panel 104–106. We kept well \nimputed common variants with Minimac4 R 2 >0.8 and minor allele frequency (MAF) >1%. For \npolygenic score analyses, we subsequently restricted to common variants that passed these \nQC filters in all genotyped cohorts and also passed QC in the GEL WGS data. \nDefining patients with versus without monogenic diagnoses  \nDDD \nThe DDD study identified clinically relevant rare variants from exome sequencing and \nchromosome microarray data using a filtering procedure described in Wright et al. 86. The \nprocedure focuses on identifying rare damaging variants that fit an appropriate inheritance mode \nin a set of genes that cause developmental disorders (DDG2P, \nhttps://www.deciphergenomics.org/ddd/ddgenes). Variants that pass clinical filtering are \nuploaded to DECIPHER87, where the patients’ clinicians are asked to classify them as definitely \npathogenic, likely pathogenic, uncertain, likely benign or benign. We defined “diagnosed” \nprobands as those with one or more variants either annotated as pathogenic/likely pathogenic \nin DECIPHER by their referring clinician, or predicted as pathogenic /likely pathogenic using \nautocoded ACMG diagnoses as described in 5. All remaining probands were classed as \n“undiagnosed”. Probands with a de novo  diagnosis are those with a de novo  mutation in a \nmonoallelic or X-linked DDG2P gene that was either annotated or predicted as pathogenic/likely \npathogenic. \nGEL \nThe probands assigned diagnostic status were those included in the Genomic Medicine Service \nexit questionnaire, in which a clinician evaluated the pathogenicity of variants of interest \nidentified through GEL’s custom pipeline. We defined “diagnosed” proban ds as those that had \na pathogenic or likely pathogenic variant that is annotated as partially or fully explaining their \nphenotype in this exit questionnaire. Probands with a de novo  diagnosis are those whose \npathogenic/likely pathogenic variants from the e xit questionnaire were annotated as de novo \nprotein truncating or missense variants in DDG2P monoallelic or X -linked genes. We defined \n“undiagnosed” probands as those that were present in the exit questionnaire but not annotated \nas having a pathogenic or likely pathogenic variant and not annotated as “yes” or “partially” in \nthe “case_solved_family” column. We further removed from this undiagnosed set any probands \nwho have potential diagnoses in the Diagnostic Discovery data in GEL, which is a list of variants \nsubmitted by researchers that are thought likely to be pathogenic by the GEL clinical team.  \n . CC-BY 4.0 International licenseIt is made available under a \n is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)\nThe copyright holder for this preprintthis version posted March 6, 2024. ; https://doi.org/10.1101/2024.03.05.24303772doi: medRxiv preprint \n\n \n22 \nExtraction and quality control on rare variants \nQuality control of DDD exome sequencing data and extraction of rare single nucleotide variants \n(SNVs), and insertion and deletions (indels) is summarized in Supplementary Table 13. Indels \nin the same gene and sample were removed (4% of indels with MAF < 1%), since these were \noften part of complex mutational events that would require haplotype-aware annotation.  \n \nFor GEL, details of the QC of SNVs and indels in the WGS data are provided by the GEL \nteam91,92 and variant QC is summarized in Supplementary Table 12 . We use a custom \npython script to extract rare variants from GEL aggregated WGS variant call format files \n(aggV2). We filtered genotypes to those with genotype quality (GQ) ≥ 20 and read depth (DP) \n≥ 10. We removed heterozygous genotypes that did not pass  a binomial test of balanced REF \nand ALT alleles (p < 1x10 -3) or for which ALT/(REF+ALT) (AB ratio) was no t between 0.2 and \n0.8. We further removed variants with missing high  quality genotypes in more than 5% of all \nsamples in aggV2. We removed indels in the same gene and sample for the same reason \ndescribed above for DDD. \n \nFor MCS, details of the QC of exome sequencing data are in Supplementary Methods. \n \nDefining trio sample sets in DDD and GEL \nThe procedure used for filtering trios used in DDD and GEL is shown in Supplementary \nFigure 16. Briefly, in DDD, we combined data across GSA and OmniChip arrays and kept \ntrios in which all three members had GBR ancestry and the proband had an NDC. We \nexcluded trios recruited from Scottish centres and kept unrelated trios. We then split trios into \nthose with both parents unaffected and those with one or both parents affected. These were \nthen categorized as genetically diagnosed or undiagnosed. We applied similar filtering in GEL \ntrios. See Supplementary Methods for more information. \nGWAS of neurodevelopmental conditions in GEL and meta -analysis with \nDDD \nWe used PLINK v1.9 to conduct a GWAS comparing individuals with NDCs (N=3,618) to \ncontrols (N=13,667) in GEL, controlling for 20 genetic principal components (PCs), age, and \nsex. Prior to running the GWAS, we removed variants with MAF < 1%, missingness > 2 % or \nHardy-Weinberg equilibrium (HWE) P-value < 1x10-5, and performed a differential missingness \ntest between the NDC patients and controls and removed variants with p -value < 1x10 -5. We \nrepeated the GWAS comparing DDD patients with neurodevelopmental cond itions on the \nCoreExome array (N=6,397) to UKHLS controls (N=9,270) using PLINK v1.9, after excluding \nDDD patients recruited from Scottish centres.  \n \nWe used METAL107 to conduct an inverse variance-weighted GWAS meta-analysis between the \nDDD-UKHLS and GEL NDC GWASs. We removed palindromic SNPs with MAF > 0.4 since the \nstrand could not be easily inferred using MAF. We also excluded SNPs with discordant AF \n(difference > 0.05) between the two cohorts. This left 5,451,801 overlapping SNPs in the meta-\nanalysis.  \n . CC-BY 4.0 International licenseIt is made available under a \n is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)\nThe copyright holder for this preprintthis version posted March 6, 2024. ; https://doi.org/10.1101/2024.03.05.24303772doi: medRxiv preprint \n\n \n23 \nHeritability and genetic correlations \nWe used Linkage Disequilibrium Score Regression (LDSC)108 to estimate SNP heritability using \nsummary statistics from the GWAS of NDCs in DDD, in GEL, and a meta -analysis of the two \ncohorts. We used ~1 million common SNPs from HapMap3 with precomputed LD scores. SNP \nheritability on the liability scale was estimated assuming a cumulative population prevalence of \n1% for rare neurodevelopmental conditions 6. We used the effective sample size (4/(1/N cases + \n1/Ncontrols)) or the sum of two effective sample sizes f or the meta -analysis and a sample \nprevalence of 50% in LDSC, as recommended previously 109. In addition, we also applied two \nmethods to estimate SNP heritability using individual -level data in DDD and GEL separately. \nWe performed GREML-LDMS110 stratified by LD (two bins of equal size) and MAF (three bins: \n1%–5%, 5%–10%, >10%). We also ran phenotype correlation –genotype correlation (PCGC) \nregression111, using the LDAK -Thin Model to compute the kinship matrix using the direct \nmethod. We corrected for sex, and ten genetic principal components as covariates in both \nmethods. We then meta -analyzed the SNP heritability estimates from DDD and GEL using an \ninverse-variance weighted method.  \n \nWe used LDSC to estimate genetic correlations between the DDD NDC GWAS or the meta -\nanalyzed NDC GWAS and various brain -related traits and conditions listed in Supplementary \nTable 14. We did not use the GEL NDC GWAS to calculate genetic correlations as the SNP \nheritability was not significantly different from zero according to LDSC. \nEstimating conditional genetic correlations with GenomicSEM \nTo estimate the genetic correlations between various traits/conditions ( Supplementary Table \n14) and NDCs independent of cognitive performance or educational attainment signals, we used \ngenomic structural equation modelling (GenomicSEM)42,44. We estimated the genetic correlation \nbetween the target trait and a latent variable representing the non -cognitive component of \nNDCs, which was genetic influences on NDCs that were not explained by cognitive skills. We \napplied the GenomicSEM model withou t SNP effects. W e also estimated genetic correlation \nwith the “non-educational attainment” latent variable, which represented genetic influences on \nNDCs that were not accounted for by the educational attainment latent variable. \nCalculating polygenic scores \nFor calculating PGSs, we used the set of SNPs that were well -imputed in all array cohorts \n(Minimac4 R2 > 0.8), passed QC in GEL aggV2 samples, and had MAF >1% in all cohorts. We \nused LDPred112 to estimate weights for calculating PGSs and an LD reference panel composed \nof HapMap3 113 common variants based on 5,000 unrelated individuals of white British \ngenetically-inferred ancestry from the UK Biobank 114 (Supplementary Methods ). GWAS \nsummary statistics for years of schooling (a measure for EA)37, the non-cognitive component of \neducational attainment (NonCogEA) 42, cognitive performance (CP) 37, schizophrenia (SCZ) 38, \nand NDCs 6 were matched with the list of overlapping SNPs ( Supplementary Table 14 ). \nPGSNDC,DDD was evaluated in the DDD Omnichip samples and the GEL samples which were not \nin the DDD GWAS. To make PGSs comparable across cohorts (DDD, GEL, UKHLS, MCS and \nALSPAC), we performed a joint PCA across all cohorts and adjusted the raw scores for 20 PCs. \nFor all analyses, residuals were scaled so that the combined set of unrelated control samples \nfrom GEL and UKHLS (or GEL controls only for PGSNDC,DDD) had mean = 0 and SD = 1.  \n . CC-BY 4.0 International licenseIt is made available under a \n is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)\nThe copyright holder for this preprintthis version posted March 6, 2024. ; https://doi.org/10.1101/2024.03.05.24303772doi: medRxiv preprint \n\n \n24 \nAnalyses of polygenic scores \nEvaluating variance explained by the PGSs \nWe evaluated how much variance in risk of NDCs was explained by the PGS on the liability \nscale111,115,116. We compared 6,397 NDC probands from DDD to 9,270 controls from UKHLS, \nand 3,618 NDC probands from GEL to 13,667 GEL controls defined as described above. We \nassumed the population prevalence of NDCs to be 1%6.  \nComparison of PGS between different subsets of probands and parents \nWe used two-sided t-tests to compare PGSs between different groups of probands, parents and \ncontrols seen in Figure 2A, Figure 3B, Extended Data Figure 4 , Extended Data Figure 5 , \nand Supplementary Tables 5, 6 and 7 . We report the mean difference in PC -corrected PGS \nbetween groups. Groups who were compared with each other include: \n● Combined set of controls from GEL and UKHLS \n● Control individuals from UK birth cohorts, ALSPAC and MCS \n● Undiagnosed NDC probands regardless of trio status \n● Diagnosed NDC probands regardless of trio status \n● Undiagnosed NDC probands for whom both parents are unaffected \n● Unaffected parents of undiagnosed NDC probands \n● Undiagnosed NDC probands with one or both parents affected \n● Affected parents of undiagnosed NDC probands \n● Diagnosed NDC probands for whom both parents are unaffected \n● NDC probands with de novo diagnoses for whom both parents are unaffected  \n● Unaffected parents of diagnosed NDC probands \n● Diagnosed NDC probands with one or both parents affected \n● Affected parents of diagnosed NDC probands \nThe sample size of each subset is listed in Supplementary Table 1. We excluded controls from \nUKHLS as well as DDD CoreExome and GSA probands when testing the DDD -derived NDC \nPGS (since these had been included in the original NDC GWAS, whereas the individuals \ngenotyped on the Omnichip had not). All the t-tests involving NDC probands were performed in \nsamples from DDD and GEL combined. \n \nWe also compared female probands versus male probands without a monogenic diagnosis \nregardless of trio status (2,427 and 1,574 male probands from DDD and GEL, and 1,426 and \n918 female probands from DDD and GEL), and unaffected mothers versus unaffected fathers \n(1,523 trios from DDD and 1,343 trios from GEL) using two-sided t-tests (Extended Data Figure \n7AB). \nAssociations between PGS and diagnostic status \nWe compared average PGSs in NDC probands with and without a monogenic diagnosis using \ntwo-sided t-tests, combining NDC probands from DDD and GEL regardless of whether they \nwere in a trio or not. We compared NDC subgroups to the combined control set from UKHLS \nand GEL, as well as to unrelated children from the MCS cohort who were reweighted using \navailable sociodemographic data to make them more representative of the general UK \npopulation (Supplementary Note 3).  \n \n . CC-BY 4.0 International licenseIt is made available under a \n is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)\nThe copyright holder for this preprintthis version posted March 6, 2024. ; https://doi.org/10.1101/2024.03.05.24303772doi: medRxiv preprint \n\n \n25 \nWithin DDD (N=7,549 without excluding Scottish samples or samples who were related to GEL \nparticipants), we tested whether the proband’s PGS EA was associated with factors affecting \ngetting a diagnosis in linear regression models: \n𝑃𝐺𝑆  ~ 𝑓𝑎𝑐𝑡𝑜𝑟 \nWe investigated the following binary factors: trio status (N=5,507 with both parents exome \nsequenced but not necessarily genotyped), proband sex (N=4,421 male probands), whether the \nproband had any affected first -degree relatives (N=1,623), whether the pro band was born \npreterm (N=1,098 with gestation <37 weeks), whether the mother had diabetes (N=242), and \nwhether the proband had severe intellectual disability or developmental delay (ID/DD; N=941) \nversus mild or moderate ID/DD (N=1,887). We compared proband s with the above mentioned \ncharacteristics to all other probands, except when comparing probands with severe versus mild \nor moderate ID/DD for which we excluded probands without ID/DD or with ID/DD of unknown \nseverity. We also investigated a continuous factor, the degree of consanguinity, quantified by \nthe fraction of the genome in runs of homozygosity (F ROH) divided by 0.0625, which is the \nexpected fraction given a first-cousin marriage.   \n \nWe also tested whether the mother’s or father’s PGS EA was associated with the above factors, \nin a total of 2,497 samples; we did not test for association with trio status since parental genotype \ndata were only available for full trios anyway.  \n \nSee the Supplementary Methods for a description of estimation of the odds ratio of diagnosis \nfor different configurations of affected relatives shown in Extended Data Figure 6, and of the \nmediation analysis to explore whether trio status and prematurity were mediating the \nassociation between PGSEA and diagnostic status. \nEvaluating over-transmission of PGS: the polygenic transmission disequilibrium test  \nWe conducted polygenic transmission disequilibrium tests (pTDT) in undiagnosed and \ndiagnosed probands from DDD (N=1,523 undiagnosed, 443 diagnosed) and GEL (N=1,343 \nundiagnosed, 507 diagnosed) combined. We also conducted pTDT in these trios excluding \nautistic probands.  \n \nThe pTDT is a two -sided one-sample t-test of the probands’ PGS deviation from expectation, \nwhich is their parents’ mean PGS. The pTDT deviation is defined as: \n𝑝𝑇𝐷𝑇 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 =  𝑃𝐺𝑆𝑐ℎ𝑖𝑙𝑑 − (𝑃𝐺𝑆𝑚𝑜𝑡ℎ𝑒𝑟 + 𝑃𝐺𝑆𝑓𝑎𝑡ℎ𝑒𝑟)\n2  \n          \nTo evaluate whether the pTDT deviation is significantly different than 0, the pTDT test statistic \n(𝑡𝑝𝑇𝐷𝑇) is defined as:   \n𝑡𝑝𝑇𝐷𝑇= \n𝑚𝑒𝑎𝑛(𝑝𝑇𝐷𝑇 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛)\n𝑆𝐷(𝑝𝑇𝐷𝑇 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛)\n√𝑛\n \nAnalyses of non-transmitted coefficients  \nWe evaluated direct genetic effects and effects of non-transmitted common alleles on NDC case \nstatus using logistic regression on PGSs:  \n1𝑁𝐷𝐶  𝑠𝑡𝑎𝑡𝑢𝑠 ~ 𝑐ℎ𝑖𝑙𝑑 𝑃𝐺𝑆 +  𝑚𝑜𝑡ℎ𝑒𝑟 𝑃𝐺𝑆 +  𝑓𝑎𝑡ℎ𝑒𝑟 𝑃𝐺𝑆 \n . CC-BY 4.0 International licenseIt is made available under a \n is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)\nThe copyright holder for this preprintthis version posted March 6, 2024. ; https://doi.org/10.1101/2024.03.05.24303772doi: medRxiv preprint \n\n \n26 \nwhere 1𝑁𝐷𝐶  𝑠𝑡𝑎𝑡𝑢𝑠 is an indicator variable for whether the individual is an NDC case (1) or control \n(0). \n \nSince a child’s PGS is calculated using transmitted alleles and the difference between the sum \nof parents’ PGS and the child’s PGS is equivalent to a PGS derived from non-transmitted alleles, \nthis model can be rewritten as: \n1𝑁𝐷𝐶  𝑠𝑡𝑎𝑡𝑢𝑠 ~ (𝛽𝑇 − 𝛽𝑁𝑇) ×  𝑐ℎ𝑖𝑙𝑑 𝑃𝐺𝑆 + 𝛽𝑁𝑇  × (𝑚𝑜𝑡ℎ𝑒𝑟 𝑃𝐺𝑆 +  𝑓𝑎𝑡ℎ𝑒𝑟 𝑃𝐺𝑆) \nwhere 𝛽𝑁𝑇 indicates the non -transmitted coefficient and 𝛽𝑇 indicates the coefficient on \ntransmitted alleles. The regression coefficient on child PGS in this trio model represents an \nunbiased estimate of direct genetic effect (difference between 𝛽𝑇  and 𝛽𝑁𝑇).  \n \nNDC probands were from DDD and GEL trios where the proband was undiagnosed and both \nparents were unaffected (N=2,866 trios). Control samples were trios from the two birth cohorts \n(ALSPAC and MCS, N=1,434 and N=2,498, respectively) as well as trios from GEL  where the \nproband did not have DDD-like developmental disorders or NDCs (N=872).  \n \nWe verified that the PGSs in the trio model did not exhibit excessive collinearity (see \nSupplementary Methods). \n \nWe performed various sensitivity analyses in the following subsets (Supplementary Figure \n4): NDC probands versus controls from GEL trios only, and NDC patients from GEL and DDD \nversus each of the three control cohorts separately (GEL, MCS or ALSPAC). We also \nconducted the analysis while controlling for the rare variant burden score (RVBS) in GEL trios \n(Extended Data Figure 10; see the section below on “Analyses of PGSs and rare protein-\ncoding variants”). \n1𝑁𝐷𝐶  𝑠𝑡𝑎𝑡𝑢𝑠 ~ 𝑐ℎ𝑖𝑙𝑑 𝑃𝐺𝑆 + 𝑐ℎ𝑖𝑙𝑑 𝑅𝑉𝐵𝑆 +  𝑚𝑜𝑡ℎ𝑒𝑟 𝑃𝐺𝑆 +  𝑚𝑜𝑡ℎ𝑒𝑟 𝑅𝑉𝐵𝑆 +  \n 𝑓𝑎𝑡ℎ𝑒𝑟 𝑃𝐺𝑆 + 𝑓𝑎𝑡ℎ𝑒𝑟 𝑅𝑉𝐵𝑆 \nWe restricted this latter analysis to GEL trios to minimize artifactual differences in rare variant \ncalling and QC between cases and controls, which could otherwise create spurious \nassociations.  \n \nSee the Supplementary Methods for a description of how we modified the running of this trio \nmodel to investigate the hypothesis that the effects of non-transmitted alleles associated with \neducational attainment and cognition might be mediated by prematurity. \nAnalyses of PGSs and rare protein-coding variants \nSequence data from DDD, GEL, and MCS were annotated with the Variant Effect Predictor \n(VEP) 117. We kept the ‘worst consequence’ annotation across transcripts. From parents and \nprobands, we extracted autosomal heterozygous protein-truncating variants (transcript ablation, \nframeshift, splice acceptor, splice donor and stop gained) annotated as high -confidence by \nLOFTEE118 (HC PTVs), as well as variants in the following classes which we grouped as \n“missense”: missense, stop lost, start lost, inframe insertion , inframe deletion, and loss -of-\nfunction variants annotated as low -confidence by LOFTEE 118. We retained rare variants with \nMAF < 1 x 10-5 in each gnomAD super-population and MAF < 1 x 10-4 in the respective cohorts.  \n \nWe considered four (non-mutually-exclusive) groups of damaging rare variants:  \ni) HC PTVs in constrained genes (pLI > 0.9119)  \n . CC-BY 4.0 International licenseIt is made available under a \n is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)\nThe copyright holder for this preprintthis version posted March 6, 2024. ; https://doi.org/10.1101/2024.03.05.24303772doi: medRxiv preprint \n\n \n27 \nii) HC PTVs and missense variants (MPC ≥ 2120) in constrained genes (pLI > 0.9)  \niii) HC PTVs in monoallelic DDG2P genes with a loss-of-function mechanism (i.e. “absent gene \nproduct”) \niv) HC PTVs and missense variants (MPC ≥ 2) in monoallelic DDG2P genes with a loss -of-\nfunction mechanism.  \nWe retained probands and parents who were heterozygous for these variants. We required the \nvariants in the children to have been inherited from a parent. \n \nTo investigate whether parental assortment leads to correlated rare and common variant \nburden, we calculated rare variant burden scores (RVBSs) as the number of rare variants in the \nclasses described above, then calculated the Pearson’s correlation coefficients between RVBSs \nand PGSs using the “cor” function in R. We used trios in which both parents were unaffected in \nthis analysis. RVBSs were corrected for 20 genetic principal components using linear regression \nmodels. We then calculated the correlation co efficients between the PC -adjusted RVBSs in \nparents and the PC -adjusted PGSs in their partners. We also assessed the correlation within \nthe same person amongst either children or parents. We repeated the analysis in subsets of \ntrios where the proband was u ndiagnosed as well as in trios where the proband had a \nmonogenic de novo diagnosis (Supplementary Figure 6). The main analysis in Figure 5 and \nthe sensitivity analysis in Extended Data Figure 10  is based on group (ii) above, whereas \nSupplementary Figure 6, 7 and 8 show the results for all four groups of variants. To investigate \nwhether the results were affected by uncorrected population structure, we also calculated \nRVBSs using rare synonymous variants in either monoallelic DDG2P genes with a loss -of-\nfunction mechanism or constrained genes, and assessed their correlation with PGS. \n \nTo assess whether PGS modify penetrance of rare inherited variants, we conducted one-sided \npaired t-tests comparing the PGS between unaffected parents transmitting a damaging variant \nto their affected offspring who inherited the variant ( Supplementary Figure 8 ). We \nhypothesized that the unaffected parents would have a more protective polygenic background \nthan their affected offspring (i.e. higher PGS EA, PGS CP, PGS NonCogEA and lower PGS SCZ, \nPGSNDC,DDD). If more than one parent transmitted a variant to a prob and, one parent-child pair \nwas chosen at random from the trio. We used trios where the proband was undiagnosed and \nboth parents were unaffected in this analysis.  \nConstruction and incorporation of weights for MCS \nWe were concerned that control cohorts might not be random samples of the population with \nrespect to educational attainment, and that this might bias our effect sizes for the difference in \nPGSs between cases and controls (Supplementary Note 3). We decided to use MCS, for which \nextensive sociodemographic data are available, to calculate a mean PGS that would be \nrepresentative of the general population, using inverse-probability weighting. MCS deliberately \noversampled minority ethnic and disadvantaged individuals 121 (sampling bias), and they provide \nsampling weights to account for this. Additionally, missingness in each wave of data collection, \nincluding the collection of DNA for genotyping, was nonrandom (non-response bias). To correct \nfor non-response bias, we prod uced non-response weights per individual using the inverse of \nthe probability of being genotyped estimated from a logistic regression, considering covariates \ncollected at the first study sweep, as previously described121,122 (Supplementary Methods). We \nfitted the model to predict who was within the sample of unrelated GBR -ancestry children with \ngenotype data (N=5,884 of 6,036 children who had complete data for these covariates), and \n . CC-BY 4.0 International licenseIt is made available under a \n is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)\nThe copyright holder for this preprintthis version posted March 6, 2024. ; https://doi.org/10.1101/2024.03.05.24303772doi: medRxiv preprint \n\n \n28 \nseparately to predict who was within the subset of these that additionally had genotype data on \nboth parents (N=2,445 of 2,498 trio children who had no missingness). To produce weights that \naccount for both sampling bias and non-response bias, we multiplied the non-response weight \nfrom regression models by the sampling weights provided by MCS. These weights were then \nused to calculate adjusted PGSs shown in Extended Data Figure 4 and Extended Data Figure \n5C and adjusted correlation between PGS and RVBS shown in Supplementary Figure 7.  \n \nData availability \nThe raw and post-quality control genotype array data and exome sequence data from DDD \nare available through European Genome-phenome Archive, under EGAS00001000775. \nWhole-genome sequence data and phenotypic data from the 100,000 Genomes project can \nbe accessed by application to Genomics England \n(https://www.genomicsengland.co.uk/research/academic/join-gecip). GWAS summary \nstatistics of neurodevelopmental conditions generated in this study are available in \nSupplementary Data. Researchers can apply to access genotype array data from ALSPAC \n(https://www.bristol.ac.uk/alspac/researchers/access/) and MCS (https://cls.ucl.ac.uk/data-\naccess-training/data-access/). Publicly available GWAS summary statistics can be accessed \nat various resources: http://www.thessgac.org/data, https://pgc.unc.edu/for-\nresearchers/download-results/, and https://egg-consortium.org/Gestational-duration-\n2023.html. \n \nCode availability \nWe used publicly available software: LDpred (https://github.com/bvilhjal/ldpred), LDSC \n(https://github.com/bulik/ldsc), GCTA-LDMS \n(https://yanglab.westlake.edu.cn/software/gcta/#GREMLinWGSorimputeddata), PCGC \nregression (https://dougspeed.com/pcgc-regression/), GenomicSEM \n(https://github.com/PerlineDemange/non-\ncognitive/blob/master/GenomicSEM/Genetic%20correlations/Without%20using%20SNP%20ef\nfects/function_rG_woSNP.R), and LHC-MR (https://github.com/LizaDarrous/lhcMR).  \n  \n . CC-BY 4.0 International licenseIt is made available under a \n is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)\nThe copyright holder for this preprintthis version posted March 6, 2024. ; https://doi.org/10.1101/2024.03.05.24303772doi: medRxiv preprint \n\n \n29 \nExtended Data Figures \nExtended Data Fig1  \nExtended Data Figure 1.  Schematic of the liability threshold model for rare \nneurodevelopmental disorders, illustrating why one might expect patients with a monogenic \ndiagnosis to have less polygenic (common variant) risk than those without a monogenic \ndiagnosis. The normal distr ibution represents the underlying distribution of liability in the \npopulation, which is assumed to be Gaussian. Both genetic and environmental factors of \ndifferent effects contribute to this total liability. Each panel represents a hypothetical example of \none individual, either unaffected ( A), affected and diagnosed with a monogenic cause ( B), or \naffected and without a monogenic diagnosis ( C). The red line indicates a threshold for being \ndiagnosed with neurodevelopmental conditions. Circles represent different genetic factors, and \ndiamonds represent environmental factors. The size of circles and diamonds represents their \nimpact on disease risk. The undiagnosed patient (C) has more green circles (i.e. risk-increasing \ncommon variants) than the patient with a monogenic diagnosis (B), in whom the orange circle \n(i.e. diagnostic large-effect variant) is sufficient on its own to push the patient over the diagnostic \nthreshold. \n \n \n \n  \n . CC-BY 4.0 International licenseIt is made available under a \n is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)\nThe copyright holder for this preprintthis version posted March 6, 2024. ; https://doi.org/10.1101/2024.03.05.24303772doi: medRxiv preprint \n\n \n30 \nExtended Data Fig2  \nExtended Data Figure 2. Distribution of age at assessment (A) and number of HPO terms (B) \nin both DDD and GEL probands with neurodevelopmental conditions who have GBR ancestry. \nThe vertical lines indicate the means. A small number of probands in each program were aged \nover 50 and had more than 30 HPOs, and these have been omitted from the plo t due to data \nsharing restrictions. C) Proportion of probands from each cohort with at least one HPO term \nwithin the indicated chapter (black text) or specific phenotype (green text), ordered by the \nprevalence in DDD. The asterisks indicate results from a logistic regression testing whether \nthere was a significant difference in phenotype prevalence between cohorts after controlling for \nsex and age (** indicates p -value < 0.05/43; * indicates p -value < 0.05). D) Proportion of \nprobands recruited to both DDD and GEL (N=789) with at least one HPO term within the \nindicated chapter (black) or specific phenotype (green text) from the phenotype information from \neach program, ordered by the prevalence in DDD. The same logistic regression was used as in \nC).  \n . CC-BY 4.0 International licenseIt is made available under a \n is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)\nThe copyright holder for this preprintthis version posted March 6, 2024. ; https://doi.org/10.1101/2024.03.05.24303772doi: medRxiv preprint \n\n \n31 \n \n . CC-BY 4.0 International licenseIt is made available under a \n is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)\nThe copyright holder for this preprintthis version posted March 6, 2024. ; https://doi.org/10.1101/2024.03.05.24303772doi: medRxiv preprint \n\n \n32 \nExtended Data Fig3 \nExtended Data Figure 3 . Manhattan plot ( A) and quantile -quantile plot ( B) of GWAS meta -\nanalysis of neurodevelopmental conditions. We meta-analyzed the GWASs derived from DDD-\nUKHLS (6,397 cases with neurodevelopmental conditions and 9,270 controls from UKHLS) and \nGEL (3,618 cases and 13,667 controls). We used overlapping SNPs with MAF >1% in both \ncohorts. The red line indicates the genome-wide significance threshold (5x10-8).  \n \n  \n . CC-BY 4.0 International licenseIt is made available under a \n is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)\nThe copyright holder for this preprintthis version posted March 6, 2024. ; https://doi.org/10.1101/2024.03.05.24303772doi: medRxiv preprint \n\n \n33 \nExtended Data Fig4  \nExtended Data Figure 4. Average polygenic scores in undiagnosed (red) and diagnosed (blue) \nprobands with neurodevelopmental conditions from DDD and GEL combined, as well as in MCS \nchildren reweighted to adjust for sampling bias and non -response bias (yellow). Subsets of \nprobands with neurodevelopmental conditions and their parents from trios are shown in light red \n(undiagnosed subsets) and light blue (diagnosed subsets). The polygenic scores have been \nstandardized such that the UKHLS+GEL controls have mean = 0 and standard deviation = 1 \n(except for PGSNDC,DDD for which only GEL controls were used to standardise). Yellow horizontal \nlines indicate weighted average polygenic scores in MCS children, which should reflect an \nunbiased estimate for the background population. PGS NDC,DDD was tested in a held -out set of \npatients in DDD. Error bars show 95% confidence intervals. See also Supplementary Table 5 \nand 7 for results of statistical tests of differences between groups.  \n \n  \n . CC-BY 4.0 International licenseIt is made available under a \n is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)\nThe copyright holder for this preprintthis version posted March 6, 2024. ; https://doi.org/10.1101/2024.03.05.24303772doi: medRxiv preprint \n\n \n34 \nExtended Data Fig5 \nExtended Data Figure 5. A ) Average polygenic score for educational attainment (PGS EA) in \ndifferent control cohorts and subsets thereof, subsets of probands with neurodevelopmental \nconditions, and their unaffected parents. B) Comparing average PGS EA in trio probands and \nprobands who did not have genetic data on both parents in ALSPAC, MCS, and affected \npatients from DDD and GEL. Note that in the case of DDD, “in trios” refers to those who had \nexome sequence data on both parents (only a subset of which also h ad genotype array data, \nsince we prioritized genotyping full trios for which the child was undiagnosed), whereas in the \nrest of the manuscript (except for Figure 2B  which uses the same definition as here), “trio \nproband” refers to those who had genotype data on both parents. C) Average polygenic scores \nfor all five traits in MCS before and after reweighting to adjust for sampling bias and attrition. \nNote that the PGS are corrected for 20 PCs and then normalized so that a combined set of \nunrelated controls from UKHLS and GEL have mean = 0 and standard deviation = 1. Error bars \nshow 95% confidence intervals.  \n \n  \n . CC-BY 4.0 International licenseIt is made available under a \n is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)\nThe copyright holder for this preprintthis version posted March 6, 2024. ; https://doi.org/10.1101/2024.03.05.24303772doi: medRxiv preprint \n\n \n35 \n \nExtended Data Fig6 \nExtended Data Figure 6. Association between different configurations of affected relatives and \nthe child’s PGS EA (left) or average diagnostic rate (right). Left: Average proband PGS EA in \nsubgroups with different configurations of affected relatives based on the number of affected \nparents, siblings, and more distant relatives. Right: Odds ratio for having a monogenic \ndiagnosis, compared to probands with no affected relatives. See Supplementary Methods for \na description of how this was calculated. \n \n  \n . CC-BY 4.0 International licenseIt is made available under a \n is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)\nThe copyright holder for this preprintthis version posted March 6, 2024. ; https://doi.org/10.1101/2024.03.05.24303772doi: medRxiv preprint \n\n \n36 \n \nExtended Data Fig7  \nExtended Data Figure 7. A) Comparison of polygenic scores between undiagnosed male and \nfemale probands in DDD and GEL combined. We used all undiagnosed probands with \nneurodevelopmental conditions regardless of trio status in this analysis (N=1,426 females and \nN=2,427 males in DDD; N =112 females and N=146 males in DDD excluding GWAS samples; \nN=918 females and N=1,574 males in GEL). A positive difference indicates that female \nprobands have higher PGS than male probands.  B) Comparison of polygenic scores between \nunaffected mothers and fathers of undiagnosed probands from a combined sample of 1,523 \ntrios and 1,343 trios from DDD and GEL, respectively. A positive difference indicates that \nmothers have higher PGS than fathers. C) pTDT results in undiagnosed female and male \nprobands with unaffected parents (N=586 females and N=937 males in DDD; N=99 females \nand N=125 males in DDD excluding GWAS samples; N=490 females and N=853 males in GEL). \nError bars show 95% confidence interval s. The significant result that passes Bonferroni \ncorrection of five tests is highlighted by two asterisks.  \n \n \n  \n . CC-BY 4.0 International licenseIt is made available under a \n is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)\nThe copyright holder for this preprintthis version posted March 6, 2024. ; https://doi.org/10.1101/2024.03.05.24303772doi: medRxiv preprint \n\n \n37 \nExtended Data Fig8 \nExtended Data Figure 8. Exploring prenatal factors that may influence risk of \nneurodevelopmental disorders. ( A) Genetic correlations between neurodevelopmental \nconditions and prenatal risk factors, before and after conditioning on educational attainment or \ncognitive performance. Genetic correlations with our GWAS meta -analysis for \nneurodevelopmental conditions was estimated using Linkage Disequilibrium Score Regression. \nThose conditioned on the GWAS summary statistics for educational attainment or cognitive \nperformance were estimated using GenomicSEM. ( B) Association between PGSs and \nprematurity, a risk factor for ne urodevelopmental conditions, estimated in DDD. See \nSupplementary Table 8  for sample sizes. Note that for PGS NDC,DDD, probands who were \nincluded in the GWAS were not tested, which left 703 probands, of which 83 were born \nprematurely. A negative estimate indicates that probands who were born prematurely or their \nparents had a lower polygenic score. Associations that pass Bonferroni correction for five traits \nin (A) or five polygenic scores in (B) are indicated by two asterisks and nominally significant \nresults by one asterisk. Error bars show 95% confidence intervals.  \n \n \n \n \n  \n . CC-BY 4.0 International licenseIt is made available under a \n is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)\nThe copyright holder for this preprintthis version posted March 6, 2024. ; https://doi.org/10.1101/2024.03.05.24303772doi: medRxiv preprint \n\n \n38 \nExtended Data Fig9 \nExtended Data Figure 9 . Causal effect estimates between educational attainment, \nneurodevelopmental conditions, and preterm birth from Mendelian randomization. The top \npanels show bi -directional relationships between educational attainment and giving birth \nprematurely, and between educational attainment and neurodevelopmental conditions, inferred \nby the Latent Heritable Confounder -Mendelian randomization method (LHC -MR), which uses \nall genome-wide SNPs. αX->Y indicates the causal effect of the exposure (X) on the outcome (Y) \nand αY->X indicates the reverse causal effect. The causal effects of the heritable confounder on \nthe exposure and the outcome are annotated as tX and tY, respectively. The forest plots on the \nbottom show the causal effects inferred using the standard Mendelian randomization methods. \nUp to four different methods were used, as indicated in the legend, but not all were used to test \neach hypothesis, depending on the number of instruments available (see Supplementary \nMethods). The dots show point estimates and the lines are 95% confidence intervals calculated \nusing standard errors. Estimates that are significant are highlighted with an asterisk and exact \np-values are annotated.  \n \n . CC-BY 4.0 International licenseIt is made available under a \n is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)\nThe copyright holder for this preprintthis version posted March 6, 2024. ; https://doi.org/10.1101/2024.03.05.24303772doi: medRxiv preprint \n\n \n39 \nExtended Data Fig10 \nExtended Data Figure 10. Association coefficients of polygenic scores (PGSs) and rare variant \nburden scores (RVBS) in the ‘proband only’ and ‘trio’ models, from logistic regressions of \ncase/control status within GEL (N=1,343 trios in which the proband with a neurodevelopmental \ncondition is undiagnosed and parents are unaffected and 872 trios without neurodevelopmental \nconditions). Case/control status was regressed on either the child’s PGS, the child’s PGS and \nchild’s RVBS, all three trio members’ polygenic  scores (trio model), or all three trio members’ \npolygenic scores and RVBSs (trio model+RVBS). The RVBS was defined as the number of rare \ndamaging PTVs and missense variants in constrained genes (requiring these to be inherited in \nthe child), corrected for genetic principle components.  \n \nSupplementary Data \nSupplementary Data 1. Summary statistics from the GWAS of neurodevelopmental \nconditions comparing cases to controls within the Genomics England (GEL) 100,000 \nGenomes Project. \n \nSupplementary Data 2. Summary statistics from the GWAS of neurodevelopmental \nconditions comparing DDD cases to UKHLS controls, excluding the Scottish samples from \nDDD. \n \nSupplementary Data 3. Summary statistics from the GWAS meta-analysis of \nneurodevelopmental conditions combining the DDD and GEL GWASs. \n . CC-BY 4.0 International licenseIt is made available under a \n is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)\nThe copyright holder for this preprintthis version posted March 6, 2024. ; https://doi.org/10.1101/2024.03.05.24303772doi: medRxiv preprint \n\n \n40 \nSupplementary Tables \nSupplementary Methods \nSupplementary Figures \nSupplementary Notes  \nSupplementary Note 1: Phenotypic comparisons of the cohorts \nSupplementary Note 2: Genome-wide significant hits from the GWAS meta-analysis of \nneurodevelopmental conditions  \nSupplementary Note 3: Potential ascertainment biases in control cohorts and their effects \nSupplementary Note 4: Examining sex differences in polygenic risk \nSupplementary Note 5: Exploring the role of prenatal risk factors in mediating common variant \nrisk \nSupplementary Note 6: Role of PGS in modifying the penetrance of rare variants  \n \nReferences \n1. Nguengang Wakap, S. et al. Estimating cumulative point prevalence of rare diseases: \nanalysis of the Orphanet database. Eur. J. Hum. Genet. 28, 165–173 (2020). \n2. Sanders, S. J. et al. A framework for the investigation of rare genetic disorders in \nneuropsychiatry. Nat. Med. 25, 1477–1487 (2019). \n3. Manickam, K. et al. Exome and genome sequencing for pediatric patients with congenital \nanomalies or intellectual disability: an evidence-based clinical guideline of the American \nCollege of Medical Genetics and Genomics (ACMG). Genet. Med. 23, 2029–2037 (2021). \n4. Srivastava, S. et al. Correction: Meta-analysis and multidisciplinary consensus statement: \nexome sequencing is a first-tier clinical diagnostic test for individuals with \nneurodevelopmental disorders. Genet. Med. 22, 1731–1732 (2020). \n5. Wright, C. F. et al. Genomic Diagnosis of Rare Pediatric Disease in the United Kingdom \nand Ireland. N. Engl. J. Med. 388, 1559–1571 (2023). \n6. Niemi, M. E. K. et al. Common genetic variants contribute to risk of rare severe \nneurodevelopmental disorders. Nature 562, 268–271 (2018). \n7. Kurki, M. I. et al. Contribution of rare and common variants to intellectual disability in a \n . CC-BY 4.0 International licenseIt is made available under a \n is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)\nThe copyright holder for this preprintthis version posted March 6, 2024. ; https://doi.org/10.1101/2024.03.05.24303772doi: medRxiv preprint \n\n \n41 \nsub-isolate of Northern Finland. Nat. Commun. 10, 410 (2019). \n8. Gardner, E. J. et al. Reduced reproductive success is associated with selective constraint \non human genes. Nature 603, 858–863 (2022). \n9. Chen, C.-Y. et al. The impact of rare protein coding genetic variation on adult cognitive \nfunction. Nat. Genet. 55, 927–938 (2023). \n10. Kingdom, R., Beaumont, R. N., Wood, A. R., Weedon, M. N. & Wright, C. F. Genetic \nmodifiers of rare variants in monogenic developmental disorder loci. medRxiv (2022) \ndoi:10.1101/2022.12.15.22283523. \n11. Rolland, T. et al. Phenotypic effects of genetic variants associated with autism. Nat. Med. \n29, 1671–1680 (2023). \n12. Fenner, E. et al. Rare coding variants in schizophrenia-associated genes affect \ngeneralised cognition in the UK Biobank. bioRxiv (2023) \ndoi:10.1101/2023.08.14.23294074. \n13. Murray, R. M., Bhavsar, V., Tripoli, G. & Howes, O. 30 Years on: How the \nNeurodevelopmental Hypothesis of Schizophrenia Morphed Into the Developmental Risk \nFactor Model of Psychosis. Schizophr. Bull. 43, 1190–1196 (2017). \n14. O’Brien, H. E. et al. Expression quantitative trait loci in the developing human brain and \ntheir enrichment in neuropsychiatric disorders. Genome Biol. 19, 194 (2018). \n15. Mallard, T. T. et al. Multivariate GWAS of psychiatric disorders and their cardinal \nsymptoms reveal two dimensions of cross-cutting genetic liabilities. Cell Genom 2, (2022). \n16. Wolstencroft, J. et al. Neuropsychiatric risk in children with intellectual disability of genetic \norigin: IMAGINE, a UK national cohort study. Lancet Psychiatry 9, 715–724 (2022). \n17. Marquis, S. M., McGrail, K. & Hayes, M. V. A population-level study of the mental health \nof siblings of children who have a developmental disability. SSM Popul Health 8, 100441 \n(2019). \n18. Sullivan, P. F. et al. Family history of schizophrenia and bipolar disorder as risk factors for \nautism. Arch. Gen. Psychiatry 69, 1099–1103 (2012). \n19. Baker, K. et al. Childhood intellectual disability and parents’ mental health: integrating \n . CC-BY 4.0 International licenseIt is made available under a \n is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)\nThe copyright holder for this preprintthis version posted March 6, 2024. ; https://doi.org/10.1101/2024.03.05.24303772doi: medRxiv preprint \n\n \n42 \nsocial, psychological and genetic influences. Br. J. Psychiatry 218, 315–322 (2021). \n20. Zarrei, M. et al. Gene copy number variation and pediatric mental \nhealth/neurodevelopment in a general population. Hum. Mol. Genet. 32, 2411–2421 \n(2023). \n21. Alexander-Bloch, A. et al. Copy Number Variant Risk Scores Associated With Cognition, \nPsychopathology, and Brain Structure in Youths in the Philadelphia Neurodevelopmental \nCohort. JAMA Psychiatry 79, 699–709 (2022). \n22. Chawner, S. J. R. A. et al. Genotype-phenotype associations in children with copy \nnumber variants associated with high neuropsychiatric risk in the UK (IMAGINE-ID): a \ncase-control cohort study. Lancet Psychiatry 6, 493–505 (2019). \n23. Falconer, D. S. The inheritance of liability to certain diseases, estimated from the \nincidence among relatives. Ann. Hum. Genet. 29, 51–76 (1965). \n24. The genetics of neurodevelopmental disorders: Mitchell/the genetics of \nneurodevelopmental disorders. (John Wiley & Sons, 2015). \n25. Bergen, S. E. et al. Joint Contributions of Rare Copy Number Variants and Common \nSNPs to Risk for Schizophrenia. Am. J. Psychiatry 176, 29–35 (2019). \n26. Antaki, D. et al. A phenotypic spectrum of autism is attributable to the combined effects of \nrare variants, polygenic risk and sex. Nat. Genet. 1–9 (2022). \n27. Wright, C. F. et al. Making new genetic diagnoses with old data: iterative reanalysis and \nreporting from genome-wide data in 1,133 families with developmental disorders. Genet. \nMed. 20, 1216–1223 (2018). \n28. Kuchenbaecker, K. B. et al. Evaluation of Polygenic Risk Scores for Breast and Ovarian \nCancer Risk Prediction in BRCA1 and BRCA2 Mutation Carriers. J. Natl. Cancer Inst. \n109, (2017). \n29. Kong, A. et al. The nature of nurture: Effects of parental genotypes. Science 359, 424–\n428 (2018). \n30. Okbay, A. et al. Polygenic prediction of educational attainment within and between \nfamilies from genome-wide association analyses in 3 million individuals. Nat. Genet. 54, \n . CC-BY 4.0 International licenseIt is made available under a \n is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)\nThe copyright holder for this preprintthis version posted March 6, 2024. ; https://doi.org/10.1101/2024.03.05.24303772doi: medRxiv preprint \n\n \n43 \n437–449 (2022). \n31. Young, A. I. et al. Mendelian imputation of parental genotypes improves estimates of \ndirect genetic effects. Nat. Genet. 54, 897–905 (2022). \n32. Howe, L. J. et al. Within-sibship genome-wide association analyses decrease bias in \nestimates of direct genetic effects. Nat. Genet. 54, 581–592 (2022). \n33. Demange, P. A. et al. Estimating effects of parents’ cognitive and non-cognitive skills on \noffspring education using polygenic scores. Nat. Commun. 13, 4801 (2022). \n34. Bates, T. C. et al. Social Competence in Parents Increases Children’s Educational \nAttainment: Replicable Genetically-Mediated Effects of Parenting Revealed by Non-\nTransmitted DNA. Twin Res. Hum. Genet. 22, 1–3 (2019). \n35. Wang, B. et al. Robust genetic nurture effects on education: A systematic review and \nmeta-analysis based on 38,654 families across 8 cohorts. Am. J. Hum. Genet. 108, \n1780–1791 (2021). \n36. 1000 Genomes Project Consortium et al. A global reference for human genetic variation. \nNature 526, 68–74 (2015). \n37. Lee, J. J. et al. Gene discovery and polygenic prediction from a genome-wide association \nstudy of educational attainment in 1.1 million individuals. Nat. Genet. 50, 1112–1121 \n(2018). \n38. Trubetskoy, V. et al. Mapping genomic loci implicates genes and synaptic biology in \nschizophrenia. Nature 604, 502–508 (2022). \n39. Davies, G. et al. Study of 300,486 individuals identifies 148 independent genetic loci \ninfluencing general cognitive function. Nat. Commun. 9, 2098 (2018). \n40. Lee, J. J. et al. Gene discovery and polygenic prediction from a 1.1-million-person GWAS \nof educational attainment. Nat. Genet. 50, 1112 (2018). \n41. Demontis, D. et al. Genome-wide analyses of ADHD identify 27 risk loci, refine the \ngenetic architecture and implicate several cognitive domains. Nat. Genet. 55, 198–208 \n(2023). \n42. Demange, P. A. et al. Investigating the genetic architecture of noncognitive skills using \n . CC-BY 4.0 International licenseIt is made available under a \n is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)\nThe copyright holder for this preprintthis version posted March 6, 2024. ; https://doi.org/10.1101/2024.03.05.24303772doi: medRxiv preprint \n\n \n44 \nGWAS-by-subtraction. Nat. Genet. 53, 35–44 (2021). \n43. Brainstorm Consortium et al. Analysis of shared heritability in common disorders of the \nbrain. Science 360, (2018). \n44. Grotzinger, A. D. et al. Genomic structural equation modelling provides insights into the \nmultivariate genetic architecture of complex traits. Nat Hum Behav 3, 513–525 (2019). \n45. Joseph, R. M. et al. Neurocognitive and Academic Outcomes at Age 10 Years of \nExtremely Preterm Newborns. Pediatrics 137, (2016). \n46. Aarnoudse-Moens, C. S. H., Weisglas-Kuperus, N., van Goudoever, J. B. & Oosterlaan, \nJ. Meta-analysis of neurobehavioral outcomes in very preterm and/or very low birth \nweight children. Pediatrics 124, 717–728 (2009). \n47. Huang, J., Zhu, T., Qu, Y. & Mu, D. Prenatal, Perinatal and Neonatal Risk Factors for \nIntellectual Disability: A Systemic Review and Meta-Analysis. PLoS One 11, e0153655 \n(2016). \n48. Crequit, S. et al. Association between social vulnerability profiles, prenatal care use and \npregnancy outcomes. BMC Pregnancy Childbirth 23, 465 (2023). \n49. Morelli, S., Nolan, B., Palomino, J. C. & Van Kerm, P. The Wealth (Disadvantage) of \nSingle-Parent Households. Ann. Am. Acad. Pol. Soc. Sci. 702, 188–204 (2022). \n50. Weiner, D. J. et al. Polygenic transmission disequilibrium confirms that common and rare \nvariation act additively to create risk for autism spectrum disorders. Nat. Genet. 49, 978–\n985 (2017). \n51. Nivard, M. G. et al. Neither nature nor nurture: Using extended pedigree data to elucidate \nthe origins of indirect genetic effects on offspring educational outcomes. \nhttps://psyarxiv.com/bhpm5/download?format=pdf (2022). \n52. Young, A. I. et al. Relatedness disequilibrium regression estimates heritability without \nenvironmental bias. Nat. Genet. 50, 1304–1310 (2018). \n53. Young, A. S. Estimation of indirect genetic effects and heritability under assortative \nmating. bioRxiv 2023.07.10.548458 (2023) doi:10.1101/2023.07.10.548458. \n54. Nivard, M. G. et al. Neither nature nor nurture: Using extended pedigree data to \n . CC-BY 4.0 International licenseIt is made available under a \n is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)\nThe copyright holder for this preprintthis version posted March 6, 2024. ; https://doi.org/10.1101/2024.03.05.24303772doi: medRxiv preprint \n\n \n45 \nunderstand indirect genetic effects on offspring educational outcomes. (2022) \ndoi:10.31234/osf.io/bhpm5. \n55. Solé-Navais, P. et al. Genetic effects on the timing of parturition and links to fetal birth \nweight. Nat. Genet. 55, 559–567 (2023). \n56. Granés, L., Torà-Rocamora, I., Palacio, M., De la Torre, L. & Llupià, A. Maternal \neducational level and preterm birth: Exploring inequalities in a hospital-based cohort \nstudy. PLoS One 18, e0283901 (2023). \n57. Yengo, L. et al. Imprint of assortative mating on the human genome. Nat Hum Behav 2, \n948–954 (2018). \n58. Reynolds, C. A., Baker, L. A. & Pedersen, N. L. Multivariate models of mixed assortment: \nphenotypic assortment and social homogamy for education and fluid ability. Behav. \nGenet. 30, 455–476 (2000). \n59. van Leeuwen, M., van den Berg, S. M. & Boomsma, D. I. A twin-family study of general \nIQ. Learn. Individ. Differ. 18, 76–88 (2008). \n60. Mascie-Taylor, C. G. Spouse similarity for IQ and personality and convergence. Behav. \nGenet. 19, 223–227 (1989). \n61. Jencks, C. Inequality: A Reassessment of the Effect of Family and Schooling in America. \n(Allen Lane, 1973). \n62. Loehlin, J. C. Heredity-environment analyses of Jencks’s IQ correlations. Behav. Genet. \n8, 415–436 (1978). \n63. Horwitz, T. B., Balbona, J. V., Paulich, K. N. & Keller, M. C. Evidence of correlations \nbetween human partners based on systematic reviews and meta-analyses of 22 traits and \nUK Biobank analysis of 133 traits. Nat Hum Behav (2023) doi:10.1038/s41562-023-\n01672-z. \n64. Sunde, H. F. et al. Genetic similarity between relatives provides evidence on the \npresence and history of assortative mating. bioRxiv 2023.06.27.546663 (2023) \ndoi:10.1101/2023.06.27.546663. \n65. Nordsletten, A. E. et al. Patterns of Nonrandom Mating Within and Across 11 Major \n . CC-BY 4.0 International licenseIt is made available under a \n is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)\nThe copyright holder for this preprintthis version posted March 6, 2024. ; https://doi.org/10.1101/2024.03.05.24303772doi: medRxiv preprint \n\n \n46 \nPsychiatric Disorders. JAMA Psychiatry 73, 354–361 (2016). \n66. Greve, A. N. et al. A Nationwide Cohort Study of Nonrandom Mating in Schizophrenia and \nBipolar Disorder. Schizophr. Bull. 47, 1342–1350 (2021). \n67. Cabrera-Mendoza, B., Wendt, F. R., Pathak, G. A., Yengo, L. & Polimanti, R. The impact \nof assortative mating, participation bias, and socioeconomic status on the polygenic risk \nof behavioral and psychiatric traits. bioRxiv (2022) doi:10.1101/2022.11.29.22282912. \n68. Smolen, C. et al. Assortative mating and parental genetic relatedness drive the \npathogenicity of variably expressive variants. medRxiv (2023) \ndoi:10.1101/2023.05.18.23290169. \n69. Kingdom, R. et al. Rare genetic variants in genes and loci linked to dominant monogenic \ndevelopmental disorders cause milder related phenotypes in the general population. The \nAmerican Journal of Human Genetics Preprint at \nhttps://doi.org/10.1016/j.ajhg.2022.05.011 (2022). \n70. Deciphering Developmental Disorders Study. Prevalence and architecture of de novo \nmutations in developmental disorders. Nature 542, 433–438 (2017). \n71. Border, R. et al. Cross-trait assortative mating is widespread and inflates genetic \ncorrelation estimates. Science 378, 754–761 (2022). \n72. Balbona, J. V., Kim, Y. & Keller, M. C. Estimation of Parental Effects Using Polygenic \nScores. Behav. Genet. 51, 264–278 (2021). \n73. Potharst, E. S. et al. High incidence of multi-domain disabilities in very preterm children at \nfive years of age. J. Pediatr. 159, 79–85 (2011). \n74. Cheong, J. L. Y. et al. Changing Neurodevelopment at 8 Years in Children Born \nExtremely Preterm Since the 1990s. Pediatrics 139, (2017). \n75. Beauregard, J. L., Drews-Botsch, C., Sales, J. M., Flanders, W. D. & Kramer, M. R. Does \nSocioeconomic Status Modify the Association Between Preterm Birth and Children’s Early \nCognitive Ability and Kindergarten Academic Achievement in the United States? Am. J. \nEpidemiol. 187, 1704–1713 (2018). \n76. Lacalle, L., Martínez-Shaw, M. L., Marín, Y. & Sánchez-Sandoval, Y. Intelligence Quotient \n . CC-BY 4.0 International licenseIt is made available under a \n is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)\nThe copyright holder for this preprintthis version posted March 6, 2024. ; https://doi.org/10.1101/2024.03.05.24303772doi: medRxiv preprint \n\n \n47 \n(IQ) in school-aged preterm infants: A systematic review. Front. Psychol. 14, 1216825 \n(2023). \n77. Wong, H. S. & Edwards, P. Nature or nurture: a systematic review of the effect of socio-\neconomic status on the developmental and cognitive outcomes of children born preterm. \nMatern. Child Health J. 17, 1689–1700 (2013). \n78. Crump, C., Sundquist, J. & Sundquist, K. Preterm or early term birth and risk of attention-\ndeficit/hyperactivity disorder: a national cohort and co-sibling study. Ann. Epidemiol. 86, \n119–125.e4 (2023). \n79. Husby, A., Wohlfahrt, J. & Melbye, M. Gestational age at birth and cognitive outcomes in \nadolescence: population based full sibling cohort study. BMJ 380, e072779 (2023). \n80. Thoma, M. E., Copen, C. E. & Kirmeyer, S. E. Short Interpregnancy Intervals in 2014: \nDifferences by Maternal Demographic Characteristics. NCHS Data Brief 1–8 (2016). \n81. Kandel, D. B., Griesler, P. C. & Schaffran, C. Educational attainment and smoking among \nwomen: risk factors and consequences for offspring. Drug Alcohol Depend. 104 Suppl 1, \nS24–33 (2009). \n82. Goldenberg, R. L., Culhane, J. F., Iams, J. D. & Romero, R. Epidemiology and causes of \npreterm birth. Lancet 371, 75–84 (2008). \n83. Madley-Dowd, P. et al. Maternal smoking during pregnancy and offspring intellectual \ndisability: sibling analysis in an intergenerational Danish cohort. Psychol. Med. 52, 1847–\n1856 (2022). \n84. Havdahl, A. et al. Associations Between Pregnancy-Related Predisposing Factors for \nOffspring Neurodevelopmental Conditions and Parental Genetic Liability to Attention-\nDeficit/Hyperactivity Disorder, Autism, and Schizophrenia: The Norwegian Mother, Father \nand Child Cohort Study (MoBa). JAMA Psychiatry 79, 799–810 (2022). \n85. van Alten, S., Domingue, B. W., Galama, T. & Marees, A. T. Reweighting the UK Biobank \nto reflect its underlying sampling population substantially reduces pervasive selection bias \ndue to volunteering. bioRxiv (2022) doi:10.1101/2022.05.16.22275048. \n86. Wright, C. F. et al. Genetic diagnosis of developmental disorders in the DDD study: a \n . CC-BY 4.0 International licenseIt is made available under a \n is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)\nThe copyright holder for this preprintthis version posted March 6, 2024. ; https://doi.org/10.1101/2024.03.05.24303772doi: medRxiv preprint \n\n \n48 \nscalable analysis of genome-wide research data. Lancet 385, 1305–1314 (2015). \n87. DECIPHER: Database of Chromosomal Imbalance and Phenotype in Humans Using \nEnsembl Resources. Am. J. Hum. Genet. 84, 524–533 (2009). \n88. Köhler, S. et al. The Human Phenotype Ontology project: linking molecular biology and \ndisease through phenotype data. Nucleic Acids Res. 42, D966–74 (2014). \n89. Deciphering Developmental Disorders Study. Large-scale discovery of novel genetic \ncauses of developmental disorders. Nature 519, 223–228 (2015). \n90. Turnbull, C. et al. The 100 000 Genomes Project: bringing whole genome sequencing to \nthe NHS. BMJ 361, k1687 (2018). \n91. Turro, E. et al. Whole-genome sequencing of patients with rare diseases in a national \nhealth system. Nature 583, 96–102 (2020). \n92. Aggregated variant calls - genomics England trusted research environment user guide. \nhttps://re-docs.genomicsengland.co.uk/aggv2/. \n93. Chang, C. C. et al. Second-generation PLINK: rising to the challenge of larger and richer \ndatasets. GigaScience vol. 4 Preprint at https://doi.org/10.1186/s13742-015-0047-8 \n(2015). \n94. Purcell, S. et al. PLINK: a tool set for whole-genome association and population-based \nlinkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007). \n95. Manichaikul, A. et al. Robust relationship inference in genome-wide association studies. \nBioinformatics 26, 2867–2873 (2010). \n96. McFall, S., Petersen, J., Kaminska, O. & Lynn, P. Understanding Society—The UK \nHousehold Longitudinal Study: Waves 2 and 3 Nurse Health Assessment, 2010–2012 \nGuide to Nurse Health …. Colchester: University of Essex. \n97. Boyd, A. et al. Cohort Profile: the ‘children of the 90s’--the index offspring of the Avon \nLongitudinal Study of Parents and Children. Int. J. Epidemiol. 42, 111–127 (2013). \n98. Fraser, A. et al. Cohort Profile: the Avon Longitudinal Study of Parents and Children: \nALSPAC mothers cohort. Int. J. Epidemiol. 42, 97–110 (2013). \n99. Connelly, R. & Platt, L. Cohort profile: UK Millennium Cohort Study (MCS). Int. J. \n . CC-BY 4.0 International licenseIt is made available under a \n is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)\nThe copyright holder for this preprintthis version posted March 6, 2024. ; https://doi.org/10.1101/2024.03.05.24303772doi: medRxiv preprint \n\n \n49 \nEpidemiol. 43, 1719–1725 (2014). \n100. Joshi, H. & Fitzsimons, E. The Millennium Cohort Study: the making of a multi-purpose \nresource for social science and policy. Longit. Life Course Stud. 7, 409–430 (2016). \n101. Northstone K, Ben Shlomo Y, Teyhan A et al. The Avon Longitudinal Study of Parents \nand children ALSPAC G0 Partners: A cohort profile. \nhttps://wellcomeopenresearch.org/articles/8-37/v1. \n102. Fitzsimons, E. et al. Collection of genetic data at scale for a nationally representative \npopulation: the UK Millennium Cohort Study. Longit. Life Course Stud. 13, 169–187 \n(2021). \n103. McInnes, L., Healy, J. & Melville, J. UMAP: Uniform Manifold Approximation and \nProjection for Dimension Reduction. arXiv [stat.ML] (2018). \n104. Taliun, D. et al. Sequencing of 53,831 diverse genomes from the NHLBI TOPMed \nProgram. Nature 590, 290–299 (2021). \n105. Das, S. et al. Next-generation genotype imputation service and methods. Nat. Genet. 48, \n1284–1287 (2016). \n106. McCarthy, S. et al. A reference panel of 64,976 haplotypes for genotype imputation. Nat. \nGenet. 48, 1279–1283 (2016). \n107. Willer, C. J., Li, Y. & Abecasis, G. R. METAL: fast and efficient meta-analysis of \ngenomewide association scans. Bioinformatics 26, 2190–2191 (2010). \n108. Bulik-Sullivan, B. K. et al. LD Score regression distinguishes confounding from \npolygenicity in genome-wide association studies. Nat. Genet. 47, 291–295 (2015). \n109. Grotzinger, A. D., Fuente, J. de la, Privé, F., Nivard, M. G. & Tucker-Drob, E. M. \nPervasive Downward Bias in Estimates of Liability-Scale Heritability in Genome-wide \nAssociation Study Meta-analysis: A Simple Solution. Biol. Psychiatry 93, 29–36 (2023). \n110. Yang, J. et al. Genetic variance estimation with imputed variants finds negligible missing \nheritability for human height and body mass index. Nat. Genet. 47, 1114–1120 (2015). \n111. Golan, D., Lander, E. S. & Rosset, S. Measuring missing heritability: inferring the \ncontribution of common variants. Proc. Natl. Acad. Sci. U. S. A. 111, E5272–81 (2014). \n . CC-BY 4.0 International licenseIt is made available under a \n is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)\nThe copyright holder for this preprintthis version posted March 6, 2024. ; https://doi.org/10.1101/2024.03.05.24303772doi: medRxiv preprint \n\n \n50 \n112. Vilhjálmsson, B. J. et al. Modeling Linkage Disequilibrium Increases Accuracy of \nPolygenic Risk Scores. Am. J. Hum. Genet. 97, 576–592 (2015). \n113. International HapMap 3 Consortium et al. Integrating common and rare genetic variation \nin diverse human populations. Nature 467, 52–58 (2010). \n114. Bycroft, C. et al. The UK Biobank resource with deep phenotyping and genomic data. \nNature 562, 203–209 (2018). \n115. Lee, S. H., Goddard, M. E., Wray, N. R. & Visscher, P. M. A better coefficient of \ndetermination for genetic profile analysis. Genet. Epidemiol. 36, 214–224 (2012). \n116. Lee, S. H., Wray, N. R., Goddard, M. E. & Visscher, P. M. Estimating missing heritability \nfor disease from genome-wide association studies. Am. J. Hum. Genet. 88, 294–305 \n(2011). \n117. McLaren, W. et al. The Ensembl Variant Effect Predictor. Genome Biol. 17, 1–14 (2016). \n118. Karczewski, K. J. et al. The mutational constraint spectrum quantified from variation in \n141,456 humans. Nature 581, 434–443 (2020). \n119. Lek, M. et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature 536, \n285–291 (2016). \n120. Samocha, K. E. et al. Regional missense constraint improves variant deleteriousness \nprediction. bioRxiv 148353 (2017) doi:10.1101/148353. \n121. Plewis, I. The Millennium Cohort Study: Technical Report on Sampling (4th Edition). \nhttp://doc.ukdataservice.ac.uk/doc/4683/mrdoc/pdf/mcs_technical_report_on_sampling_4t\nh_edition.pdf (2007). \n122. Plewis, I. Non‐Response in a Birth Cohort Study: The Case of the Millennium Cohort \nStudy. Int. J. Soc. Res. Methodol. 10, 325–334 (2007). \n \n . CC-BY 4.0 International licenseIt is made available under a \n is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)\nThe copyright holder for this preprintthis version posted March 6, 2024. ; https://doi.org/10.1101/2024.03.05.24303772doi: medRxiv preprint","source_license":"CC-BY-4.0","license_restricted":false}