Evaluation of a biomarker for amyotrophic lateral sclerosis derived from a hypomethylated DNA signature of human motor neurons | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article Evaluation of a biomarker for amyotrophic lateral sclerosis derived from a hypomethylated DNA signature of human motor neurons Calum Harvey, Alicja Nowak, Sai Zhang, Tobias Moll, Annika K Weimer, and 11 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-5397445/v1 This work is licensed under a CC BY 4.0 License Status: Published Journal Publication published 14 Jan, 2025 Read the published version in BMC Medical Genomics → Version 1 posted 4 You are reading this latest preprint version Abstract Amyotrophic lateral sclerosis (ALS) lacks a specific biomarker, but is defined by relatively selective toxicity to motor neurons (MN). As others have highlighted, this offers an opportunity to develop a sensitive and specific biomarker based on detection of DNA released from dying MN within accessible biofluids. Here we have performed whole genome bisulfite sequencing (WGBS) of iPSC-derived MN from neurologically normal individuals. By comparing MN methylation with an atlas of tissue methylation we have derived a MN-specific signature of hypomethylated genomic regions, which accords with genes important for MN function. Through simulation we have optimised the selection of regions for biomarker detection in plasma and CSF cell-free DNA (cfDNA). However, we show that MN-derived DNA is not detectable via WGBS in plasma cfDNA. In support of our experimental finding, we show theoretically that the relative sparsity of lower MN sets a limit on the proportion of plasma cfDNA derived from MN which is below the threshold for detection of WGBS. Our findings are important for the ongoing development of ALS biomarkers. The MN-specific hypomethylated genomic regions we have derived could be usefully combined with more sensitive detection methods and perhaps with study of CSF instead of plasma. Indeed we demonstrate that neuronal-derived DNA is detectable in CSF. Our work is relevant for all diseases featuring death of rare cell-types. Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Introduction Amyotrophic lateral sclerosis (ALS) is an incurable neurodegenerative disease where death results from motor neuron (MN) loss leading to respiratory failure. The design and development of novel therapeutics has been held back because of the lack of a specific biomarker. Currently, neurofilament proteins measured in plasma provide a non-specific readout of neuronal death [ 1 ]. Neurofilament proteins form important structural components of the large myelinated axons which are found in MN. MN death triggers the release of neurofilaments from the cytoplasm into the extracellular space [ 2 ]; as a result the level of detectable neurofilament is a function of the rate of MN death, and thus neurofilament measurement can be used as a biomarker of disease progression [ 1 ]. However, neurofilaments are not specific to MN and it is notable that serum neurofilament light chain (NfL) [ 3 ] is elevated in other neurological diseases. Indeed, for diagnosis of ALS, serum NfL is of limited value [ 4 ] even if it is useful for measuring the rate of progression. It follows that detection of a different marker which is released only from dying MN may outperform neurofilaments as a biomarker for ALS. DNA methylation is fundamental to the control of gene expression and by inference, genomic methylation should be relatively cell specific. Cell-specific DNA methylation signals are stable between individuals, as was confirmed by a recent atlas of DNA methylation [ 5 ]. Moreover, DNA methylation is relatively stable over time [ 6 ]. Cell-free DNA (cfDNA) found in peripheral blood is the product of release from dying cells [ 7 ] and has been extensively proposed as a source of biomarkers in the cancer field [ 8 ]; methylated cfDNA is now the basis of FDA-approved applications e.g. [ 9 ]. We hypothesised that a DNA methylation signature which is specific to MN, and is detectable within cfDNA, might be both sensitive and specific as a biomarker of the rate of MN death due to ALS. We present whole genome bisulfite sequencing (WGBS) data from iPSC-derived MN from controls. These data complement our previously published epigenetic profiling from the same neurons [ 10 ]. It is practically difficult to obtain MN in sufficient quantity from post-mortem material to perform WGBS and therefore we chose to focus on iPSC-derived MN which are a gold-standard model of ALS [ 11 ]. We have published WGBS of cfDNA from ALS patients and controls [ 12 ] but previously we lacked a MN signature for comparison. Here we show, using simulation and measurement, that MN-specific DNA methylation is not detectable within cfDNA in plasma by WGBS. Future work will evaluate our MN DNA methylation signature by other means and in other biofluids. Our approach is summarised in Fig. 1 . Results Cell-specific DNA methylation within control iPSC-derived MN is similar to human adult CNS neurons WGBS was performed at high depth to profile DNA methylation within iPSC-derived MN from three neurologically normal individuals ( Supplementary Table 1, Methods ). A first question was whether the methylation signature of these neurons, which are derived in vitro , is consistent with CNS neurons abstained from human tissue. WGBS sequencing data were processed and quality control (QC) was performed according to the ENCODE 4 standards [ 14 ]. Methylation profiles of 205 samples covering 39 cell-types from an available methylation atlas [ 5 ] were combined with our samples, then used to segment the genome into blocks of co-methylated CpGs ( Methods ). Hierarchical unsupervised clustering was used to examine the relationships between samples ( Methods , Fig. 2 A). As expected, genome methylation within iPSC-derived MN clustered closely with CNS neuronal subtypes (Fig. 2 B). On this basis we proceeded to use our data to identify MN-specific methylation ( Methods ). Identification of cell-specific hypomethylated genomic regions Next we derived DNA methylation changes specific to MN via comparison with the methylation profiles of 205 samples covering 39 cell-types from an available methylation atlas [ 5 ]. Blocks of co-methylated CpGs that exhibited hyper- or hypomethylation specifically in MN were identified ( Methods ) and taken forward for further analysis. In total 8,729 regions were specifically hypomethylated in MN ( Supplementary Table 2 ); hypomethylation indicates increased genomic accessibility suggestive of MN-specific function. A similar analysis identified 5,690 blocks which were specifically hypomethylated in the total set of human CNS neurons compared to other cell-types. The number of regions identified per cell-type varied dramatically from 61,693 for gallbladder to 436 for colon fibroblasts. MN-specific DNA methylation is linked to MN function but not to genetic risk for ALS Cell-specific DNA methylation is typically hypomethylated [ 5 ], which should be coincident with increased accessibility of underlying DNA over regulatory regions including enhancers [ 15 ]. As a validation of the regions we have identified, we examined the overlap of MN-specific hypomethylated enhancers and their target genes, with independent measurements of MN gene expression and ALS heritability (Fig. 3 A). To derive associated genes from MN-specific hypomethylated DNA blocks, we applied the activity-by-contact (ABC) model [ 13 ] to link regulatory regions to expressed genes within iPSC-derived MN ( Methods ). We found the total list of hypomethylated regions is associated with 2,046 expressed genes. We then tested this gene list for enrichment with human cell types and tissues included in ARCHS4 [ 16 ] using Enrichr [ 17 ], and found they were most significantly enriched for genes expressed specifically in spinal motor neurons isolated from post-mortem tissue [ 18 ] (Fisher’s exact test, p = 4.22e-19, OR = 1.79, using the ARCHS4 database [ 16 ], Fig. 3 B). This demonstrates that the methylation profiles of the iPSC derived motor neurons are congruent with transcriptional profiles of human motor neurons. To further characterise the function of MN-specific hypomethylated genes we examined RNA-sequencing from iPSC-derived motor neurons obtained from 245 ALS patients and 45 controls ( www.answerals.org ) ( Methods ). Genes linked to hypomethylated regions in MN were highly expressed within iPSC-derived MN compared to the background transcriptome (Wilcox rank sum test, p < 2.2e-16, Fig. 3 C) which is consistent with an important role in MN function. Four genes were reported as differentially expressed (FDR < 0.05, negative binomial test) between ALS patients and controls in this data, but genes linked to hypomethylated regions in MN were not enriched within ALS-associated differentially expressed genes (Wilcoxon rank sum test, p = 0.25, Fig. 3 D). Finally, we performed LDSC [ 19 ] using a recent GWAS study of ALS [ 20 ] to examine disease-specific heritability enrichment within MN-specific hypomethylated regions. Heritability for ALS was enriched within hypomethylated regions but this was not statistically significant (OR = 25.2, se = 26.05, p = 0.38, LDSC, Methods ). We conclude that MN-specific DNA hypomethylation is associated with gene expression linked to MN function, but we find no conclusive evidence that there is a specific association with genes dysregulated in MN in a disease context. An optimum set of hypomethylated DNA regions for ALS biomarker design An important use of cell-type-specific methylation profiles is for the deconvolution of complex mixes of DNA to identify the proportions of contributing cell types. This has the potential to lead to a novel biomarker of ALS: Cell-free DNA (cfDNA) found within plasma is released from dying cells and thus, the quantity of DNA sourced from CNS neurons, and MN in particular, should be proportional to the rate of MN death. Neuronal DNA is not normally seen in the plasma [ 5 ], which may be due to a low rate of neuron death or to the blood brain barrier, but brain-derived DNA has been detected in plasma under pathological conditions [ 21 , 22 ] demonstrating its potential to serve as a biomarker. To deconvolute plasma cfDNA we optimised the UXM algorithm [ 5 ] for the low coverage (~ 10x) typical of methylation studies of cfDNA; in particular we optimised the choice and configuration of MN-specific methylation blocks. The UXM algorithm was chosen as it makes use of read level methylation data, and has achieved accurate deconvolution of cell types present at proportions as low as 0.1% [ 5 ]. Optimisation was performed using synthetic data generated by spiking WGBS data derived from plasma cfDNA of healthy individuals, with sequencing reads derived from human MN at a known proportion between 0.01%-10% ( Methods , Fig. 4 A). We simulated relatively low coverage (10x) to match coverage in the actual ALS cfDNA samples. We observed a linear correlation between the actual and predicted percentage of spike-in MN DNA with an adjusted r 2 < 0.9 in all marker sets (Fig. 4 B). A configuration of UXM using 500 MN-specific blocks with a minimum of 3 CpGs produced the highest detection probability at 1% spike-in, but 500 blocks with a minimum of 4 CpGs performed better at both 0.5% and 0.1% spike-in (difference in detection probability between 0.1–0.2 at each % spike-in, Fig. 4 C). However, we note that at spike-ins of ≤ 0.5%, AUC was poor for all sets of MN marker blocks. The greatest AUC (0.69) at 1% spike-in was achieved with 500 blocks with a minimum of 3 CpGs, in keeping with its higher probability of detection ( Supplementary Fig. 1A ); this was the configuration taken forward to analyse ALS patient samples. As seen in [ 5 , 23 ], deconvolution frequently identified false-positive cell-types within the synthetic mixture ( Supplementary Fig. 1B ). We used a linear model to examine the effect of coverage and number of marker regions the total number of cell types identified in a sample. Both coverage (p = 0.04) and number of markers (p = 3.7e-4) were significantly negatively correlated with the number of cell types identified, suggesting that increased coverage and using more marker regions per cell-type will reduce the number of cell types falsely identified within a mixture. MN-derived DNA is not detectable within plasma cfDNA When we applied our optimised deconvolution utilising 500 MN-specific methylation blocks with a minimum of 3 CpGs to plasma cfDNA WBGS from n = 12 ALS patients we did not identify MN-derived DNA in any sample (Fig. 4 D) suggesting that if MN DNA is present it is below the detectable limit of ~ 1% of plasma cfDNA (Fig. 4 B-C). Neuronal-derived DNA is detectable in CSF cfDNA The cerebrospinal fluid (CSF) surrounds the spinal cord and brain, and is encapsulated by the blood brain barrier. It might be expected that CSF cfDNA is enriched in neuronal DNA compared to plasma and so we attempted to fully characterise the contributing cell types within CSF cfDNA ( Methods ). No WGBS data was available from ALS patient CSF cfDNA. We analysed four samples of WGBS CSF cfDNA from hydrocephalus patients [ 24 ]. Coverage was very low (0.12-0.45x, Supplementary Table 3 ) due to the low concentration of cfDNA within the spinal cord so samples were merged to improve deconvolution accuracy. We discovered that neuronal and oligodendrocyte DNA comprised 13% and 14% of the total cfDNA with the remainder largely composed of a mix of blood, epithelial, and adipocyte cell types ( Supplementary Fig. 1 ); MN-derived DNA was not detectable in any sample. The contribution of adipocytes may in part reflect the lumbar puncture procedure used to collect CSF as DNA. The lack of a number of CNS-specific cell-types such as microglia within the reference leads to a possible assignment error which is impossible to quantify, and is likely responsible for the small proportion of epithelial and pancreatic cell types identified. The theoretical maximum proportion of MN-derived DNA within plasma cfDNA is very low We did not detect MN DNA in any ALS patient sample suggesting that if MN DNA is present it is below ~ 1% of plasma cfDNA. We questioned if this was a detection deficiency or whether there might be insufficient MN DNA for detection. To address this we modelled the theoretical maximum proportion of MN DNA that might be expected within plasma cfDNA (Fig. 5 A). Recent work [ 25 ] has estimated the effect of cellular turnover on the proportion of DNA derived from different cell-types detectable within plasma cfDNA. The proportion of DNA released from dying cells that reaches cfDNA varies dramatically, from 3% of released DNA for megakaryocytes and endothelial cells, to 0.003% for erythrocyte progenitors. Although there are > 86 billion neurons in the human CNS [ 26 ], lower MN are a rare subtype of neurons, and previous work has estimated that there may be < 500,000 in total [ 27 ]. Assuming optimum availability then 3% of released MN DNA will be detectable within plasma cfDNA. If we assume all lower MN die over the course of disease, we can estimate the theoretical maximum proportion of MN DNA as a part of total plasma cfDNA as a function of the rate of disease progression ( Methods , Fig. 5 B). From this we can calculate that even for the fastest theoretical disease progression rate, the plasma concentration of MN DNA would be several orders of magnitude smaller than our threshold for detection, primarily because of the small number of MN relative to other cell types. We have assumed a half life for cfDNA of 114 minutes [ 28 ]. In our simulation experiments we achieved a detection probability greater than chance only when the proportion of cfDNA attributed to MN was > 1% (Fig. 4 B-C) which determined the threshold for theoretical detection. We sought to estimate what rate of MN death would be required to produce a detectable concentration within cfDNA. Using the proportion of DNA from cellular turnover detectable as cfDNA in the plasma from endothelial cell and erythroblasts as maximum and minimum estimates, we show that even if all lower MN died within 24 hours, their contribution to cfDNA would still be below the limit of detection for WGBS (Fig. 5 C). We consider this estimate of wider use to the field as it predicts whether a detectable quantity of cfDNA will be present from a known rate of cell death. Methods Tissue culture and development of iPSC-derived MN Tissue culture of iPSCs and the derivation of pure MN cultures via small molecules is described elsewhere [ 10 ]. Whole genome bisulfite sequencing (WGBS) of DNA derived from iPSC-derived MN We generated WGBS libraries following the Whole-Genome Bisulfite Sequencing Data Standards and Processing Pipeline ( https://www.encodeproject.org/data-standards/wgbs/ ). In brief, genomic DNA was extracted from ~ 50,000 cells per technical replicate before shearing and bisulfite treatment. Libraries were amplified by PCR and purified. Library concentrations were measured (Qubit). WGBS libraries were paired-end sequenced on a NovaSeq 6000 system (Illumina) with target 30X coverage Raw data were processed with the ENCODE 4 pipeline for WGBS according to ENCODE 4 standards. Files are available at encodeproject.org with the following accession numbers: ENCSR734EFX, ENCSR509LMK, ENCSR978LOX. Paired-end FASTQ files were mapped to the human (hg38), lambda, pUC19 and viral genomes using bwa-meth (v.0.2.0) then converted to BAM files using SAMtools (v.1.9)52. Duplicated reads were marked by Sambamba (v.0.6.5) with parameters ‘-l 1 -t 16 --sort-buffer-size 16000 --overflow-list-size 10000000’ [ 29 ]. Reads with low mapping quality, duplicated or not mapped in a proper pair were excluded using SAMtools view with parameters ‘-F 1796 -q 10’. Reads were stripped from nonCpG nucleotides and converted to PAT files using wgbstools (v.0.2.0, downloaded from Github github.com/nloyfer/wgbs_tools in September 2022), command wgbstools bam2pat --genome hg38 . Methylation across the MN samples was examined using a PCA plot, and technical replicates were found to have low heterogeneity. Technical replicates were then merged to allow inclusion in the wgbstools pipeline. Genome segmentation into methylation blocks Using all three of our samples and all 205 samples from a methylation atlas we segmented the genome into 1,630,133 blocks of 4 or more CpGs using the wgbstools command ‘wgbstools segment --min_cpg 4 --max_bp 5000’. PAT and BETA files for all 207 available samples mapped to GRCh38 were downloaded from GEO (accession number GSE186458) [ 5 ] on the 20th of September 2022. As per the original publication we excluded two cardiomyocyte samples due to low coverage. We also segmented the genome into 1,938,130 blocks of 3 CpGs were identified using the wgbstools command wgbstools segment --min_cpg 3 --max_bp 5000; these blocks of 3 CpGs were used only for marker selection. Unsupervised clustering of DNA methylation profiles Average methylation per block (of at least 4 CpGs in size) for each sample was extracted using the wgbstools command ‘beta_to_table’, replacing blocks with less than 10x coverage in a sample with ‘NA’. We then selected the top 1% of blocks by variance, excluding blocks with any ‘NA’ values across all samples, and used these for clustering. Unsupervised clustering was performed using Python version 3.10.8, Dask version 2023.9.2, SciPy 1.9.1, options method='average', metric='cityblock', optimal_ordering = True. Derivation of MN-specific hypomethylated genomic regions We applied the wgbstools command ‘find_markers’ together with all 205 samples used for segmentation. Default parameters were used to remove low coverage regions, samples with a read depth of less than 5 in a segment had the value set to NA, and segments with greater than 1 in 3 NA values in either the target or background cell type were removed. Regions were considered MN-specific if there was a difference of at least 0.3 between the mean motor neuron methylation and mean of all other samples’ methylation within that block, and the p value of a t-test was equal to or below 0.05. Identification of genes linked to MN-specific hypomethylated genomic regions We implemented the ABC model [ 13 ] following the guidelines provided at https://github.com/broadinstitute/ABC-Enhancer-Gene-Prediction . First, we called peaks for the ATAC-seq profiling using MACS2, and then identified the candidate enhancer elements using “makeCandidateRegions.py” with parameters peakExtendFromSummit = 250 and nStrongestPeaks = 150000. The black-listed regions generated by the ENCODE 4 ( https://www.encodeproject.org/ ) were used for removing enhancers overlapping regions with anomalous sequencing reads. Second, we applied “run.neighborhoods.py” to quantify the enhancer activities by counting ATAC-seq and H3K27ac ChIP-seq reads in candidate enhancer regions. RNA-seq profiling of iPSC-derived MNs was also provided to inform expressed genes. Quantile normalisation was applied using K562 epigenetic data as the reference. At last, using “predict.py” we computed the ABC scores by combining the enhancer activities (calculated by the second step) with the Hi-C profiling. Hi-C data was fit to the power-law model. The default threshold 0.02 was used to define valid E-P links. Transcriptome analysis For AnswerALS data, gene expression profiling of iPSC-derived MNs and phenotype data were obtained for 245 ALS patients and 45 neurologically normal controls ( https://www.answerals.org/ ). Gene expression was normalised by the The trimmed mean of M-values normalisation method (TMM). We used a negative binomial test to determine genes differentially expressed between ALS patients and controls. Significance testing was performed for all genes expressed in MN (n = 22,976) defined as count above zero in more than half of samples; in addition we excluded the bottom 25% of genes based on mean count across all samples. Generation of synthetics mixes of MN-derived DNA together with plasma cfDNA WGBS of plasma cfDNA samples produced by Caggiano C. et al. [ 12 ] were downloaded from https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE164600 in February 2023, including 12 ALS patients and 12 healthy volunteers. Raw FastQ files were trimmed with Trim Galore version 6.7 using the options ‘trim_galore --paired -clip_R1 4 --clip_R2 4 --three_prime_clip_R1 12 --three_prime_clip_R2 12’ and then aligned to GRCh38 using the bowtie 2 aligner in Bismark version 22.3. Duplicate reads were removed with Bismark and Samtools version 1.16.1 was used to remove reads with a MAPQ score below 10. BAM files were then converted to PAT and BETA files using wgbstools. Using wgbstools command ‘mix_pat’, synthetic mixes of MN sample PGP_M_55_iPSC ( Supplementary Table 1 ) or cerebral neuron sample Cortex-Neuron-Z0000042F [ 5 ] and the either the 12 plasma cfDNA samples from healthy volunteers, or the 4 CSF cfDNA samples from hydrocephalus patients were created. By down- or up-sampling the cfDNA and neuronal reads, spike-ins were made at 0–10%, and coverage was varied from 2.5-30x. Deconvolution of plasma cfDNA and optimisation of a deconvolution algorithm We derived uniquely hypomethylated regions for each cell-type to use for deconvolution. In this process we excluded the two samples used for spike-in to prevent overfitting. Segmentation was repeated as before to derive two sets of regions, one with a minimum length of 3 CpGs and one with a minimum length of 4 CpGs. For both sets of regions cell type specific marker regions were found using wgbstools ‘find_markers’ with a minimum difference between target and background means of 0.3 and a t-test p-value equal to or below 0.05. To derive different numbers of marker regions, for each cell-type the marker regions were ordered by the difference between the 75th-centile in the target group and the 2.5th centile in the background and then 25, 50, 100, 250, 300, 400, or 500 marker regions were selected. Marker regions for all cell types were then used to create an atlas of the fragment based methylation for each region across all cell types using the UXM tool downloaded from https://github.com/nloyfer/UXM_deconv on the 31st of January 2023. We then used UXM to deconvolve the synthetic mixes, producing estimated cell type contributions for each mix. These were then analysed using R version 4.3.1 (2023-06-16). To optimise region selection we tested using smaller or larger regions, and more or less regions per cell-type in order to maximise the probability of detection of spiked-in DNA, and minimise the normalised root mean squared error (RMSE). Deconvolution of CSF cfDNA WGBS of CSF cfDNA samples [ 30 ] were downloaded from https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE142241 in April 2023, including four hydrocephalus patients. Reads were trimmed with trim-galore version 6.7 using the paired option and default settings. Due to low mapping efficiency of the reads we followed the ‘Dirty Harry’ protocol described by the creators of the Bismark software. Reads were first aligned as paired end reads using the bowtie aligner within Bismark. Unmapped R1 reads were then aligned in directional mode, and R2 reads were then aligned in pbat mode before combining them into a single file. Duplicate reads were then removed with Bismark, then Samtools version 1.16.1 was used to remove reads with a MAPQ score below 10 before converting them into PAT and Beta files using wgbstools. Theoretical estimate of the maximum of MN-derived DNA within plasma cfDNA The concentration of cfDNA produced from cell death is given by the standard pharmacokinetic equation for concentration produced by a drug infusion at a constant rate. C = d( k 0 * t 1/2 ) / ( ln(2) * Vd ) Where C is the concentration in the plasma, k 0 is the infusion rate, t 1/2 is the half life, Vd is the volume of distribution, and d is the proportion of DNA from cell death present in the plasma. We were able to calculate the theoretical maximum concentration of MN DNA within plasma cfDNA as a function of the time period over which the DNA was released i.e. disease duration by making reasonable assumptions for each of these values. Using the values given for a 70kg 20–25 year old man as has historically been used as standard, the volume of plasma is 3.0L [ 31 ]. In the absence of a ground-truth for the proportion of DNA released from dying MN that reaches plasma cfDNA, we used observed maximum and minimum proportions for other cell-types: from 3% for megakaryocytes and endothelial cells to 0.003% for erythrocyte progenitors [ 25 ]. Infusion rate is given by the rate of cell death, and converted to weight of DNA using the conversion 1 diploid genome = 6.46pg [ 32 ]. The total number of lower MN has been estimated at ~ 500,000 [ 27 ] and we estimate a constant rate of loss over the disease course based on the observation that neurofilament levels, a biomarker of neuronal death, rise prior to disease onset then reach a stable concentration that is proportional to speed of progression [ 33 ]. The half life of plasma cfDNA has been measured using a variety of means, including the decrease in foetal cfDNA following pregnancy, the decrease in tumour cfDNA following surgery, and the increase and decrease in cfDNA following exercise [ 34 ]. A key point is to distinguish between the distribution half life and steady state half life. As shown by experiments with radiolabeled double stranded DNA [ 35 ], following an infusion DNA is taken up by soft tissues causing its concentration in the plasma to decrease rapidly until an equilibrium is reached with equal movement of DNA between the soft tissues and plasma. Following this the concentration of DNA will reach a steady state where its concentration is determined by the infusion rate and the steady state half life. We use 114 minutes as our estimate for the steady state half life as this is based on the fall in circulating tumour DNA following complete resection of the tumour [ 36 ]. cfDNA from the tumour would have reached a steady state prior to the surgery and its decrease from the surgery would be in line with the steady state half life. When estimating the proportion of cfDNA we use the concentration of 297pg/ul as the expected concentration of plasma cfDNA as this was the average concentration in controls age and sex matched to ALS patients [ 12 ]. Discussion ALS is currently an incurable and invariably fatal neurodegenerative disease [ 37 ]. Biomarkers are crucial for translational medicine and the recent development of serum NfL as a biomarker for ALS [ 1 ] has been key to the development of new treatments [ 38 ]. However, a key deficiency of NfL measurement is that it is not specific to MN [ 3 ], the primary degenerating cell in ALS. We and others have hypothesised that detection of cell-specific methylation of DNA within plasma cfDNA might provide an alternative and more specific biomarker for ALS. Here we show theoretically and experimentally that this goal is potentially not achievable using WGBS of plasma cfNDA, at least under the experimental conditions we encountered. Alternative approaches are needed which may include alternative biofluids or detection methods. We have developed a MN-specific set of hypomethylated genomic regions using WGBS in iPSC-derived MN from neurologically normal individuals, together with an atlas of tissue-specific methylation [ 5 ]. We demonstrate that these regions are associated with genes which are key to MN function but not significantly enriched with ALS genetic risk. Our regions are likely to be useful for future works aiming to detect DNA derived from MN using different detection methods. Our simulations and our measurements suggest that the sensitivity of WGBS is limited to 1% of plasma cfDNA which is significantly greater than the theoretical maximum proportion of plasma cfDNA derived from rapidly degenerating MN, which we determine to be 1.6*10 − 5 %. This is due to the relatively small number of MN compared to the ongoing turnover of other cell-types. It is not inconceivable that MN-derived DNA could be detected at this level but targeted amplification together with more sensitive detection will be necessary. An important limitation to our work, and the majority of deconvolution algorithms, is that they assume the sequenced DNA fragments are randomly distributed across the genome, which is not correct. It is known that the formation of cell-free DNA from genomic DNA leads to preferential preservation of nucleosome-bound DNA, so cell-free DNA from different cell types or tissues produces fragmentation patterns with greater depth at sites bound to nucleosomes [ 39 ]. Enrichment of MN-specific methylation blocks used for detection with nucleosome-bound genomic regions could potentially improve the performance of detection. It is possible that use of an alternative biofluid might enable detection of MN-specific DNA. CSF is the obvious choice given that, unlike blood, it is not separated from MN by the blood brain barrier (BBB). However, the extremely low concentration of cfDNA in CSF – 0.4ng/mL versus 7.7ng/mL in plasma [ 40 ] – may again be prohibitive. Our preliminary analysis suggests that neuronal but not MN-derived DNA is detectable within CSF cfDNA via WGBS, but this did not include sequencing data from ALS patients. Our study has contributed WGBS data from iPSC-derived MN (encodeproject.org, Methods ) and the identification of MN-specific hypomethylated genomic regions. We have not achieved a new biomarker for ALS but we have delineated the challenge for this approach through both theoretical calculations and experimental measurements. We have shown that WGBS of cfDNA derived from plasma is not likely to lead to a new biomarker for ALS and that future research should focus on developing our MN-specific regions with a more sensitive detection method. Declarations Ethics approval and consent to participate The study was approved by the South Sheffield Research Ethics Committee. Also, this study followed study protocols approved by Medical Ethical Committees for each of the participating institutions. Written informed consent was obtained from all participating individuals. All methods were performed in accordance with relevant national and international guidelines and regulations. Consent for publication Written informed consent was obtained from all participating individuals. Availability of data and material WGBS data are available at encodeproject.org with the following accession numbers: ENCSR734EFX, ENCSR509LMK, ENCSR978LOX. Competing interests The authors declare that they have no competing interests. Funding This work was supported by the National Institutes of Health (CEGS 5P50HG00773504, 1P50HL083800, 1R01HL101388, 1R01-HL122939, S10OD025212, P30DK116074, and UM1HG009442 to MPS), the Wellcome Trust (216596/Z/19/Z to JCK), and NIHR (NF-SI-0617-10077 to PJS). CH/JCK are supported by the MNDA (899-792). We also acknowledge support from a Kingsland fellowship (T.M.), and the NIHR Sheffield Biomedical Research Centre for Translational Neuroscience (IS-BRC-1215-20017) and the NIHR Sheffield Clinical Research Facility. Author contributions CH, PJS, MPS, SZ, JM, EH and JCK conceived and designed the study. CH, AN, AMB and JCK performed statistical analyses. AKW, CDSS, LF and TM carried out experiments. CH, EH, JM, SZ, JCK, KK, CC, and NZ interpreted the data with assistance from all other authors. JCK, JM, PJS, and MPS supervised the work. CH, EH and JCK wrote the manuscript with feedback from all other authors. Acknowledgments We are very grateful to the ALS patients and control subjects who generously donated biosamples. We acknowledge transcriptomic data provided by the AnswerALS Consortium. Figures were created using BioRender.com. References Lu C-H, Macdonald-Wallis C, Gray E, Pearce N, Petzold A, Norgren N, et al. Neurofilament light chain: A prognostic biomarker in amyotrophic lateral sclerosis. Neurology. 2015;84:2247–57. Yuan A, Rao MV, Veeranna, Nixon RA. Neurofilaments and Neurofilament Proteins in Health and Disease. Cold Spring Harb Perspect Biol. 2017;9. Verde F, Steinacker P, Weishaupt JH, Kassubek J, Oeckl P, Halbgebauer S, et al. Neurofilament light chain in serum for the diagnosis of amyotrophic lateral sclerosis. J Neurol Neurosurg Psychiatry. 2019;90:157–64. Davies JC, Dharmadasa T, Thompson AG, Edmond EC, Yoganathan K, Gao J, et al. Limited value of serum neurofilament light chain in diagnosing amyotrophic lateral sclerosis. Brain Commun. 2023;5:fcad163. Loyfer N, Magenheim J, Peretz A, Cann G, Bredno J, Klochendler A, et al. A DNA methylation atlas of normal human cell types. Nature. 2023;613:355–64. Li Y, Pan X, Roberts ML, Liu P, Kotchen TA, Cowley AW Jr, et al. Stability of global methylation profiles of whole blood and extracted DNA under different storage durations and conditions. Epigenomics. 2018;10:797–811. Kustanovich A, Schwartz R, Peretz T, Grinshpun A. Life and death of circulating cell-free DNA. Cancer Biol Ther. 2019;20:1057–67. Bronkhorst AJ, Ungerer V, Holdenrieder S. The emerging role of cell-free DNA as a molecular marker for cancer management. Biomol Detect Quantif. 2019;17:100087. Warren JD, Xiong W, Bunker AM, Vaughn CP, Furtado LV, Roberts WL, et al. Septin 9 methylated DNA is a sensitive and specific blood test for colorectal cancer. BMC Med. 2011;9:133. Zhang S, Cooper-Knock J, Weimer AK, Shi M, Moll T, Marshall JNG, et al. Genome-wide identification of the genetic basis of amyotrophic lateral sclerosis. Neuron. 2022;110:992–e100811. Sances S, Bruijn LI, Chandran S, Eggan K, Ho R, Klim JR, et al. Modeling ALS with motor neurons derived from human induced pluripotent stem cells. Nat Neurosci. 2016;19:542–53. Caggiano C, Celona B, Garton F, Mefford J, Black BL, Henderson R, et al. Comprehensive cell type decomposition of circulating cell-free DNA with CelFiE. Nat Commun. 2021;12:2717. Stamenova EK, Aiden EL, Lander ES, Engreitz JM. Activity-by-contact model of enhancer–promoter regulation from thousands of CRISPR perturbations. Nature. 2019. ENCODE Project Consortium, Moore JE, Purcaro MJ, Pratt HE, Epstein CB, Shoresh N, et al. Expanded encyclopaedias of DNA elements in the human and mouse genomes. Nature. 2020;583:699–710. Wiench M, John S, Baek S, Johnson TA, Sung M-H, Escobar T, et al. DNA methylation status predicts cell type-specific enhancer activity. EMBO J. 2011;30:3028–39. Lachmann A, Torre D, Keenan AB, Jagodnik KM, Lee HJ, Wang L, et al. Massive mining of publicly available RNA-seq data from human and mouse. Nat Commun. 2018;9:1366. Xie Z, Bailey A, Kuleshov MV, Clarke DJB, Evangelista JE, Jenkins SL, et al. Gene Set Knowledge Discovery with Enrichr. Curr Protoc. 2021;1:e90. Nizzardo M, Taiana M, Rizzo F, Aguila Benitez J, Nijssen J, Allodi I, et al. Synaptotagmin 13 is neuroprotective across motor neuron diseases. Acta Neuropathol. 2020;139:837–53. Finucane HK, Bulik-Sullivan B, Gusev A, Trynka G, Reshef Y, Loh P-R, et al. Partitioning heritability by functional annotation using genome-wide association summary statistics. Nat Genet. 2015;47:1228–35. van Rheenen W, van der Spek RAA, Bakker MK, van Vugt JJFA, Hop PJ, Zwamborn RAJ, et al. Common and rare variant association analyses in amyotrophic lateral sclerosis identify 15 risk loci with distinct genetic architectures and neuron-specific biology. Nat Genet. 2021;53:1636–48. Chatterton Z, Mendelev N, Chen S, Carr W, Kamimori GH, Ge Y, et al. Bisulfite Amplicon Sequencing Can Detect Glia and Neuron Cell-Free DNA in Blood Plasma. Front Mol Neurosci. 2021;14:672614. Lehmann-Werman R, Neiman D, Zemmour H, Moss J, Magenheim J, Vaknin-Dembinsky A, et al. Identification of tissue-specific cell death using methylation patterns of circulating DNA. Proc Natl Acad Sci U S A. 2016;113:E1826–34. Li S, Zeng W, Ni X, Liu Q, Li W, Stackpole ML, et al. Comprehensive tissue deconvolution of cell-free DNA by deep learning for disease diagnosis and monitoring. Proc Natl Acad Sci U S A. 2023;120:e2305236120. Ye Z, Chatterton Z, Pflueger J, Damiano JA, McQuillan L, Harvey AS, et al. Cerebrospinal fluid liquid biopsy for detecting somatic mosaicism in brain. Brain Commun. 2021;3:fcaa235. Sender R, Noor E, Milo R, Dor Y. What fraction of cellular DNA turnover becomes cfDNA? bioRxiv. 2023. Voytek B. Are there really as many neurons in the human brain as stars in the Milky Way. Scitable, Nature Education. Gautier O, Blum JA, Maksymetz J, Chen D, Schweingruber C, Mei I et al. Human motor neurons are rare and can be transcriptomically divided into known subtypes. bioRxiv. 2023;:2023.04.05.535689. Chen K, Zhao H, Yang F, Hui B, Wang T, Wang LT, et al. Dynamic changes of circulating tumour DNA in surgical lung cancer patients: protocol for a prospective observational study. BMJ Open. 2018;8:e019012. Tarasov A, Vilella AJ, Cuppen E, Nijman IJ, Prins P. Sambamba: fast processing of NGS alignment formats. Bioinformatics. 2015;31:2032–4. Li J, Zhao S, Lee M, Yin Y, Li J, Zhou Y et al. Reliable tumor detection by whole-genome methylation sequencing of cell-free DNA in cerebrospinal fluid of pediatric medulloblastoma. Sci Adv. 2020;6. ICRP. ICRP Publication 89: Basic Anatomical and Physiological Data for Use in Radiological Protection: Reference Values. SAGE Publications Limited; 2003. Piovesan A, Pelleri MC, Antonaros F, Strippoli P, Caracausi M, Vitale L. On the length, weight and GC content of the human genome. BMC Res Notes. 2019;12:106. Benatar M, Wuu J, Andersen PM, Lombardi V, Malaspina A. Neurofilament light: A candidate biomarker of presymptomatic amyotrophic lateral sclerosis and phenoconversion. Ann Neurol. 2018;84:130–9. Khier S, Lohan L. Kinetics of circulating cell-free DNA for biomedical applications: critical appraisal of the literature. Future Sci OA. 2018;4:FSO295. Emlen W, Mannik M. Effect of DNA size and strandedness on the in vivo clearance and organ localization of DNA. Clin Exp Immunol. 1984;56:185–92. Diehl F, Schmidt K, Choti MA, Romans K, Goodman S, Li M, et al. Circulating mutant DNA to assess tumor dynamics. Nat Med. 2008;14:985–90. Cooper-Knock J, Jenkins T, Shaw PJ. Clinical and Molecular Aspects of Motor Neuron Disease. Colloquium Ser Genomic Mol Med. 2013;2:1–60. Miller TM, Cudkowicz ME, Genge A, Shaw PJ, Sobue G, Bucelli RC, et al. Trial of Antisense Oligonucleotide Tofersen for SOD1 ALS. N Engl J Med. 2022;387:1099–110. Snyder MW, Kircher M, Hill AJ, Daza RM, Shendure J. Cell-free DNA Comprises an In Vivo Nucleosome Footprint that Informs Its Tissues-Of-Origin. Cell. 2016;164:57–68. Wu J, Liu Z, Huang T, Wang Y, Song MM, Song T, et al. Cerebrospinal fluid circulating tumor DNA depicts profiling of brain metastasis in NSCLC. Mol Oncol. 2023;17:810–24. Additional Declarations No competing interests reported. Supplementary Files SupplementaryTablesfinal.xlsx SupplementaryMaterial.docx Cite Share Download PDF Status: Published Journal Publication published 14 Jan, 2025 Read the published version in BMC Medical Genomics → Version 1 posted Editorial decision: Revision requested 08 Nov, 2024 Editor assigned by journal 08 Nov, 2024 Submission checks completed at journal 08 Nov, 2024 First submitted to journal 05 Nov, 2024 You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-5397445","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":375740210,"identity":"eea2755a-c444-4fa5-b0e9-f22cf1799a19","order_by":0,"name":"Calum Harvey","email":"","orcid":"","institution":"University of Sheffield","correspondingAuthor":false,"prefix":"","firstName":"Calum","middleName":"","lastName":"Harvey","suffix":""},{"id":375740211,"identity":"14102b4e-868c-4575-80be-79aecbd7fcfa","order_by":1,"name":"Alicja Nowak","email":"","orcid":"","institution":"University of Sheffield","correspondingAuthor":false,"prefix":"","firstName":"Alicja","middleName":"","lastName":"Nowak","suffix":""},{"id":375740212,"identity":"3217c7b4-d41e-45c8-9b0d-9c7892642c85","order_by":2,"name":"Sai Zhang","email":"","orcid":"","institution":"University of Florida","correspondingAuthor":false,"prefix":"","firstName":"Sai","middleName":"","lastName":"Zhang","suffix":""},{"id":375740213,"identity":"8fb4673e-cc2b-474c-988d-911b6a27f500","order_by":3,"name":"Tobias Moll","email":"","orcid":"","institution":"University of Sheffield","correspondingAuthor":false,"prefix":"","firstName":"Tobias","middleName":"","lastName":"Moll","suffix":""},{"id":375740214,"identity":"43c40075-73af-485b-8567-7980fa69bc26","order_by":4,"name":"Annika K Weimer","email":"","orcid":"","institution":"Novo Nordisk Foundation Center for Genomic Mechanisms of Disease, Broad Institute","correspondingAuthor":false,"prefix":"","firstName":"Annika","middleName":"K","lastName":"Weimer","suffix":""},{"id":375740215,"identity":"5c096f0c-f706-4e2c-a81d-9651cde76b80","order_by":5,"name":"Aina Mogas Barcons","email":"","orcid":"","institution":"University of Sheffield","correspondingAuthor":false,"prefix":"","firstName":"Aina","middleName":"Mogas","lastName":"Barcons","suffix":""},{"id":375740216,"identity":"1a05a88e-3527-4628-9b8b-23d683753c14","order_by":6,"name":"Cleide Dos Santos Souza","email":"","orcid":"","institution":"University of Sheffield","correspondingAuthor":false,"prefix":"","firstName":"Cleide","middleName":"Dos Santos","lastName":"Souza","suffix":""},{"id":375740217,"identity":"c5371eb6-c6b3-4862-86a8-c060c34d9485","order_by":7,"name":"Laura Ferraiuolo","email":"","orcid":"","institution":"University of Sheffield","correspondingAuthor":false,"prefix":"","firstName":"Laura","middleName":"","lastName":"Ferraiuolo","suffix":""},{"id":375740218,"identity":"b9d4c83d-c92b-4159-923d-e49d6e5d1cd8","order_by":8,"name":"Kevin Kenna","email":"","orcid":"","institution":"University Medical Center Utrecht","correspondingAuthor":false,"prefix":"","firstName":"Kevin","middleName":"","lastName":"Kenna","suffix":""},{"id":375740219,"identity":"736c681c-f9fc-407d-acb5-b17bd170ae01","order_by":9,"name":"Noah Zaitlen","email":"","orcid":"","institution":"UCLA","correspondingAuthor":false,"prefix":"","firstName":"Noah","middleName":"","lastName":"Zaitlen","suffix":""},{"id":375740220,"identity":"57d1781f-76e6-41d2-a82c-db3f38ca3b1a","order_by":10,"name":"Christa Caggiano","email":"","orcid":"","institution":"UCLA","correspondingAuthor":false,"prefix":"","firstName":"Christa","middleName":"","lastName":"Caggiano","suffix":""},{"id":375740221,"identity":"9e254eb8-965e-4349-b413-4321b0505b4f","order_by":11,"name":"Pamela J Shaw","email":"","orcid":"","institution":"University of Sheffield","correspondingAuthor":false,"prefix":"","firstName":"Pamela","middleName":"J","lastName":"Shaw","suffix":""},{"id":375740222,"identity":"1651572a-3fa3-4bf9-bd20-ca2404a97669","order_by":12,"name":"Michael P Snyder","email":"","orcid":"","institution":"Stanford University School of Medicine","correspondingAuthor":false,"prefix":"","firstName":"Michael","middleName":"P","lastName":"Snyder","suffix":""},{"id":375740223,"identity":"85ce3c19-2595-4d1a-b25c-7271d8ec31b6","order_by":13,"name":"Jonathan Mill","email":"","orcid":"","institution":"University of Exeter Medical School, University of Exeter","correspondingAuthor":false,"prefix":"","firstName":"Jonathan","middleName":"","lastName":"Mill","suffix":""},{"id":375740224,"identity":"7dd967bb-f43b-437b-bc09-e2e4167ee1a3","order_by":14,"name":"Eilis Hannon","email":"","orcid":"","institution":"University of Exeter Medical School, University of Exeter","correspondingAuthor":false,"prefix":"","firstName":"Eilis","middleName":"","lastName":"Hannon","suffix":""},{"id":375740225,"identity":"9a9812ca-f160-470e-9039-68f95d16b9aa","order_by":15,"name":"Johnathan Cooper-Knock","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAABKklEQVRIie3RPUvDQBjA8ec4SJcLWQ9K8StcCPgCofkqCYFMha4ODieBuARdFfwQHXVLCaTLaddAB68ITh3ilql6aSsWmtTV4f5DOAg/7p47AJ3uH4b4z4o0n8wFBkhWPpzv/mR/kkgRbN/7QDvJb1uSbwkcIZjjpfx8csdAXqayFnPvrBcjLi/pgPdyiYloOZjh2A8iuuDmbWin5SJ4TnPEfUEdTiKGSdlCiNE3k5x5c3Lah2rhs3I8k0FCAw4jwKTqIl8MrA159djbO+LBWhFrdYxkDMxUkTJDkxIrwhWhzS4tB4s3s4QMiHDsVITBRIRqloI6Cf1g08fD8e2buLmxoSIjW9bF0GOzHF3XV+7gzgqXclUckhj233M/A9of8gQ6iU6n0+l2fQNiAW5jT5JmugAAAABJRU5ErkJggg==","orcid":"","institution":"University of Sheffield","correspondingAuthor":true,"prefix":"","firstName":"Johnathan","middleName":"","lastName":"Cooper-Knock","suffix":""}],"badges":[],"createdAt":"2024-11-05 17:53:15","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-5397445/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-5397445/v1","draftVersion":[],"editorialEvents":[{"content":"https://doi.org/10.1186/s12920-025-02084-w","type":"published","date":"2025-01-14T15:57:49+00:00"}],"editorialNote":"","failedWorkflow":false,"files":[{"id":69891205,"identity":"51f70d53-6319-4e88-9203-c8638e5283e2","added_by":"auto","created_at":"2024-11-26 10:28:28","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":110281,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eDerivation and biomarker evaluation of a hypomethylated DNA signature from whole genome bisulfite sequencing (WGBS) of human motor neurons. \u003c/strong\u003eMN-specific DNA hypomethylation was used to assess the proportion of MN DNA within cfDNA in plasma from ALS patients (n=12) and CSF from controls (n=4). We sort to verify the validity of MN-specific DNA hypomethylated regions by linking regions to target genes and cross-checking those genes with independent observations of MN gene expression; we hypothesised that correctly identified hypomethylated regions should indicate regions of open, active and transcribed chromatin which should be statistically enriched in measures of MN-specific gene expression. We linked regions to target genes using the activity-by-contact (ABC) model \u003ca href=\"https://paperpile.com/c/RHr1tE/S6l64\"\u003e[13]\u003c/a\u003e.\u003c/p\u003e","description":"","filename":"floatimage1.png","url":"https://assets-eu.researchsquare.com/files/rs-5397445/v1/3bf8333f4b0947166863ce60.png"},{"id":69889907,"identity":"d751c619-ab95-41eb-bad9-f2dc1c10cba8","added_by":"auto","created_at":"2024-11-26 10:20:28","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":210516,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eiPSC-derived MN maintain a DNA methylation signature consistent with human adult neurons. \u003c/strong\u003e(\u003cstrong\u003eA\u003c/strong\u003e) Whole genome bisulfite sequencing (WGBS) of genomic DNA derived from human iPSC-derived MN was used to derive a profile of genomic methylation within MN for comparison with methylation profiles of 205 samples covering 39 cell-types \u003ca href=\"https://paperpile.com/c/RHr1tE/FUBzx\"\u003e[5]\u003c/a\u003e. (\u003cstrong\u003eB\u003c/strong\u003e) Unsupervised clustering was used to assess cell-similarity and revealed that iPSC-derived MN (blue text) cluster together with human CNS neurons (green text).\u003c/p\u003e","description":"","filename":"floatimage2.png","url":"https://assets-eu.researchsquare.com/files/rs-5397445/v1/439c5af83d5b24618abe5a00.png"},{"id":69889909,"identity":"f0e72efd-5bbf-4563-bb0f-6ae6e4e9ef3d","added_by":"auto","created_at":"2024-11-26 10:20:28","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":79412,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eMN specific DNA methylation is linked to MN function but not to genetic risk for ALS. \u003c/strong\u003e(\u003cstrong\u003eA\u003c/strong\u003e) We used independent measurements of MN gene expression and ALS heritability to verify the biological validity of identified MN-specific hypomethylated genomic regions. MN-specific hypomethylated genes are enriched with genes expressed in human MN (\u003cstrong\u003eB\u003c/strong\u003e) and in human iPSC-derived MN (\u003cstrong\u003eC\u003c/strong\u003e). MN-specific hypomethylated genes are not differentially expressed in ALS iPSC-derived MN compared to control MN (\u003cstrong\u003eD\u003c/strong\u003e).\u003c/p\u003e","description":"","filename":"floatimage3.png","url":"https://assets-eu.researchsquare.com/files/rs-5397445/v1/cee0b673f9d07fd34528be25.png"},{"id":69891207,"identity":"bbbc9ca1-22c1-4fc3-93da-cace6f04fe28","added_by":"auto","created_at":"2024-11-26 10:28:28","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":75487,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eOptimised set of MN-specific hypomethylated genomic regions is not detectable in ALS patient plasma cfDNA. \u003c/strong\u003e(\u003cstrong\u003eA\u003c/strong\u003e) We used a synthetic mix of WGBS reads from non-diseased plasma cfDNA together with spike-in reads from iPSC-derived MN to determine the optimum set of MN-specific regions for detection in ALS patient biosamples. \u003cstrong\u003e(B)\u003c/strong\u003e At spike-ins of 1-10% there is a linear relationship between spike-in and predicted MN DNA concentrations for all sets of MN-specific methylation blocks; p\u0026lt;0.02, adjusted r\u003csup\u003e2\u003c/sup\u003e\u0026gt;0.998, Pearson's product moment correlation coefficient.\u0026nbsp; (\u003cstrong\u003eC\u003c/strong\u003e) At spike-ins ≲1% it is possible to detect reads derived from MN-specific regions but the detection probability is \u0026lt;0.5. (\u003cstrong\u003eD\u003c/strong\u003e) MN-specific DNA is not detectable within ALS patient plasma.\u0026nbsp;\u003c/p\u003e","description":"","filename":"floatimage4.png","url":"https://assets-eu.researchsquare.com/files/rs-5397445/v1/8cb3dab1096e95d19541e91d.png"},{"id":69889903,"identity":"520ecab7-0b3e-4b2c-8b78-f97fc4e1c8b8","added_by":"auto","created_at":"2024-11-26 10:20:28","extension":"png","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":120871,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eThe theoretical maximum proportion of MN-derived DNA within plasma cfDNA is very low. \u003c/strong\u003e(\u003cstrong\u003eA\u003c/strong\u003e) We can estimate the proportion of plasma cfDNA derived from MN based on the number of MN dying, the proportion of released DNA which reaches plasma cfDNA and the half-life of cfDNA. (\u003cstrong\u003eB\u003c/strong\u003e) For different disease durations between one and five years we estimate the proportion of plasma cfDNA derived from MN; and (\u003cstrong\u003eC\u003c/strong\u003e) we estimate the rate of MN-death necessary to achieve a given concentration within plasma cfDNA.\u003c/p\u003e","description":"","filename":"floatimage5.png","url":"https://assets-eu.researchsquare.com/files/rs-5397445/v1/75712e3e55300c7c46f424d6.png"},{"id":74284862,"identity":"db0ccdc7-83e2-4f5e-8cbc-6cce0fd783cf","added_by":"auto","created_at":"2025-01-20 16:13:17","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":1896708,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-5397445/v1/f4496456-13e9-4404-ae79-74b095f91dfc.pdf"},{"id":69889906,"identity":"9da42cfc-5f43-4b0f-ac4b-a9124bffe5fb","added_by":"auto","created_at":"2024-11-26 10:20:28","extension":"xlsx","order_by":1,"title":"","display":"","copyAsset":false,"role":"supplement","size":596027,"visible":true,"origin":"","legend":"","description":"","filename":"SupplementaryTablesfinal.xlsx","url":"https://assets-eu.researchsquare.com/files/rs-5397445/v1/e851e151340b608404ad1f8c.xlsx"},{"id":69891206,"identity":"30640f1b-5630-4859-95b1-f4a8d484c54a","added_by":"auto","created_at":"2024-11-26 10:28:28","extension":"docx","order_by":2,"title":"","display":"","copyAsset":false,"role":"supplement","size":266230,"visible":true,"origin":"","legend":"","description":"","filename":"SupplementaryMaterial.docx","url":"https://assets-eu.researchsquare.com/files/rs-5397445/v1/73e0b7a339a80e061ddfb87d.docx"}],"financialInterests":"No competing interests reported.","formattedTitle":"Evaluation of a biomarker for amyotrophic lateral sclerosis derived from a hypomethylated DNA signature of human motor neurons","fulltext":[{"header":"Introduction","content":"\u003cp\u003eAmyotrophic lateral sclerosis (ALS) is an incurable neurodegenerative disease where death results from motor neuron (MN) loss leading to respiratory failure. The design and development of novel therapeutics has been held back because of the lack of a specific biomarker. Currently, neurofilament proteins measured in plasma provide a non-specific readout of neuronal death [\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e]. Neurofilament proteins form important structural components of the large myelinated axons which are found in MN. MN death triggers the release of neurofilaments from the cytoplasm into the extracellular space [\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e]; as a result the level of detectable neurofilament is a function of the rate of MN death, and thus neurofilament measurement can be used as a biomarker of disease progression [\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e]. However, neurofilaments are not specific to MN and it is notable that serum neurofilament light chain (NfL) [\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e] is elevated in other neurological diseases. Indeed, for diagnosis of ALS, serum NfL is of limited value [\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e] even if it is useful for measuring the rate of progression. It follows that detection of a different marker which is released \u003cem\u003eonly\u003c/em\u003e from dying MN may outperform neurofilaments as a biomarker for ALS.\u003c/p\u003e \u003cp\u003eDNA methylation is fundamental to the control of gene expression and by inference, genomic methylation should be relatively cell specific. Cell-specific DNA methylation signals are stable between individuals, as was confirmed by a recent atlas of DNA methylation [\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e]. Moreover, DNA methylation is relatively stable over time [\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e]. Cell-free DNA (cfDNA) found in peripheral blood is the product of release from dying cells [\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e] and has been extensively proposed as a source of biomarkers in the cancer field [\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e]; methylated cfDNA is now the basis of FDA-approved applications e.g. [\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e]. We hypothesised that a DNA methylation signature which is specific to MN, and is detectable within cfDNA, might be both sensitive and specific as a biomarker of the rate of MN death due to ALS.\u003c/p\u003e \u003cp\u003eWe present whole genome bisulfite sequencing (WGBS) data from iPSC-derived MN from controls. These data complement our previously published epigenetic profiling from the same neurons [\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e]. It is practically difficult to obtain MN in sufficient quantity from post-mortem material to perform WGBS and therefore we chose to focus on iPSC-derived MN which are a gold-standard model of ALS [\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e]. We have published WGBS of cfDNA from ALS patients and controls [\u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e] but previously we lacked a MN signature for comparison. Here we show, using simulation and measurement, that MN-specific DNA methylation is not detectable within cfDNA in plasma by WGBS. Future work will evaluate our MN DNA methylation signature by other means and in other biofluids. Our approach is summarised in Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e"},{"header":"Results","content":"\u003cdiv id=\"Sec3\" class=\"Section2\"\u003e \u003ch2\u003eCell-specific DNA methylation within control iPSC-derived MN is similar to human adult CNS neurons\u003c/h2\u003e \u003cp\u003eWGBS was performed at high depth to profile DNA methylation within iPSC-derived MN from three neurologically normal individuals (\u003cb\u003eSupplementary Table\u0026nbsp;1, Methods\u003c/b\u003e). A first question was whether the methylation signature of these neurons, which are derived \u003cem\u003ein vitro\u003c/em\u003e, is consistent with CNS neurons abstained from human tissue.\u003c/p\u003e \u003cp\u003eWGBS sequencing data were processed and quality control (QC) was performed according to the ENCODE 4 standards [\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e]. Methylation profiles of 205 samples covering 39 cell-types from an available methylation atlas [\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e] were combined with our samples, then used to segment the genome into blocks of co-methylated CpGs (\u003cb\u003eMethods\u003c/b\u003e). Hierarchical unsupervised clustering was used to examine the relationships between samples (\u003cb\u003eMethods\u003c/b\u003e, Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003eA). As expected, genome methylation within iPSC-derived MN clustered closely with CNS neuronal subtypes (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003eB). On this basis we proceeded to use our data to identify MN-specific methylation (\u003cb\u003eMethods\u003c/b\u003e).\u003c/p\u003e \u003c/div\u003e\n\u003ch3\u003eIdentification of cell-specific hypomethylated genomic regions\u003c/h3\u003e\n\u003cp\u003eNext we derived DNA methylation changes specific to MN via comparison with the methylation profiles of 205 samples covering 39 cell-types from an available methylation atlas [\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e]. Blocks of co-methylated CpGs that exhibited hyper- or hypomethylation specifically in MN were identified (\u003cb\u003eMethods\u003c/b\u003e) and taken forward for further analysis. In total 8,729 regions were specifically hypomethylated in MN (\u003cb\u003eSupplementary Table\u0026nbsp;2\u003c/b\u003e); hypomethylation indicates increased genomic accessibility suggestive of MN-specific function. A similar analysis identified 5,690 blocks which were specifically hypomethylated in the total set of human CNS neurons compared to other cell-types. The number of regions identified per cell-type varied dramatically from 61,693 for gallbladder to 436 for colon fibroblasts.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e\n\u003ch3\u003eMN-specific DNA methylation is linked to MN function but not to genetic risk for ALS\u003c/h3\u003e\n\u003cp\u003eCell-specific DNA methylation is typically hypomethylated [\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e], which should be coincident with increased accessibility of underlying DNA over regulatory regions including enhancers [\u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e15\u003c/span\u003e]. As a validation of the regions we have identified, we examined the overlap of MN-specific hypomethylated enhancers and their target genes, with independent measurements of MN gene expression and ALS heritability (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003eA).\u003c/p\u003e \u003cp\u003eTo derive associated genes from MN-specific hypomethylated DNA blocks, we applied the activity-by-contact (ABC) model [\u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e] to link regulatory regions to expressed genes within iPSC-derived MN (\u003cb\u003eMethods\u003c/b\u003e). We found the total list of hypomethylated regions is associated with 2,046 expressed genes. We then tested this gene list for enrichment with human cell types and tissues included in ARCHS4 [\u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e] using Enrichr [\u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e17\u003c/span\u003e], and found they were most significantly enriched for genes expressed specifically in spinal motor neurons isolated from post-mortem tissue [\u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e18\u003c/span\u003e] (Fisher\u0026rsquo;s exact test, p\u0026thinsp;=\u0026thinsp;4.22e-19, OR\u0026thinsp;=\u0026thinsp;1.79, using the ARCHS4 database [\u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e], Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003eB). This demonstrates that the methylation profiles of the iPSC derived motor neurons are congruent with transcriptional profiles of human motor neurons.\u003c/p\u003e \u003cp\u003eTo further characterise the function of MN-specific hypomethylated genes we examined RNA-sequencing from iPSC-derived motor neurons obtained from 245 ALS patients and 45 controls (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e\u003ca href=\"http://www.answerals.org\" target=\"_blank\"\u003ewww.answerals.org\u003c/a\u003e\u003c/span\u003e\u003cspan address=\"http://www.answerals.org\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e) (\u003cb\u003eMethods\u003c/b\u003e). Genes linked to hypomethylated regions in MN were highly expressed within iPSC-derived MN compared to the background transcriptome (Wilcox rank sum test, p\u0026thinsp;\u0026lt;\u0026thinsp;2.2e-16, Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003eC) which is consistent with an important role in MN function. Four genes were reported as differentially expressed (FDR\u0026thinsp;\u0026lt;\u0026thinsp;0.05, negative binomial test) between ALS patients and controls in this data, but genes linked to hypomethylated regions in MN were not enriched within ALS-associated differentially expressed genes (Wilcoxon rank sum test, p\u0026thinsp;=\u0026thinsp;0.25, Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003eD).\u003c/p\u003e \u003cp\u003eFinally, we performed LDSC [\u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e19\u003c/span\u003e] using a recent GWAS study of ALS [\u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e20\u003c/span\u003e] to examine disease-specific heritability enrichment within MN-specific hypomethylated regions. Heritability for ALS was enriched within hypomethylated regions but this was not statistically significant (OR\u0026thinsp;=\u0026thinsp;25.2, se\u0026thinsp;=\u0026thinsp;26.05, p\u0026thinsp;=\u0026thinsp;0.38, LDSC, \u003cb\u003eMethods\u003c/b\u003e). We conclude that MN-specific DNA hypomethylation is associated with gene expression linked to MN function, but we find no conclusive evidence that there is a specific association with genes dysregulated in MN in a disease context.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e\n\u003ch3\u003eAn optimum set of hypomethylated DNA regions for ALS biomarker design\u003c/h3\u003e\n\u003cp\u003eAn important use of cell-type-specific methylation profiles is for the deconvolution of complex mixes of DNA to identify the proportions of contributing cell types. This has the potential to lead to a novel biomarker of ALS: Cell-free DNA (cfDNA) found within plasma is released from dying cells and thus, the quantity of DNA sourced from CNS neurons, and MN in particular, should be proportional to the rate of MN death. Neuronal DNA is not normally seen in the plasma [\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e], which may be due to a low rate of neuron death or to the blood brain barrier, but brain-derived DNA has been detected in plasma under pathological conditions [\u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e21\u003c/span\u003e, \u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e22\u003c/span\u003e] demonstrating its potential to serve as a biomarker.\u003c/p\u003e \u003cp\u003eTo deconvolute plasma cfDNA we optimised the UXM algorithm [\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e] for the low coverage (~\u0026thinsp;10x) typical of methylation studies of cfDNA; in particular we optimised the choice and configuration of MN-specific methylation blocks. The UXM algorithm was chosen as it makes use of read level methylation data, and has achieved accurate deconvolution of cell types present at proportions as low as 0.1% [\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e]. Optimisation was performed using synthetic data generated by spiking WGBS data derived from plasma cfDNA of healthy individuals, with sequencing reads derived from human MN at a known proportion between 0.01%-10% (\u003cb\u003eMethods\u003c/b\u003e, Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003eA). We simulated relatively low coverage (10x) to match coverage in the actual ALS cfDNA samples. We observed a linear correlation between the actual and predicted percentage of spike-in MN DNA with an adjusted r\u003csup\u003e2\u003c/sup\u003e\u0026thinsp;\u0026lt;\u0026thinsp;0.9 in all marker sets (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003eB). A configuration of UXM using 500 MN-specific blocks with a minimum of 3 CpGs produced the highest detection probability at 1% spike-in, but 500 blocks with a minimum of 4 CpGs performed better at both 0.5% and 0.1% spike-in (difference in detection probability between 0.1\u0026ndash;0.2 at each % spike-in, Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003eC). However, we note that at spike-ins of \u0026le;\u0026thinsp;0.5%, AUC was poor for all sets of MN marker blocks. The greatest AUC (0.69) at 1% spike-in was achieved with 500 blocks with a minimum of 3 CpGs, in keeping with its higher probability of detection (\u003cb\u003eSupplementary Fig.\u0026nbsp;1A\u003c/b\u003e); this was the configuration taken forward to analyse ALS patient samples.\u003c/p\u003e \u003cp\u003eAs seen in [\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e, \u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e23\u003c/span\u003e], deconvolution frequently identified false-positive cell-types within the synthetic mixture (\u003cb\u003eSupplementary Fig.\u0026nbsp;1B\u003c/b\u003e). We used a linear model to examine the effect of coverage and number of marker regions the total number of cell types identified in a sample. Both coverage (p\u0026thinsp;=\u0026thinsp;0.04) and number of markers (p\u0026thinsp;=\u0026thinsp;3.7e-4) were significantly negatively correlated with the number of cell types identified, suggesting that increased coverage and using more marker regions per cell-type will reduce the number of cell types falsely identified within a mixture.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e\n\u003ch3\u003eMN-derived DNA is not detectable within plasma cfDNA\u003c/h3\u003e\n\u003cp\u003eWhen we applied our optimised deconvolution utilising 500 MN-specific methylation blocks with a minimum of 3 CpGs to plasma cfDNA WBGS from n\u0026thinsp;=\u0026thinsp;12 ALS patients we did not identify MN-derived DNA in any sample (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003eD) suggesting that if MN DNA is present it is below the detectable limit of ~\u0026thinsp;1% of plasma cfDNA (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003eB-C).\u003c/p\u003e \u003cdiv id=\"Sec8\" class=\"Section2\"\u003e \u003ch2\u003eNeuronal-derived DNA is detectable in CSF cfDNA\u003c/h2\u003e \u003cp\u003eThe cerebrospinal fluid (CSF) surrounds the spinal cord and brain, and is encapsulated by the blood brain barrier. It might be expected that CSF cfDNA is enriched in neuronal DNA compared to plasma and so we attempted to fully characterise the contributing cell types within CSF cfDNA (\u003cb\u003eMethods\u003c/b\u003e).\u003c/p\u003e \u003cp\u003eNo WGBS data was available from ALS patient CSF cfDNA. We analysed four samples of WGBS CSF cfDNA from hydrocephalus patients [\u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e24\u003c/span\u003e]. Coverage was very low (0.12-0.45x, \u003cb\u003eSupplementary Table\u0026nbsp;3\u003c/b\u003e) due to the low concentration of cfDNA within the spinal cord so samples were merged to improve deconvolution accuracy. We discovered that neuronal and oligodendrocyte DNA comprised 13% and 14% of the total cfDNA with the remainder largely composed of a mix of blood, epithelial, and adipocyte cell types (\u003cb\u003eSupplementary Fig.\u0026nbsp;1\u003c/b\u003e); MN-derived DNA was not detectable in any sample. The contribution of adipocytes may in part reflect the lumbar puncture procedure used to collect CSF as DNA. The lack of a number of CNS-specific cell-types such as microglia within the reference leads to a possible assignment error which is impossible to quantify, and is likely responsible for the small proportion of epithelial and pancreatic cell types identified.\u003c/p\u003e \u003c/div\u003e\n\u003ch3\u003eThe theoretical maximum proportion of MN-derived DNA within plasma cfDNA is very low\u003c/h3\u003e\n\u003cp\u003eWe did not detect MN DNA in any ALS patient sample suggesting that if MN DNA is present it is below ~\u0026thinsp;1% of plasma cfDNA. We questioned if this was a detection deficiency or whether there might be insufficient MN DNA for detection. To address this we modelled the theoretical maximum proportion of MN DNA that might be expected within plasma cfDNA (Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003eA).\u003c/p\u003e \u003cp\u003eRecent work [\u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e25\u003c/span\u003e] has estimated the effect of cellular turnover on the proportion of DNA derived from different cell-types detectable within plasma cfDNA. The proportion of DNA released from dying cells that reaches cfDNA varies dramatically, from 3% of released DNA for megakaryocytes and endothelial cells, to 0.003% for erythrocyte progenitors. Although there are \u0026gt;\u0026thinsp;86\u0026nbsp;billion neurons in the human CNS [\u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e26\u003c/span\u003e], lower MN are a rare subtype of neurons, and previous work has estimated that there may be \u0026lt;\u0026thinsp;500,000 in total [\u003cspan citationid=\"CR27\" class=\"CitationRef\"\u003e27\u003c/span\u003e]. Assuming optimum availability then 3% of released MN DNA will be detectable within plasma cfDNA. If we assume all lower MN die over the course of disease, we can estimate the theoretical maximum proportion of MN DNA as a part of total plasma cfDNA as a function of the rate of disease progression (\u003cb\u003eMethods\u003c/b\u003e, Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003eB). From this we can calculate that even for the fastest theoretical disease progression rate, the plasma concentration of MN DNA would be several orders of magnitude smaller than our threshold for detection, primarily because of the small number of MN relative to other cell types. We have assumed a half life for cfDNA of 114 minutes [\u003cspan citationid=\"CR28\" class=\"CitationRef\"\u003e28\u003c/span\u003e]. In our simulation experiments we achieved a detection probability greater than chance only when the proportion of cfDNA attributed to MN was \u0026gt;\u0026thinsp;1% (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003eB-C) which determined the threshold for theoretical detection.\u003c/p\u003e \u003cp\u003eWe sought to estimate what rate of MN death would be required to produce a detectable concentration within cfDNA. Using the proportion of DNA from cellular turnover detectable as cfDNA in the plasma from endothelial cell and erythroblasts as maximum and minimum estimates, we show that even if all lower MN died within 24 hours, their contribution to cfDNA would still be below the limit of detection for WGBS (Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003eC). We consider this estimate of wider use to the field as it predicts whether a detectable quantity of cfDNA will be present from a known rate of cell death.\u003c/p\u003e"},{"header":"Methods","content":"\u003cdiv id=\"Sec11\" class=\"Section2\"\u003e \u003ch2\u003eTissue culture and development of iPSC-derived MN\u003c/h2\u003e \u003cp\u003eTissue culture of iPSCs and the derivation of pure MN cultures via small molecules is described elsewhere [\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e].\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec12\" class=\"Section2\"\u003e \u003ch2\u003eWhole genome bisulfite sequencing (WGBS) of DNA derived from iPSC-derived MN\u003c/h2\u003e \u003cp\u003eWe generated WGBS libraries following the Whole-Genome Bisulfite Sequencing Data Standards and Processing Pipeline (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://www.encodeproject.org/data-standards/wgbs/\u003c/span\u003e\u003cspan address=\"https://www.encodeproject.org/data-standards/wgbs/\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003e).\u003c/span\u003e In brief, genomic DNA was extracted from ~\u0026thinsp;50,000 cells per technical replicate before shearing and bisulfite treatment. Libraries were amplified by PCR and purified. Library concentrations were measured (Qubit). WGBS libraries were paired-end sequenced on a NovaSeq 6000 system (Illumina) with target 30X coverage Raw data were processed with the ENCODE 4 pipeline for WGBS according to ENCODE 4 standards. Files are available at encodeproject.org with the following accession numbers: ENCSR734EFX, ENCSR509LMK, ENCSR978LOX.\u003c/p\u003e \u003cp\u003ePaired-end FASTQ files were mapped to the human (hg38), lambda, pUC19 and viral genomes using bwa-meth (v.0.2.0) then converted to BAM files using SAMtools (v.1.9)52. Duplicated reads were marked by Sambamba (v.0.6.5) with parameters \u0026lsquo;-l 1 -t 16 --sort-buffer-size 16000 --overflow-list-size 10000000\u0026rsquo; [\u003cspan citationid=\"CR29\" class=\"CitationRef\"\u003e29\u003c/span\u003e]. Reads with low mapping quality, duplicated or not mapped in a proper pair were excluded using SAMtools view with parameters \u0026lsquo;-F 1796 -q 10\u0026rsquo;. Reads were stripped from nonCpG nucleotides and converted to PAT files using wgbstools (v.0.2.0, downloaded from Github \u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003egithub.com/nloyfer/wgbs_tools\u003c/span\u003e in September 2022), command \u003cem\u003ewgbstools bam2pat --genome hg38\u003c/em\u003e. Methylation across the MN samples was examined using a PCA plot, and technical replicates were found to have low heterogeneity. Technical replicates were then merged to allow inclusion in the wgbstools pipeline.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec13\" class=\"Section2\"\u003e \u003ch2\u003eGenome segmentation into methylation blocks\u003c/h2\u003e \u003cp\u003eUsing all three of our samples and all 205 samples from a methylation atlas we segmented the genome into 1,630,133 blocks of 4 or more CpGs using the wgbstools command \u0026lsquo;wgbstools segment --min_cpg 4 --max_bp 5000\u0026rsquo;. PAT and BETA files for all 207 available samples mapped to GRCh38 were downloaded from GEO (accession number GSE186458) [\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e] on the 20th of September 2022. As per the original publication we excluded two cardiomyocyte samples due to low coverage. We also segmented the genome into 1,938,130 blocks of 3 CpGs were identified using the wgbstools command wgbstools segment --min_cpg 3 --max_bp 5000; these blocks of 3 CpGs were used only for marker selection.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec14\" class=\"Section2\"\u003e \u003ch2\u003eUnsupervised clustering of DNA methylation profiles\u003c/h2\u003e \u003cp\u003eAverage methylation per block (of at least 4 CpGs in size) for each sample was extracted using the wgbstools command \u0026lsquo;beta_to_table\u0026rsquo;, replacing blocks with less than 10x coverage in a sample with \u0026lsquo;NA\u0026rsquo;. We then selected the top 1% of blocks by variance, excluding blocks with any \u0026lsquo;NA\u0026rsquo; values across all samples, and used these for clustering. Unsupervised clustering was performed using Python version 3.10.8, Dask version 2023.9.2, SciPy 1.9.1, options method='average', metric='cityblock', optimal_ordering\u0026thinsp;=\u0026thinsp;True.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec15\" class=\"Section2\"\u003e \u003ch2\u003eDerivation of MN-specific hypomethylated genomic regions\u003c/h2\u003e \u003cp\u003eWe applied the wgbstools command \u0026lsquo;find_markers\u0026rsquo; together with all 205 samples used for segmentation. Default parameters were used to remove low coverage regions, samples with a read depth of less than 5 in a segment had the value set to NA, and segments with greater than 1 in 3 NA values in either the target or background cell type were removed. Regions were considered MN-specific if there was a difference of at least 0.3 between the mean motor neuron methylation and mean of all other samples\u0026rsquo; methylation within that block, and the p value of a t-test was equal to or below 0.05.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec16\" class=\"Section2\"\u003e \u003ch2\u003eIdentification of genes linked to MN-specific hypomethylated genomic regions\u003c/h2\u003e \u003cp\u003eWe implemented the ABC model [\u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e] following the guidelines provided at \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://github.com/broadinstitute/ABC-Enhancer-Gene-Prediction\u003c/span\u003e\u003cspan address=\"https://github.com/broadinstitute/ABC-Enhancer-Gene-Prediction\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e. First, we called peaks for the ATAC-seq profiling using MACS2, and then identified the candidate enhancer elements using \u0026ldquo;makeCandidateRegions.py\u0026rdquo; with parameters peakExtendFromSummit\u0026thinsp;=\u0026thinsp;250 and nStrongestPeaks\u0026thinsp;=\u0026thinsp;150000. The black-listed regions generated by the ENCODE 4 (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://www.encodeproject.org/\u003c/span\u003e\u003cspan address=\"https://www.encodeproject.org/\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003e)\u003c/span\u003e were used for removing enhancers overlapping regions with anomalous sequencing reads. Second, we applied \u0026ldquo;run.neighborhoods.py\u0026rdquo; to quantify the enhancer activities by counting ATAC-seq and H3K27ac ChIP-seq reads in candidate enhancer regions. RNA-seq profiling of iPSC-derived MNs was also provided to inform expressed genes. Quantile normalisation was applied using K562 epigenetic data as the reference. At last, using \u0026ldquo;predict.py\u0026rdquo; we computed the ABC scores by combining the enhancer activities (calculated by the second step) with the Hi-C profiling. Hi-C data was fit to the power-law model. The default threshold 0.02 was used to define valid E-P links.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec17\" class=\"Section2\"\u003e \u003ch2\u003eTranscriptome analysis\u003c/h2\u003e \u003cp\u003eFor AnswerALS data, gene expression profiling of iPSC-derived MNs and phenotype data were obtained for 245 ALS patients and 45 neurologically normal controls (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://www.answerals.org/\u003c/span\u003e\u003cspan address=\"https://www.answerals.org/\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e). Gene expression was normalised by the The trimmed mean of M-values normalisation method (TMM). We used a negative binomial test to determine genes differentially expressed between ALS patients and controls. Significance testing was performed for all genes expressed in MN (n\u0026thinsp;=\u0026thinsp;22,976) defined as count above zero in more than half of samples; in addition we excluded the bottom 25% of genes based on mean count across all samples.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec18\" class=\"Section2\"\u003e \u003ch2\u003eGeneration of synthetics mixes of MN-derived DNA together with plasma cfDNA\u003c/h2\u003e \u003cp\u003eWGBS of plasma cfDNA samples produced by Caggiano C. et al. [\u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e] were downloaded from \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE164600\u003c/span\u003e\u003cspan address=\"https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE164600\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e in February 2023, including 12 ALS patients and 12 healthy volunteers. Raw FastQ files were trimmed with Trim Galore version 6.7 using the options \u0026lsquo;trim_galore --paired -clip_R1 4 --clip_R2 4 --three_prime_clip_R1 12 --three_prime_clip_R2 12\u0026rsquo; and then aligned to GRCh38 using the bowtie 2 aligner in Bismark version 22.3. Duplicate reads were removed with Bismark and Samtools version 1.16.1 was used to remove reads with a MAPQ score below 10. BAM files were then converted to PAT and BETA files using wgbstools.\u003c/p\u003e \u003cp\u003eUsing wgbstools command \u0026lsquo;mix_pat\u0026rsquo;, synthetic mixes of MN sample PGP_M_55_iPSC (\u003cb\u003eSupplementary Table\u0026nbsp;1\u003c/b\u003e) or cerebral neuron sample Cortex-Neuron-Z0000042F [\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e] and the either the 12 plasma cfDNA samples from healthy volunteers, or the 4 CSF cfDNA samples from hydrocephalus patients were created. By down- or up-sampling the cfDNA and neuronal reads, spike-ins were made at 0\u0026ndash;10%, and coverage was varied from 2.5-30x.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec19\" class=\"Section2\"\u003e \u003ch2\u003eDeconvolution of plasma cfDNA and optimisation of a deconvolution algorithm\u003c/h2\u003e \u003cp\u003eWe derived uniquely hypomethylated regions for each cell-type to use for deconvolution. In this process we excluded the two samples used for spike-in to prevent overfitting. Segmentation was repeated as before to derive two sets of regions, one with a minimum length of 3 CpGs and one with a minimum length of 4 CpGs. For both sets of regions cell type specific marker regions were found using wgbstools \u0026lsquo;find_markers\u0026rsquo; with a minimum difference between target and background means of 0.3 and a t-test p-value equal to or below 0.05. To derive different numbers of marker regions, for each cell-type the marker regions were ordered by the difference between the 75th-centile in the target group and the 2.5th centile in the background and then 25, 50, 100, 250, 300, 400, or 500 marker regions were selected. Marker regions for all cell types were then used to create an atlas of the fragment based methylation for each region across all cell types using the UXM tool downloaded from \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://github.com/nloyfer/UXM_deconv\u003c/span\u003e\u003cspan address=\"https://github.com/nloyfer/UXM_deconv\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e on the 31st of January 2023. We then used UXM to deconvolve the synthetic mixes, producing estimated cell type contributions for each mix. These were then analysed using R version 4.3.1 (2023-06-16). To optimise region selection we tested using smaller or larger regions, and more or less regions per cell-type in order to maximise the probability of detection of spiked-in DNA, and minimise the normalised root mean squared error (RMSE).\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec20\" class=\"Section2\"\u003e \u003ch2\u003eDeconvolution of CSF cfDNA\u003c/h2\u003e \u003cp\u003eWGBS of CSF cfDNA samples [\u003cspan citationid=\"CR30\" class=\"CitationRef\"\u003e30\u003c/span\u003e] were downloaded from \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE142241\u003c/span\u003e\u003cspan address=\"https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE142241\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e in April 2023, including four hydrocephalus patients. Reads were trimmed with trim-galore version 6.7 using the paired option and default settings. Due to low mapping efficiency of the reads we followed the \u0026lsquo;Dirty Harry\u0026rsquo; protocol described by the creators of the Bismark software. Reads were first aligned as paired end reads using the bowtie aligner within Bismark. Unmapped R1 reads were then aligned in directional mode, and R2 reads were then aligned in pbat mode before combining them into a single file. Duplicate reads were then removed with Bismark, then Samtools version 1.16.1 was used to remove reads with a MAPQ score below 10 before converting them into PAT and Beta files using wgbstools.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec21\" class=\"Section2\"\u003e \u003ch2\u003eTheoretical estimate of the maximum of MN-derived DNA within plasma cfDNA\u003c/h2\u003e \u003cp\u003eThe concentration of cfDNA produced from cell death is given by the standard pharmacokinetic equation for concentration produced by a drug infusion at a constant rate.\u003c/p\u003e \u003cp\u003eC\u0026thinsp;=\u0026thinsp;d( k\u003csub\u003e0\u003c/sub\u003e* t\u003csub\u003e1/2\u003c/sub\u003e ) / ( ln(2) * Vd )\u003c/p\u003e \u003cp\u003eWhere C is the concentration in the plasma, k\u003csub\u003e0\u003c/sub\u003e is the infusion rate, t\u003csub\u003e1/2\u003c/sub\u003e is the half life, Vd is the volume of distribution, and d is the proportion of DNA from cell death present in the plasma. We were able to calculate the theoretical maximum concentration of MN DNA within plasma cfDNA as a function of the time period over which the DNA was released i.e. disease duration by making reasonable assumptions for each of these values. Using the values given for a 70kg 20\u0026ndash;25 year old man as has historically been used as standard, the volume of plasma is 3.0L [\u003cspan citationid=\"CR31\" class=\"CitationRef\"\u003e31\u003c/span\u003e]. In the absence of a ground-truth for the proportion of DNA released from dying MN that reaches plasma cfDNA, we used observed maximum and minimum proportions for other cell-types: from 3% for megakaryocytes and endothelial cells to 0.003% for erythrocyte progenitors [\u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e25\u003c/span\u003e]. Infusion rate is given by the rate of cell death, and converted to weight of DNA using the conversion 1 diploid genome\u0026thinsp;=\u0026thinsp;6.46pg [\u003cspan citationid=\"CR32\" class=\"CitationRef\"\u003e32\u003c/span\u003e]. The total number of lower MN has been estimated at ~\u0026thinsp;500,000 [\u003cspan citationid=\"CR27\" class=\"CitationRef\"\u003e27\u003c/span\u003e] and we estimate a constant rate of loss over the disease course based on the observation that neurofilament levels, a biomarker of neuronal death, rise prior to disease onset then reach a stable concentration that is proportional to speed of progression [\u003cspan citationid=\"CR33\" class=\"CitationRef\"\u003e33\u003c/span\u003e]. The half life of plasma cfDNA has been measured using a variety of means, including the decrease in foetal cfDNA following pregnancy, the decrease in tumour cfDNA following surgery, and the increase and decrease in cfDNA following exercise [\u003cspan citationid=\"CR34\" class=\"CitationRef\"\u003e34\u003c/span\u003e]. A key point is to distinguish between the distribution half life and steady state half life. As shown by experiments with radiolabeled double stranded DNA [\u003cspan citationid=\"CR35\" class=\"CitationRef\"\u003e35\u003c/span\u003e], following an infusion DNA is taken up by soft tissues causing its concentration in the plasma to decrease rapidly until an equilibrium is reached with equal movement of DNA between the soft tissues and plasma. Following this the concentration of DNA will reach a steady state where its concentration is determined by the infusion rate and the steady state half life. We use 114 minutes as our estimate for the steady state half life as this is based on the fall in circulating tumour DNA following complete resection of the tumour [\u003cspan citationid=\"CR36\" class=\"CitationRef\"\u003e36\u003c/span\u003e]. cfDNA from the tumour would have reached a steady state prior to the surgery and its decrease from the surgery would be in line with the steady state half life. When estimating the proportion of cfDNA we use the concentration of 297pg/ul as the expected concentration of plasma cfDNA as this was the average concentration in controls age and sex matched to ALS patients [\u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e].\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e"},{"header":"Discussion","content":"\u003cp\u003eALS is currently an incurable and invariably fatal neurodegenerative disease [\u003cspan citationid=\"CR37\" class=\"CitationRef\"\u003e37\u003c/span\u003e]. Biomarkers are crucial for translational medicine and the recent development of serum NfL as a biomarker for ALS [\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e] has been key to the development of new treatments [\u003cspan citationid=\"CR38\" class=\"CitationRef\"\u003e38\u003c/span\u003e]. However, a key deficiency of NfL measurement is that it is not specific to MN [\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e], the primary degenerating cell in ALS. We and others have hypothesised that detection of cell-specific methylation of DNA within plasma cfDNA might provide an alternative and more specific biomarker for ALS. Here we show theoretically and experimentally that this goal is potentially not achievable using WGBS of plasma cfNDA, at least under the experimental conditions we encountered. Alternative approaches are needed which may include alternative biofluids or detection methods.\u003c/p\u003e \u003cp\u003eWe have developed a MN-specific set of hypomethylated genomic regions using WGBS in iPSC-derived MN from neurologically normal individuals, together with an atlas of tissue-specific methylation [\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e]. We demonstrate that these regions are associated with genes which are key to MN function but not significantly enriched with ALS genetic risk. Our regions are likely to be useful for future works aiming to detect DNA derived from MN using different detection methods.\u003c/p\u003e \u003cp\u003eOur simulations and our measurements suggest that the sensitivity of WGBS is limited to 1% of plasma cfDNA which is significantly greater than the theoretical maximum proportion of plasma cfDNA derived from rapidly degenerating MN, which we determine to be 1.6*10\u003csup\u003e\u0026minus;\u0026thinsp;5\u003c/sup\u003e%. This is due to the relatively small number of MN compared to the ongoing turnover of other cell-types. It is not inconceivable that MN-derived DNA could be detected at this level but targeted amplification together with more sensitive detection will be necessary.\u003c/p\u003e \u003cp\u003eAn important limitation to our work, and the majority of deconvolution algorithms, is that they assume the sequenced DNA fragments are randomly distributed across the genome, which is not correct. It is known that the formation of cell-free DNA from genomic DNA leads to preferential preservation of nucleosome-bound DNA, so cell-free DNA from different cell types or tissues produces fragmentation patterns with greater depth at sites bound to nucleosomes [\u003cspan citationid=\"CR39\" class=\"CitationRef\"\u003e39\u003c/span\u003e]. Enrichment of MN-specific methylation blocks used for detection with nucleosome-bound genomic regions could potentially improve the performance of detection.\u003c/p\u003e \u003cp\u003eIt is possible that use of an alternative biofluid might enable detection of MN-specific DNA. CSF is the obvious choice given that, unlike blood, it is not separated from MN by the blood brain barrier (BBB). However, the extremely low concentration of cfDNA in CSF \u0026ndash; 0.4ng/mL versus 7.7ng/mL in plasma [\u003cspan citationid=\"CR40\" class=\"CitationRef\"\u003e40\u003c/span\u003e] \u0026ndash; may again be prohibitive. Our preliminary analysis suggests that neuronal but not MN-derived DNA is detectable within CSF cfDNA via WGBS, but this did not include sequencing data from ALS patients.\u003c/p\u003e \u003cp\u003eOur study has contributed WGBS data from iPSC-derived MN (encodeproject.org, \u003cb\u003eMethods\u003c/b\u003e) and the identification of MN-specific hypomethylated genomic regions. We have not achieved a new biomarker for ALS but we have delineated the challenge for this approach through both theoretical calculations and experimental measurements. We have shown that WGBS of cfDNA derived from plasma is not likely to lead to a new biomarker for ALS and that future research should focus on developing our MN-specific regions with a more sensitive detection method.\u003c/p\u003e"},{"header":"Declarations","content":"\u003cp\u003eEthics approval and consent to participate\u003c/p\u003e\n\u003cp\u003eThe study was approved by the South Sheffield Research Ethics Committee. Also, this study followed study protocols approved by Medical Ethical Committees for each of the participating institutions. Written informed consent was obtained from all participating individuals. All methods were performed in accordance with relevant national and international guidelines and regulations.\u003c/p\u003e\n\u003cp\u003eConsent for publication\u003c/p\u003e\n\u003cp\u003eWritten informed consent was obtained from all participating individuals.\u003c/p\u003e\n\u003cp\u003eAvailability of data and material\u003c/p\u003e\n\u003cp\u003eWGBS data are available at encodeproject.org with the following accession numbers: ENCSR734EFX, ENCSR509LMK, ENCSR978LOX.\u003c/p\u003e\n\u003cp\u003eCompeting interests\u003c/p\u003e\n\u003cp\u003eThe authors declare that they have no competing interests.\u003c/p\u003e\n\u003cp\u003eFunding\u003c/p\u003e\n\u003cp\u003eThis work was supported by the National Institutes of Health (CEGS 5P50HG00773504, 1P50HL083800, 1R01HL101388, 1R01-HL122939, S10OD025212, P30DK116074, and UM1HG009442 to MPS), the Wellcome Trust (216596/Z/19/Z to JCK), and NIHR (NF-SI-0617-10077 to PJS). CH/JCK are supported by the MNDA (899-792). We also acknowledge support from a Kingsland fellowship (T.M.), and the NIHR Sheffield Biomedical Research Centre for Translational Neuroscience (IS-BRC-1215-20017) and the NIHR Sheffield Clinical Research Facility.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAuthor contributions\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eCH, PJS, MPS, SZ, JM, EH and JCK conceived and designed the study. CH, AN, AMB and JCK performed statistical analyses. AKW, CDSS, LF and TM carried out experiments. CH, EH, JM, SZ, JCK, KK, CC, and NZ interpreted the data with assistance from all other authors. JCK, JM, PJS, and MPS supervised the work. CH, EH and JCK wrote the manuscript with feedback from all other authors.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAcknowledgments\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eWe are very grateful to the ALS patients and control subjects who generously donated biosamples. We acknowledge transcriptomic data provided by the AnswerALS Consortium. Figures were created using\u003ca href=\"http://biorender.com\"\u003e\u0026nbsp;\u003c/a\u003eBioRender.com.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\u003cli\u003e\u003cspan\u003eLu C-H, Macdonald-Wallis C, Gray E, Pearce N, Petzold A, Norgren N, et al. Neurofilament light chain: A prognostic biomarker in amyotrophic lateral sclerosis. Neurology. 2015;84:2247\u0026ndash;57.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eYuan A, Rao MV, Veeranna, Nixon RA. Neurofilaments and Neurofilament Proteins in Health and Disease. Cold Spring Harb Perspect Biol. 2017;9.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eVerde F, Steinacker P, Weishaupt JH, Kassubek J, Oeckl P, Halbgebauer S, et al. Neurofilament light chain in serum for the diagnosis of amyotrophic lateral sclerosis. J Neurol Neurosurg Psychiatry. 2019;90:157\u0026ndash;64.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eDavies JC, Dharmadasa T, Thompson AG, Edmond EC, Yoganathan K, Gao J, et al. Limited value of serum neurofilament light chain in diagnosing amyotrophic lateral sclerosis. Brain Commun. 2023;5:fcad163.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLoyfer N, Magenheim J, Peretz A, Cann G, Bredno J, Klochendler A, et al. A DNA methylation atlas of normal human cell types. Nature. 2023;613:355\u0026ndash;64.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLi Y, Pan X, Roberts ML, Liu P, Kotchen TA, Cowley AW Jr, et al. Stability of global methylation profiles of whole blood and extracted DNA under different storage durations and conditions. Epigenomics. 2018;10:797\u0026ndash;811.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKustanovich A, Schwartz R, Peretz T, Grinshpun A. Life and death of circulating cell-free DNA. Cancer Biol Ther. 2019;20:1057\u0026ndash;67.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBronkhorst AJ, Ungerer V, Holdenrieder S. The emerging role of cell-free DNA as a molecular marker for cancer management. Biomol Detect Quantif. 2019;17:100087.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWarren JD, Xiong W, Bunker AM, Vaughn CP, Furtado LV, Roberts WL, et al. Septin 9 methylated DNA is a sensitive and specific blood test for colorectal cancer. BMC Med. 2011;9:133.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZhang S, Cooper-Knock J, Weimer AK, Shi M, Moll T, Marshall JNG, et al. Genome-wide identification of the genetic basis of amyotrophic lateral sclerosis. Neuron. 2022;110:992\u0026ndash;e100811.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSances S, Bruijn LI, Chandran S, Eggan K, Ho R, Klim JR, et al. Modeling ALS with motor neurons derived from human induced pluripotent stem cells. Nat Neurosci. 2016;19:542\u0026ndash;53.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eCaggiano C, Celona B, Garton F, Mefford J, Black BL, Henderson R, et al. Comprehensive cell type decomposition of circulating cell-free DNA with CelFiE. Nat Commun. 2021;12:2717.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eStamenova EK, Aiden EL, Lander ES, Engreitz JM. Activity-by-contact model of enhancer\u0026ndash;promoter regulation from thousands of CRISPR perturbations. Nature. 2019.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eENCODE Project Consortium, Moore JE, Purcaro MJ, Pratt HE, Epstein CB, Shoresh N, et al. Expanded encyclopaedias of DNA elements in the human and mouse genomes. Nature. 2020;583:699\u0026ndash;710.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWiench M, John S, Baek S, Johnson TA, Sung M-H, Escobar T, et al. DNA methylation status predicts cell type-specific enhancer activity. EMBO J. 2011;30:3028\u0026ndash;39.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLachmann A, Torre D, Keenan AB, Jagodnik KM, Lee HJ, Wang L, et al. Massive mining of publicly available RNA-seq data from human and mouse. Nat Commun. 2018;9:1366.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eXie Z, Bailey A, Kuleshov MV, Clarke DJB, Evangelista JE, Jenkins SL, et al. Gene Set Knowledge Discovery with Enrichr. Curr Protoc. 2021;1:e90.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eNizzardo M, Taiana M, Rizzo F, Aguila Benitez J, Nijssen J, Allodi I, et al. Synaptotagmin 13 is neuroprotective across motor neuron diseases. Acta Neuropathol. 2020;139:837\u0026ndash;53.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eFinucane HK, Bulik-Sullivan B, Gusev A, Trynka G, Reshef Y, Loh P-R, et al. Partitioning heritability by functional annotation using genome-wide association summary statistics. Nat Genet. 2015;47:1228\u0026ndash;35.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003evan Rheenen W, van der Spek RAA, Bakker MK, van Vugt JJFA, Hop PJ, Zwamborn RAJ, et al. Common and rare variant association analyses in amyotrophic lateral sclerosis identify 15 risk loci with distinct genetic architectures and neuron-specific biology. Nat Genet. 2021;53:1636\u0026ndash;48.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eChatterton Z, Mendelev N, Chen S, Carr W, Kamimori GH, Ge Y, et al. Bisulfite Amplicon Sequencing Can Detect Glia and Neuron Cell-Free DNA in Blood Plasma. Front Mol Neurosci. 2021;14:672614.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLehmann-Werman R, Neiman D, Zemmour H, Moss J, Magenheim J, Vaknin-Dembinsky A, et al. Identification of tissue-specific cell death using methylation patterns of circulating DNA. Proc Natl Acad Sci U S A. 2016;113:E1826\u0026ndash;34.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLi S, Zeng W, Ni X, Liu Q, Li W, Stackpole ML, et al. Comprehensive tissue deconvolution of cell-free DNA by deep learning for disease diagnosis and monitoring. Proc Natl Acad Sci U S A. 2023;120:e2305236120.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eYe Z, Chatterton Z, Pflueger J, Damiano JA, McQuillan L, Harvey AS, et al. Cerebrospinal fluid liquid biopsy for detecting somatic mosaicism in brain. Brain Commun. 2021;3:fcaa235.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSender R, Noor E, Milo R, Dor Y. What fraction of cellular DNA turnover becomes cfDNA? bioRxiv. 2023.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eVoytek B. Are there really as many neurons in the human brain as stars in the Milky Way. Scitable, Nature Education.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eGautier O, Blum JA, Maksymetz J, Chen D, Schweingruber C, Mei I et al. Human motor neurons are rare and can be transcriptomically divided into known subtypes. bioRxiv. 2023;:2023.04.05.535689.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eChen K, Zhao H, Yang F, Hui B, Wang T, Wang LT, et al. Dynamic changes of circulating tumour DNA in surgical lung cancer patients: protocol for a prospective observational study. BMJ Open. 2018;8:e019012.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eTarasov A, Vilella AJ, Cuppen E, Nijman IJ, Prins P. Sambamba: fast processing of NGS alignment formats. Bioinformatics. 2015;31:2032\u0026ndash;4.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLi J, Zhao S, Lee M, Yin Y, Li J, Zhou Y et al. Reliable tumor detection by whole-genome methylation sequencing of cell-free DNA in cerebrospinal fluid of pediatric medulloblastoma. Sci Adv. 2020;6.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eICRP. ICRP Publication 89: Basic Anatomical and Physiological Data for Use in Radiological Protection: Reference Values. SAGE Publications Limited; 2003.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003ePiovesan A, Pelleri MC, Antonaros F, Strippoli P, Caracausi M, Vitale L. On the length, weight and GC content of the human genome. BMC Res Notes. 2019;12:106.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBenatar M, Wuu J, Andersen PM, Lombardi V, Malaspina A. Neurofilament light: A candidate biomarker of presymptomatic amyotrophic lateral sclerosis and phenoconversion. Ann Neurol. 2018;84:130\u0026ndash;9.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKhier S, Lohan L. Kinetics of circulating cell-free DNA for biomedical applications: critical appraisal of the literature. Future Sci OA. 2018;4:FSO295.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eEmlen W, Mannik M. Effect of DNA size and strandedness on the in vivo clearance and organ localization of DNA. Clin Exp Immunol. 1984;56:185\u0026ndash;92.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eDiehl F, Schmidt K, Choti MA, Romans K, Goodman S, Li M, et al. Circulating mutant DNA to assess tumor dynamics. Nat Med. 2008;14:985\u0026ndash;90.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eCooper-Knock J, Jenkins T, Shaw PJ. Clinical and Molecular Aspects of Motor Neuron Disease. Colloquium Ser Genomic Mol Med. 2013;2:1\u0026ndash;60.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMiller TM, Cudkowicz ME, Genge A, Shaw PJ, Sobue G, Bucelli RC, et al. Trial of Antisense Oligonucleotide Tofersen for SOD1 ALS. N Engl J Med. 2022;387:1099\u0026ndash;110.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSnyder MW, Kircher M, Hill AJ, Daza RM, Shendure J. Cell-free DNA Comprises an In Vivo Nucleosome Footprint that Informs Its Tissues-Of-Origin. Cell. 2016;164:57\u0026ndash;68.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWu J, Liu Z, Huang T, Wang Y, Song MM, Song T, et al. Cerebrospinal fluid circulating tumor DNA depicts profiling of brain metastasis in NSCLC. Mol Oncol. 2023;17:810\u0026ndash;24.\u003c/span\u003e\u003c/li\u003e\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":true,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"bmc-medical-genomics","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"mgnm","sideBox":"Learn more about [BMC Medical Genomics](http://bmcmedgenomics.biomedcentral.com/)","snPcode":"","submissionUrl":"https://www.editorialmanager.com/mgnm/default.aspx","title":"BMC Medical Genomics","twitterHandle":"BMC_series","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"em","reportingPortfolio":"BMC Series","inReviewEnabled":true,"inReviewRevisionsEnabled":true},"keywords":"","lastPublishedDoi":"10.21203/rs.3.rs-5397445/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-5397445/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eAmyotrophic lateral sclerosis (ALS) lacks a specific biomarker, but is defined by relatively selective toxicity to motor neurons (MN). As others have highlighted, this offers an opportunity to develop a sensitive and specific biomarker based on detection of DNA released from dying MN within accessible biofluids. Here we have performed whole genome bisulfite sequencing (WGBS) of iPSC-derived MN from neurologically normal individuals. By comparing MN methylation with an atlas of tissue methylation we have derived a MN-specific signature of hypomethylated genomic regions, which accords with genes important for MN function. Through simulation we have optimised the selection of regions for biomarker detection in plasma and CSF cell-free DNA (cfDNA). However, we show that MN-derived DNA is not detectable via WGBS in plasma cfDNA. In support of our experimental finding, we show theoretically that the relative sparsity of lower MN sets a limit on the proportion of plasma cfDNA derived from MN which is below the threshold for detection of WGBS. Our findings are important for the ongoing development of ALS biomarkers. The MN-specific hypomethylated genomic regions we have derived could be usefully combined with more sensitive detection methods and perhaps with study of CSF instead of plasma. Indeed we demonstrate that neuronal-derived DNA is detectable in CSF. Our work is relevant for all diseases featuring death of rare cell-types.\u003c/p\u003e","manuscriptTitle":"Evaluation of a biomarker for amyotrophic lateral sclerosis derived from a hypomethylated DNA signature of human motor neurons","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2024-11-26 10:20:23","doi":"10.21203/rs.3.rs-5397445/v1","editorialEvents":[{"type":"communityComments","content":0},{"type":"decision","content":"Revision requested","date":"2024-11-08T11:43:53+00:00","index":"","fulltext":""},{"type":"editorAssigned","content":"","date":"2024-11-08T10:03:08+00:00","index":"","fulltext":""},{"type":"checksComplete","content":"","date":"2024-11-08T10:01:28+00:00","index":"","fulltext":""},{"type":"submitted","content":"BMC Medical Genomics","date":"2024-11-05T17:43:36+00:00","index":"","fulltext":""}],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"bmc-medical-genomics","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"mgnm","sideBox":"Learn more about [BMC Medical Genomics](http://bmcmedgenomics.biomedcentral.com/)","snPcode":"","submissionUrl":"https://www.editorialmanager.com/mgnm/default.aspx","title":"BMC Medical Genomics","twitterHandle":"BMC_series","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"em","reportingPortfolio":"BMC Series","inReviewEnabled":true,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"edb6e7f8-7814-413c-8345-972c1571eae2","owner":[],"postedDate":"November 26th, 2024","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"published-in-journal","subjectAreas":[],"tags":[],"updatedAt":"2025-01-20T16:06:52+00:00","versionOfRecord":{"articleIdentity":"rs-5397445","link":"https://doi.org/10.1186/s12920-025-02084-w","journal":{"identity":"bmc-medical-genomics","isVorOnly":false,"title":"BMC Medical Genomics"},"publishedOn":"2025-01-14 15:57:49","publishedOnDateReadable":"January 14th, 2025"},"versionCreatedAt":"2024-11-26 10:20:23","video":"","vorDoi":"10.1186/s12920-025-02084-w","vorDoiUrl":"https://doi.org/10.1186/s12920-025-02084-w","workflowStages":[]},"version":"v1","identity":"rs-5397445","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-5397445","identity":"rs-5397445","version":["v1"]},"buildId":"8U1c8b4HqxoKbykW_rLl7","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.