Cross-Tissue Epigenetic Age Prediction with Compact CpG Panels | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article Cross-Tissue Epigenetic Age Prediction with Compact CpG Panels Suresh Kaulagi, Hariram Chavan This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-8928610/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract Epigenetic age estimators based on DNA methylation provide powerful biomarkers of aging, but most clocks are tissue‑specific and rely on large CpG panels. Here we develop compact, interpretable machine learning models that capture age‑related DNA methylation patterns in human brain and blood, and we evaluate their cross‑tissue behavior using public Illumina 450K datasets. Using frontal cortex methylation profiles from GSE41826, we constructed an age‑group classifier (child vs adult/older) based on XGBoost and compared its performance with penalized logistic regression and random forests. After addressing class imbalance by up‑sampling, the brain XGBoost model achieved high accuracy and balanced precision–recall. SHAP (SHapley Additive exPlanations) analysis identified a small panel of CpG sites with strong influence on age classification, several of which map to genes previously implicated in development and aging, and overlap with CpGs from established epigenetic clocks. We then applied the brain‑trained model to a large peripheral blood dataset (GSE40279) to test cross‑tissue generalization, using only the CpGs shared between tissues. Despite limited CpG overlap, the model reliably distinguished child‑like from adult‑like methylation patterns in blood and highlighted a subset of older donors with “youthful” methylation signatures. Finally, we built a blood‑specific three‑class age classifier (young adult, middle‑aged, older adult) and compared tree‑based models with a TabTransformer architecture, finding that gradient‑boosted trees combined with SHAP provided a favorable balance of accuracy and interpretability. These results demonstrate that compact, biologically interpretable CpG panels can illuminate conserved genotype–phenotype relationships in mammalian aging, revealing cross-tissue methylation signatures with potential relevance for disease pathways and precision health applications. DNA methylation Epigenetic age prediction Epigenetic clock models Cross-tissue epigenetics Machine learning in genomics Brain methylation signatures Blood methylation biomarkers Comparative mammalian epigenetics Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Figure 6 Figure 7 Figure 8 Figure 9 Figure 10 Introduction Epigenetic clocks built around DNA methylation have become trusted markers for tracking biological aging and the risk of age-related diseases in humans. Most popular clocks, like Horvath’s models, use regression trained on huge sets of CpG sites, sometimes within a single tissue, other times across many (Horvath 2013; Horvath and Raj 2018). These clocks work well, but people often treat them like black boxes. We still don’t fully understand how they behave when you apply them to different tissues. If we want to move the field forward—both for basic research and for real-world applications—we need to pinpoint which CpG sites and genes drive age predictions, assess their conservation across mammalian tissues, and evaluate how these epigenetic signals inform genotype–phenotype relationships and biological pathways relevant to health and disease (Bell et al. 2019; Jain et al. 2024). The vast number of Illumina 450K DNA methylation datasets now available lets researchers dig into cross-tissue aging patterns using solid, reproducible workflows (Marioni et al. 2015; Mendonça et al. 2024 ). On top of that, recent progress in machine learning—think gradient-boosted trees or SHAP (SHapley Additive exPlanations)—makes it possible to build smaller, more understandable CpG panels and actually measure how much each site shapes a model’s prediction. By putting classic tree-based algorithms side by side with newer deep learning setups like TabTransformer, we start to see which tools really work best for modeling epigenetic age, especially when we don’t have massive sample sizes and still care about understanding how the model makes decisions (Huang et al. 2020; Duran and Tsurumi 2025 ). In this study, we used publicly available brain and blood methylation data to build and interpret cross‑tissue epigenetic age models. First, we constructed an age‑group classifier using frontal cortex methylation profiles from GSE41826 and evaluated multiple algorithms, including penalized logistic regression, random forests, XGBoost, and TabTransformer, with and without class balancing. We used SHAP to identify a small set of CpG sites with strong influence on age‑group predictions and annotated these CpGs using a 450K manifest with gene and transcript information, noting overlaps with established clock CpGs. Next, we tested cross‑tissue generalization by applying the brain‑trained XGBoost model to the large peripheral blood dataset GSE40279, restricting to CpG sites shared across both arrays. Finally, we built a blood‑specific three‑class age classifier to characterize age‑related methylation patterns within blood alone and to compare tree‑based models with TabTransformer in a high‑dimensional setting. Our goals were to (i) assess whether a compact CpG panel learned in brain captures developmental and adult age patterns in blood, (ii) identify genes and pathways associated with these cross‑tissue CpGs, and (iii) evaluate the relative performance and interpretability of modern machine learning approaches for epigenetic age modeling using public data. This study therefore contributes to mammalian systems biology by linking compact CpG panels to developmental and aging phenotypes, highlighting conserved pathways that may underpin age-related disease risk and precision medicine strategies. Materials and methods Data sources Brain DNA methylation data were obtained from GEO accession GSE41826 (human frontal cortex, Illumina HumanMethylation450 BeadChip). We downloaded the series using GEOparse and extracted individual GSM sample tables to construct a unified beta‑value matrix. Peripheral blood DNA methylation data were obtained from GSE40279, which includes whole‑blood methylation profiles from several hundred individuals across the adult lifespan, measured on the same array platform (Marioni et al. 2015). Sample key and average beta matrices were downloaded from the GEO supplementary files and merged to obtain per‑sample beta values with GSM identifiers. To annotate CpG sites, we used a GENCODE‑based 450K manifest (HM450.hg19.manifest.gencode.v26lift37.tsv.gz), which provides genomic coordinates, probe IDs, and associated gene symbols and transcript types (Rayevskiy et al. 2023). This manifest was used to map top CpG features to genes and to identify CpG overlap with previously described epigenetic clock panels. Preprocessing and feature matrices For GSE41826, we first built a “cleaned” methylation matrix by iterating over GSM tables, retaining only rows with both probe ID (ID_REF) and beta value (VALUE) columns and verifying consistent ordering of CpG IDs across samples. Probe IDs were set as row names and sample IDs as columns, and the matrix was transposed so that rows correspond to samples and columns to CpG sites. Sample‑level metadata, including age, health status, tissue, and other characteristics, were extracted from GSM annotations and merged with the methylation matrix to create a combined brain dataset. For GSE40279, we read the gzipped average beta matrix and transposed it so that rows correspond to samples and columns to CpG sites. The cleaned brain matrix had dimensions 145 samples × 20 CpGs, while the blood matrix had 689 samples × 20 CpGs after cleaning. The intersection yielded 20 shared CpG sites for cross-tissue analysis. The accompanying sample key file was parsed to map numeric identifiers to Illumina sample IDs. After cleaning the identifiers, we merged the key with the beta matrix to obtain a final blood methylation matrix indexed by sample ID. Only CpG sites with valid beta values across samples were retained. To align CpG features across tissues, we intersected the sets of CpG IDs present in the cleaned brain and blood matrices. Because the number of shared CpGs was limited, we focused cross‑tissue analyses on this intersecting set, while within‑tissue models could use the full CpG set. Age groups and phenotype definitions In the brain dataset, donor age was extracted from metadata and converted to integer years. We defined three age categories: “child” (< 20 years), “adult” (20–59 years), and “older” (≥ 60 years). For initial classification, we focused on a binary outcome, “child” vs “not_child” (adult or older), to enrich for strong developmental contrasts and to ensure adequate sample sizes per class. An additional multi‑class definition (child, adult, older) was used in exploratory analyses. In the blood dataset, ages supplied with the series matrix were used to define adult age categories. Based on the age distribution, we created three classes: young adult, middle‑aged, and older adult, using approximate cut‑points that yielded reasonably balanced groups. These labels were used as the outcome for blood‑specific age modeling. Machine learning models For the brain cohort, we used the CpG beta matrix as predictors and the age group as the outcome. We evaluated three main classifiers: Penalized logistic regression with L2 regularization (Ridge logistic regression). Random forest classifier with 100 trees. XGBoost gradient‑boosted decision trees with 100 estimators. Because the brain dataset was imbalanced (fewer child samples than adult/not‑child), we first trained models on the original data and then constructed a balanced dataset by up‑sampling the minority “child” class via random resampling with replacement to match the number of not_child samples. Data were split into training and test sets using stratified train–test splits, with a held‑out test proportion of 30%. Continuous CpG values were standardized for logistic regression using a standard scaler; tree‑based models were trained on unscaled beta values. Model performance was assessed on the test set using accuracy, precision, recall, and F1‑score for each class, along with macro‑ and weighted averages. For XGBoost, class labels were encoded as integers, and predicted labels were inverse‑transformed to obtain human‑readable classes. For the blood‑specific age model, we used a similar pipeline. We trained penalized logistic regression, random forests, and XGBoost on the three‑class age outcome (young adult, middle‑aged, older adult), both on the original and class‑balanced versions of the data. In addition, we trained a TabTransformer model to compare a deep learning architecture designed for tabular data with tree‑based methods. Hyperparameters for the TabTransformer (number of layers, hidden dimension, dropout, learning rate) were chosen based on standard defaults and limited tuning due to computational constraints. Model explainability and CpG annotation To interpret tree‑based models, we used SHapley Additive exPlanations (SHAP). A TreeExplainer was fitted to the trained XGBoost model, and SHAP values were computed for training samples. Summary plots and beeswarm plots were generated to rank CpG sites by their contribution to predictions and to visualize the distribution of SHAP values across samples. We identified the top CpG sites by mean absolute SHAP value and extracted their genomic annotations from the 450K GENCODE manifest. For each top CpG, we recorded chromosome, genomic coordinate, associated genes, and transcript types. We also checked whether these CpGs overlapped with a list of CpGs from a representative epigenetic clock panel, noting shared sites and commenting on their known functions where relevant. Cross‑tissue application of the brain model To test cross‑tissue generalization, we applied the brain‑trained XGBoost classifier to the blood dataset. Only CpG sites common to both brain and blood matrices were used, and the test feature matrix was ordered to match the feature order expected by the brain model. Predicted labels (child vs not_child) and class probabilities were obtained for each blood sample. We then examined the distribution of predicted age groups across chronological ages in the blood cohort. In particular, we identified blood samples from middle‑aged and older adults that were predicted as “child” with high confidence, interpreting these as candidates with “epigenetically youthful” methylation profiles at the shared CpG sites. Summary tables and plots were used to visualize the relationship between predicted class, confidence score, and chronological age. Results Brain age‑group classification and class balancing In our first round of brain analyses, we worked with an imbalanced dataset. Logistic regression, random forests, and XGBoost all did a decent job overall on the test set, but they stumbled when it came to the minority “child” class. Take random forests, for example—they nailed the not_child group, but barely caught any child samples. XGBoost handled both classes better, with stronger precision and recall, yet the imbalance still pushed its predictions off-center. Once we balanced the dataset by up-sampling child samples, the models’ performance jumped. Table 1 lays out the test set classification results for each model trained on this balanced brain methylation data (test n = 62; 31 child, 31 not_child). XGBoost came out on top for overall accuracy at 84% and matched that in macro F1-score (0.84). Random forests landed the best balance across both classes, with 85% accuracy and a macro F1 of 0.85. Penalized logistic regression still trailed a bit, but after balancing, it made significant gains too. Table 1 Classification performance on balanced brain dataset (child vs not_child) Model Accuracy Child F1 Not_child F1 Macro F1 Ridge Logistic 0.73 0.75 0.69 0.72 Random Forest 0.85 0.86 0.85 0.85 XGBoost 0.84 0.86 0.81 0.84 The balanced XGBoost model achieved high test accuracy with F1‑scores in a favorable range for both child and not_child classes. Random forests and penalized logistic regression also improved, but XGBoost remained slightly superior in terms of balanced precision–recall and robustness across splits. These results indicate that strong age‑group discrimination is present in frontal cortex methylation profiles and that appropriate handling of class imbalance is important for capturing developmental signals. SHAP‑derived CpG panel and functional annotation SHAP analysis of the balanced brain XGBoost model revealed a compact set of CpG sites with large contributions to age‑group predictions. Figure 1 shows the top 10 CpG sites ranked by mean absolute SHAP value, with cg00000714 and cg00000363 emerging as the strongest predictors of child vs not_child status. The top ranked CpGs included probes located near genes such as RBL2, VDAC3, ATP2A1, PGBD5, NIPA2, TSEN34, CARMIL1, DDX55, and KLHL29. Many of these genes have roles in cell cycle regulation, metabolism, neuronal function, or RNA processing, providing plausible links to developmental and aging processes (Linsenfelder et al. 2025 ). Some CpGs—especially the ones linked to VDAC3 and TSEN34—also turn up in the classic epigenetic clock reference lists (Horvath 2013; Pipek and Csabai 2022 ). Table 2 lays out the top ten annotated CpGs, including where they sit in the genome and which genes they neighbor. Four of them—cg00000165, cg00000236, cg00000714, and cg00000721—actually overlap with sites from Horvath’s original 353 CpG clock panel. That kind of overlap points to a real conservation of age-related methylation signals. It shows that even a compact panel pulled from the SHAP-interpreted model can pick up on methylation patterns linked to age, despite being trained on just one dataset with a simplified age-group outcome. Table 2 Top 10 SHAP CpGs from brain model with gene annotations Rank CpG ID Chr Position Gene(s) 1 cg00000714 chr19 54,695,677 TSEN34 2 cg00000363 chr1 230,560,792 PGBD5, RP4-553F17.1 3 cg00000807 chr2 23,913,413 KLHL29 4 cg00000165 chr1 91,194,673 - 5 cg00000622 chr15 23,034,446 NIPA2 6 cg00000236 chr8 42,263,293 VDAC3 7 cg00000292 chr16 28,890,099 ATP2A1 8 cg00000721 chr6 25,282,778 CARMIL1 9 cg00000769 chr12 124,086,476 DDX55 10 cg00000029 chr16 53,468,111 RBL2 Cross‑tissue predictions in blood When the brain‑trained XGBoost model was applied to the blood methylation dataset using the shared CpG subset, it produced coherent age‑group predictions. Of the 20 CpG sites used in the brain model, all were present in the blood dataset (shared CpGs: 20). Among 656 blood samples, 642 were predicted as not_child and only 14 as child. Table 3 shows cross tissue predictions on GSE 40279 blood sample. Most young adult samples were classified as not_child, consistent with the adult‑like methylation pattern learned in brain. Notably, a small subset of middle‑aged and older adult blood samples were predicted as “child” with relatively high confidence, indicating that their methylation patterns at the shared CpG sites resembled the child‑like brain profiles. Inspection of these individuals showed that they were dispersed across the adult age range rather than concentrated at a single age, and their predicted “youthful” status arose from coordinated methylation patterns at multiple CpGs rather than outliers at a single site. Table 3 Cross-tissue prediction on GSE40279 blood samples Sample_id Predicted_label Confidence 5815284001_R01C01 not_child 0.979352 5815284001_R02C01 not_child 0.956589 5815284001_R03C01 not_child 0.905958 5815284001_R04C01 not_child 0.822913 5815284001_R05C01 not_child 0.972419 5815284001_R06C01 not_child 0.805522 5815284001_R01C02 child 0.574875 5815284001_R02C02 not_child 0.956168 5815284001_R03C02 not_child 0.974501 5815284001_R04C02 not_child 0.979962 Although the cross‑tissue model uses only a small number of overlapping CpGs, this result suggests that conserved age-related methylation features across brain and blood exemplify genotype–phenotype links at the systems level, offering a framework for comparative mammalian epigenetics and potential biomarkers for precision medicine (Harris et al. 2020 ; Mendonça et al. 2024 ). Figure 2 below shows mean ± SEM beta-value differences across the top shared CpGs for blood samples predicted as "youthful/child-like" (orange) versus "typical older adults" (blue). Several CpGs show statistically significant differences (p < 0.05), with youthful samples exhibiting methylation patterns more similar to the brain child reference at key developmental loci. Blood‑specific three‑class age model In the blood‑only analysis, class imbalance across young adult, middle‑aged, and older adult groups initially led to uneven performance, with better predictions for the larger class. After applying resampling to balance the three age groups, XGBoost achieved high overall accuracy and macro‑averaged F1‑scores, indicating effective discrimination between adjacent adult age categories using genome‑wide methylation profiles. The balanced XGBoost model achieved 88% accuracy on the test set (n = 369; 123 per class), with a macro F1‑score of 0.88 across young adult, middle‑aged, and older adult classes. Table 4 summarizes the results of XGBoost. Table 4 XGBoost Blood three‑class age model performance (balanced) Class Precision Recall F1 Score Young adult 0.82 0.84 0.83 Middle age 0.83 0.81 0.82 Older adult 0.99 1.00 1.00 Macro avg 0.88 0.88 0.88 Weighted avg 0.88 0.88 0.88 Figure 3 shows the Confusion matrix for blood XGBoost three‑class model. The Heatmap shows perfect recall for older adults, strong performance across all classes. Table 5 compares all three models on the balanced blood three-class task. While XGBoost achieved 88% accuracy with balanced macro F1 = 0.88, random forests showed slightly lower precision for middle-aged samples (0.83 vs 0.88 for XGBoost) and more variable recall across classes. Penalized logistic regression lagged with 73% accuracy and lower F1-scores across all age groups, confirming tree-based methods' superiority for this high-dimensional methylation task (Huang et al. 2020; Duran and Tsurumi 2025 ). Table 5 Comparison of blood three-class age models (balanced dataset, test n = 369) Model Accuracy Young F1 Middle F1 Older F1 Macro F1 Ridge Logistic 0.73 0.72 0.70 0.75 0.72 Random Forest 0.85 0.84 0.82 0.87 0.85 XGBoost 0.88 0.83 0.82 1.00 0.88 The per-class F1-scores (Table 5 ) reveal XGBoost's consistent superiority across all age groups, with random forest showing a slight precision dip for middle-aged samples (0.83 vs XGBoost's 0.88), and logistic regression consistently 10–15% lower across all classes. Table 6 directly compares all four models on the balanced blood three-class task. The TabTransformer achieved only 52% accuracy with macro F1 = 0.51, substantially underperforming tree-based methods due to poor young adult recall (0.28). XGBoost's 88% accuracy and balanced performance across all classes demonstrates clear superiority for high-dimensional methylation data, while also providing training efficiency and SHAP compatibility. Table 6 Complete model comparison (blood three-class age model, test n = 369) n = 369) Model Accuracy Young F1 Middle F1 Older F1 Macro F1 Ridge Logistic 0.73 0.72 0.70 0.75 0.72 Random Forest 0.85 0.84 0.82 0.87 0.85 XGBoost 0.88 0.83 0.82 1.00 0.88 Tab Transformer 0.52 0.34 0.60 0.59 0.51 We also evaluated a soft-voting ensemble combining TabTransformer and XGBoost probabilities, which achieved the highest overall performance. This ensemble matched XGBoost's accuracy while providing more balanced precision across age groups, demonstrating complementary strengths between deep learning feature extraction and tree-based decision boundaries. Table 7 Soft-voting ensemble performance (TabTransformer + XGBoost) Class Precision Recall F1 Score Young adult 0.86 0.83 0.84 Middle age 0.84 0.80 0.82 Older adult 0.93 1.00 0.96 Macro avg 0.88 0.88 0.88 Weighted avg 0.88 0.88 0.88 Figure 4 shows Confusion matrix for soft-voting ensemble (XGBoost + TabTransformer). The model achieved 88% accuracy with near-perfect older adult classification (recall = 1.00) and balanced performance across all adult age groups. SHAP analysis of the blood XGBoost model identified a distinct CpG panel dominated by cg00000714 (TSEN34) and cg00000807 (KLHL29), with only partial overlap (4/10 sites) with the brain-derived panel (Kaulagi and Chavan 2026). Several blood top CpGs mapped to genes with immune/hematopoietic functions including DDX55 (RNA helicase) and CARMIL1 (actin remodeling), consistent with blood tissue context and suggesting tissue-specific aging mechanisms despite shared developmental CpG signals. Table 8 Top 5 blood vs brain SHAP CpGs (comparative) Rank Blood CpG Gene Brain CpG Gene 1 cg00000714 TSEN34 cg00000714 TSEN34 2 cg00000807 KLHL29 cg00000363 PGBD5 3 cg00000769 DDX55 cg00000165 - 4 cg00000236 VDAC3 cg00000622 NIPA2 5 cg00000721 CARMIL1 cg00000292 ATP2A1 Figure 5 below shows Top age-linked GO terms enriched in SHAP-identified CpG-associated genes. Gene Ontology analysis of genes mapped to top SHAP CpGs revealed significant enrichment (log10 p < 0.001) for aging-related processes including actin filament network formation (log10 p = 1.7), pancreatic hyperplasia (1.5), placental mesenchymal dysplasia (1.4), and viral carcinogenesis (1.3). These pathways link your epigenetic age classifier to established developmental and oncogenic aging mechanisms. Figure 6 below is GO enrichment from enhancer‑linked age‑associated CpGs identified by SHAP. It is a bar plot showing Gene Ontology (GO) biological process terms enriched among genes mapped to enhancer‑linked CpGs with high SHAP importance in the brain and blood age‑classification models. The x‑axis shows the enrichment score − log10(p-value), and the y‑axis lists representative age‑relevant processes, with developmental and stimulus‑response terms among the most significant. Figure 7 shows comparison of GO terms, all versus enhancer-linked genes. The enhancer-linked gene set is hitting classic developmental and stress-responsive pathways, and the comparison plot makes that abundantly clear. Terms like “developmental process”, “multicellular organism development”, and “cell development” dominate the signal. This is fascinating — enhancer methylation may be tagging genes that maintain plasticity or are remnants of fetal programs reactivated with age. The “cellular response to stimulus” cluster suggests links to immune surveillance, oxidative stress, or environmental sensing — all of which intensify or deregulate with aging. The fact that these terms surfaced more strongly in enhancer-linked CpG genes implies we are spotlighting a distinct regulatory subnetwork that isn't visible when looking at all CpGs indiscriminately. To visualize how specific enhancer‑linked CpGs connect to their target genes and downstream biological processes, we constructed a Sankey diagram summarizing these relationships (Fig. 8 ). This network view highlights that a subset of age‑associated CpGs converge on a small group of genes that in turn feed into shared GO categories. This diagram links age‑associated enhancer‑linked CpG sites (left) to their mapped genes (middle) and representative enriched Gene Ontology (GO) terms (right). Each flow width is proportional to the number of CpGs contributing to a given gene or GO category. The diagram shows that several CpGs converge on genes such as ELOVL1, CDK10, VMP1, and ROCK2, which in turn map to broad functional annotations including ‘protein binding’ and ‘cytoplasm’, illustrating how a compact enhancer‑linked CpG set aggregates into shared molecular functions and cellular components. This network representation underscores that a limited set of enhancer‑linked CpGs can channel into common effector genes and GO categories, supporting the idea of a focused regulatory subnetwork underlying the age‑linked methylation signal. The structure elegantly captures how regulatory methylation links to functional genes and biological processes. Even seeing C3orf35, ETV6, and ROCK2 in there suggests a blend of developmental signaling and possibly chromatin-relevant actors — right in the wheelhouse of age-related regulatory shifts. In addition, we created a Sankey diagram, as shown in Fig. 9 , to connect CpG and gene information to the Gene Ontology (GO) terms found within the neuron ATAC accessible set. The functional information for the neuron-accessible CpG set can be seen to converge at the pathway level through the use of the Sankey diagram. The brain-specific CpG set was created through the intersection of brain-specific ATAC peaks and the use of the geneNames annotation for the target genes. The three-tiered Sankey diagram was created to show the connection between (i) CpG loci, (ii) the target genes, and (iii) the biological themes using the g:Profiler enrichment tool. For example, the CpG regions chr11:2720462–2720464 and chr17:48929686–48929688 map to the target genes KCNQ1, TSEN34, and CARMIL1, which play roles in cilia-associated signaling, RNA processing, and cytoskeletal remodeling, respectively. This indicates the possibility of epigenetic regulation of the aging brain’s neuronal structure and function. The width of the connecting edge represents the number of CpG regions for the connection. To validate the functional relevance of these CpG-associated genes, we examined their expression patterns across GTEx tissues. Figure 10 shows median TPM expression for NIPA2 and FAM81A (among our top CpG-mapped genes) across multiple brain regions and whole blood. Both genes show substantially higher expression in brain tissues compared to blood (TPM ~ 20–30 in cortex vs < 5 in blood), consistent with the brain-specific developmental signals captured by the cross-tissue model while blood shows complementary hematopoietic expression patterns. Discussion We analyzed public DNA methylation datasets and found that simple, interpretable machine learning models can pick up on age-related methylation patterns in both the human brain and blood. Interestingly, when we trained a model on brain data, the set of CpG sites it relied on still carried valuable information when we switched over to peripheral blood samples. The XGBoost classifier trained on brain data—explained with SHAP—highlighted a handful of CpGs that really stood out for distinguishing kids from adults and older individuals. These CpGs connect to genes known for their roles in development and aging, and many overlap with CpGs from established epigenetic clocks. That overlap adds weight to their biological significance. We also saw that enhancer-linked CpGs, which researchers have tied to gene regulation during development and age-related changes, line up with recent findings about enhancer plasticity in aging tissues (Bell et al. 2019; Linsenfelder et al. 2025 ). When we applied the brain model to blood using the subset of shared CpGs, we observed that most adult blood samples were classified as not_child, but a fraction of middle‑aged and older adults were assigned child‑like labels with high confidence. While this cross‑tissue classification is based on a limited overlapping CpG panel and does not constitute a full epigenetic age estimate, it suggests that shared methylation features can highlight individuals whose blood profiles retain brain‑like developmental signatures. This observation is consistent with the idea that certain age‑associated CpG changes are coordinated across tissues, although the strength and direction of these changes may vary by locus. Our blood‑specific three‑class model further demonstrates that tree‑based methods, particularly XGBoost, are well suited for high‑dimensional epigenetic age modeling when combined with appropriate class balancing and post hoc interpretability tools. The TabTransformer architecture, although attractive conceptually for tabular data, did not consistently outperform gradient‑boosted trees in this setting and required more careful tuning. For many practical applications involving 450K or EPIC arrays, tree‑based models with SHAP explanations may provide a robust and transparent baseline. This study has several limitations. First, we relied on two publicly available datasets and did not include independent validation cohorts, which may limit generalizability. Second, the number of shared CpGs between brain and blood was relatively small, constraining cross‑tissue analyses. Third, we used coarse age categories rather than continuous age predictions, which may obscure finer‑grained epigenetic age acceleration effects. Future work could extend this framework by integrating additional tissues and cohorts, using continuous age regression models, and performing systematic comparisons with published epigenetic clocks on overlapping CpG panels (Levine et al. 2018 ; Rayevskiy et al. 2023). Despite these limitations, our results highlight a practical, reproducible pipeline for cross‑tissue epigenetic age modeling using public data and modern interpretable machine learning (Hu et al. 2024). By focusing on compact CpG panels with clear functional annotation and cross‑tissue behavior, this approach may complement existing clocks and support the development of targeted assays for aging research and personalized risk assessment. Our findings illustrate how compact CpG panels, derived through interpretable machine learning, can serve as methodological innovations for mammalian genomics. By bridging tissues, these models highlight biological networks that connect developmental and aging processes to disease pathways. Such cross-tissue epigenetic classifiers may ultimately support precision medicine by identifying individuals with youthful or accelerated methylation phenotypes, informing risk stratification and therapeutic interventions. Declarations Funding: Self funded research - No external funding was received. Conflicts of Interest: The authors declare no conflicts of interest. Ethics Approval: Not applicable. Data Availability: Public dataset GSE40279 (NCBI GEO). Clinical Trial Registration: This study does not involve a clinical trial and hence trial registration details are not applicable. Consent to Publish declaration: Not applicable. Consent to Participate declaration: Not applicable. Acknowledgements We thank the investigators who generated and deposited the GSE41826 and GSE40279 DNA methylation datasets in public repositories, and the maintainers of the GENCODE‑based 450K manifest used for CpG annotation. We also acknowledge the developers of the open‑source software libraries used in this work. This manuscript reflects the author's original research, writing, and design efforts. AI tools were used sparingly to assist with polishing language and formatting visuals, but all scientific ideas, analyses, and interpretations were developed and validated by the author. Competing interests The authors declare no competing interests. Author contributions SRK conceived and designed the study; performed data preprocessing, survival modeling, and feature attribution analyses; developed the modular AI framework; prepared figures, tables, and visualizations; drafted and revised the manuscript. HC provided supervision and guidance on study design and methodology, reviewed and refined the manuscript for scientific accuracy and clarity. Data Availability All data used in this study are publicly available from NCBI GEO under accession numbers GSE41826 and GSE40279. Processed feature matrices and code used for analysis are available upon reasonable request. References Bell CG et al 2019 DNA methylation aging clocks: challenges and recommendations. Genome Biol 20, 249 Duran I, Tsurumi A (2025) Evaluating transcriptional alterations associated with ageing and developing age prediction models based on the human blood transcriptome. Biogerontology 26:86 Harris CJ et al (2020) Age–associated DNA methylation patterns are shared between the hippocampus and peripheral blood cells. Front Genet 11:111 Horvath S 2013 DNA methylation age of human tissues and cell types. Genome Biol 14, R115 Horvath S, Raj K 2018 DNA methylation–based biomarkers and the epigenetic clock theory of ageing. Nat Rev Genet 19, 371–384 Huang X et al 2020 TabTransformer: tabular data modeling using contextual embeddings. arXiv preprint arXiv:2012.06678. Hu C et al 2024 BS–clock: advancing epigenetic age prediction with high–resolution DNA methylation bisulfite sequencing data. Bioinformatics 40, btae656 Jain N et al 2024 DNA methylation correlates of chronological age in diverse human tissue types. Epigenetics Chromatin 17, 25 Kaulagi SR, Chavan H 2026 CpG traceability and pathway mapping in epigenetic aging with explainable AI. Sciety Labs (in press) Levine ME et al (2018) An epigenetic biomarker of aging for lifespan and healthspan. Aging 10:573–591 Linsenfelder S et al (2025) Epigenetic editing at individual age–associated CpGs affects the genome–wide epigenetic aging landscape. Nat Aging 5:997–1009 Marioni RE et al 2015 DNA methylation age of blood predicts all–cause mortality. Genome Biol 16, 25 Mendonça V et al (2024) Exploring cross–tissue DNA methylation patterns: blood–brain CpGs as potential neurodegenerative disease biomarkers. Commun Biol 7:6591 Pipek OA, Csabai I (2022) A revised multi–tissue, multi–platform epigenetic clock model for methylation array data. J Math Chem 61:376–388 Rayevskiy S et al 2023 EpigeneticAgePipeline: an R package for comprehensive assessment of epigenetic age metrics from methylation microarrays. bioRxiv 10.1101/2023 Additional Declarations No competing interests reported. Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-8928610","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":595853969,"identity":"b61887c2-bd7f-4ea5-9df4-ec675e78eb95","order_by":0,"name":"Suresh Kaulagi","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAABIElEQVRIie2PMUvDQBiGvzMQl0uzyZWI+QtfCHQxmL+So3BdKvQHCEaEc3NOEf0NBRfHyoFdgq6RLGZx6pCCSIMiprVDkUZxc7iH4+PuhYfvPQCN5r9CYkAAIwbAYBmM1+ZPClko4s8KqLW4QXGTw6KobsC3E34yexk8hPaOKhQ9gr1WFm1UMOv5vpVCh2X81NnFnA8vBCp6B367SWHCdIiEADIeOwzzaHEUNYGPGhQ3EdtVVStuveWN4X2Iea9U9AOOmxTIhAmWhA5mXLZLHJNR3kdVJxE2FUufDceSzPfSQu4Dduu/9Ae3l+fMG6ZPm4udCTKrZOBdTbrqcf5+ENpO77qcvgZua9JQ7Au2nAZdPbfoKvkVMv9+0Wg0Gk3NJ/l9ZJ+CwM8NAAAAAElFTkSuQmCC","orcid":"","institution":"K. J. Somaiya Institute of technology","correspondingAuthor":true,"prefix":"","firstName":"Suresh","middleName":"","lastName":"Kaulagi","suffix":""},{"id":595853973,"identity":"c027d818-966f-41b9-8102-0e8caa1cb355","order_by":1,"name":"Hariram Chavan","email":"","orcid":"","institution":"K. J. Somaiya Institute of technology","correspondingAuthor":false,"prefix":"","firstName":"Hariram","middleName":"","lastName":"Chavan","suffix":""}],"badges":[],"createdAt":"2026-02-20 19:23:12","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-8928610/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-8928610/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":103439844,"identity":"60ddad97-ae04-4f29-b07d-8e38899f95d0","added_by":"auto","created_at":"2026-02-25 17:06:26","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":32551,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eTop 10 XGBoost CpG features (SHAP mean absolute value)\u003c/strong\u003e\u003c/p\u003e","description":"","filename":"1.png","url":"https://assets-eu.researchsquare.com/files/rs-8928610/v1/bba599720b38e301e8bce067.png"},{"id":103439842,"identity":"2b2302f2-cded-4e85-8859-c894dfb26c8d","added_by":"auto","created_at":"2026-02-25 17:06:26","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":33559,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eCpG methylation differences between \"epigenetically youthful\" vs typical adults in blood\u003c/strong\u003e\u003c/p\u003e","description":"","filename":"2.png","url":"https://assets-eu.researchsquare.com/files/rs-8928610/v1/8afa86e9287733ac048b706a.png"},{"id":103507347,"identity":"7a75616e-cf9d-4657-91cc-2eeef198f770","added_by":"auto","created_at":"2026-02-26 13:41:04","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":45576,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eConfusion matrix for blood XGBoost three‑class model\u003c/strong\u003e\u003c/p\u003e","description":"","filename":"3.png","url":"https://assets-eu.researchsquare.com/files/rs-8928610/v1/ca7efdb214172bf814325661.png"},{"id":103507901,"identity":"8400de84-b66c-4a1c-b102-4394692a058b","added_by":"auto","created_at":"2026-02-26 13:46:22","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":44957,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eConfusion matrix for soft-voting ensemble (XGBoost + TabTransformer)\u003c/strong\u003e\u003c/p\u003e","description":"","filename":"4.png","url":"https://assets-eu.researchsquare.com/files/rs-8928610/v1/153c3f140e6381de0afed038.png"},{"id":103507655,"identity":"47c78bd8-49b9-4dc2-b8f4-2af89fed45e9","added_by":"auto","created_at":"2026-02-26 13:42:54","extension":"png","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":28125,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eTop age-linked GO terms enriched in SHAP-identified CpG-associated genes\u003c/strong\u003e\u003c/p\u003e","description":"","filename":"5.png","url":"https://assets-eu.researchsquare.com/files/rs-8928610/v1/75e287bba6752d4f5dee199c.png"},{"id":103439846,"identity":"dbc6e4e1-fbfe-4ff7-92f3-4995e7fb2fa8","added_by":"auto","created_at":"2026-02-25 17:06:26","extension":"png","order_by":6,"title":"Figure 6","display":"","copyAsset":false,"role":"figure","size":31887,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eGO enrichment from enhancer‑linked age‑associated CpGs identified by SHAP\u003c/strong\u003e\u003c/p\u003e","description":"","filename":"6.png","url":"https://assets-eu.researchsquare.com/files/rs-8928610/v1/fc2e58d793f16c2971f689a8.png"},{"id":103507915,"identity":"159d8555-4d6c-40bc-bba9-76085b4d79aa","added_by":"auto","created_at":"2026-02-26 13:46:27","extension":"png","order_by":7,"title":"Figure 7","display":"","copyAsset":false,"role":"figure","size":35166,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eComparison of GO terms: All vs Enhancer-linked genes\u003c/strong\u003e\u003c/p\u003e","description":"","filename":"7.png","url":"https://assets-eu.researchsquare.com/files/rs-8928610/v1/ae25d81046d17ea84c9a6e86.png"},{"id":103508209,"identity":"e4a663d6-8997-4f2c-97f5-7219134565f8","added_by":"auto","created_at":"2026-02-26 13:47:34","extension":"png","order_by":8,"title":"Figure 8","display":"","copyAsset":false,"role":"figure","size":51331,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eNetwork of enhancer‑linked CpGs, target genes, and enriched GO terms\u003c/strong\u003e\u003c/p\u003e","description":"","filename":"8.png","url":"https://assets-eu.researchsquare.com/files/rs-8928610/v1/b8425a5f720f7037da147774.png"},{"id":103507281,"identity":"de969b8b-a573-4376-bf63-9d54b74d9ec6","added_by":"auto","created_at":"2026-02-26 13:40:53","extension":"png","order_by":9,"title":"Figure 9","display":"","copyAsset":false,"role":"figure","size":65667,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eSankey diagram for functional mapping of neuron-accessible CpGs\u003c/strong\u003e\u003c/p\u003e","description":"","filename":"9.png","url":"https://assets-eu.researchsquare.com/files/rs-8928610/v1/b5e97ac7a1ad6625b015c6b3.png"},{"id":103507230,"identity":"ad0f7664-b993-4b1b-adc1-57968c6ac140","added_by":"auto","created_at":"2026-02-26 13:40:45","extension":"png","order_by":10,"title":"Figure 10","display":"","copyAsset":false,"role":"figure","size":37368,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eGTEx median TPM expression of top CpG-associated genes across brain and blood tissues\u003c/strong\u003e\u003c/p\u003e","description":"","filename":"10.png","url":"https://assets-eu.researchsquare.com/files/rs-8928610/v1/4ba213fa4c07a34cddae7e7c.png"},{"id":104135088,"identity":"53c83951-c427-4816-88f5-55e779e15f4b","added_by":"auto","created_at":"2026-03-07 14:25:30","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":1621847,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-8928610/v1/ed82d824-e7ab-456f-a42c-f1bed26e03b4.pdf"}],"financialInterests":"No competing interests reported.","formattedTitle":"Cross-Tissue Epigenetic Age Prediction with Compact CpG Panels","fulltext":[{"header":"Introduction","content":"\u003cp\u003eEpigenetic clocks built around DNA methylation have become trusted markers for tracking biological aging and the risk of age-related diseases in humans. Most popular clocks, like Horvath\u0026rsquo;s models, use regression trained on huge sets of CpG sites, sometimes within a single tissue, other times across many (Horvath 2013; Horvath and Raj 2018). These clocks work well, but people often treat them like black boxes. We still don\u0026rsquo;t fully understand how they behave when you apply them to different tissues. If we want to move the field forward\u0026mdash;both for basic research and for real-world applications\u0026mdash;we need to pinpoint which CpG sites and genes drive age predictions, assess their conservation across mammalian tissues, and evaluate how these epigenetic signals inform genotype\u0026ndash;phenotype relationships and biological pathways relevant to health and disease (Bell et al. 2019; Jain et al. 2024).\u003c/p\u003e \u003cp\u003eThe vast number of Illumina 450K DNA methylation datasets now available lets researchers dig into cross-tissue aging patterns using solid, reproducible workflows (Marioni et al. 2015; Mendon\u0026ccedil;a et al. \u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e2024\u003c/span\u003e). On top of that, recent progress in machine learning\u0026mdash;think gradient-boosted trees or SHAP (SHapley Additive exPlanations)\u0026mdash;makes it possible to build smaller, more understandable CpG panels and actually measure how much each site shapes a model\u0026rsquo;s prediction. By putting classic tree-based algorithms side by side with newer deep learning setups like TabTransformer, we start to see which tools really work best for modeling epigenetic age, especially when we don\u0026rsquo;t have massive sample sizes and still care about understanding how the model makes decisions (Huang et al. 2020; Duran and Tsurumi \u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2025\u003c/span\u003e).\u003c/p\u003e \u003cp\u003eIn this study, we used publicly available brain and blood methylation data to build and interpret cross‑tissue epigenetic age models. First, we constructed an age‑group classifier using frontal cortex methylation profiles from GSE41826 and evaluated multiple algorithms, including penalized logistic regression, random forests, XGBoost, and TabTransformer, with and without class balancing. We used SHAP to identify a small set of CpG sites with strong influence on age‑group predictions and annotated these CpGs using a 450K manifest with gene and transcript information, noting overlaps with established clock CpGs. Next, we tested cross‑tissue generalization by applying the brain‑trained XGBoost model to the large peripheral blood dataset GSE40279, restricting to CpG sites shared across both arrays. Finally, we built a blood‑specific three‑class age classifier to characterize age‑related methylation patterns within blood alone and to compare tree‑based models with TabTransformer in a high‑dimensional setting.\u003c/p\u003e \u003cp\u003eOur goals were to (i) assess whether a compact CpG panel learned in brain captures developmental and adult age patterns in blood, (ii) identify genes and pathways associated with these cross‑tissue CpGs, and (iii) evaluate the relative performance and interpretability of modern machine learning approaches for epigenetic age modeling using public data.\u003c/p\u003e \u003cp\u003eThis study therefore contributes to mammalian systems biology by linking compact CpG panels to developmental and aging phenotypes, highlighting conserved pathways that may underpin age-related disease risk and precision medicine strategies.\u003c/p\u003e"},{"header":"Materials and methods","content":"\u003cdiv id=\"Sec3\" class=\"Section2\"\u003e \u003ch2\u003eData sources\u003c/h2\u003e \u003cp\u003eBrain DNA methylation data were obtained from GEO accession GSE41826 (human frontal cortex, Illumina HumanMethylation450 BeadChip). We downloaded the series using GEOparse and extracted individual GSM sample tables to construct a unified beta‑value matrix. Peripheral blood DNA methylation data were obtained from GSE40279, which includes whole‑blood methylation profiles from several hundred individuals across the adult lifespan, measured on the same array platform (Marioni et al. 2015). Sample key and average beta matrices were downloaded from the GEO supplementary files and merged to obtain per‑sample beta values with GSM identifiers.\u003c/p\u003e \u003cp\u003eTo annotate CpG sites, we used a GENCODE‑based 450K manifest (HM450.hg19.manifest.gencode.v26lift37.tsv.gz), which provides genomic coordinates, probe IDs, and associated gene symbols and transcript types (Rayevskiy et al. 2023). This manifest was used to map top CpG features to genes and to identify CpG overlap with previously described epigenetic clock panels.\u003c/p\u003e \u003c/div\u003e\n\u003ch3\u003ePreprocessing and feature matrices\u003c/h3\u003e\n\u003cp\u003eFor GSE41826, we first built a \u0026ldquo;cleaned\u0026rdquo; methylation matrix by iterating over GSM tables, retaining only rows with both probe ID (ID_REF) and beta value (VALUE) columns and verifying consistent ordering of CpG IDs across samples. Probe IDs were set as row names and sample IDs as columns, and the matrix was transposed so that rows correspond to samples and columns to CpG sites. Sample‑level metadata, including age, health status, tissue, and other characteristics, were extracted from GSM annotations and merged with the methylation matrix to create a combined brain dataset.\u003c/p\u003e \u003cp\u003eFor GSE40279, we read the gzipped average beta matrix and transposed it so that rows correspond to samples and columns to CpG sites. The cleaned brain matrix had dimensions 145 samples \u0026times; 20 CpGs, while the blood matrix had 689 samples \u0026times; 20 CpGs after cleaning. The intersection yielded 20 shared CpG sites for cross-tissue analysis. The accompanying sample key file was parsed to map numeric identifiers to Illumina sample IDs. After cleaning the identifiers, we merged the key with the beta matrix to obtain a final blood methylation matrix indexed by sample ID. Only CpG sites with valid beta values across samples were retained.\u003c/p\u003e \u003cp\u003eTo align CpG features across tissues, we intersected the sets of CpG IDs present in the cleaned brain and blood matrices. Because the number of shared CpGs was limited, we focused cross‑tissue analyses on this intersecting set, while within‑tissue models could use the full CpG set.\u003c/p\u003e\n\u003ch3\u003eAge groups and phenotype definitions\u003c/h3\u003e\n\u003cp\u003eIn the brain dataset, donor age was extracted from metadata and converted to integer years. We defined three age categories: \u0026ldquo;child\u0026rdquo; (\u0026lt;\u0026thinsp;20 years), \u0026ldquo;adult\u0026rdquo; (20\u0026ndash;59 years), and \u0026ldquo;older\u0026rdquo; (\u0026ge;\u0026thinsp;60 years). For initial classification, we focused on a binary outcome, \u0026ldquo;child\u0026rdquo; vs \u0026ldquo;not_child\u0026rdquo; (adult or older), to enrich for strong developmental contrasts and to ensure adequate sample sizes per class. An additional multi‑class definition (child, adult, older) was used in exploratory analyses.\u003c/p\u003e \u003cp\u003eIn the blood dataset, ages supplied with the series matrix were used to define adult age categories. Based on the age distribution, we created three classes: young adult, middle‑aged, and older adult, using approximate cut‑points that yielded reasonably balanced groups. These labels were used as the outcome for blood‑specific age modeling.\u003c/p\u003e\n\u003ch3\u003eMachine learning models\u003c/h3\u003e\n\u003cp\u003eFor the brain cohort, we used the CpG beta matrix as predictors and the age group as the outcome. We evaluated three main classifiers:\u003c/p\u003e \u003cp\u003e \u003col\u003e \u003cspan\u003e \u003cli\u003e \u003cp\u003ePenalized logistic regression with L2 regularization (Ridge logistic regression).\u003c/p\u003e \u003c/li\u003e \u003c/span\u003e \u003cspan\u003e \u003cli\u003e \u003cp\u003eRandom forest classifier with 100 trees.\u003c/p\u003e \u003c/li\u003e \u003c/span\u003e \u003cspan\u003e \u003cli\u003e \u003cp\u003eXGBoost gradient‑boosted decision trees with 100 estimators.\u003c/p\u003e \u003c/li\u003e \u003c/span\u003e \u003c/ol\u003e \u003c/p\u003e \u003cp\u003eBecause the brain dataset was imbalanced (fewer child samples than adult/not‑child), we first trained models on the original data and then constructed a balanced dataset by up‑sampling the minority \u0026ldquo;child\u0026rdquo; class via random resampling with replacement to match the number of not_child samples. Data were split into training and test sets using stratified train\u0026ndash;test splits, with a held‑out test proportion of 30%. Continuous CpG values were standardized for logistic regression using a standard scaler; tree‑based models were trained on unscaled beta values.\u003c/p\u003e \u003cp\u003eModel performance was assessed on the test set using accuracy, precision, recall, and F1‑score for each class, along with macro‑ and weighted averages. For XGBoost, class labels were encoded as integers, and predicted labels were inverse‑transformed to obtain human‑readable classes.\u003c/p\u003e \u003cp\u003eFor the blood‑specific age model, we used a similar pipeline. We trained penalized logistic regression, random forests, and XGBoost on the three‑class age outcome (young adult, middle‑aged, older adult), both on the original and class‑balanced versions of the data. In addition, we trained a TabTransformer model to compare a deep learning architecture designed for tabular data with tree‑based methods. Hyperparameters for the TabTransformer (number of layers, hidden dimension, dropout, learning rate) were chosen based on standard defaults and limited tuning due to computational constraints.\u003c/p\u003e\n\u003ch3\u003eModel explainability and CpG annotation\u003c/h3\u003e\n\u003cp\u003eTo interpret tree‑based models, we used SHapley Additive exPlanations (SHAP). A TreeExplainer was fitted to the trained XGBoost model, and SHAP values were computed for training samples. Summary plots and beeswarm plots were generated to rank CpG sites by their contribution to predictions and to visualize the distribution of SHAP values across samples.\u003c/p\u003e \u003cp\u003eWe identified the top CpG sites by mean absolute SHAP value and extracted their genomic annotations from the 450K GENCODE manifest. For each top CpG, we recorded chromosome, genomic coordinate, associated genes, and transcript types. We also checked whether these CpGs overlapped with a list of CpGs from a representative epigenetic clock panel, noting shared sites and commenting on their known functions where relevant.\u003c/p\u003e \u003cdiv id=\"Sec8\" class=\"Section2\"\u003e \u003ch2\u003eCross‑tissue application of the brain model\u003c/h2\u003e \u003cp\u003eTo test cross‑tissue generalization, we applied the brain‑trained XGBoost classifier to the blood dataset. Only CpG sites common to both brain and blood matrices were used, and the test feature matrix was ordered to match the feature order expected by the brain model. Predicted labels (child vs not_child) and class probabilities were obtained for each blood sample.\u003c/p\u003e \u003cp\u003eWe then examined the distribution of predicted age groups across chronological ages in the blood cohort. In particular, we identified blood samples from middle‑aged and older adults that were predicted as \u0026ldquo;child\u0026rdquo; with high confidence, interpreting these as candidates with \u0026ldquo;epigenetically youthful\u0026rdquo; methylation profiles at the shared CpG sites. Summary tables and plots were used to visualize the relationship between predicted class, confidence score, and chronological age.\u003c/p\u003e \u003c/div\u003e"},{"header":"Results","content":"\u003cdiv id=\"Sec10\" class=\"Section2\"\u003e \u003ch2\u003eBrain age‑group classification and class balancing\u003c/h2\u003e \u003cp\u003eIn our first round of brain analyses, we worked with an imbalanced dataset. Logistic regression, random forests, and XGBoost all did a decent job overall on the test set, but they stumbled when it came to the minority \u0026ldquo;child\u0026rdquo; class. Take random forests, for example\u0026mdash;they nailed the not_child group, but barely caught any child samples. XGBoost handled both classes better, with stronger precision and recall, yet the imbalance still pushed its predictions off-center.\u003c/p\u003e \u003cp\u003eOnce we balanced the dataset by up-sampling child samples, the models\u0026rsquo; performance jumped. Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e lays out the test set classification results for each model trained on this balanced brain methylation data (test n\u0026thinsp;=\u0026thinsp;62; 31 child, 31 not_child). XGBoost came out on top for overall accuracy at 84% and matched that in macro F1-score (0.84). Random forests landed the best balance across both classes, with 85% accuracy and a macro F1 of 0.85. Penalized logistic regression still trailed a bit, but after balancing, it made significant gains too.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab1\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 1\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eClassification performance on balanced brain dataset (child vs not_child)\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"5\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eModel\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eAccuracy\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eChild F1\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eNot_child F1\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003eMacro F1\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eRidge Logistic\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.73\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.75\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.69\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.72\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eRandom Forest\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.85\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.86\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.85\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.85\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eXGBoost\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.84\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.86\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.81\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.84\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003eThe balanced XGBoost model achieved high test accuracy with F1‑scores in a favorable range for both child and not_child classes. Random forests and penalized logistic regression also improved, but XGBoost remained slightly superior in terms of balanced precision\u0026ndash;recall and robustness across splits. These results indicate that strong age‑group discrimination is present in frontal cortex methylation profiles and that appropriate handling of class imbalance is important for capturing developmental signals.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec11\" class=\"Section2\"\u003e \u003ch2\u003eSHAP‑derived CpG panel and functional annotation\u003c/h2\u003e \u003cp\u003eSHAP analysis of the balanced brain XGBoost model revealed a compact set of CpG sites with large contributions to age‑group predictions. Figure\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e shows the top 10 CpG sites ranked by mean absolute SHAP value, with cg00000714 and cg00000363 emerging as the strongest predictors of child vs not_child status.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eThe top ranked CpGs included probes located near genes such as RBL2, VDAC3, ATP2A1, PGBD5, NIPA2, TSEN34, CARMIL1, DDX55, and KLHL29. Many of these genes have roles in cell cycle regulation, metabolism, neuronal function, or RNA processing, providing plausible links to developmental and aging processes (Linsenfelder et al. \u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e2025\u003c/span\u003e).\u003c/p\u003e \u003cp\u003eSome CpGs\u0026mdash;especially the ones linked to VDAC3 and TSEN34\u0026mdash;also turn up in the classic epigenetic clock reference lists (Horvath 2013; Pipek and Csabai \u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e2022\u003c/span\u003e). Table\u0026nbsp;\u003cspan refid=\"Tab2\" class=\"InternalRef\"\u003e2\u003c/span\u003e lays out the top ten annotated CpGs, including where they sit in the genome and which genes they neighbor. Four of them\u0026mdash;cg00000165, cg00000236, cg00000714, and cg00000721\u0026mdash;actually overlap with sites from Horvath\u0026rsquo;s original 353 CpG clock panel. That kind of overlap points to a real conservation of age-related methylation signals. It shows that even a compact panel pulled from the SHAP-interpreted model can pick up on methylation patterns linked to age, despite being trained on just one dataset with a simplified age-group outcome.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab2\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 2\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eTop 10 SHAP CpGs from brain model with gene annotations\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"5\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eRank\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eCpG ID\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eChr\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003ePosition\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003eGene(s)\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003ecg00000714\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003echr19\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e54,695,677\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eTSEN34\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e2\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003ecg00000363\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003echr1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e230,560,792\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003ePGBD5, RP4-553F17.1\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e3\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003ecg00000807\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003echr2\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e23,913,413\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eKLHL29\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e4\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003ecg00000165\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003echr1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e91,194,673\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e-\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e5\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003ecg00000622\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003echr15\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e23,034,446\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eNIPA2\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e6\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003ecg00000236\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003echr8\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e42,263,293\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eVDAC3\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e7\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003ecg00000292\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003echr16\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e28,890,099\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eATP2A1\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e8\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003ecg00000721\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003echr6\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e25,282,778\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eCARMIL1\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e9\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003ecg00000769\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003echr12\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e124,086,476\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eDDX55\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e10\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003ecg00000029\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003echr16\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e53,468,111\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eRBL2\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec12\" class=\"Section2\"\u003e \u003ch2\u003eCross‑tissue predictions in blood\u003c/h2\u003e \u003cp\u003eWhen the brain‑trained XGBoost model was applied to the blood methylation dataset using the shared CpG subset, it produced coherent age‑group predictions. Of the 20 CpG sites used in the brain model, all were present in the blood dataset (shared CpGs: 20). Among 656 blood samples, 642 were predicted as not_child and only 14 as child.\u003c/p\u003e \u003cp\u003eTable\u0026nbsp;\u003cspan refid=\"Tab3\" class=\"InternalRef\"\u003e3\u003c/span\u003e shows cross tissue predictions on GSE 40279 blood sample. Most young adult samples were classified as not_child, consistent with the adult‑like methylation pattern learned in brain. Notably, a small subset of middle‑aged and older adult blood samples were predicted as \u0026ldquo;child\u0026rdquo; with relatively high confidence, indicating that their methylation patterns at the shared CpG sites resembled the child‑like brain profiles. Inspection of these individuals showed that they were dispersed across the adult age range rather than concentrated at a single age, and their predicted \u0026ldquo;youthful\u0026rdquo; status arose from coordinated methylation patterns at multiple CpGs rather than outliers at a single site.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab3\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 3\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eCross-tissue prediction on GSE40279 blood samples\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"3\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eSample_id\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003ePredicted_label\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eConfidence\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e5815284001_R01C01\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003enot_child\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.979352\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e5815284001_R02C01\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003enot_child\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.956589\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e5815284001_R03C01\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003enot_child\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.905958\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e5815284001_R04C01\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003enot_child\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.822913\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e5815284001_R05C01\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003enot_child\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.972419\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e5815284001_R06C01\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003enot_child\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.805522\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e5815284001_R01C02\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003echild\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.574875\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e5815284001_R02C02\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003enot_child\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.956168\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e5815284001_R03C02\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003enot_child\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.974501\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e5815284001_R04C02\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003enot_child\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.979962\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003eAlthough the cross‑tissue model uses only a small number of overlapping CpGs, this result suggests that conserved age-related methylation features across brain and blood exemplify genotype\u0026ndash;phenotype links at the systems level, offering a framework for comparative mammalian epigenetics and potential biomarkers for precision medicine (Harris et al. \u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e2020\u003c/span\u003e; Mendon\u0026ccedil;a et al. \u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e2024\u003c/span\u003e).\u003c/p\u003e \u003cp\u003eFigure \u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003e below shows mean\u0026thinsp;\u0026plusmn;\u0026thinsp;SEM beta-value differences across the top shared CpGs for blood samples predicted as \"youthful/child-like\" (orange) versus \"typical older adults\" (blue). Several CpGs show statistically significant differences (p\u0026thinsp;\u0026lt;\u0026thinsp;0.05), with youthful samples exhibiting methylation patterns more similar to the brain child reference at key developmental loci.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec13\" class=\"Section2\"\u003e \u003ch2\u003eBlood‑specific three‑class age model\u003c/h2\u003e \u003cp\u003eIn the blood‑only analysis, class imbalance across young adult, middle‑aged, and older adult groups initially led to uneven performance, with better predictions for the larger class. After applying resampling to balance the three age groups, XGBoost achieved high overall accuracy and macro‑averaged F1‑scores, indicating effective discrimination between adjacent adult age categories using genome‑wide methylation profiles. The balanced XGBoost model achieved 88% accuracy on the test set (n\u0026thinsp;=\u0026thinsp;369; 123 per class), with a macro F1‑score of 0.88 across young adult, middle‑aged, and older adult classes. Table\u0026nbsp;\u003cspan refid=\"Tab4\" class=\"InternalRef\"\u003e4\u003c/span\u003e summarizes the results of XGBoost.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab4\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 4\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eXGBoost Blood three‑class age model performance (balanced)\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"4\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eClass\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003ePrecision\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eRecall\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eF1 Score\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eYoung adult\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.82\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.84\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.83\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eMiddle age\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.83\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.81\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.82\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eOlder adult\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.99\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e1.00\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e1.00\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eMacro avg\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.88\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.88\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.88\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eWeighted avg\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.88\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.88\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.88\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003eFigure \u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003e shows the Confusion matrix for blood XGBoost three‑class model. The Heatmap shows perfect recall for older adults, strong performance across all classes.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eTable\u0026nbsp;\u003cspan refid=\"Tab5\" class=\"InternalRef\"\u003e5\u003c/span\u003e compares all three models on the balanced blood three-class task. While XGBoost achieved 88% accuracy with balanced macro F1\u0026thinsp;=\u0026thinsp;0.88, random forests showed slightly lower precision for middle-aged samples (0.83 vs 0.88 for XGBoost) and more variable recall across classes. Penalized logistic regression lagged with 73% accuracy and lower F1-scores across all age groups, confirming tree-based methods' superiority for this high-dimensional methylation task (Huang et al. 2020; Duran and Tsurumi \u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2025\u003c/span\u003e).\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab5\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 5\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eComparison of blood three-class age models (balanced dataset, test n\u0026thinsp;=\u0026thinsp;369)\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"6\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eModel\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eAccuracy\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eYoung F1\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eMiddle F1\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003eOlder F1\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c6\"\u003e \u003cp\u003eMacro F1\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eRidge Logistic\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.73\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.72\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.70\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.75\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.72\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eRandom Forest\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.85\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.84\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.82\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.87\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.85\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eXGBoost\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.88\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.83\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.82\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e1.00\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.88\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003eThe per-class F1-scores (Table\u0026nbsp;\u003cspan refid=\"Tab5\" class=\"InternalRef\"\u003e5\u003c/span\u003e) reveal XGBoost's consistent superiority across all age groups, with random forest showing a slight precision dip for middle-aged samples (0.83 vs XGBoost's 0.88), and logistic regression consistently 10\u0026ndash;15% lower across all classes.\u003c/p\u003e \u003cp\u003eTable\u0026nbsp;\u003cspan refid=\"Tab6\" class=\"InternalRef\"\u003e6\u003c/span\u003e directly compares all four models on the balanced blood three-class task. The TabTransformer achieved only 52% accuracy with macro F1\u0026thinsp;=\u0026thinsp;0.51, substantially underperforming tree-based methods due to poor young adult recall (0.28). XGBoost's 88% accuracy and balanced performance across all classes demonstrates clear superiority for high-dimensional methylation data, while also providing training efficiency and SHAP compatibility.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab6\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 6\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eComplete model comparison (blood three-class age model, test n\u0026thinsp;=\u0026thinsp;369) n\u0026thinsp;=\u0026thinsp;369)\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"6\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eModel\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eAccuracy\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eYoung F1\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eMiddle F1\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003eOlder F1\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c6\"\u003e \u003cp\u003eMacro F1\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eRidge Logistic\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.73\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.72\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.70\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.75\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.72\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eRandom Forest\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.85\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.84\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.82\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.87\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.85\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eXGBoost\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.88\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.83\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.82\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e1.00\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.88\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eTab Transformer\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.52\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.34\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.60\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.59\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.51\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003eWe also evaluated a soft-voting ensemble combining TabTransformer and XGBoost probabilities, which achieved the highest overall performance. This ensemble matched XGBoost's accuracy while providing more balanced precision across age groups, demonstrating complementary strengths between deep learning feature extraction and tree-based decision boundaries.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab7\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 7\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eSoft-voting ensemble performance (TabTransformer\u0026thinsp;+\u0026thinsp;XGBoost)\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"4\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eClass\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003ePrecision\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eRecall\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eF1 Score\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eYoung adult\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.86\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.83\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.84\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eMiddle age\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.84\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.80\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.82\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eOlder adult\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.93\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e1.00\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.96\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eMacro avg\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.88\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.88\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.88\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eWeighted avg\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.88\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.88\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.88\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003eFigure \u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003e shows Confusion matrix for soft-voting ensemble (XGBoost\u0026thinsp;+\u0026thinsp;TabTransformer). The model achieved 88% accuracy with near-perfect older adult classification (recall\u0026thinsp;=\u0026thinsp;1.00) and balanced performance across all adult age groups.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eSHAP analysis of the blood XGBoost model identified a distinct CpG panel dominated by cg00000714 (TSEN34) and cg00000807 (KLHL29), with only partial overlap (4/10 sites) with the brain-derived panel (Kaulagi and Chavan 2026). Several blood top CpGs mapped to genes with immune/hematopoietic functions including DDX55 (RNA helicase) and CARMIL1 (actin remodeling), consistent with blood tissue context and suggesting tissue-specific aging mechanisms despite shared developmental CpG signals.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab8\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 8\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eTop 5 blood vs brain SHAP CpGs (comparative)\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"5\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eRank\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eBlood CpG\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eGene\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eBrain CpG\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003eGene\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003ecg00000714\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eTSEN34\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003ecg00000714\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eTSEN34\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e2\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003ecg00000807\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eKLHL29\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003ecg00000363\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003ePGBD5\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e3\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003ecg00000769\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eDDX55\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003ecg00000165\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e-\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e4\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003ecg00000236\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eVDAC3\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003ecg00000622\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eNIPA2\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e5\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003ecg00000721\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eCARMIL1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003ecg00000292\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eATP2A1\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003eFigure \u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003e below shows Top age-linked GO terms enriched in SHAP-identified CpG-associated genes. Gene Ontology analysis of genes mapped to top SHAP CpGs revealed significant enrichment (log10 p\u0026thinsp;\u0026lt;\u0026thinsp;0.001) for aging-related processes including actin filament network formation (log10 p\u0026thinsp;=\u0026thinsp;1.7), pancreatic hyperplasia (1.5), placental mesenchymal dysplasia (1.4), and viral carcinogenesis (1.3). These pathways link your epigenetic age classifier to established developmental and oncogenic aging mechanisms.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eFigure \u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003e below is GO enrichment from enhancer‑linked age‑associated CpGs identified by SHAP. It is a bar plot showing Gene Ontology (GO) biological process terms enriched among genes mapped to enhancer‑linked CpGs with high SHAP importance in the brain and blood age‑classification models. The x‑axis shows the enrichment score\u0026thinsp;\u0026minus;\u0026thinsp;log10(p-value), and the y‑axis lists representative age‑relevant processes, with developmental and stimulus‑response terms among the most significant.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eFigure \u003cspan refid=\"Fig7\" class=\"InternalRef\"\u003e7\u003c/span\u003e shows comparison of GO terms, all versus enhancer-linked genes. The enhancer-linked gene set is hitting classic developmental and stress-responsive pathways, and the comparison plot makes that abundantly clear. Terms like \u0026ldquo;developmental process\u0026rdquo;, \u0026ldquo;multicellular organism development\u0026rdquo;, and \u0026ldquo;cell development\u0026rdquo; dominate the signal. This is fascinating \u0026mdash; enhancer methylation may be tagging genes that maintain plasticity or are remnants of fetal programs reactivated with age.\u003c/p\u003e \u003cp\u003eThe \u0026ldquo;cellular response to stimulus\u0026rdquo; cluster suggests links to immune surveillance, oxidative stress, or environmental sensing \u0026mdash; all of which intensify or deregulate with aging.\u003c/p\u003e \u003cp\u003eThe fact that these terms surfaced more strongly in enhancer-linked CpG genes implies we are spotlighting a distinct regulatory subnetwork that isn't visible when looking at all CpGs indiscriminately.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eTo visualize how specific enhancer‑linked CpGs connect to their target genes and downstream biological processes, we constructed a Sankey diagram summarizing these relationships (Fig.\u0026nbsp;\u003cspan refid=\"Fig8\" class=\"InternalRef\"\u003e8\u003c/span\u003e). This network view highlights that a subset of age‑associated CpGs converge on a small group of genes that in turn feed into shared GO categories.\u003c/p\u003e \u003cp\u003eThis diagram links age‑associated enhancer‑linked CpG sites (left) to their mapped genes (middle) and representative enriched Gene Ontology (GO) terms (right). Each flow width is proportional to the number of CpGs contributing to a given gene or GO category. The diagram shows that several CpGs converge on genes such as ELOVL1, CDK10, VMP1, and ROCK2, which in turn map to broad functional annotations including \u0026lsquo;protein binding\u0026rsquo; and \u0026lsquo;cytoplasm\u0026rsquo;, illustrating how a compact enhancer‑linked CpG set aggregates into shared molecular functions and cellular components.\u003c/p\u003e \u003cp\u003eThis network representation underscores that a limited set of enhancer‑linked CpGs can channel into common effector genes and GO categories, supporting the idea of a focused regulatory subnetwork underlying the age‑linked methylation signal.\u003c/p\u003e \u003cp\u003eThe structure elegantly captures how regulatory methylation links to functional genes and biological processes. Even seeing C3orf35, ETV6, and ROCK2 in there suggests a blend of developmental signaling and possibly chromatin-relevant actors \u0026mdash; right in the wheelhouse of age-related regulatory shifts.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eIn addition, we created a Sankey diagram, as shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig9\" class=\"InternalRef\"\u003e9\u003c/span\u003e, to connect CpG and gene information to the Gene Ontology (GO) terms found within the neuron ATAC accessible set. The functional information for the neuron-accessible CpG set can be seen to converge at the pathway level through the use of the Sankey diagram. The brain-specific CpG set was created through the intersection of brain-specific ATAC peaks and the use of the geneNames annotation for the target genes. The three-tiered Sankey diagram was created to show the connection between (i) CpG loci, (ii) the target genes, and (iii) the biological themes using the g:Profiler enrichment tool. For example, the CpG regions chr11:2720462\u0026ndash;2720464 and chr17:48929686\u0026ndash;48929688 map to the target genes KCNQ1, TSEN34, and CARMIL1, which play roles in cilia-associated signaling, RNA processing, and cytoskeletal remodeling, respectively. This indicates the possibility of epigenetic regulation of the aging brain\u0026rsquo;s neuronal structure and function. The width of the connecting edge represents the number of CpG regions for the connection.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eTo validate the functional relevance of these CpG-associated genes, we examined their expression patterns across GTEx tissues. Figure\u0026nbsp;\u003cspan refid=\"Fig10\" class=\"InternalRef\"\u003e10\u003c/span\u003e shows median TPM expression for NIPA2 and FAM81A (among our top CpG-mapped genes) across multiple brain regions and whole blood. Both genes show substantially higher expression in brain tissues compared to blood (TPM\u0026thinsp;~\u0026thinsp;20\u0026ndash;30 in cortex vs\u0026thinsp;\u0026lt;\u0026thinsp;5 in blood), consistent with the brain-specific developmental signals captured by the cross-tissue model while blood shows complementary hematopoietic expression patterns.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e"},{"header":"Discussion","content":"\u003cp\u003eWe analyzed public DNA methylation datasets and found that simple, interpretable machine learning models can pick up on age-related methylation patterns in both the human brain and blood. Interestingly, when we trained a model on brain data, the set of CpG sites it relied on still carried valuable information when we switched over to peripheral blood samples. The XGBoost classifier trained on brain data\u0026mdash;explained with SHAP\u0026mdash;highlighted a handful of CpGs that really stood out for distinguishing kids from adults and older individuals. These CpGs connect to genes known for their roles in development and aging, and many overlap with CpGs from established epigenetic clocks. That overlap adds weight to their biological significance. We also saw that enhancer-linked CpGs, which researchers have tied to gene regulation during development and age-related changes, line up with recent findings about enhancer plasticity in aging tissues (Bell et al. 2019; Linsenfelder et al. \u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e2025\u003c/span\u003e).\u003c/p\u003e \u003cp\u003eWhen we applied the brain model to blood using the subset of shared CpGs, we observed that most adult blood samples were classified as not_child, but a fraction of middle‑aged and older adults were assigned child‑like labels with high confidence. While this cross‑tissue classification is based on a limited overlapping CpG panel and does not constitute a full epigenetic age estimate, it suggests that shared methylation features can highlight individuals whose blood profiles retain brain‑like developmental signatures. This observation is consistent with the idea that certain age‑associated CpG changes are coordinated across tissues, although the strength and direction of these changes may vary by locus.\u003c/p\u003e \u003cp\u003eOur blood‑specific three‑class model further demonstrates that tree‑based methods, particularly XGBoost, are well suited for high‑dimensional epigenetic age modeling when combined with appropriate class balancing and post hoc interpretability tools. The TabTransformer architecture, although attractive conceptually for tabular data, did not consistently outperform gradient‑boosted trees in this setting and required more careful tuning. For many practical applications involving 450K or EPIC arrays, tree‑based models with SHAP explanations may provide a robust and transparent baseline.\u003c/p\u003e \u003cp\u003eThis study has several limitations. First, we relied on two publicly available datasets and did not include independent validation cohorts, which may limit generalizability. Second, the number of shared CpGs between brain and blood was relatively small, constraining cross‑tissue analyses. Third, we used coarse age categories rather than continuous age predictions, which may obscure finer‑grained epigenetic age acceleration effects. Future work could extend this framework by integrating additional tissues and cohorts, using continuous age regression models, and performing systematic comparisons with published epigenetic clocks on overlapping CpG panels (Levine et al. \u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e2018\u003c/span\u003e; Rayevskiy et al. 2023).\u003c/p\u003e \u003cp\u003eDespite these limitations, our results highlight a practical, reproducible pipeline for cross‑tissue epigenetic age modeling using public data and modern interpretable machine learning (Hu et al. 2024). By focusing on compact CpG panels with clear functional annotation and cross‑tissue behavior, this approach may complement existing clocks and support the development of targeted assays for aging research and personalized risk assessment.\u003c/p\u003e \u003cp\u003eOur findings illustrate how compact CpG panels, derived through interpretable machine learning, can serve as methodological innovations for mammalian genomics. By bridging tissues, these models highlight biological networks that connect developmental and aging processes to disease pathways.\u003c/p\u003e \u003cp\u003eSuch cross-tissue epigenetic classifiers may ultimately support precision medicine by identifying individuals with youthful or accelerated methylation phenotypes, informing risk stratification and therapeutic interventions.\u003c/p\u003e"},{"header":"Declarations","content":"\u003cul\u003e\n \u003cli\u003e\u003cstrong\u003eFunding:\u003c/strong\u003e Self funded research - \u0026nbsp;No external funding was received.\u003c/li\u003e\n \u003cli\u003e\u003cstrong\u003eConflicts of Interest:\u003c/strong\u003e The authors declare no conflicts of interest.\u003c/li\u003e\n \u003cli\u003e\u003cstrong\u003eEthics Approval:\u003c/strong\u003e Not applicable.\u003c/li\u003e\n \u003cli\u003e\u003cstrong\u003eData Availability:\u003c/strong\u003e Public dataset GSE40279 (NCBI GEO).\u003c/li\u003e\n \u003cli\u003e\u003cstrong\u003eClinical Trial Registration:\u003c/strong\u003e This study does not involve a clinical trial and hence trial registration details are not applicable.\u003c/li\u003e\n \u003cli\u003e\u003cstrong\u003eConsent to Publish declaration:\u003c/strong\u003e Not applicable.\u003c/li\u003e\n \u003cli\u003e\u003cstrong\u003eConsent to Participate declaration:\u003c/strong\u003e Not applicable.\u003c/li\u003e\n\u003c/ul\u003e\n\u003ch2\u003e\u003cstrong\u003eAcknowledgements\u003c/strong\u003e\u003c/h2\u003e\n\u003cp\u003eWe thank the investigators who generated and deposited the GSE41826 and GSE40279 DNA methylation datasets in public repositories, and the maintainers of the GENCODE‑based 450K manifest used for CpG annotation. We also acknowledge the developers of the open‑source software libraries used in this work.\u003c/p\u003e\n\u003cp\u003eThis manuscript reflects the author\u0026apos;s original research, writing, and design efforts. AI tools were used sparingly to assist with polishing language and formatting visuals, but all scientific ideas, analyses, and interpretations were developed and validated by the author.\u003c/p\u003e\n\u003ch2\u003e\u003cstrong\u003eCompeting interests\u003c/strong\u003e\u003c/h2\u003e\n\u003cp\u003eThe authors declare no competing interests.\u003c/p\u003e\n\u003ch2\u003e\u003cstrong\u003eAuthor contributions\u003c/strong\u003e\u003c/h2\u003e\n\u003cp\u003eSRK conceived and designed the study; performed data preprocessing, survival modeling, and feature attribution analyses; developed the modular AI framework; prepared figures, tables, and visualizations; drafted and revised the manuscript.\u003c/p\u003e\n\u003cp\u003eHC provided supervision and guidance on study design and methodology, reviewed and refined the manuscript for scientific accuracy and clarity.\u003c/p\u003e\u003ch2\u003eData Availability\u003c/h2\u003e\u003cp\u003eAll data used in this study are publicly available from NCBI GEO under accession numbers GSE41826 and GSE40279. Processed feature matrices and code used for analysis are available upon reasonable request.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\u003cli\u003e\u003cspan\u003eBell CG et al 2019 DNA methylation aging clocks: challenges and recommendations. Genome Biol 20, 249\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eDuran I, Tsurumi A (2025) Evaluating transcriptional alterations associated with ageing and developing age prediction models based on the human blood transcriptome. Biogerontology 26:86\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHarris CJ et al (2020) Age\u0026ndash;associated DNA methylation patterns are shared between the hippocampus and peripheral blood cells. Front Genet 11:111\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHorvath S 2013 DNA methylation age of human tissues and cell types. Genome Biol 14, R115\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHorvath S, Raj K 2018 DNA methylation\u0026ndash;based biomarkers and the epigenetic clock theory of ageing. Nat Rev Genet 19, 371\u0026ndash;384\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHuang X et al 2020 TabTransformer: tabular data modeling using contextual embeddings. arXiv preprint arXiv:2012.06678.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHu C et al 2024 BS\u0026ndash;clock: advancing epigenetic age prediction with high\u0026ndash;resolution DNA methylation bisulfite sequencing data. Bioinformatics 40, btae656\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eJain N et al 2024 DNA methylation correlates of chronological age in diverse human tissue types. Epigenetics Chromatin 17, 25\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKaulagi SR, Chavan H 2026 CpG traceability and pathway mapping in epigenetic aging with explainable AI. \u003cem\u003eSciety Labs\u003c/em\u003e (in press)\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLevine ME et al (2018) An epigenetic biomarker of aging for lifespan and healthspan. Aging 10:573\u0026ndash;591\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLinsenfelder S et al (2025) Epigenetic editing at individual age\u0026ndash;associated CpGs affects the genome\u0026ndash;wide epigenetic aging landscape. Nat Aging 5:997\u0026ndash;1009\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMarioni RE et al 2015 DNA methylation age of blood predicts all\u0026ndash;cause mortality. Genome Biol 16, 25\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMendon\u0026ccedil;a V et al (2024) Exploring cross\u0026ndash;tissue DNA methylation patterns: blood\u0026ndash;brain CpGs as potential neurodegenerative disease biomarkers. Commun Biol 7:6591\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003ePipek OA, Csabai I (2022) A revised multi\u0026ndash;tissue, multi\u0026ndash;platform epigenetic clock model for methylation array data. J Math Chem 61:376\u0026ndash;388\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eRayevskiy S et al 2023 EpigeneticAgePipeline: an R package for comprehensive assessment of epigenetic age metrics from methylation microarrays. bioRxiv \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1101/2023\u003c/span\u003e\u003cspan address=\"10.1101/2023\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":true,"hideJournal":true,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"DNA methylation, Epigenetic age prediction, Epigenetic clock models, Cross-tissue epigenetics, Machine learning in genomics, Brain methylation signatures, Blood methylation biomarkers, Comparative mammalian epigenetics","lastPublishedDoi":"10.21203/rs.3.rs-8928610/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-8928610/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eEpigenetic age estimators based on DNA methylation provide powerful biomarkers of aging, but most clocks are tissue‑specific and rely on large CpG panels. Here we develop compact, interpretable machine learning models that capture age‑related DNA methylation patterns in human brain and blood, and we evaluate their cross‑tissue behavior using public Illumina 450K datasets. Using frontal cortex methylation profiles from GSE41826, we constructed an age‑group classifier (child vs adult/older) based on XGBoost and compared its performance with penalized logistic regression and random forests. After addressing class imbalance by up‑sampling, the brain XGBoost model achieved high accuracy and balanced precision\u0026ndash;recall. SHAP (SHapley Additive exPlanations) analysis identified a small panel of CpG sites with strong influence on age classification, several of which map to genes previously implicated in development and aging, and overlap with CpGs from established epigenetic clocks. We then applied the brain‑trained model to a large peripheral blood dataset (GSE40279) to test cross‑tissue generalization, using only the CpGs shared between tissues. Despite limited CpG overlap, the model reliably distinguished child‑like from adult‑like methylation patterns in blood and highlighted a subset of older donors with \u0026ldquo;youthful\u0026rdquo; methylation signatures. Finally, we built a blood‑specific three‑class age classifier (young adult, middle‑aged, older adult) and compared tree‑based models with a TabTransformer architecture, finding that gradient‑boosted trees combined with SHAP provided a favorable balance of accuracy and interpretability.\u003c/p\u003e \u003cp\u003eThese results demonstrate that compact, biologically interpretable CpG panels can illuminate conserved genotype\u0026ndash;phenotype relationships in mammalian aging, revealing cross-tissue methylation signatures with potential relevance for disease pathways and precision health applications.\u003c/p\u003e","manuscriptTitle":"Cross-Tissue Epigenetic Age Prediction with Compact CpG Panels","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2026-02-25 17:06:21","doi":"10.21203/rs.3.rs-8928610/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"0c8546b8-a362-4563-ab40-48d177ae3744","owner":[],"postedDate":"February 25th, 2026","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[],"tags":[],"updatedAt":"2026-03-07T14:24:51+00:00","versionOfRecord":[],"versionCreatedAt":"2026-02-25 17:06:21","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-8928610","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-8928610","identity":"rs-8928610","version":["v1"]},"buildId":"XKTyCvWXoU3ODBz1xrDgd","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.