Abstract
21
Whether distinct visible aging traits, e.g., wrinkling, pigmentation, and inflammation, reflect 22
shared or independent epigenetic programs remains unknown; existing clocks compress aging 23
into a single chronological axis, leaving the phenotype -specific architecture of cutaneous aging 24
uncharacterized. Here, we integrate AI -derived facial phenotypes with skin DNA methylation 25
profiles from 706 individuals to develop EpiVision, a panel of 21 epigenetic predictors spanning 26
structural, pigmentary, inflammatory, and textural aging traits. Predictors reveal shared and trait-27
specific pathways, including developmental patterning, epithelial remodeling, hormonal signaling, 28
and UV damage responses, and capture environmentally induced acceleration in sun -exposed 29
skin alongside lifestyle and topical treatment–associated variation. These findings establish that 30
visible skin aging comprises molecularly distinct axes with shared regulatory substrates and trait-31
specific drivers, providing a scalable epigenetic framework for intervention evaluation and aging 32
biology research. 33
34
35
.CC-BY-NC 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted March 16, 2026. ; https://doi.org/10.64898/2026.03.13.711642doi: bioRxiv preprint
Introduction
36
37
Human skin aging is a highly visible and clinically relevant phenotype shaped by intrinsic biology 38
and extrinsic exposures. Yet current approaches to measuring skin aging remain fragmented. 39
Imaging-based assessments capture visible structural features, whereas biophysical and 40
molecular assays often require invasive sampling and typically focus on a limited set of endpoints 41
such as elasticity, hydration, collagen content, or barrier function. As a result, existing measures 42
tend to quantify isolated aspects of skin aging rather than providing a comprehensive, 43
multidimensional representation of how skin ages across structural, pigmentary, inflammatory, 44
and textural domains. Moreover, few studies integrate visible phenotypes with molecular readouts 45
or link these measures to longitudinal outcomes. 46
47
Recent computer -vision systems trained on large dermatology -labeled image corpora enable 48
standardized, high-throughput extraction of continuous facial skin phenotypes from photographs, 49
offering an opportunity to study skin aging as a multidimensional, quantitative trait at scale 1. 50
However, these have not been linked to molecular biomarkers, leaving a necessary step for 51
translational research for mechanisms of skin aging. 52
DNA methylation (DNAm) has emerged as one of the most reproducible molecular readouts of 53
aging across human populations. Early epigenetic clock models demonstrated that DNAm 54
patterns can predict chronological age with high accuracy, establishing DNAm as an integrative 55
biomarker that captures cumulative cellular history across tissues and environmental contexts. 56
However, as DNAm clocks became widely used, it became clear that chronological -age training 57
alone does not guarantee optimal sensitivity to physiology, disease risk, or intervention response. 58
Because first-generation clocks are optimized to minimize error against calendar age, then came 59
the development of second-generation epigenetic clocks trained not on chronological age, but on 60
phenotypes intended to reflect morbidity and mortality risk. DNAm PhenoAge was trained on a 61
composite “phenotypic age” derived from clinical biomarkers and chronological age, and DNAm 62
GrimAge incorporated DNAm surrogates for plasma proteins and smoking exposure to infer 63
biologically meaningful variation that is independent of chronological time, such as heterogeneity 64
in healthspan, tissue -specific decline, or responsiveness to therapeutic intervention. Together, 65
these advances underscored that phenotype -relevant training, rather than chronological age 66
minimization, is necessary to capture biologically meaningful epigenetic variation; a principle that 67
remains largely unapplied in the context of skin aging 2–5. 68
Despite rapid progress in blood-based epigenetic predictors, analogous phenotype-trained DNAm 69
clocks for skin aging remain comparatively underdeveloped. Most widely used skin DNAm clocks 70
remain anchored to chronological time rather than to skin -relevant functional phenotypes. In 71
parallel, the field has begun to develop skin -specific and skin-compatible approaches that lower 72
barriers to sampling and increase sensitivity to cutaneous biology. “MolClock” introduced a skin-73
focused methylation approach aimed at quantifying skin aging dynamics and treatment effects in 74
a skin-relevant context 6. More recently, non-invasive epidermal DNAm profiling from facial tape 75
strips coupled with enzymatic methyl -seq enabled highly accurate chronological age prediction 76
directly from epidermal DNA and demonstrated sensitivity to experimental rejuvenation signals, 77
highlighting the feasibility and translational promise of non -invasive skin epigenomics 7,8. 78
Complementing these approaches, newer work has also explored DNAm clocks tied to skin -79
relevant phenotypes (for example, visual -age progression and wrinkle -grade modeling), 80
underscoring growing interest in phenotype-forward skin aging biomarkers 9. 81
Existing epigenetic clocks vary widely in tissue specificity, training strategy, sample size, and CpG 82
coverage. Early first -generation clocks such as Horvath’s multi -tissue clock (~8,000 samples; 83
.CC-BY-NC 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted March 16, 2026. ; https://doi.org/10.64898/2026.03.13.711642doi: bioRxiv preprint
450K array) 10 and the Horvath Skin and Blood clock (~896 samples) 11 were trained primarily on 84
chronological age. Subsequent multi-tissue models, including AltumAge (~16,000 samples) 12 and 85
RetroClock (~18,000 samples), 13 expanded training cohorts and probe coverage but remained 86
largely time-based. Skin-focused approaches such as MolClock (~508 samples) 14, Mitra et al. 87
(epidermal EM -seq; ~4 million CpGs) 15, and commercial second -generation clocks (e.g., 88
Beiersdorf AG) differ substantially in platform depth (450K to >1M CpGs), training size (~300–800 89
samples), Cohort demographics (only women) and phenotype focus. Collectively, these models 90
highlight rapid technical progress but also underscore the absence of large -scale, phenotype-91
trained, imaging-linked skin methylation frameworks grounded explicitly in multidimensional facial 92
aging traits and capable of supporting evidence -based, personalized approaches to long -term 93
skin health. 94
A key opportunity, and persistent gap, is to connect visible facial aging phenotypes to molecular 95
aging signatures in skin using scalable, non -invasive sampling. “Perceived age” from facial 96
images is not merely cosmetic: human -rated perceived age predicts short -term mortality in 97
longitudinal cohorts, indicating that facial appearance can encode health -relevant information 98
beyond chronological age 16,17. Consistent with this, blood DNAm has been associated with 99
perceived facial aging, supporting the hypothesis that methylation markers can reflect processes 100
linked to facial aging phenotypes. Advances in deep learning further suggest that facial 101
photographs can support automated estimation of age- and outcome-relevant signals, motivating 102
rigorous efforts to anchor image-derived phenotypes to biological substrates. 103
A central unresolved question in aging biology is whether the distinct visible features of facial 104
aging (e.g., wrinkling, pigmentation change, inflammation, loss of uniformity) reflect a single 105
shared biological process or independent molecular programs with trait -specific regulation. We 106
address this question directly by integrating AI -derived facial aging phenotypes from HAUT 107
imaging with matched DNA methylation profiles from minimally invasive curette skin samples 108
across 706 individuals. Using this framework, we developed EpiVision, a panel of 21 epigenetic 109
predictors spanning perceived age, structural features, pigmentation, inflammation, and texture, 110
trained on quantitative skin -relevant outcomes rather than chronological time. Alongside these 111
predictors, we performed independent epigenome-wide association analyses for each phenotype 112
to characterize the biological architecture connecting methylation variation to visible aging traits 113
and to quantify the degree of shared versus distinct epigenetic structure across aging axes. 114
Together, these analyses ask not only whether methylation encodes phenotypic information, but 115
which molecular pathways mediate that encoding, and whether different faces of aging are written 116
in the same or different parts of the epigenome. 117
Results
118
119
Study collection and design 120
Facial skin exhibits diverse aging phenotypes that reflect both intrinsic biology and cumulative 121
environmental exposures. To capture DNA methylation signatures underlying these visible 122
features, we developed EpiVision, a methylation-based framework that estimates facial skin aging 123
phenotypes from dermal samples paired with standardized facial imaging. We profiled 744 dermal 124
curette samples collected from the cheekbone region using the Illumina EPICv2 BeadChip, 125
yielding 911,580 CpG sites (97.4% autosomal, 2.6% sex chromosomes) per sample after quality 126
control. In parallel, we analyzed 1,463 standardized facial image submissions and quantified 19 127
facial aging-related phenotypes using the HAUT artificial intelligence platform. After linkage of 128
methylation and imaging phenotypes, 706 individuals had matched data and were randomly 129
divided into a training set (N=406–564, 80%) and an independent testing set (N=102–142, 20%) 130
with comparable age and Fitzpatrick skin type distributions ( Fig. 1, Supplementary File 1 ). 131
.CC-BY-NC 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted March 16, 2026. ; https://doi.org/10.64898/2026.03.13.711642doi: bioRxiv preprint
EpiVision generated a panel of epigenetic predictors that were evaluated in the held-out test set 132
using mean absolute error and Spearman correlation. To examine lifestyle and medical 133
correlates, we linked survey data from 282 participants (33 exposures) and tested exposure –134
predictor associations using age/sex -adjusted epigenetic age acceleration where applicable. 135
Separately, we performed epigenome -wide association analyses within the matched cohort 136
(N=706) to identify CpG sites associated with specific facial aging features, adjusting for cell-type 137
heterogeneity estimated using EpiSCORE. 138
139
HAUT Imaging provides baseline for skin health and aging 140
To establish the phenotypic baseline for downstream methylation analyses, we characterized 19 141
AI-derived facial aging and skin -quality traits across 1,413 individuals from 1,463 imaging 142
submissions using the HAUT platform. The subset with paired methylation data (n = 706) showed 143
highly similar phenotype distributions to the full imaging cohort, indicating representativeness 144
(Fig. 2A). Among participants with available demographic data, mean chronological age was 40.3 145
± 16.3 years (n = 247), and Fitzpatrick skin types among 481 participants with available data were 146
most commonly Type II (n = 154) and Type III (n = 143) ( Fig. 2B–C). As expected, structural 147
aging features correlated with chronological age (n = 191 –246), with perceived age (r = 0.866) 148
and perceived eye age (r = 0.826) showing the strongest positive associations, while overall 149
uniformness (r = −0.529), lines (r = −0.409), and sagging (r = −0.322) declined with age (Fig. 2D). 150
Associations with Fitzpatrick type were also observed (n = 364 –468), with irritation (ρ = 0.488) 151
and erythema (HAUT redness score; ρ = 0.470) positively correlated with higher phototypes, and 152
sun spots (ρ = −0.221) and pigmentation score (dyschromia and uneven skin tone; ρ = −0.175) 153
inversely correlated (Fig. 2E). 154
155
Skin Cell Deconvolution of methylation data 156
Cell-type fractions were estimated for 744 samples using EpiSCORE (Figure S1). Differentiated 157
keratinocytes represented the largest component (median ≈ 0.32), followed by endothelial cells 158
(median ≈ 0.25) and undifferentiated keratinocytes (median ≈ 0.21). Macrophage and T -cell 159
fractions were present with substantial inter -individual variability, whereas fibroblast and 160
melanocyte fractions were generally low. Overall, the inferred composition, dominated by 161
keratinocytes with a notable endothelial contribution, is consistent with superficial sampling that 162
captures predominantly epidermal cell populations. These cell-type estimates were incorporated 163
as covariates in downstream methylation analyses to account for cellular heterogeneity across 164
samples. 165
166
Development and Performance of Skin Trait Predictors 167
Skin DNA methylation encodes multidimensional facial aging phenotypes with strong predictive 168
fidelity, demonstrating that the epigenome captures meaningful visible aging information beyond 169
chronological time. To characterize this systematically, we developed epigenetic predictors for 21 170
skin-related phenotypes using genome-wide DNA methylation profiles from the Illumina EPICv2 171
platform. Predictors were trained on unadjusted phenotype scores using the full CpG array 172
including sex-linked probes (median 4.8% of selected features per model, range 0.5–33.0%), with 173
.CC-BY-NC 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted March 16, 2026. ; https://doi.org/10.64898/2026.03.13.711642doi: bioRxiv preprint
sex and age adjustment reserved for downstream association analyses (see Methods). Models 174
for inflammatory traits such as irritation showed the greatest enrichment for sex -linked features, 175
consistent with known sex differences in cutaneous immune regulation. Models were trained in 176
an 80/20 split framework and evaluated in held -out test sets using Spearman correlation and 177
mean absolute error (Table 1). Multiple machine learning approaches were assessed, and tree-178
based ensemble methods consistently outperformed linear models, with LightGBM selected for 179
17 of 21 predictors (81%) and Random Forest for the remaining four. Age -related predictors 180
showed the strongest performance, including the sun-exposure–informed age model (ρ = 0.836; 181
MAE = 5.30 years), perceived chronological age (ρ = 0.819), and perceived eye age (ρ = 0.765). 182
Several non-age phenotypes also achieved strong correlations, including irritation (ρ = 0.738), 183
uniformness of the left cheek (ρ = 0.709), and redness (ρ = 0.705), demonstrating that methylation 184
profiles capture multidimensional aspects of visible skin biology beyond chronological time. 185
To evaluate the contribution of expanded probe coverage, we compared EPICv2-specific models 186
with models restricted to CpGs shared between EPICv1 and EPICv2 (Fig. 3A–D). EPICv2 models 187
demonstrated superior overall performance, with an average 10.0% increase in Spearman 188
correlation (0.546 vs. 0.601) and 4.6% reduction in MAE (9.83 vs. 9.38) ( Fig. 3A,B). Applying a 189
≥5% threshold for meaningful change, EPICv2 models showed meaningful MAE improvement in 190
45% of targets with no meaningful degradation in any target, and meaningful Spearman 191
improvement in 40% of targets with meaningful degradation in 20% ( Fig. 3C,D). Improvements 192
were most pronounced for texture-related traits, including lacrimal grooves (ρ = 0.344 to 0.625), 193
overall uniformness, and pores. Although a small subset of predictors (4/20) showed ≥5% 194
decreases in correlation, no models exhibited meaningful increases in MAE. Complete 195
performance metrics for all EPICv2 and V1 –V2 overlap models are provided in Supplementary 196
File 2. 197
Exposure and Clinical Correlates of EpiVision Predictors 198
Building on the performance and biological validity of the EpiVision predictors, we examined 199
whether methylation-derived facial aging measures captured expected associations with self -200
reported lifestyle and clinical factors ( Fig. 4A –B). Across 282 participants with survey data, 201
exposures were evaluated against 21 methylation -based predictors using age/sex -adjusted 202
regression models to isolate exposure -specific effects from demographic variation captured by 203
the unadjusted predictors, with age -related outcomes analyzed as epigenetic age acceleration 204
(EAA). 205
Lifestyle exposures showed phenotype-specific rather than global aging associations, suggesting 206
that behavioral factors influence distinct dimensions of visible skin biology. Greater water intake 207
was associated with reduced dark circles (β = −0.206, p = 0.013), and higher self-reported diet 208
quality was associated with fewer acneiform breakouts (HAUT breakouts score) in the unadjusted 209
model (β = −0.204, p = 0.001), with a consistent direction after age/sex adjustment (β = −0.152, 210
p = 0.065). Sleep quality was significantly associated with fine lines after adjustment (β = 0.194, 211
p = 0.011), with differing effect patterns across line subtypes, indicating that even a single 212
.CC-BY-NC 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted March 16, 2026. ; https://doi.org/10.64898/2026.03.13.711642doi: bioRxiv preprint
behavioral exposure maps onto structurally distinct aspects of facial aging. Tobacco use 213
demonstrated nominal correlations with pigmentation and line traits in smaller subgroups but did 214
not survive covariate adjustment. The specificity of these associations across phenotypically 215
distinct predictors supports the discriminant validity of the EpiVision framework and highlights the 216
value of decomposing skin aging into trait -level components rather than relying solely on 217
composite aging measures. 218
Among clinical and medication variables, hyaluronic acid use was the only exposure significantly 219
associated with epigenetic age acceleration, showing slower perceived -age acceleration across 220
both the standard (β = −0.198, p = 0.015) and sun -exposure-informed (β = −0.161, p = 0.048) 221
perceived-age models (Fig. 4B). At the phenotype level, retinoid use was associated with reduced 222
pore size appearance (β = −0.184, p = 0.022), spironolactone use with reduced deep line 223
appearance (β = −0.174, p = 0.027), consistent with its known anti-androgenic effects on dermal 224
remodeling, and testosterone use with increased sagging skin (β = 0.119, p = 0.028), an 225
unexpected finding that may reflect residual confounding by age or hormonal context and 226
warrants prospective replication . Notably, unadjusted age analyses revealed numerous 227
significant associations with age -related predictors, including testosterone, hormone therapy, 228
diabetes, and cancer history (all p < 0.005), but these attenuated after EAA adjustment, consistent 229
with confounding by chronological age and reinforcing the necessity of acceleration -based 230
modeling to isolate exposure -specific effects. Many nominal associations attenuated after 231
covariate adjustment, underscoring the influence of age and exposure imbalance, and several 232
large apparent effects were driven by single exposed individuals and are therefore considered 233
exploratory. All association results can be found in Supplementary File 3. 234
To assess the impact of chronic sun exposure on biological aging, we analyzed paired sun -235
exposed and sun -protected skin samples from 20 participants. The EpiVision perceived -age 236
predictor showed a mean 1.30-year acceleration in sun-exposed tissue relative to sun-protected 237
tissue (paired t -test; Cohen's d_z = 0.50, p = 0.042; Fig. 4D ), providing molecular evidence 238
consistent with chronic UV exposure advancing the biological age of skin. The eyes-age predictor 239
showed a directionally consistent and nominally larger acceleration (+1.61 years, p = 0.013; Fig. 240
4C); although trained on periorbital aging phenotypes, the methylation features underlying this 241
predictor are derived from skin broadly and likely capture UV -responsive epigenetic programs 242
shared across cutaneous sites rather than periorbital-specific anatomy. 243
Beyond epigenetic acceleration, sun exposure broadly altered visible skin phenotypes 244
(Supplementary File 4). Nine of 18 non-age traits (50%) differed significantly between sites, with 245
the largest effects observed for erythema, sagging skin, pigmentary change, and lines, all 246
demonstrating large effect sizes (|d| > 0.8). Acneiform breakouts were more prevalent in sun -247
exposed skin, consistent with UV -induced disruption of barrier integrity, increased sebaceous 248
activity, and follicular occlusion in photodamaged tissue. The convergence of structural, 249
pigmentary, and inflammatory changes across sites indicates that chronic UV exposure perturbs 250
multiple biological axes simultaneously rather than isolated features. Together, these paired -251
sample analyses demonstrate that chronic sun exposure accelerates both molecular epigenetic 252
.CC-BY-NC 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted March 16, 2026. ; https://doi.org/10.64898/2026.03.13.711642doi: bioRxiv preprint
aging and a broad spectrum of visible skin phenotypes, and establish the EpiVision perceived -253
age predictor as a biologically grounded, skin -appropriate molecular readout of environmentally 254
induced aging. 255
Identification of differentially methylated positions associated with facial skin features 256
Having established that skin methylation encodes phenotype -level aging information, we next 257
asked what biological processes underlie this connection and whether distinct visible aging traits 258
share a common epigenetic substrate or arise from independent molecular programs. To address 259
this, we conducted a parallel, independently designed epigenome-wide association study (EWAS) 260
for each of the 19 HAUT phenotypes. Critically, this analysis was structured to adjust for 261
confounders that predictive models absorb implicitly but do not decompose: technical batch 262
effects (Illumina BeadChip), inter -individual variation in skin cell -type composition (EpiSCORE; 263
six fractions), and biological sex. This covariate-adjusted design isolates trait-specific epigenetic 264
signals from artifacts of cellular heterogeneity and demographic imbalance. The number of 265
significant DMPs varied substantially across traits, ranging from 3,050 for lacrimal grooves (0.33% 266
of tested sites) to 367,506 for acneiform breakouts (40.3%), reflecting marked differences in 267
epigenetic architecture across phenotypes. 268
To characterize the shared epigenetic architecture underlying these phenotypes, we performed a 269
cross-trait pleiotropy analysis across all 19 EWAS traits (Supplementary File 5; two traits from the 270
original 21-trait panel, perceived age standard error and quality front, were excluded from the 271
EWAS owing to insufficient phenotypic variance after quality filtering). Of 586,859 unique CpG 272
sites reaching significance (FDR < 0.05) in at least one trait, 43.5% were trait -specific, 48.4% 273
were significant in two to four traits, and 7.8% in five to nine traits. Only 1,483 CpGs (0.25%) were 274
significant in ten or more traits, a distribution that departs dramatically from the expectation under 275
independence (χ² = 2,118,347, df = 3, P < 10 ⁻³⁰⁰). This small set of highly pleiotropic loci, 276
significant across more than half the phenotype panel, represents a shared epigenetic substrate 277
of facial aging. Of these, 55.0% mapped to intergenic regions, suggesting that long -range 278
regulatory elements play a substantial role in coordinating multi -trait aging biology. Pairwise 279
Jaccard similarity of significant CpG sets revealed biologically coherent clustering: perceived age 280
and perceived eye age shared the greatest overlap (J = 0.78), the three uniformness sub-regions 281
formed a tight cluster (J = 0.41 –0.62), pigmentation and sun spots (solar hyperpigmentation) 282
grouped together (J = 0.47), lines and sagging co-clustered (J = 0.37), and irritation and erythema 283
formed a distinct inflammatory module (J = 0.31). Nearly all highly pleiotropic CpGs (99.8%) 284
showed mixed directionality, with hypermethylation in some traits and hypomethylation in others, 285
rather than uniform effects. Because all models adjusted for six EpiSCORE cell-type fractions and 286
technical batch, this mixed directionality is unlikely to reflect compositional confounding and 287
instead suggests that pleiotropic loci participate in genuinely distinct, trait -specific regulatory 288
programs. 289
These multi-trait loci represent candidate targets for future functional and intervention studies: 290
modulating a pleiotropic locus could, in principle, exert coordinated effects across structural, 291
pigmentary, and inflammatory aging axes simultaneously. The most broadly associated CpG was 292
cg16867657 at the ELOVL2 locus, significant in 15 of 19 traits spanning deep lines, fine lines, 293
perceived age, pigmentary change, erythema, sagging, and all four uniformness measures, 294
among others. Other highly pleiotropic loci included cg08870112 (RANBP17; 14 traits), 295
cg07553761 (TRIM59; 14 traits), and cg23571812 (FOXG1; 13 traits), all established epigenetic 296
aging biomarkers. Recurrent gene annotations among the 1,483 highly pleiotropic CpGs included 297
ZIC1 (13 CpGs), PRRT1 (11 CpGs), ZNF154 (7 CpGs), and FOXG1 (6 CpGs), implicating 298
developmental transcription factors and zinc -finger proteins in the shared architecture of skin 299
.CC-BY-NC 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted March 16, 2026. ; https://doi.org/10.64898/2026.03.13.711642doi: bioRxiv preprint
aging. The direction of effect also differed systematically by phenotype category: structural aging 300
features such as lines and sagging showed predominantly hypermethylated DMPs, consistent 301
with progressive locus-specific gain of methylation with aging, while inflammatory traits including 302
irritation and acneiform breakouts showed predominantly hypomethylated positions, suggesting 303
loss of epigenetic regulation at immune-relevant loci. Full DMP results for all 19 traits are provided 304
in Supplementary File 5. 305
Perceived age and perceived eye age. 306
For perceived age (ρ = 0.82), enrichment was dominated by neuronal development and 307
morphogenesis terms including axonogenesis (FDR < 10⁻²²), alongside epithelial–mesenchymal 308
transition (EMT; FDR = 3.43×10 ⁻⁴), apical junction, NOTCH signaling, and estrogen response 309
programs; this implicates coordinated tissue -architectural remodeling rather than isolated 310
molecular damage. Many enriched neuronal guidance genes (SLIT2, ROBO1/2, SEMA3A/F, 311
NCAM1) also have established roles in cutaneous innervation and epithelial organization, 312
underscoring shared developmental programs between neural and skin tissues. Perceived eye 313
age (ρ = 0.765) showed a largely overlapping architecture, with developmental patterning and 314
axonogenesis (FDR < 10⁻²⁴), EMT, and estrogen response as top enriched terms, with additional 315
myogenesis enrichment consistent with periorbital muscle –skin structural interactions. The 316
substantial overlap between these two aging phenotypes, corroborated by their exceptionally high 317
Jaccard similarity (J = 0.78), indicates that shared developmental and structural epigenetic 318
programs underlie aging across anatomically distinct facial regions. 319
Additional traits (ρ > 0.7). Irritation (ρ = 0.74) highlighted innate immune activation, 320
cytokine production, and neurogenesis regulation pathways, pointing to epigenetic encoding of 321
inflammatory skin states. Redness (ρ = 0.70) showed enrichment for neurogenesis regulation and 322
vascular processes including angiogenesis and vascular permeability, reflecting the 323
neurovascular biology underlying cutaneous erythema. Uniformness (left cheek; ρ = 0.71) 324
implicated epithelial organization and keratinocyte turnover programs. Traits with weaker 325
epigenetic–phenotypic correlations (ρ < 0.70), including uniformness overall (ρ = 0.68), pores (ρ 326
= 0.68), lacrimal grooves (tear trough areas; ρ = 0.63), and pigmentation (ρ = 0.55), showed 327
analogous but attenuated enrichment themes involving developmental signaling and epithelial 328
architecture. For pigmentation, despite the more modest predictive correlation, pathway 329
enrichment revealed UV -response and melanocyte morphogenesis programs consistent with 330
photoaging-driven dysregulation of pigment biology, including both hyper- and hypopigmentation. 331
Across all traits, developmental signaling, epithelial junction biology, estrogen response, 332
and UV-associated programs recurred as shared substrates of facial aging, while trait -specific 333
signals, such as UV response in pigmentary change, metabolic stress in irritation, and vascular 334
programs in erythema, indicate layered biological specialization atop this common aging 335
architecture. CpG loci and pathways enriched across multiple phenotypes, including the 336
developmental and epithelial -junction programs shared by perceived age, uniformness, and 337
erythema, may represent particularly actionable intervention targets, as modulating them would 338
be expected to influence multiple aging dimensions simultaneously. Complete pathway 339
enrichment results across GO, KEGG, and Hallmark databases (28,583 enriched terms across 340
19 traits) are provided in Supplementary File 6. 341
Discussion
342
Visible human facial aging is not a single biological process. The findings presented here 343
demonstrate that the skin epigenome encodes distinct molecular programs underlying structural 344
.CC-BY-NC 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted March 16, 2026. ; https://doi.org/10.64898/2026.03.13.711642doi: bioRxiv preprint
aging, pigmentary change, inflammation, and textural change, programs that are partially shared 345
across traits but that also carry trait-specific biological signatures. By integrating AI-derived facial 346
phenotypes with matched skin DNA methylation profiles across 706 individuals, EpiVision 347
establishes that methylation variation in superficial skin samples captures multidimensional aging 348
information with biological coherence, environmental sensitivity, and responsiveness to clinical 349
exposures. Three primary advances emerge: skin methylation accurately predicts specific facial 350
aging dimensions well beyond what chronological age alone explains; trait -specific epigenome-351
wide analyses reveal biologically coherent and partially independent molecular architectures 352
underlying each visible aging axis; and EpiVision predictors capture expected relationships with 353
UV exposure, lifestyle factors, and dermatologic interventions, validating both their biological 354
grounding and practical utility. 355
A particularly compelling observation concerns the relationship between EpiVision's trait 356
architecture and clinically recognized photoaging subtypes. Sachs et al. characterized two 357
divergent forms of UV-induced skin aging: hypertrophic photoaging, marked by coarse wrinkling, 358
skin thickening, and sallowness, and atrophic photoaging, characterized by fine wrinkling, 359
erythema, telangiectasia, and elevated keratinocyte cancer risk 18. Subsequent histological work 360
confirmed distinct structural features underlying these phenotypes, including differential solar 361
elastosis and collagen VII distribution 19. EpiVision's EWAS enrichment signatures map onto this 362
clinical axis with notable specificity. The erythema and irritation predictors showed neurogenesis 363
regulation among their most significantly enriched processes, alongside innate immune 364
activation, vascular processes, and angiogenesis. This profile mirrors the neurogenic and 365
vascular inflammatory biology described as hallmarks of atrophic photoaging. Conversely, the 366
lines, sagging, and deep lines predictors were enriched for extracellular matrix organization, 367
collagen fibril organization, and fibroblast migration, reflecting the structural ECM remodeling and 368
elastotic thickening that define the hypertrophic phenotype. Although the current cohort lacks 369
formal AP/HP clinical classification, the convergence of these orthogonal enrichment signatures 370
suggests that EpiVision's trait architecture may already capture the molecular underpinnings of 371
this clinical dichotomy. Prospective integration of AP/HP classification with EpiVision outputs 372
would provide a framework for testing whether distinct epigenetic signatures predict photoaging 373
subtype or modulate responsiveness to subtype-specific interventions. 374
A key insight from this work is that skin DNA methylation encodes biologically meaningful 375
information about visible aging phenotypes beyond what is predicted by chronological age. This 376
connection has not previously been demonstrated at this phenotypic resolution. Age -related 377
predictors achieved robust test -set correlations (sun -exposure–informed model, ρ = 0.836; 378
perceived age, ρ = 0.819; eye age, ρ = 0.765), while non -age phenotypes including irritation, 379
erythema, and regional uniformness exceeded ρ = 0.70, indicating that methylation captures 380
multidimensional cutaneous biology beyond chronological time. 381
The parallel EWAS analyses, performed independently of model training and adjusted for batch 382
effects, cell -type composition, and sex, provide biological grounding for the predictive 383
performance. For phenotypes with strong epigenetic coupling, enrichment analyses converged 384
on developmental and structural pathways. Perceived age, for instance, was enriched for 385
neuronal development alongside epithelial–mesenchymal transition, apical junction, and estrogen 386
response programs, implicating coordinated tissue -architectural remodeling rather than passive 387
damage accumulation. Several enriched neuronal guidance genes, including SLIT2, ROBO1/2, 388
SEMA3A/F, and NCAM1, also play established roles in cutaneous innervation and epithelial 389
organization, underscoring shared developmental programs between neural and skin tissues. 390
Trait-specific signals aligned with known dermatologic drivers, including UV -response pathways 391
for pigmentation and innate immune activation for irritation. Cross -trait pleiotropy analysis 392
.CC-BY-NC 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted March 16, 2026. ; https://doi.org/10.64898/2026.03.13.711642doi: bioRxiv preprint
identified shared "core" aging loci: cg16867657 (ELOVL2) was significant in 15 of 19 traits, 393
consistent with the established role of ELOVL2 methylation as a pan-tissue aging marker 20. The 394
direction of effect also differed systematically. Structural features showed predominantly 395
hypermethylated DMPs consistent with progressive locus -specific methylation gain, while 396
inflammatory traits showed predominantly hypomethylated positions suggesting loss of epigenetic 397
regulation at immune-relevant loci. 398
We validated EpiVision's biological sensitivity through paired sun-exposure analysis and survey-399
linked lifestyle associations. In 20 participants with matched sun-exposed and non–sun-exposed 400
samples, sun-exposed skin showed accelerated perceived age (+1.30 years, p = 0.042) alongside 401
deterioration in 9 of 18 phenotypic traits. In the lifestyle analyses, exposures showed phenotype-402
specific rather than global aging associations: water intake was linked to reduced dark circles, 403
sleep quality to fine line measures, and self reported topical hyaluronic acid use was the only 404
exposure significantly associated with epigenetic age acceleration across both perceived -age 405
models. This pattern, where lifestyle factors influence individual trait predictors rather than shifting 406
a composite clock, reinforces the value of decomposing skin aging into trait -level components 407
and positions EpiVision as a more informative endpoint than single -score approaches. Notably, 408
numerous raw associations with age -related predictors (testosterone, HRT, diabetes, cancer 409
history) attenuated after EAA adjustment, reinforcing the necessity of acceleration -based 410
modeling to isolate exposure-specific effects from chronological age confounding. 411
EpiVision extends the phenotype -forward direction introduced by Bienkowska et al., who 412
developed VisAgeX, a skin -specific second -generation epigenetic clock predicting visual age 413
progression from epidermal methylation profiles of female volunteers 21. While VisAgeX targeted 414
wrinkle grade and visual facial age using 450K -array data from a single -sex cohort, EpiVision 415
broadens both the phenotypic scope (21 predictors spanning structural, pigmentary, 416
inflammatory, and textural axes) and population diversity (both sexes, multiple Fitzpatrick types) 417
while leveraging EPICv2 coverage for enhanced regulatory CpG representation. EpiVision also 418
complements earlier skin -specific chronological -age clocks 22, senotherapeutic validation 419
platforms 23, and emerging non-invasive epidermal methylomics approaches, including tape-strip 420
whole-genome profiling 24 and sequencing-based epigenetic age predictors optimized for non -421
invasively collected epidermis 25. Together, these advances position skin methylation as an 422
increasingly tractable molecular layer for both aging research and intervention evaluation. 423
Several limitations warrant consideration. The cross-sectional design precludes causal inference 424
and assessment of within -person trajectories; longitudinal tracking will be essential to quantify 425
intervention responsiveness and determine whether EpiVision captures true biological change 426
over time. The cohort's ancestry and Fitzpatrick diversity is limited (Types V–VI comprised 6.5% 427
of the methylation cohort), and expansion to more diverse populations is needed to establish 428
generalizability. Integration with transcriptomic and proteomic layers would connect methylation 429
shifts to functional outputs and strengthen mechanistic interpretation. Continued refinement of 430
deconvolution and sampling protocols will help reduce residual confounding by cellular 431
heterogeneity. Biophysical measures such as transepidermal water loss (TEWL) and skin 432
hydration were not captured in the current study and would complement methylation -derived 433
readouts in future work. Future work should establish whether EpiVision predictors capture true 434
biological trajectories across diverse populations and serve as scalable endpoints for 435
mechanistically informed, data-driven skin-aging models. More broadly, the demonstration that 436
skin methylation encodes biologically distinct aging axes, rather than a single chronological signal, 437
opens a path toward understanding facial aging as a decomposable biological phenotype, with 438
implications for how epigenetic tools are designed, validated, and applied across aging research, 439
clinical dermatology, and regenerative dermatologic practice. 440
.CC-BY-NC 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted March 16, 2026. ; https://doi.org/10.64898/2026.03.13.711642doi: bioRxiv preprint
Methods
441
442
Study Population and Sample Collection 443
Facial skin samples were collected from participants enrolled in a skin aging cohort, 444
including the TruDiagnostic cohort, Integrative Skin Research (led by Dr. Raja Sivamani), and 445
Boynton Beach Dermatology / Beauty Within (led by Dr. Joseph C. Gretzula), using sterile dermal 446
curettes (Integra Miltex, Integra LifeSciences), an instrument designation reflecting the tool rather 447
than the tissue depth, to obtain predominantly epidermal skin cells, as confirmed by EpiSCORE 448
deconvolution. Skin samples were collected from the cheekbone region and subsequently 449
transferred to collection vials containing DNA/RNA Shield stabilization buffer (Zymo Research, 450
Irvine, CA) and shipped to the laboratory for processing. DNA was extracted from collected skin 451
samples using standard protocols and quantified prior to bisulfite conversion. 452
Participants completed questionnaires, facial images, and provided informed consent prior 453
to participation. The questionnaires captured demographic information as well as relevant skin 454
history, health history, and potential environmental exposures. All study procedures were 455
approved by the relevant institutional review boards. The final analytic sample comprised 508 to 456
706 individuals per phenotype depending on data availability (see Results for phenotype-specific 457
sample sizes). 458
459
Facial Phenotyping Using HAUT Artificial Intelligence Platform 460
461
Image Acquisition. Participants submitted standardized facial photographs (selfies) using 462
their personal mobile devices. Images were captured in three standard orientations: frontal view, 463
left profile, and right profile. Participants were instructed to photograph themselves in well -lit 464
environments without makeup, with hair pulled back to expose the full face, and with neutral facial 465
expressions. Multiple submissions from the same participant were allowed over time to 466
accommodate technical failures or quality issues. 467
468
Automated Phenotype Extraction. All facial images were processed using HAUT Face 469
Analysis 3.0 software (Haut.AI, Tallinn, Estonia), an artificial intelligence-powered computer vision 470
platform designed for automated, objective assessment of skin aging and appearance 471
characteristics from digital photographs. The HAUT platform employs deep convolutional neural 472
networks trained on large -scale annotated datasets of facial images labeled by expert 473
dermatologists for a comprehensive array of skin features. 474
For each submitted image, the HAUT pipeline performs the following automated steps: (1) 475
facial landmark detection to identify anatomically defined regions of interest (forehead, periorbital 476
area, cheeks, perioral region, nose); (2) image quality assessment across multiple technical 477
dimensions including resolution, blur, exposure, illumination uniformity, proper facial positioning, 478
absence of occlusions (eyeglasses, hair, hands), and sufficient visible skin area; (3) feature -479
specific detection using specialized deep learning models for each phenotype (e.g., wrinkle 480
detection models identify line segments and quantify depth, length, and spatial distribution; 481
pigmentation models segment hyperpigmented lesions and compute density and color metrics; 482
inflammatory lesion models identify and count pimples, papules, and pustules); and (4) scoring 483
on continuous scales (0 -100 for most traits, with higher scores generally indicating better skin 484
quality or less severe manifestation; or estimated age in years for perceived aging phenotypes). 485
The HAUT platform generates the following phenotypic outputs that were used in the 486
current epigenome -wide association studies: (1) aging -related phenotypes including 487
chronological age (self-reported age in years) and perceived eye age (AI-estimated apparent age 488
of the periorbital region based on comparison to age -labeled reference datasets); (2) structural 489
aging phenotypes including general lines score (overall facial wrinkle burden), fine lines score 490
(shallow wrinkles 1mm depth), sagging 491
.CC-BY-NC 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted March 16, 2026. ; https://doi.org/10.64898/2026.03.13.711642doi: bioRxiv preprint
score (gravitational descent of facial soft tissues and jowl formation), and lacrimal grooves score 492
(infraorbital hollowing and tear trough deformity); (3) pigmentation phenotypes including overall 493
pigmentation score (dyschromia and uneven skin tone) and sun spots score (discrete solar 494
lentigines and hyperpigmented macules); (4) inflammatory and skin quality phenotypes including 495
breakouts score (acneiform lesions, pustules, papules), irritation score (erythema, scaling, 496
roughness), and redness score (vascular prominence and telangiectasia); (5) textural phenotypes 497
including pores score (visible pore size and density) and puffiness score (periorbital edema); (6) 498
additional localized features including dark circles score (infraorbital hyperpigmentation and 499
shadowing); and (7) skin tone uniformness assessed globally (overall uniformness score) and 500
regionally (forehead, left cheek, right cheek uniformness scores measuring color heterogeneity 501
and blotchiness within each facial zone). 502
503
Quality Control and Best-Row Selection 504
Embedded quality control algorithms within the HAUT platform assigned composite quality scores 505
to each image based on the aforementioned technical criteria. For participants with multiple image 506
submissions, we selected a single "best-row" image per individual defined as the submission with 507
the highest composite quality score, thereby maximizing measurement reliability while avoiding 508
pseudoreplication from repeated measures. Images failing to meet minimum quality thresholds 509
were excluded from analysis. This quality control process resulted in the retention of 1,413 to 510
1,463 individuals with usable HAUT phenotype data depending on the specific trait (see Results 511
for details). 512
513
DNA Methylation Profiling and Preprocessing 514
515
Array-Based Methylation Measurement. Genomic DNA extracted from skin curette 516
samples was bisulfite-converted using the EZ DNA Methylation Kit (Zymo Research, Irvine, CA, 517
USA) according to the manufacturer's protocol. Bisulfite -converted DNA was hybridized to the 518
Illumina Infinium MethylationEPIC v2 BeadChip (Illumina, San Diego, CA, USA), which 519
interrogates DNA methylation at over 935,000 CpG sites across the human genome with 520
enhanced coverage of regulatory regions including gene promoters, CpG islands, enhancers, and 521
CTCF binding sites. Arrays were processed following the Illumina standard protocol, with samples 522
randomly distributed across multiple BeadChips (recorded as the "Beadchip" batch variable) to 523
mitigate batch effects. Fluorescence intensity data (IDAT files) were generated using the Illumina 524
iScan System. 525
Quality Control. Raw IDAT files were imported into R version 4.4.0 (R Core Team, 2024) 526
and processed using the minfi Bioconductor package (Aryee et al., 2014). Quality control was 527
performed using detection p -value filtering with the following criteria: (1) samples with mean 528
detection p-value >0.15 across all probes were excluded; (2) probes with mean detection p-value 529
>0.10 across all samples were removed (n = 19,078 probes excluded). Sex chromosome probes 530
were retained in the methylation feature matrix, as X - and Y-linked CpG methylation captures 531
biologically relevant sex-differential variation in skin biology (15,722 chrX probes, 46 chrY probes 532
retained). 533
Data Normalization. Normalization was performed using the ssNoob (single -sample 534
normal-exponential out -of-band) method implemented in the preprocessNoob function with 535
dyeMethod = "single" (Triche et al., 2013), which models background correction and dye -bias 536
normalization in a single unified framework using out -of-band probe intensities. The ssNoob 537
Method
performs within-array normalization, making it suitable for datasets where samples are 538
processed in multiple batches and providing robustness against batch effects. For epigenetic 539
clock prediction, missing values at required clock CpG sites were imputed using reference-based 540
imputation implemented in the SeSAMe package (Zhou et al., 2018). 541
.CC-BY-NC 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted March 16, 2026. ; https://doi.org/10.64898/2026.03.13.711642doi: bioRxiv preprint
Beta extraction. Following quality control and normalization, 911,580 CpG sites remained 542
for downstream epigenome -wide association analyses, including both autosomal and sex 543
chromosome probes. Methylation levels were represented as beta values (β = M / [M + U + 100], 544
where M and U represent methylated and unmethylated signal intensities, respectively), ranging 545
from 0 (completely unmethylated) to 1 (completely methylated). Genomic coordinates are 546
reported relative to the hg38 human genome reference build (EPICv2 custom array annotation 547
version 20a1). 548
CpG Annotation. CpG sites were annotated using the 549
IlluminaHumanMethylationEPICv2anno.20a1.hg38 R package, which provides gene annotations 550
(mapping to UCSC gene symbols via proximity to transcription start sites), genomic locations 551
(chromosome and position), and regulatory context (CpG island, shore, shelf, or open sea 552
classification). Intergenic CpG sites (those not mapping to annotated gene bodies or promoters 553
within 1.5 kb of transcription start sites) were labeled as "intergenic" throughout the Results. 554
Cell Type Deconvolution for Adjustment of Cellular Heterogeneity. Skin is a 555
heterogeneous tissue comprising multiple cell types with distinct DNA methylation profiles, 556
including keratinocytes, fibroblasts, melanocytes, endothelial cells, and resident immune cells. To 557
distinguish true within -cell methylation changes from artifacts arising due to inter -individual 558
differences in skin cell -type composition, we estimated cell -type proportions from the DNA 559
methylation data using a reference-based deconvolution approach. 560
Cell type fractions were estimated using the EpiSCORE algorithm (Teschendorff et al., 561
2017) with the skin -specific methylation reference matrix mrefSkin.m, as implemented in the 562
EpiSCORE R package version 1.0. Although EpiSCORE was originally developed for blood, we 563
applied the skin-specific reference matrix mrefSkin.m because our samples are derived from skin 564
tissue and the skin-specific reference captures the relevant cell-type heterogeneity in our dataset. 565
The mrefSkin.m reference was derived from purified cell populations isolated from human skin 566
tissue and includes methylation signatures for six cell types: fibroblasts (Fib, dermal cells 567
responsible for extracellular matrix production), differentiated keratinocytes (Kera_diff, terminally 568
differentiated epidermal barrier cells), undifferentiated keratinocytes (Kera_undiff, progenitor 569
keratinocytes in the basal layer), macrophages (Macro, tissue -resident immune cells), 570
melanocytes (Mela, pigment-producing cells), and T cells (Tcell, adaptive immune cells). 571
The EpiSCORE deconvolution procedure consists of two steps. First, CpG -level 572
methylation data (beta values) are aggregated to gene-level methylation summaries by averaging 573
CpG sites within 200 base pairs of annotated transcription start sites (TSS) using the 574
constAvBetaTSS function with the "850k" array type specification (compatible with EPICv2 575
annotation). This gene-level aggregation reduces technical noise and increases the robustness 576
of deconvolution by focusing on regulatory regions most likely to differ across cell types. Second, 577
the gene -level methylation matrix is input to the weighted robust partial correlation (wRPC) 578
function, which estimates cell type proportions by comparing the sample methylation profile to the 579
Reference
cell type signatures while down -weighting unreliable genes (useW = TRUE, weight 580
threshold wth = 0.4). This approach is robust to batch effects and outliers. The resulting cell type 581
fraction estimates (proportion of each cell type per sample, summing to 1 across the six cell types) 582
were merged with phenotypic data and included as covariates in all epigenome-wide association 583
models. 584
585
Development of skin predictors 586
We developed a two -stage framework to identify robust and biologically interpretable DNA 587
methylation features and to train and validate predictive models for multiple biomarker targets 588
using complementary machine learning architectures with systematic hyperparameter 589
optimization. The workflow was designed to capture complex, potentially non -linear and non -590
monotonic relationships between CpG methylation and disease -relevant phenotypes while 591
preserving directional interpretability for selected features. 592
.CC-BY-NC 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted March 16, 2026. ; https://doi.org/10.64898/2026.03.13.711642doi: bioRxiv preprint
To quantify feature importance with a conservative selection threshold, we implemented 593
a hybrid feature -importance procedure that integrates a noise -based baseline, repeated 594
ensemble-derived importance estimation, and post hoc directionality inference. A synthetic 595
Gaussian noise feature was generated using the empirical distribution of the input data and 596
appended to the feature matrix to establish a minimum importance “floor.” During model fitting, 597
this noise feature was treated identically to biological features, enabling prioritization of CpG sites 598
whose importance exceeded that of the noise feature and reducing retention of spurious 599
predictors. Feature importance was estimated using random forest regression under repeated 600
instantiation to reduce variance due to stochastic training effects and mitigate overfitting during 601
feature selection. Specifically, the random forest model was refitted 100 times using independent 602
random seeds and aggregated importance values were used to identify features that were 603
consistently informative across runs. Because tree-based importance measures do not preserve 604
the direction of feature effects, directional relationships for retained features were inferred using 605
multiple linear regression fit to the selected feature set, and the sign of each regression coefficient 606
was used to assign positive or negative directionality relative to the modeled target. This 607
combined approach supports global feature evaluation rather than pairwise screening, captures 608
non-linear and non-monotonic relationships through ensemble modeling, and improves biological 609
interpretability by providing signed effect estimates that clarify how each feature contributes to 610
age estimation, disease prediction, or biomarker quantification. 611
To identify robust predictive signatures in DNA methylation data, we implemented a 612
comprehensive model development and validation framework that evaluates multiple regression 613
and classification algorithms under rigorous hyperparameter optimization. For each target, four 614
model families were trained and compared: ElasticNet, random forest regression, LightGBM 615
gradient boosting, and a multi -layer perceptron. This set was chosen to span regularized linear 616
models, bagged tree ensembles, gradient -boosted trees, and neural networks, reflecting prior 617
observations that optimal modeling architectures can vary across biomarkers as a function of 618
underlying data dispersion and distributional characteristics. 619
All models underwent systematic hyperparameter tuning using the Optuna framework to 620
efficiently navigate complex search spaces while limiting overfitting. ElasticNet models were 621
optimized across the regularization strength and L1 ratio to identify sparse yet stable methylation 622
signatures. Random forest regressors were tuned across the number of estimators, maximum 623
depth, minimum samples required to split, and feature selection settings. LightGBM models were 624
tuned across learning rate, number of leaves, tree depth, feature and bagging fractions, and 625
regularization parameters. The neural network architecture was optimized over the number of 626
hidden layers, nodes per layer, activation function, learning rate, and batch size. 627
628
Model performance was evaluated using a standard hold -out validation strategy with a random 629
80/20 train–test split. Models were trained on unadjusted HAUT phenotype scores to preserve 630
the full range of biologically meaningful variation, including sex -dependent and age -dependent 631
differences in skin aging; biological sex and chronological age were instead included as 632
covariates in all downstream association and EWAS analyses to isolate exposure - and trait -633
specific effects. Predictive performance was quantified using mean absolute error to assess 634
absolute accuracy and Spearman correlation to assess rank preservation between predicted and 635
observed values. Final model selection was based on minimizing a composite objective in which 636
absolute error was penalized based on Spearman’s rho, thereby balancing accuracy with 637
correlation strength and prioritizing models that were both accurate and rank -consistent for the 638
outcome of interest. 639
640
Epigenome-Wide Association Studies: Statistical Framework 641
642
.CC-BY-NC 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted March 16, 2026. ; https://doi.org/10.64898/2026.03.13.711642doi: bioRxiv preprint
Data Preparation and Transformation. For each of the 19 HAUT -derived phenotypes, samples 643
with missing phenotype values were excluded from the analysis. The remaining samples were 644
subset from the full methylation matrix, and beta values were logit-transformed to M-values using 645
the formula M = log₂(β / [1 - β]) to better satisfy the assumptions of linear regression, as M-values 646
are more homoscedastic (constant variance) across the methylation range compared to beta 647
values (Du et al., 2010). CpG sites with any infinite or missing M-values (which can occur if beta 648
values are exactly 0 or 1 due to technical artifacts) were excluded from the analysis. 649
650
Differential Methylation Analysis. Differential methylation analysis was performed using the limma 651
package version 3.62.1 (Ritchie et al., 2015), which implements linear models for each CpG site 652
with empirical Bayes moderation of standard errors to improve statistical power, particularly for 653
small to moderate sample sizes. For each phenotype, we fit the following linear model at each 654
CpG site: 655
656
M value ~ phenotype + Beadchip + Fib + Kera_diff + Kera_undiff + Macro + Mela 657
658
where M represents the M-value (logit-transformed methylation: log₂[β/(1-β)]) at a given CpG site; 659
phenotype is the continuous HAUT -derived score (0 –100 scale), representing the primary 660
independent variable of interest; Beadchip is a categorical variable (factor) representing the 661
Illumina BeadChip identifier on which the sample was processed, included to adjust for technical 662
batch effects; and EC, Fib, Kera_diff, Kera_undiff, Macro, and Mela are continuous numeric 663
variables representing the estimated cell type fractions (proportions) for endothelial cells, 664
fibroblasts, differentiated keratinocytes, undifferentiated keratinocytes, macrophages, and 665
melanocytes, respectively, included to adjust for confounding due to inter -individual variation in 666
cellular composition. All models used treatment (dummy) coding for categorical variables, as 667
specified by options(contrasts = c("contr.treatment", "contr.treatment")) in R. Note: Tcell (T cells) 668
was initially included but was non-estimable across all phenotypes due to perfect collinearity with 669
other cell type fractions and was automatically dropped by the model fitting algorithm. 670
The linear model was fit using the `lmFit` function, which efficiently estimates regression 671
coefficients across all CpG sites simultaneously via matrix operations. Empirical Bayes shrinkage 672
of residual variances was then applied using the `eBayes` function, which borrows information 673
across CpG sites to stabilize variance estimates and compute moderated t-statistics. Differential 674
methylation statistics for the phenotype coefficient (logFC, t -statistic, unadjusted P-value) were 675
extracted using the `topTable` function with `coef = "phenotype"`. 676
677
Functional Enrichment analysis. To identify biological pathways associated with differentially 678
methylated positions, we performed functional enrichment analysis using three complementary 679
approaches. Gene Ontology (GO) enrichment analysis (Biological Process, Molecular Function, 680
and Cellular Component) was conducted using clusterProfiler (v4.0) with gene -centric mapping 681
of CpG sites to their nearest genes. KEGG pathway enrichment was performed using 682
clusterProfiler with the same gene mapping strategy. MSigDB Hallmark gene set enrichment was 683
conducted using missMethyl (v1.26.0), which accounts for the varying number of CpG probes per 684
gene on the Illumina EPIC array. All enrichment analyses were performed independently for each 685
of the 19 facial aging traits using differentially methylated positions at FDR < 0.05 as input. 686
Enriched pathways were considered significant at FDR < 0.05 after Benjamini -Hochberg 687
correction for multiple testing. 688
689
Interpretation of Effect Sizes. The logFC (log fold -change) coefficient from the limma model 690
represents the estimated change in M -value per unit increase in the phenotype. For continuous 691
HAUT scores (0-100 scale), logFC represents the M-value change per 1-point increase in score 692
(e.g., from score 50 to score 51). For age (years), logFC represents the M -value change per 1-693
.CC-BY-NC 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted March 16, 2026. ; https://doi.org/10.64898/2026.03.13.711642doi: bioRxiv preprint
year increase in age. Positive logFC values indicate hypermethylation (increased methylation 694
associated with higher phenotype values), while negative logFC values indicate hypomethylation 695
(decreased methylation associated with higher phenotype values). Because HAUT scores use an 696
inverse convention for many traits (higher scores = better skin quality = less severe phenotype), 697
the biological interpretation of effect direction must account for this scoring system (see Results 698
for detailed interpretation). 699
700
Multiple Testing Correction. To account for multiple hypothesis testing across 911,580 CpG sites, 701
we applied the Benjamini -Hochberg false discovery rate (FDR) correction using the 702
`adjust.method = "fdr"` option in limma's `topTable` function. The FDR -adjusted P -values 703
(adj.P.Val) control the expected proportion of false positives among all CpG sites declared 704
significant. We used FDR < 0.05 as the primary significance threshold throughout this study, 705
meaning that among all CpG sites called significant, we expect fewer than 5% to be false positives 706
on average. Secondary significance thresholds of FDR < 0.10 and FDR < 0.01 were also 707
computed for sensitivity analyses. 708
709
Genomic Inflation Assessment. For each phenotype, we computed the genomic inflation factor 710
(lambda, λ) as the ratio of the median observed chi-squared test statistic to the expected median 711
under the null hypothesis (0.456), based on the conversion of P-values to chi-squared statistics. 712
Lambda values substantially greater than 1.0 can indicate either genuine epigenome-wide signal 713
(many true associations distributed across the genome) or systematic bias (e.g., residual 714
population stratification, uncorrected batch effects, or incomplete adjustment for confounders). 715
We examined quantile -quantile (QQ) plots of observed versus expected P -values for each 716
phenotype to distinguish between these scenarios: genuine polygenic signal manifests as 717
deviation from the null distribution primarily in the upper tail (most significant CpGs), while 718
systematic bias produces uniform inflation across the entire P-value distribution. 719
720
Cross-Trait Pleiotropy Analysis. To assess the extent of shared epigenetic architecture across 721
skin aging phenotypes, we identified all unique CpG sites reaching significance (FDR < 0.05) in 722
at least one phenotype. For each such CpG site, we counted the number of traits in which it was 723
significant, yielding a "trait count" ranging from 1 (trait -specific) up to 19 (significant in all 724
phenotypes). CpG sites significant in 10 or more traits were classified as "highly pleiotropic" and 725
were examined for evidence of directional consistency (i.e., whether the sign of the effect was 726
consistent across the traits in which the CpG was significant). Pairwise overlap between 727
phenotypes was quantified using the Jaccard similarity index, defined as the size of the 728
intersection (CpGs significant in both traits) divided by the size of the union (CpGs significant in 729
either trait), ranging from 0 (no overlap) to 1 (perfect overlap). 730
731
Visualization. QQ plots, Manhattan plots, and volcano plots were generated for each phenotype 732
using the ggplot2 package version 3.5.1. Manhattan plots display -log₁₀(P-value) for each CpG 733
site on the y -axis against genomic position (chromosome and position) on the x -axis, with 734
horizontal lines indicating genome -wide significance thresholds (FDR < 0.05). Volcano plots 735
display -log₁₀(P-value) on the y -axis against logFC on the x -axis, highlighting both statistical 736
significance and magnitude of effect. QQ plots display observed -log₁₀(P-values) against 737
expected values under the null hypothesis to assess calibration and identify inflation. All plots 738
were saved as high-resolution JPG images. 739
740
Software and Package Versions 741
All analyses were conducted in R version 4.4.0 (2024-04-24, "Puppy Cup"). Key R packages and 742
versions used include: 743
744
.CC-BY-NC 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted March 16, 2026. ; https://doi.org/10.64898/2026.03.13.711642doi: bioRxiv preprint
1. Methylation data processing: 745
a. SeSAMe (version 1.20.0): Preprocessing, quality control, normalization (ssNoob 746
method) 747
b. IlluminaHumanMethylationEPICv2manifest (version 0.99.0): Array manifest 748
c. IlluminaHumanMethylationEPICv2anno.20a1.hg38 (version 0.99.0): CpG 749
annotation 750
2. Cell type deconvolution: 751
a. EpiSCORE (version 1.0.0): Reference-based deconvolution with mrefSkin.m 752
3. Differential methylation analysis: 753
a. limma (version 3.58.1): Linear models and empirical Bayes moderation 754
b. DMRcate (version 3.0.0): Differentially methylated region identification (planned 755
for future analysis) 756
4. Data manipulation and visualization: 757
a. data.table (version 1.15.4): Efficient data import and manipulation 758
b. dplyr (version 1.1.4): Data wrangling 759
c. tidyr (version 1.3.1): Data tidying 760
d. ggplot2 (version 3.5.1): Publication -quality graphicsggrepel (version 0.9.5): Non -761
overlapping text labels in plots 762
763
Scripts implementing these analyses are available from the authors upon request. 764
.CC-BY-NC 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted March 16, 2026. ; https://doi.org/10.64898/2026.03.13.711642doi: bioRxiv preprint
Acknowledgements. We would like to thank Chatlin Helm and Razelle Ann Piculados for their 765
valuable help with data management. We would also like to thank the many volunteers who 766
supported this study by providing skin samples which were used for the development of 767
predictors. 768
Data Availability. Due to privacy considerations and institutional review board restrictions, 769
individual-level methylation and phenotype data cannot be publicly shared. The HAUT Face 770
Analysis 3.0 software is proprietary and available under commercial license from Haut.AI 771
(https://haut.ai). Raw methylation array data (Illumina EPIC v2) will be available upon reasonable 772
request. The crosswalk linking anonymous subject identifiers to internal sample IDs is available 773
from the corresponding author under a data use agreement (DUA) to protect participant privacy. 774
Additional data supporting the findings of this study are available from the corresponding author 775
upon reasonable request. Full supplementary data and information is available on Zenodo at 776
https://zenodo.org/records/19006544. 777
Code Availability. Analysis code for the EWAS, cross -trait pleiotropy analysis, lifestyle 778
association analyses, and figure generation will be deposited on Zenodo (779
https://zenodo.org/records/19006544) upon acceptance and made publicly available at that time. 780
Author Contributions. Design and conceptualization: VBD, RS [Ryan Smith], LP. Methylation 781
data generation and processing: VBD. Statistical analyses: VBD, SH. Predictive model 782
development: SH, VBD. Clinical data generation and sample provision: RS [Sivamani], JG. Facial 783
imaging data generation and processing: AG, KK. Sample collection and processing: DS, JW. 784
Scientific discussion and interpretation: GMF, YC, KS. Manuscript drafting: VBD, SH. All authors 785
reviewed, edited, and approved the final manuscript. 786
Competing Interests. VBD, SH, KS, DS, JW, and RMS are employees of TruDiagnostic Inc. AG 787
and KK are employees of Haut.AI. LP is an employee of Olaplex Inc. GMF, YC, JG, and RS 788
declare no competing interests. 789
Consent for Publication. Facial images were collected for automated phenotyping via HAUT.ai 790
but are not reproduced in this article. All participants provided informed consent for the use of 791
their biological samples and imaging data in research. No identifiable participant data appear in 792
this publication. 793
794
795
.CC-BY-NC 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted March 16, 2026. ; https://doi.org/10.64898/2026.03.13.711642doi: bioRxiv preprint
TABLE 796
797
Table 1 | Performance of V2 Models for Skin Trait Prediction. Summary of the final selected 798
model for each of the 21 predictors, sorted by descending Spearman R correlation. MAE (Mean 799
Absolute Error), best Model Type (LightGBM or RandomForest), and sample sizes (total, train, 800
and test) are shown for each predictor. 801
802
803
Predictor Model Type MAE Spearman rho) Total Samples Train Samples Test Samples
age_age_se LightGBM 5.3 0.836 706 564 142
age_age LightGBM 5.14 0.819 706 564 142
eyes_age_eyes_age LightGBM 5.37 0.765 706 564 142
irritation_score LightGBM 6.8 0.738 584 467 117
uniformness_left_cheek_score LightGBM 10.31 0.709 679 543 136
redness_score LightGBM 9.55 0.705 584 467 117
uniformness_score LightGBM 8.18 0.682 706 564 142
pores_score LightGBM 5.7 0.681 706 564 142
uniformness_right_cheek_score LightGBM 9.81 0.645 678 542 136
lacrimal_grooves_score LightGBM 12.72 0.625 706 564 142
uniformness_forehead_score LightGBM 10.27 0.582 691 552 139
sun_spots_score LightGBM 6.09 0.566 627 501 126
puffiness_score RandomForest 8.65 0.563 706 564 142
pigmentation_score RandomForest 7.87 0.551 627 501 126
fine_lines_score LightGBM 11.52 0.545 508 406 102
sagging_score RandomForest 17.42 0.537 508 406 102
lines_score LightGBM 12.07 0.494 508 406 102
quality_front_score RandomForest 11.6 0.478 706 564 142
breakouts_score LightGBM 5.19 0.471 706 564 142
deep_lines_score LightGBM 13.69 0.44 508 406 102
dark_circles_score LightGBM 9.58 0.416 706 564 142
.CC-BY-NC 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted March 16, 2026. ; https://doi.org/10.64898/2026.03.13.711642doi: bioRxiv preprint
Figures 804
805
806
807
Figure 1 | Study design and analytic workflow. Facial photographs were processed with HAUT 808
AI to derive 19 phenotypes (1,463 submissions; n = 1,413 individuals after QC). Facial skin 809
methylation was profiled on the Illumina EPICv2 array (n = 744; 911,580 CpGs after QC). A 810
matched subset (n = 706) was used for two parallel analyses: (i) cell -type-adjusted EWAS of 811
CpG–phenotype associations (EpiSCORE; 6 cell types) and (ii) EpiVision model development 812
using an 80:20 train –test split, evaluated by MAE and Spearman ρ. EpiVision comprises 21 813
predictors (3 age-related, 18 phenotype-specific). A survey-linked subset (n = 282; 33 exposures) 814
was used for association testing with both unadjusted and age/sex -adjusted models. Sample 815
sizes, algorithm selection, and model details are provided in Supplementary Files 1 and 2. 816
817
818
819
QC/Normalization
.CC-BY-NC 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted March 16, 2026. ; https://doi.org/10.64898/2026.03.13.711642doi: bioRxiv preprint
820
821
Figure 2 | HAUT imaging phenotypes in the full cohort and demographic correlations. (A) 822
Distributions of HAUT-derived phenotype scores across the full imaging cohort (blue; n = 1,413) 823
and the subset with paired DNA methylation data (orange; n = 706). Violin plots show score 824
distributions; boxplots indicate Q1 –Q3 with median, whiskers denote min –max, and diamonds 825
denote the mean. (B) Chronological age distribution for participants with age metadata (n = 247; 826
mean 40.3 ± 16.3 years). (C) Fitzpatrick skin type composition for participants with available data 827
(n = 481), with counts shown above bars. ( D) Pearson correlations between chronological age 828
and HAUT phenotypes (n = 191 –246 depending on phenotype). ( E) Spearman correlations 829
between Fitzpatrick skin type and HAUT phenotypes (n = 364 –468). Bar direction and color 830
indicate positive and negative correlations; values are labeled on each bar. 831
832
833
.CC-BY-NC 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted March 16, 2026. ; https://doi.org/10.64898/2026.03.13.711642doi: bioRxiv preprint
834
835
Figure 3 | Model performance using EPICv2 -only features versus the V1 –V2 overlapping 836
CpG set. Performance is shown for 20 targets (chronological age and 19 HAUT phenotypes) 837
trained with identical modeling procedures but different CpG feature spaces. (A) Mean absolute 838
error (MAE; lower is better) for each target. Dashed lines indicate across -target means (V1–V2 839
overlap: 9.83; V2 -only: 9.38). (B) Spearman correlation (ρ; higher is better) for each target. 840
Dashed lines indicate across-target means (V1–V2 overlap: 0.546; V2-only: 0.601). (C) Percent 841
change in performance for V2-only relative to V1–V2 overlap (negative MAE change and positive 842
Spearman change indicate improvement). (D) Summary of meaningful change (≥5% threshold): 843
9/20 targets (45%) showed meaningful MAE improvement with no meaningful degradation in any 844
target; 8/20 targets (40%) showed meaningful Spearman improvement, with 4/20 (20%) showing 845
meaningful degradation. 846
847
.CC-BY-NC 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted March 16, 2026. ; https://doi.org/10.64898/2026.03.13.711642doi: bioRxiv preprint
848
849
Figure 4 | Lifestyle, clinical, and sun -exposure correlates of EpiVision predictors. (A,B) 850
Forest plots showing standardized effect sizes (β) and 95% confidence intervals for associations 851
between self-reported exposures and EpiVision predictors in the survey-linked subset (n = 282). 852
Blue circles denote unadjusted estimates; orange squares denote covariate -adjusted estimates 853
(age/sex-adjusted for phenotype predictors; epigenetic age acceleration for age -related 854
predictors). Sample sizes for each test are shown as n (raw → adjusted), reflecting differences 855
due to covariate availability. (A) Lifestyle correlates. (B) Clinical and medication correlates. 856
Significance asterisks are based on the adjusted model (*P < 0.05, **P < 0.01, ***P < 0.001). (C–857
E) Paired comparisons of EpiVision age predictions between sun-protected and sun-exposed skin 858
from the same individuals (n = 20). Grey lines connect within-participant measurements. Boxplots 859
show median, interquartile range, and 1.5×IQR whiskers. (C) Eyes Age. (D) Perceived Age. (E) 860
Sun-exposure-informed Perceived Age (SE). Cohen's d_z and P values from paired t -tests are 861
shown above each panel. 862
.CC-BY-NC 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted March 16, 2026. ; https://doi.org/10.64898/2026.03.13.711642doi: bioRxiv preprint
Supplementary Figure 863
864
865
866
Supplementary FIGURE S1 | Skin cell -type composition estimates across samples. 867
Boxplots summarize EpiSCORE-inferred skin cell-type fractions across 744 facial skin samples, 868
including endothelial cells (EC), fibroblasts (Fib), differentiated keratinocytes (Kera_diff), 869
undifferentiated keratinocytes (Kera_undiff), macrophages (Macro), melanocytes (Mela), and T 870
cells (Tcell). Fractions are expressed as percentages summing to ~100% per sample. 871
Differentiated keratinocytes comprised the largest component (median 32.82%), followed by 872
endothelial cells (26.01%) and undifferentiated keratinocytes (21.39%). Macrophage fractions 873
were variable (median 13.53%), whereas T cells were lower on average (median 0.83%). 874
Fibroblast (median 0.00%) and melanocyte (median 0.00%) fractions were generally minimal 875
across samples. Center lines indicate the median, boxes denote the interquartile range (Q1–Q3), 876
whiskers extend to 1.5×IQR, and points represent outliers. The predominance of keratinocytes 877
with a notable endothelial contribution is consistent with superficial, epidermis-enriched sampling, 878
and these estimates were included as covariates in downstream methylation analyses to account 879
for cellular heterogeneity. 880
881
.CC-BY-NC 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted March 16, 2026. ; https://doi.org/10.64898/2026.03.13.711642doi: bioRxiv preprint
Supplementary File Legends 882
883
SUPPLEMENTARY FILE 1 | Cohort characteristics and demographic composition. 884
Summary of sample collection, data availability, and key demographics for the HAUT imaging 885
cohort, the Illumina EPIC v2 methylation cohort, the survey -linked subset, and the matched 886
HAUT–methylation subset used for model development and downstream analyses. Age statistics 887
are reported for participants with available metadata. Fitzpatrick skin type counts are shown for 888
the methylation cohort; “not reported” indicates missing Fitzpatrick data. ᵃSample size varies 889
across the 21 EpiVision predictors due to phenotype -specific completeness and covariate 890
availability. 891
892
SUPPLEMENTARY FILE 2 | Complete performance metrics for all EPICv2 (V2 -Only) and 893
V1–V2 overlap predictive models across skin trait targets. All metrics are reported on the 894
held-out test set (20% of total samples). ΔMAE and ΔSpearman R reflect the percent change of 895
the EPICv2 model relative to the V1–V2 overlap model (negative ΔMAE indicates improvement; 896
positive ΔSpearman R indicates improvement). Targets are sorted alphabetically. Probe counts: 897
EPICv2 probe set uses all stable EPICv2 probes; V1–V2 overlap probe set is restricted to probes 898
present on both EPICv1 and EPICv2 arrays. 899
900
SUPPLEMENTARY FILE 3 | Association results between self-reported lifestyle, clinical, and 901
medication exposures and EpiVision methylation-based facial aging predictors. Results are 902
reported for 33 exposures tested against 21 EpiVision predictors in the survey-linked subset (n = 903
282) across four models: unadjusted raw predicted age (Age – Raw); epigenetic age acceleration 904
(Age – EAA), in which predicted age is residualized on chronological age to isolate biological 905
aging effects independent of time; unadjusted phenotype scores (Phenotype – Raw); and 906
age/sex-adjusted phenotype scores (Phenotype – Model 2), restricted to participants ≥16 years. 907
Exposures include binary clinical and medication variables (e.g., retinoid use, hormone therapy, 908
spironolactone, testosterone) and ordinal lifestyle variables (e.g., sleep quality, diet quality, water 909
intake, tobacco status). Standardized beta coefficients are reported throughout to enable cross -910
exposure comparability. FDR correction applied within adjusted models using the Benjamini –911
Hochberg method. Associations highlighted in the main text reach p 1; associations with n_exposed = 1 are considered exploratory. 913
914
SUPPLEMENTARY FILE 4 | Paired comparison of HAUT -derived skin phenotype scores 915
between sun-protected and sun-exposed skin sites. Results from paired t-tests comparing 21 916
facial aging and skin quality phenotypes between sun -protected buttock skin and sun -exposed 917
forearm skin collected from the same individuals (n = 20 per trait). Traits are sorted by ascending 918
p-value. Mean Difference reflects forearm minus buttock values; negative values indicate lower 919
scores in sun-exposed skin (i.e., worse phenotype given HAUT's scoring convention where higher 920
= better quality), and positive values indicate higher scores in sun -exposed skin. Cohen's d_z is 921
the within-subject standardized mean difference, calculated as the mean paired difference divided 922
by the standard deviation of the paired differences, and reflects effect size for paired designs. The 923
Panel column corresponds to figure panel labels in the associated supplementary figure. 924
Statistical significance threshold: p < 0.05 (two -sided paired t -test, no correction for multiple 925
comparisons across traits). 926
927
SUPPLEMENTARY FILE 5 | Epigenome-Wide Association Study Results for 19 Skin Aging 928
and Appearance Traits. N, sample size; Lambda, genomic inflation factor; DMPs, differentially 929
methylated positions at FDR < 0.05; % Sig, percentage of tested CpGs (911,580 total) reaching 930
significance; % Hyper, percentage of significant DMPs showing hypermethylation (positive effect); 931
Top CpG, most significant DMP by P-value; logFC, log fold-change in M-values per unit increase 932
.CC-BY-NC 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted March 16, 2026. ; https://doi.org/10.64898/2026.03.13.711642doi: bioRxiv preprint
in phenotype. All models adjusted for technical batch (Illumina BeadChip) and six skin cell type 933
fractions (fibroblasts, differentiated keratinocytes, undifferentiated keratinocytes, macrophages, 934
melanocytes, T cells) estimated using EpiSCORE with skin-specific reference. 935
936
SUPPLEMENTARY FILE 6 | Functional enrichment analyses for differentially methylated 937
positions associated with 19 facial aging and skin quality traits. Pathway enrichment was 938
performed independently for each trait using genes mapped to significant DMPs (FDR < 0.05) 939
identified in the covariate-adjusted EWAS, across four complementary databases: Gene Ontology 940
Biological Process (GO_BP), Molecular Function (GO_MF), and Cellular Component (GO_CC); 941
KEGG Pathways; and MSigDB Hallmark gene sets. In total, 28,583 enriched pathways were 942
identified across all 19 traits, sorted by trait and ascending p-value. Significance threshold: FDR 943
< 0.05. 944
.CC-BY-NC 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted March 16, 2026. ; https://doi.org/10.64898/2026.03.13.711642doi: bioRxiv preprint
References
945
1. Bobrov, E. et al. PhotoAgeClock: deep learning algorithms for development of non -946
invasive visual biomarkers of aging. Aging 10, 3249–3259 (2018). 947
2. Levine, M. E. et al. An epigenetic biomarker of aging for lifespan and healthspan. Aging 948
10, 573–591 (2018). 949
3. Lu, A. T. et al. DNA methylation GrimAge strongly predicts lifespan and healthspan. Aging 950
11, 303–327 (2019). 951
4. Lu, A. T. et al. DNA methylation GrimAge version 2. Aging 14, 9484–9549 (2022). 952
5. Chen, Q. et al. OMICmAge: An integrative multi-omics approach to quantify biological age 953
with electronic medical records. bioRxiv 2023.10.16.562114 (2023) 954
doi:10.1101/2023.10.16.562114. 955
6. Boroni, M. et al. Highly accurate skin-specific methylome analysis algorithm as a platform 956
to screen and validate therapeutics for healthy aging. Clin. Epigenetics 12, 105 (2020). 957
7. Bienkowska, A. et al. Development of an epigenetic clock to predict visual age progression 958
of human skin. Front. Aging 4, 1258183 (2024). 959
8. Menendez Vazquez, A. et al. Epigenetic age predictors for non -invasive assessment of 960
human skin. Npj Aging 12, 11 (2025). 961
9. Zonari, A. et al. Senotherapeutic peptide treatment reduces biological age and 962
senescence burden in human skin models. NPJ Aging 9, 10 (2023). 963
10. Horvath, S. DNA methylation age of human tissues and cell types. Genome Biol. 14, 3156 964
(2013). 965
11. Horvath, S. et al. Epigenetic clock for skin and blood cells applied to Hutchinson Gilford 966
Progeria Syndrome and ex vivo studies. Aging 10, 1758–1775 (2018). 967
12. de Lima Camillo, L. P., Lapierre, L. R. & Singh, R. A pan -tissue DNA -methylation 968
epigenetic clock based on deep learning. Npj Aging 8, 4 (2022). 969
13. Ndhlovu, L. C. et al. Retro-age: A unique epigenetic biomarker of aging captured by DNA 970
methylation states of retroelements. Aging Cell 23, e14288 (2024). 971
14. Boroni, M. et al. Highly accurate skin-specific methylome analysis algorithm as a platform 972
to screen and validate therapeutics for healthy aging. Clin. Epigenetics 12, 105 (2020). 973
15. Menendez Vazquez, A. et al. Epigenetic age predictors for non -invasive assessment of 974
human skin. Npj Aging 12, 11 (2025). 975
16. Christensen, K. et al. Perceived age as clinically useful biomarker of ageing: cohort study. 976
BMJ 339, b5262 (2009). 977
17. Gunn, D. A., Larsen, L. A., Lall, J. S., Rexbye, H. & Christensen, K. Mortality is Written on 978
the Face. J. Gerontol. A. Biol. Sci. Med. Sci. 71, 72–77 (2016). 979
18. Sachs, D. L. et al. Atrophic and hypertrophic photoaging: Clinical, histologic, and 980
molecular features of 2 distinct phenotypes of photoaged skin. J. Am. Acad. Dermatol. 81, 480–981
488 (2019). 982
19. Langton, A. K. et al. Distinctive clinical and histological characteristics of atrophic and 983
hypertrophic facial photoageing. J. Eur. Acad. Dermatol. Venereol. 35, 762–768 (2021). 984
20. Garagnani, P. et al. Methylation of ELOVL2 gene as a new epigenetic marker of age. 985
Aging Cell 11, 1132–1134 (2012). 986
21. Bienkowska, A. et al. Development of an epigenetic clock to predict visual age progression 987
of human skin. Front. Aging 4, 1258183 (2024). 988
22. Boroni, M. et al. Highly accurate skin-specific methylome analysis algorithm as a platform 989
to screen and validate therapeutics for healthy aging. Clin. Epigenetics 12, 105 (2020). 990
23. Zonari, A. et al. Senotherapeutic peptide treatment reduces biological age and 991
senescence burden in human skin models. NPJ Aging 9, 10 (2023). 992
24. Banila, C. et al. A noninvasive method for whole-genome skin methylome profiling. Br. J. 993
Dermatol. 189, 750–759 (2023). 994
.CC-BY-NC 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted March 16, 2026. ; https://doi.org/10.64898/2026.03.13.711642doi: bioRxiv preprint
25. Menendez Vazquez, A. et al. Epigenetic age predictors for non -invasive assessment of 995
human skin. Npj Aging 12, 11 (2025). 996
.CC-BY-NC 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted March 16, 2026. ; https://doi.org/10.64898/2026.03.13.711642doi: bioRxiv preprint
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.