Comparing tangible retinal image characteristics with deep learning features reveals their complementarity for gene association and disease prediction | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Article Comparing tangible retinal image characteristics with deep learning features reveals their complementarity for gene association and disease prediction David Presby, Michael Beyeler, Olga Trofimova, Dennis Bontempi, and 6 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-6130721/v1 This work is licensed under a CC BY 4.0 License Status: Under Review Version 1 posted You are reading this latest preprint version Abstract Advances in AI, including deep learning (DL), are transforming medical image analysis by enabling automated disease risk predictions. However, DL's outputs and latent space representations often lack interpretability, impeding clinical trust and biological insight. In this study, we evaluated RETFound , a foundation model for retinal images, by comparing its predictive performance and genetic associations to those obtained using clinically interpretable traditional image features (TIFs). Our findings revealed that RETFound’ s individual latent space variables poorly represent most TIFs but typically achieve higher accuracy when combined linearly. Fine-tuning RETFound to predict TIFs provided better, but far from perfect surrogates, highlighting RETFound’s limitations to fully characterise the retinal vasculature. We also find that RETFound’ s latent space variables have many genetic associations, though there was minimal overlap between the significant genes identified from measured or predicted TIFs. Notably, predicted TIFs demonstrated greater heritability and excelled in ocular disease prediction as compared to their measured TIF counterparts. Comparing the predictive capacity of RETFound compared to TIFs, RETFound ’s features carry more predictive value for diabetes and ocular diseases but the best models for predicting blood pressure and body mass index are those that combine tangible and deep features. Overall, these findings indicate that manually derived image features can complement foundation models, enhancing their interpretability and predictive capability. This study highlights the synergistic potential of integrating deep learning with classical feature extraction, advancing our understanding of retinal biology and disease mechanisms, and paving the way toward improved diagnostic and prognostic tools in ophthalmology. Biological sciences/Genetics/Genetic association study/Genome-wide association studies Health sciences/Medical research/Genetics research Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Figure 6 Introduction Medical imaging has become essential for diagnosing diseases and providing presymptomatic risk prognoses. The use of computer-aided analysis, particularly through deep learning (DL), has made these processes not only more accurate but also scalable to large datasets. Recently, foundation models [ 1 ] have emerged as a powerful DL approach generating initial models pre-trained on extensive image datasets using self-supervised learning. Such models can then be applied across a wide range of use cases as they provide a robust and efficient latent space that captures image variability in a low-dimensional form. The basic idea is to use the “encoder” of the model that transforms an image into latent variables (LVs), which are then used as features for disease identification or prediction using a relatively simple model architecture. In some cases, the training procedure also involves “fine-tuning” the weights of the encoder, effectively co-adapting the latent space to provide the most pertinent features for the end-point prediction. While there are ample examples showing the power of using foundation models [ 1 – 3 ], they also have several disadvantages: First, the LVs are typically difficult to interpret. Although there exist techniques to elucidate which regions of an image most strongly contribute to variation in a given LV, it is generally not obvious what exact properties of the image in these regions are captured. Second, the number of LVs is usually relatively large, typically between 2 7 and 2 10 , and their potential correlation might further complicate interpretability. Finally, there is no guarantee that LVs that efficiently capture the information needed to reconstruct medical images also include potentially very subtle signals that are most relevant for certain diseases. In contrast to DL approaches, clinicians and researchers have been examining images for decades to determine tangible image features (TIFs), such as the size or shape of previously identified anatomical objects, and for many of those there is well-established medical evidence showing their relevance for certain diseases. Recently, automated methods have been increasingly utilized for TIF extraction from medical images, facilitating large-scale analysis and enhancing diagnostic capabilities [ 4 – 10 ]. Among various imaging modalities, colour fundus images (CFIs) have emerged as a particularly valuable tool due to their noninvasive nature and ability to predict a wide range of diseases. CFIs provide a cost-effective, in vivo assessment of the superficial inner retinal layer, making them instrumental in monitoring ocular diseases such as diabetic retinopathy, macular degeneration, and glaucoma. Furthermore, they enable early risk prediction for systemic conditions, including cardiovascular and cerebrovascular disease, chronic kidney disease, and diabetes [ 11 ]. Of note, we recently published a study of 17 different morphological vascular phenotypes extracted from over 130k fundus images of close to 72k UK Biobank subjects. We found that a substantial portion of the variability in these measured TIFs (mTIFs)—features of longstanding interest to ophthalmologists, including median vessel diameter, diameter variability, main temporal angles, vascular density, central retinal equivalents, the number of bifurcations, and tortuosity—can be attributed to genetic factors, with SNP-based heritability estimates ranging from 5–25%. Moreover, we observed a large number of genetic association signals with genes and pathways that appear plausible to be involved in modulating the mTIFs, and many significant correlations with both ocular and systemic diseases or their risk factors [ 10 ]. However, mTIFs are inherently limited by human-driven feature selection, which dictates what is considered relevant in disease. Additionally, mTIFs may also suffer from poor generalizability in automated settings, as mathematical “one-size-fits-all” heuristics are often employed for feature extraction. Here we investigate how mTIFs and DL-based features compare against and complement each other in the context of genetics and disease by leveraging RETFound [ 2 ], a recently developed transformer-based [ 12 ] foundation model for retinal images. To better understand the potential limitations of the information embedded in RETFound’s latent space representation, we examine how well it can predict retinal vascular mTIFs. This exploration is driven by the fact that, unlike medical endpoints, mTIFs are entirely determined by the image itself. We utilize genomic analysis to assess the underlying genetic architectures as they can inform upon what physiological aspect an LV represents within an image. To determine if TIFs outperform or complement foundation models in prediction tasks, we build models that include TIFs and LVs and assess their ability to predict disease. Lastly, as classical retinal features are established using heuristic approaches that may fail to generalize, we use RETFound’s prediction of mTIFs to derive “deep” TIFs (dTIFs), and compare their genetic architectures as well as disease predictive capacities (see Fig. 1 for a graphical overview of our study). In summary, using RETFound and CFIs as an example, our study sheds light on (i) whether foundation models can detect known image features, (ii) whether their latent variables can gain interpretability using genomic analyses, and (iii) whether mTIFs and deep features complement in regards to their genetic signals and disease prediction. Results Phenotypic data generation in terms of vascular TIFs and LVs Using the methodology described in our previous work [ 9 ], we measured 17 TIFs from CFIs of 41 527 genotyped participants from the UK Biobank. Each CFI was also characterised in terms of 1024 LVs extracted with RETFound prior to any fine-tuning, using the weights of the network provided by the authors [ 2 ] (see Methods for details). Modelling vascular TIFs with RETFound By design, RETFound’ s LVs capture much of the information contained in CFIs, not limited to but including the retinal vasculature. Thus, in principle, these LVs should also be informative of our vascular TIFs. In order to evaluate to what extent any individual TIF can be represented by a single LV, we computed all pairwise correlations (shown as a bi-clustered cross-correlation matrix in Fig. 2 a). For each TIF, we then picked the LV with the largest squared correlation, which corresponds to the explained variance of a Simple Linear Regression (SLR) model ( \(\:{R}_{SLR}^{2}\) ; top bars in Fig. 2 b). Our analysis indicates that individual LVs can represent a sizable portion of the variability of relatively simple vascular measures, like the vascular densities \(\:{(max\:R}_{SLR}^{2}=0.36)\) . However, this is not the case for vascular TIFs whose measurement is more intricate, where we observe poor correlations between an individual LV and the arterial central retinal equivalent \(\:{(max\:R}_{SLR}^{2}=0.09)\) , tortuosity \(\:{(max\:R}_{SLR}^{2}=0.01)\) , and tortuosity artery/vein (A/V) ratios \(\:{(max\:R}_{SLR}^{2}<0.01)\) . Next, we investigated to what extent combining multiple LVs to predict TIFs can increase the explained variance. As a first step, we trained Multiple Linear Regression (MLR) models with all LVs as features using five-fold cross-validation. The corresponding out-of-sample estimates for the explained variance \(\:{R}_{MLR}^{2}\) for each TIF (middle bars in Fig. 2 b) are consistently higher than those from the best SLR model. We observed similar patterns as for SLR, with simple TIFs (such as vascular densities) having more than 90% of their variance explained and much less for more intricate TIFs, requiring explicit identification of vessels and their type. Notably, LVs failed to predict A/V ratios for tortuosity \(\:{(R}_{MLR}^{2}=0.09)\) and central retinal equivalents \(\:{(R}_{MLR}^{2}=0.06)\) . In a second step we directly trained RETFound to predict TIFs, resulting in 17 fine-tuned RETFound models providing TIF estimates as their output, which we call "deep TIFs" (dTIFs). For most of these dTIFs, the out-of-sample estimates \(\:{R}_{deep}^{2}\) for the explained variances (bottom bars in Fig. 2 b) are larger than those of the corresponding MLR models. Specifically, the fine-tuned RETFound models outperformed the MLR models for 11 out of 17 TIFs, with a mean improvement in \(\:{R}_{}^{2}\) of 0.05 (95% CI = [0.01, 0.08], t(16) = 2.45, p = 0.03). We observed the most substantial increases for the temporal angles (arterial: \(\:{R}_{deep}^{2}=0.34,\:\varDelta\:{R}^{2}=0.14\) ; venous: \(\:{R}_{deep}^{2}=0.61,\:\varDelta\:{R}^{2}=0.22)\) and arterial tortuosity \(\:{(R}_{deep}^{2}=0.43,\:\varDelta\:{R}^{2}=0.16)\) , while the improvements for the other TIFs are marginal or even absent. Comparing the genetic association signals of TIFs and LVs We previously showed that many of our TIFs have a sizable genetic component, with heritability estimates ranging from 5% for median vessel diameter to > 25% for arterial tortuosity [ 9 ]. Our genome-wide association study (GWAS) further revealed many plausible genes and pathways for the vast majority of TIFs. These genetic association signals indicate that our TIFs are likely to capture some processes related to vascular maintenance that are partially modulated by genetic variants. As we do not understand what information is captured by LVs—other than that they can only partially represent TIFs— we sought to elucidate their genetic architectures, as these genetic signals can potentially shed light on the image-based features represented by the LVs. To this end, we performed GWAS, heritability estimation, and gene analyses for all 1 024 LVs of the RETFound model (before any fine-tuning) and our 17 dTIFs. At the SNP-level (Fig. 3 a–c), the number of significantly associated variants was higher for mTIFs (5 949 SNPs at a Bonferroni‐adjusted threshold of 5/17×10⁻⁸) than for LVs (4 731 SNPs at 5/1 024×10⁻⁸). Since LVs are not independent we also performed a GWAS on their 17 leading principal components (PCs), explaining 66% of their variance (Suppl. Figure 1), which resulted in n = 2 130 SNPs with p < 5/17·10 − 8 (Suppl. Figure 2). Notably, a subset of LVs showed the strongest associations, with an effect size of β = 0.45 and a nominal significance of p < 10⁻³⁰⁰. rs12913832 was a key driver of this signal, which has previously been shown to modulate up to 50% of blue eye colour, with the blue-eye allele decreasing expression levels of OCA2 [ 13 ]. In contrast, 50.7% of loci were shared between mTIFs and dTIFs, with the numbers of exclusive hits being 1 708 for mTIFs, 1 967 for dTIFs, and 3928 for LVs. Gene‐level aggregation using the PascalX tool [ 14 , 15 ] confirmed that the genetic architecture of dTIFs more closely resembles that of mTIFs than LVs (Fig. 3 d), and dTIFs generally exhibit stronger genetic signals than LVs (Fig. 3 e). Consistently, the overlap of significantly associated genes was 69.0% between mTIFs and dTIFs, but only 19.2% between mTIFs and LVs (Fig. 3 f). The most significant gene‐level association was observed for the HERC2 / OCA2 locus (p = 7.7×10⁻²⁹⁷ in LV 810), which also influences some TIFs, albeit less strongly (p = 1.7×10⁻²¹ in venous vascular density). Finally, heritability estimates derived from Linkage Disequilibrium (LD) Score Regression (LDSR) indicated that most LVs have a modest genetic contribution (h² 0.21 (Fig. 3 g). A similar pattern emerged at the gene level, with LVs showing a less polygenic architecture compared with mTIFs and dTIFs (Fig. 3 h). Moreover, while genes linked to TIFs were generally specific to only one or two of the 17 phenotypes, those associated with LVs tended to influence multiple traits (Fig. 3 i). TIF-level analyses (Fig. 4 ), comparing dTIFs and mTIFs in terms of heritabilities and genetic associations, revealed that heritability estimates for dTIFs were generally comparable to those of the corresponding mTIFs, and in some cases, were slightly higher (Fig. 4 a). A notable exception was the arterial temporal angle, where the “deep” version showed a substantially higher heritability (h 2 = 0.22, \(\:\varDelta\:{h}^{2}=0.13\) ). We made a consistent observation at the gene level (Fig. 4 b), where dTIFs and mTIFs showed similar numbers of associated genes and large overlaps. With the exception of the ratio of arterial and venous vascular density and venous tortuosity, dTIFs had more exclusively associated genes than their corresponding mTIFs. Interestingly, “deep” arterial tortuosity gained 25 gene discoveries (-28%), and did so with slightly decreased heritability ( \(\:\varDelta\:{h}^{2}=-0.02\) ), whereas its venous counterpart lost 11 discoveries (-32%). Associating disease risk with measured and deep TIFs Next, we investigated to what extent risk factors, ocular and general diseases, and clinical events can be explained by mTIFs or dTIFs. We used linear models that incorporated covariates alongside a single TIF as explanatory variables. We assessed the strength of associations using standardised effect sizes and statistical significance and quantified the total number of unique significant associations across diseases and TIFs to summarise the findings (Fig. 5 ). Additionally, we applied Fisher’s exact test to assess whether the ratio of significant associations differed between mTIFs and dTIFs. Although not statistically significant, the largest difference in the number of significant diseases tended to be found with the venous temporal angle, with five diseases being significantly associated uniquely with dTIFs ( OR = 2.74, p = 0.21). While also not significant, diabetes-related eye disease (“diabetes-eye”) tended to display the largest number of differences between TIF types, with five more dTIFs than mTIFs ( OR = 3.43, p = 0.17). When grouping the diseases into categories, we observed that ocular diseases had significantly more associations only with dTIFs than for mTIFs ( OR = 1.75, p = 0.02). No such pattern was observed across risk factors (OR = 1.06, p = 0.84) or general diseases and events (OR = 0.92, p = 0.81). Finally, we aimed to determine whether models that combine multiple image feature sets increase predictive power over models restricted to just covariates or one feature set at a time. We used regularised multilinear models with eight sets of explanatory features: (i) a baseline model including only covariates (Covar) that were included in each subsequent model; (ii) all 17 mTIFs; (iii) all 17 dTIFs; (iv) the 1024 RETFound LVs; (v) all mTIFs and dTIFs; (vi) all mTIFs and the 1024 RETFound LVs; (vii) all dTIFs and the 1024 RETFound LVs; and (viii) all features. We found that: 1) models that incorporated TIFs with LVs performed best when predicting blood pressure and BMI (Fig. 6 a and c); 2) when the outcome was an ocular disease, either there was no additional gain in predictive power over the baseline model, or the LVs tended to outperform the TIFs (Fig. 6 b and Suppl. Figure 3); and 3) both the TIFs and the LVs added to predictive power over the base model for most risk factors and diseases (Fig. 6 a-c and Suppl. Figure 3). Associated statistics for comparing feature predictive capacities can be found in Suppl. Table 1. Discussion Breakthroughs in computer-aided analysis are revolutionizing the use of medical images for diagnosis and risk assessment. These advances range from supporting physicians in segmenting anatomical structures and extracting precise measurements, to making fully automated disease risk predictions using end-to-end machine learning pipelines. DL algorithms typically hold internal “latent space” representations of the images for this task, but their nonlinear entanglement makes them appear as “black boxes” lacking straightforward explanations. This is problematic in the biomedical domain, where trust by the clinician is essential [ 16 – 19 ]. Furthermore, interpretability is key for deciphering the underlying biological or physiological mechanisms, which is pivotal for advancing treatment [ 20 ]. Visual interpretability methods, such as attention maps [ 21 – 23 ] and counterfactual explanations [ 24 ], have been developed to address this but often fall short of clarifying exactly how latent representations drive predictions (see [ 19 ] for recent advances in tackling this issue). To shed some light on what latent variables (LVs) might represent, here we investigated a recently proposed foundation model for retinal images and applied it to a large collection of colour fundus images (CFIs) which we had previously characterised in terms of “tangible image features” (TIFs) of high clinical interpretability [ 10 ]. Specifically, we asked to what extent LVs can predict TIFs, individually or jointly. Moreover, the fact that these images stem from genotyped subjects with extensive clinical phenotyping allowed us to directly compare the associations of LVs with genotypes and disease states or risk factors with those of TIFs. Our analysis revealed that, prior to fine-tuning, individual LVs of RETFound do not strongly correlate with any of our TIFs. Vascular densities and number of bifurcations seem to have the closest LV representation, explaining close to 40% of their variance, while it is less than 20% for the TIFs whose measurement is more intricate. Linear combinations of multiple LVs have substantially higher predictive power for TIFs, in particular for the vascular densities, explaining close to 90% of their variance. Strikingly, fine-tuning RETFound to predict TIFs yielded marginal or no improvements compared to the LVs (c.f. [ 25 – 27 ] for similar negative results on fine-tuning in LLMs), with the notable exception of the temporal angles between the main vascular branches and, to a lesser degree, vessel tortuosity. Thus, even the advanced transformer DL architecture of RETFound and its foundation model parameters trained on more than 900 000 CFIs fail to provide a close approximation of well-established classical ophthalmological parameters of the retinal vasculature. This is remarkable because, in contrast to clinical endpoints, TIFs are fully determined by the images, i.e., their extraction is implemented in terms of deterministic algorithmic procedures based on conventional image processing. Our analysis of genetic associations elucidates to what extent DL features are modulated genetically and their overlap with the association signals of the TIFs. We argue that genetic associations support physiological relevance because they point to a causal chain from a DNA difference (i.e. a genetic variant at the level of the DNA) over a molecular variation (such as differential gene expression) to a (typically minute) phenotypic variability. Heritability can be seen as a global summary statistic of such genetic effects, and our results indicate that dTIFs are at least as heritable as their measured counterparts, supporting similar if not higher physiological pertinence. Gene-scoring analyses revealed that among RETFound’ s LVs the strongest association is observed with the HERC2 / OCA2 locus, a well‐known, strong modulator of pigmentation that has previously been found as a top hit in a GWAS of LVs from self‐supervised deep phenotyping of CFIs [ 28 ]. This indicates that some of the baseline LVs predominantly capture information related to the pigmentation of the retina. In contrast, dTIFs obtained by fine-tuning RETFound demonstrate a genetic architecture that more closely mirrors that of mTIFs. Notably, fine‐tuning yielded 87 additional genetic discoveries exclusive to dTIFs, nearly doubling the number of unique associations observed for mTIFs. Furthermore, traits such as arterial temporal angles exhibited significantly higher heritability and a greater number of gene discoveries compared to their measured counterparts, suggesting that fine‐tuning enhances the capacity of some TIFs to capture subtle yet physiologically pertinent vascular signals. We thus hypothesize that directing RETFound ’s focus toward TIFs augments to some extent their ability to reflect the underlying vascular biology. A complementary means of evaluating the pertinence of vascular features is in terms of their capacity to facilitate disease prediction. In this study, we contrasted mTIFs and dTIFs by comparing their standardised effects in logistic and linear models with diseases or risk factors as response variables. We found that the number of significant diseases and risk factors was largely similar between mTIFs and dTIFs, but dTIFs showed more significant associations with ocular diseases compared to mTIFs. This outcome may be attributed to RETFound' s initial training on a diverse set of retinal images [ 2 , 29 ]. We hypothesise that its fine-tuned weights, while focused on predicting vascular morphology, may retain some of RETFound' s broader representation of retinal features. This may allow the model to capture subtle patterns or relationships between vascular morphology and other retinal structures that are relevant to ocular pathologies but not explicitly measured in traditional vascular assessments. The enhanced performance in ocular disease prediction using our fine-tuned models suggests that AI-derived vascular morphology metrics may provide additional informative features for ocular condition detection compared to conventional morphological measurements alone. Using models that integrate multiple feature types (i.e., baseline, mTIFs, dTIFs, and PCs of LVs), we provide evidence that their integration enhances the predictive power for disease. Notably, despite being representative of the entire fundus image, including the vasculature, models that included both LVs from RETFound and TIFs (both measured and deep) perform best when predicting blood pressure or BMI, suggesting that TIFs provide complimentary information not fully captured by RETFound ’s LVs. One potential reason the RETFound model may not capture the information provided by TIFs is due to the process by which it was trained, whereby it used masked autoencoding during pre-training [ 30 ], which may not assign high importance to vascular structures when reconstructing the image. As a result, some critical vasculature information may be excluded from its base weights. Indeed, the original authors of RETFound qualitatively report that the model fails to reconstruct much of the vasculature present in CFIs. Our results further support this, as we find that fine-tuning RETFound to predict vascular TIFs only captures a relatively small portion of their variance, suggesting that the model is unable to relearn detailed vascular and vessel type specific information. Similar findings regarding hybrid models that combine manually- and DL-derived features have been observed in other domains, where manually engineered features capture domain-specific knowledge that DL models may overlook when sample sizes are small [ 31 ] or when training and test set distributions differ [ 32 , 33 ]. These results emphasize the importance of leveraging both foundation- and expert-derived features to optimize predictive performance. In summary, our findings provide key insights into the interplay between DL-derived features and classical retinal image features in the context of genetics and disease prediction. While RETFound can predict certain TIFs, such as vessel densities, with reasonable accuracy, it provides poor estimates of more intricate features, like measures related to artery-to-vein ratios. Despite these limitations, dTIFs exhibit heritability comparable to or greater than their measured counterparts, with a notably large increase in heritability found with temporal angles of the arteries, suggesting that dTIFs may enhance physiologically relevant signal extraction. Furthermore, the genetic and associations of mTIFs and dTIFs show both overlap and complementarity, reinforcing the idea that DL-derived features do not simply replicate classical measurements but may provide additional, distinct insights. Most notably, integrating LVs with TIFs leads to superior predictive performance compared to models using either feature set alone, highlighting the potential of hybrid approaches in disease modeling. Overall, this work highlights the synergy between DL and classical feature extraction for advancing our understanding of retinal biology and disease mechanisms, potentially leading to improved diagnostic and prognostic tools in ophthalmology. Methods Data and quality control The UK Biobank is a population-based cohort of ~ 488k subjects with rich, longitudinal phenotypic data, including medical history, and a median 10-year follow-up [ 34 ]. Standard retinal 45° CFIs were captured using a Topcon 3D-OCT 1000 Mark II, resulting in a total of 173 814 CFIs from 84 813 individuals. For this study we used 41 527 of these images from the right eye, for which we were able to measure all of our 17 TIFs using our previously described analysis pipeline [ 10 ]. 96% of these images passed the quality control threshold we had used in our previous work [ 10 ] (see Suppl. Figure 4). This effectively removed CFIs for which our algorithm failed to identify the optic disk or enough major blood vessels to measure the temporal angles. Since this happens more often for CFIs from older and diseased subjects, the cohort of the present study appears to be more “healthy” with respect to its baseline risk factors (see Suppl. Table 2 for comparison to our previous study). Since some of the LVs produced from RETFound encoded information specific to eye laterality (left vs. right), we opted to analyze a single eye rather than averaging across both eyes. Genotyping was performed on Axiom arrays for a total of 805 426 markers, from which ~ 96 million genotypes were imputed. We used the subset of 15 599 830 SNPs that had been assigned an rsID. Measured phenotypes The 17 mTIFs comprise tortuosity, temporal angles, vessel diameters and diameter variability, vascular density, and the number of bifurcations. With the exception of the number of bifurcations, all of these measurements were performed independently for arteries and veins. To explore asymmetries between arteries and veins, the arterio-venous ratio between diameters, tortuosities and vascular densities were also included. Complete algorithmic descriptions of how these measurements were extracted are given in our recent publication [ 10 ]. Deep learning model We utilised RETFound [ 2 ], a novel foundation model that leverages Google’s vision transformer large patch 16 architecture [ 35 ]. Briefly, RETFound uses a masked autoencoder (MAE) approach [ 30 ] to optimise the model towards reconstructing retinal images. For building RETFound , 75% of the image patches were masked and the model was trained to reconstruct the missing patches, thereby learning effective representations of the images provided as training data. We used the encoder portion of this model to extract LVs and make predictions. Extracting Latent Variables To generate LVs, we first resized each image to 224×224 pixels and normalised the image to the means and standard deviations of ImageNet’s RGB channels. After preprocessing the image, we then propagated each individual image through the encoder portion of the pre-trained RETFound model to extract the averages of the 1 024-dimensional patch embeddings, yielding 1 024 LVs for each image. We further derived the leading PCs of these LVs for both genetic analyses and disease associations. Predicting tangible image features For predicting TIFs, we used the encoder portion of RETFound and added a multi-layer perceptron to the head of the model. We utilised the base hyperparameters established in the original RETFound manuscript [ 2 ] and train the model over 50 epochs, with a “warm-up” in the 10 first epochs (with the learning rate r monotonically increasing from 0 to 5 × 10 − 4 ), followed by a cosine annealing schedule (where r decreases from 5 × 10 − 4 to 1 × 10 − 6 ). We set the batch size to 16, the dropout rate to 0.2, and the image size to 224×224 pixels. After each epoch, we evaluated the model on a validation dataset and then kept the model that had the lowest mean absolute error across all epochs. Individual models were built for each of the 17 TIFs using five-fold cross-validation. Covariate corrections Genetic analysis and disease prediction models were corrected for a set of covariates known to confound phenotypic variability in general, or in the eye specifically. They comprise sex, age, age-squared, sex-by-age, sex-by-age-squared, spherical power, spherical power-squared, cylindrical power, cylindrical power-squared, imaging instance, assessment centre, genotype measurement batch, and the first 20 genomic PCs (see Suppl. Figure 5). Genomic analyses For GWAS, raw phenotypes were transformed using rank-based inverse normal transformation. GWAS were conducted using BGENIE [ 36 ]. SNP-specific heritabilities and genetic correlations between traits were calculated using LDSR [ 37 ] which fits an ordinary least squares regression of SNP LD scores against their average chi-squared statistics. Gene and pathway scores were computed using PascalX [ 14 , 15 , 38 ]. Both protein-coding genes and lincRNAs were scored using the approximate “saddle” method, incorporating all SNPs within a 50 kb window surrounding each gene. All pathways from MSigDB v7.2 [ 39 – 41 ] were scored using PascalX’ s ranking mode, merging and rescoring co-occurring genes separated by less than 100 kb. PascalX utilises LD structure for accurate computation of gene scores, which in these analyses was derived from the UK10K (hg19) reference panel [ 42 ]. For annotating genes in the Manhattan plots, the top ten genes for each feature set (i.e., mTIFs, dTIFs, or LVs) were selected within a 1 000 SNP window. Disease associations Three distinct groups of disease traits were included in this work. First, a set of general systemic risk factors, second, a set of ocular diseases, and third, a set of general diseases and events (see Suppl. Figure 6). All disease data were collected from the UK Biobank (see Ref. [ 10 ] for how we mapped the data field identifiers to each disease included in this study). To extract beta coefficients, in the case of the continuous risk factors, linear regression was used to estimate standardised effects. For both ocular and general diseases, time-to-event data were binarised, and logistic regression was applied. When examining the predictive capacity of different feature sets, linear regression was used for continuous risk factors and logistic regression was used for disease prediction. To assist with model generalization and reduce overfitting, five-fold cross-validation was employed. Both linear and logistic regression models were regularized using elastic net penalties, with optimal values for alpha and the L1 ratio determined by grid search within each fold. We utilized the coefficient of determination (R 2 ) as the performance metric for continuous risk factors and area under the receiver operating characteristic curve (AUC) as the performance metric for diseases. As AUC can suffer from imbalanced datasets when used for prediction tasks, and most of the diseases investigated had many more controls than cases, we downsampled the controls to the same sample size as the cases. Downsampling was performed by matching each case with the control that had the most similar covariate profile using optimal bipartite matching (Hungarian algorithm), ensuring that no control was used more than once. To prevent poor feature-to-sample-size ratios, diseases and events were required to have at least 1 000 samples (cases and controls combined) to be analyzed. For all regression analyses, traits were corrected for the covariates listed above. To determine the significance of regression coefficients and prediction power, p-values were computed, and the Benjamini-Hochberg procedure was applied using a false discovery rate of 0.05. Declarations Data availability GWAS summary statistics, gene and pathway scores, and all data underlying figures will be made available on Zenodo upon publication. GWAS summary statistics will further be made available on the GWAS Catalog. Raw UK Biobank data are protected and not open access; however, they can be obtained upon project creation and acceptance via their platform (https://www.ukbiobank.ac.uk/). mTIFs [10] are already available via this channel; dTIFs and baseline LVs will be added to the platform upon publication. Code availability Code is going to be made available on Zenodo upon publication. References Azad B, Azad R, Eskandari S, Bozorgpour A, Kazerouni A, Rekik I, et al. Foundational Models in Medical Imaging: A Comprehensive Survey and Future Vision. 2023. Available: http://arxiv.org/abs/2310.18689 Zhou Y, Chia MA, Wagner SK, Ayhan MS, Williamson DJ, Struyven RR, et al. A foundation model for generalizable disease detection from retinal images. Nature. 2023;622: 156–163. Parres D, Paredes R. Fine-Tuning Vision Encoder–Decoder Transformers for Handwriting Text Recognition on Historical Documents. Document Analysis and Recognition - ICDAR 2023. 2023; 253–268. Fhima J, Van Eijgen J, Billen Moulin-Romsée M-I, Brackenier H, Kulenovic H, Debeuf V, et al. LUNet: deep learning for the segmentation of arterioles and venules in high resolution fundus images. Physiol Meas. 2024;45. doi:10.1088/1361-6579/ad3d28 Quiros JV, Liefers B, van Garderen K, Vermeulen J, Center ER, Consortium S, et al. VascX Models: Model Ensembles for Retinal Vascular Analysis from Color Fundus Images. 2024. Available: http://arxiv.org/abs/2409.16016 Tomasoni M, Beyeler MJ, Vela SO, Mounier N, Porcu E, Corre T, et al. Genome-wide Association Studies of Retinal Vessel Tortuosity Identify Numerous Novel Loci Revealing Genes and Pathways Associated With Ocular and Cardiometabolic Diseases. Ophthalmology Science. 2023;3: 100288. Zhou Y, Wagner SK, Chia MA, Zhao A, Woodward-Court P, Xu M, et al. AutoMorph: Automated Retinal Vascular Morphology Quantification Via a Deep Learning Pipeline. Trans Vis Sci Tech. 2022;11: 12–12. Van Eijgen J, Fhima J, Billen Moulin-Romsée M-I, Behar JA, Christinaki E, Stalmans I. Leuven-Haifa High-Resolution Fundus Image Dataset for Retinal Blood Vessel Segmentation and Glaucoma Diagnosis. Sci Data. 2024;11: 257. Ortin Vela S, Beyeler MJ, Trofimova O, Tomasoni M. Phenotypic and Genetic Characteristics of Retinal Vascular Parameters and their Association with Diseases. medRxiv. 2023. Available: https://www.medrxiv.org/content/10.1101/2023.07.07.23292368.abstract Ortín Vela S, Beyeler MJ, Trofimova O, Iuliani I, Vargas Quiros JD, de Vries VA, et al. Phenotypic and genetic characteristics of retinal vascular parameters and their association with diseases. Nat Commun. 2024;15: 9593. Ho H, Cheung CY, Sabanayagam C, Yip W, Ikram MK, Ong PG, et al. Retinopathy Signs Improved Prediction and Reclassification of Cardiovascular Disease Risk in Diabetes: A prospective cohort study. Sci Rep. 2017;7: 41492. Advances in medical image analysis with vision Transformers: A comprehensive review. Medical Image Analysis. 2024;91: 103000. Eiberg H, Troelsen J, Nielsen M, Mikkelsen A, Mengel-From J, Kjaer KW, et al. Blue eye color in humans may be caused by a perfectly associated founder mutation in a regulatory element located within the HERC2 gene inhibiting OCA2 expression. Human Genetics. 2008;123: 177–187. Krefl D, Bergmann S. Cross-GWAS coherence test at the gene and pathway level. PLoS Comput Biol. 2022;18: e1010517. Krefl D, Brandulas Cammarata A, Bergmann S. PascalX: a Python library for GWAS gene and pathway enrichment tests. Bioinformatics. 2023;39. doi:10.1093/bioinformatics/btad296 Andrews B, Chang J-B, Collinson L, Li D, Lundberg E, Mahamid J, et al. Imaging cell biology. Nature Cell Biology. 2022;24: 1180–1185. Rajpurkar P, Chen E, Banerjee O, Topol EJ. AI in health and medicine. Nat Med. 2022;28: 31–38. DeGrave AJ, Cai ZR, Janizek J, Daneshjou R, Lee S-I. Auditing the inference processes of medical-image classifiers by leveraging generative AI and the expertise of physicians. Nature Biomedical Engineering. 2023; 1–13. Rotem O, Schwartz T, Maor R, Tauber Y, Shapiro MT, Meseguer M, et al. Visual interpretability of image-based classification models by generative latent space disentanglement applied to in vitro fertilization. Nature Communications. 2024;15: 1–19. Rotem O, Zaritsky A. Visual interpretability of bioimaging deep learning models. Nat Methods. 2024;21: 1394–1397. Zhou B, Khosla A, Lapedriza A, Oliva A, Torralba A. Learning deep features for discriminative localization. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE; 2016. doi:10.1109/cvpr.2016.319 Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D. Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization. 2016 [cited 17 Dec 2024]. doi:10.1007/s11263-019-01228-7 Shrikumar A, Greenside P, Kundaje A. Learning important features through propagating activation differences. Precup D, Teh YW, editors. arXiv [cs.CV]. 06--11 Aug 2017. pp. 3145–3153. Available: https://proceedings.mlr.press/v70/shrikumar17a.html Lang O, Gandelsman Y, Yarom M, Wald Y, Elidan G, Hassidim A, et al. Explaining in Style: Training a GAN to explain a classifier in StyleSpace. arXiv [cs.CV]. 2021. Available: http://arxiv.org/abs/2104.13369 Fine-Tuning or Fine-Failing? Debunking Performance Myths in Large Language Models. [cited 18 Dec 2024]. Available: https://arxiv.org/html/2406.11201v1 Li X, Chan S, Zhu X, Pei Y, Ma Z, Liu X, et al. Are ChatGPT and GPT-4 General-Purpose Solvers for Financial Text Analytics? A Study on Several Typical Tasks. 2023. Available: http://arxiv.org/abs/2305.05862 Uppaal R, Hu J, Li Y. Is Fine-tuning Needed? Pre-trained Language Models Are Near Perfect for Out-of-Domain Detection. 2023. Available: http://arxiv.org/abs/2305.13282 Xie Z, Zhang T, Kim S, Lu J, Zhang W, Lin C-H, et al. iGWAS: Image-based genome-wide association of self-supervised deep phenotyping of retina fundus images. PLOS Genetics. 2024;20: e1011273. Fu Y, Ma L, Wan S, Ge S, Yang Z. A novel clinical artificial intelligence model for disease detection via retinal imaging. Innovation (Camb). 2024;5: 100575. He K, Chen X, Xie S, Li Y, Dollar P, Girshick R. Masked Autoencoders Are Scalable Vision Learners. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE; 2022. doi:10.1109/cvpr52688.2022.01553 Mavaie P, Holder L, Skinner MK. Hybrid deep learning approach to improve classification of low-volume high-dimensional data. BMC Bioinformatics. 2023;24: 1–20. Lin W, Hasenstab K, Moura Cunha G, Schwartzman A. Comparison of handcrafted features and convolutional neural networks for liver MR image adequacy assessment. Scientific Reports. 2020;10: 1–11. Bento N, Rebelo J, Barandas M, Carreiro AV, Campagner A, Cabitza F, et al. Comparing Handcrafted Features and Deep Neural Representations for Domain Generalization in Human Activity Recognition. Sensors (Basel, Switzerland). 2022;22: 7324. Bycroft C, Freeman C, Petkova D, Band G, Elliott LT, Sharp K, et al. The UK Biobank resource with deep phenotyping and genomic data. Nature. 2018;562: 203–209. Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. 2020. Available: http://arxiv.org/abs/2010.11929 Band G, Marchini J. BGEN: a binary file format for imputed genotype and haplotype data. bioRxiv. 2018. p. 308296. doi:10.1101/308296 Bulik-Sullivan B, Loh PR, Finucane HK, Ripke S, Yang J, Patterson N, et al. LD score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat Genet. 2015;47: 291–295. Lamparter D, Marbach D, Rueedi R, Kutalik Z, Bergmann S. Fast and Rigorous Computation of Gene and Pathway Scores from SNP-Based Summary Statistics. 2016. doi:10.1371/journal.pcbi.1004714 Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A. 2005;102: 15545–15550. Liberzon A, Birger C, Thorvaldsdóttir H, Ghandi M, Mesirov JP, Tamayo P. The Molecular Signatures Database (MSigDB) hallmark gene set collection. Cell Syst. 2015;1: 417–425. Castanza AS, Recla JM, Eby D, Thorvaldsdóttir H, Bult CJ, Mesirov JP. Extending support for mouse data in the Molecular Signatures Database (MSigDB). Nature Methods. 2023;20: 1619–1620. UK10K Consortium, Walter K, Min JL, Huang J, Crooks L, Memari Y, et al. The UK10K project identifies rare variants in health and disease. Nature. 2015;526: 82–90. Additional Declarations There is NO Competing Interest. Supplementary Files 2manuscriptsupplement.docx Supplemental material Cite Share Download PDF Status: Under Review Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-6130721","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Article","associatedPublications":[],"authors":[{"id":437581175,"identity":"268603cd-42a6-4b29-8536-eaae3b5f2c37","order_by":0,"name":"David Presby","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAA3klEQVRIie3OsQrCMBCA4ZNIXEq7KvUh4hLpIL6KIjhp3cS90ElwrdCHSSnoUu2aoYMgZOrQ0dG01TXNKJifcGS4jwTAZPrFmDxWc5OzktMGwPqkF8mJdQh8CaovncS+ZhNWQrFznNvFnYXFGAMSDxUZZRuSxCC8c+Sv3W0o5MfwlKgIYetFakFKCLeoJKkkFh4qSS5aMs8z6npahK9Y+wpsqNvTISP+ZElMBBlyf+Ud75IgTJXEzpdBVR4K4pxuCX/t07kzCISSfL7H6tlvN1H3fl1DUKW3bDKZTP/WG+0PRj1E7TD7AAAAAElFTkSuQmCC","orcid":"","institution":"Swiss Institute of Bioinformatics, and University of Lausanne","correspondingAuthor":true,"prefix":"","firstName":"David","middleName":"","lastName":"Presby","suffix":""},{"id":437581176,"identity":"b553fb72-2c8c-46e2-901d-684ec1155e2f","order_by":1,"name":"Michael Beyeler","email":"","orcid":"https://orcid.org/0000-0001-6199-4879","institution":"Dept. of Computational Biology, University of Lausanne","correspondingAuthor":false,"prefix":"","firstName":"Michael","middleName":"","lastName":"Beyeler","suffix":""},{"id":437581177,"identity":"4eebbe25-5ced-4989-b36d-feccb51be38a","order_by":2,"name":"Olga Trofimova","email":"","orcid":"","institution":"Dept. of Computational Biology, University of Lausanne","correspondingAuthor":false,"prefix":"","firstName":"Olga","middleName":"","lastName":"Trofimova","suffix":""},{"id":437581178,"identity":"bc591ccc-d833-4bf9-9039-d672ea15be88","order_by":3,"name":"Dennis Bontempi","email":"","orcid":"","institution":"Dept. of Computational Biology, University of Lausanne","correspondingAuthor":false,"prefix":"","firstName":"Dennis","middleName":"","lastName":"Bontempi","suffix":""},{"id":437581179,"identity":"cfe2edd5-9bec-44b7-821a-8c3f34b8f7d3","order_by":4,"name":"Leah Bottger","email":"","orcid":"","institution":"University of Lausanne","correspondingAuthor":false,"prefix":"","firstName":"Leah","middleName":"","lastName":"Bottger","suffix":""},{"id":437581180,"identity":"2d85975a-528b-441a-b3b8-ef941ab4aa1b","order_by":5,"name":"Sacha Bors","email":"","orcid":"","institution":"University of Lausanne","correspondingAuthor":false,"prefix":"","firstName":"Sacha","middleName":"","lastName":"Bors","suffix":""},{"id":437581181,"identity":"6a0ad8fb-6073-40ba-8d7e-7a98d7e7031f","order_by":6,"name":"Ilaria Iuliani","email":"","orcid":"","institution":"Laboratoire de biologie et pharmacologie appliquée, UMR 8113, ENS Paris-Saclay, CNRS","correspondingAuthor":false,"prefix":"","firstName":"Ilaria","middleName":"","lastName":"Iuliani","suffix":""},{"id":437581182,"identity":"18a1e182-a349-4b23-a702-403ceba348ab","order_by":7,"name":"Ian Quintas","email":"","orcid":"","institution":"University of Lausanne","correspondingAuthor":false,"prefix":"","firstName":"Ian","middleName":"","lastName":"Quintas","suffix":""},{"id":437581183,"identity":"53ce7374-f5f9-4309-adaa-ae0cb951ab38","order_by":8,"name":"Sofia Ortin Vela","email":"","orcid":"https://orcid.org/0000-0002-2268-9772","institution":"University of Lausanne","correspondingAuthor":false,"prefix":"","firstName":"Sofia","middleName":"Ortin","lastName":"Vela","suffix":""},{"id":437581184,"identity":"8ffef509-41b0-4bc2-89d9-72e0ed84133f","order_by":9,"name":"Sven Bergmann","email":"","orcid":"https://orcid.org/0000-0002-6785-9034","institution":"University of Lausanne","correspondingAuthor":false,"prefix":"","firstName":"Sven","middleName":"","lastName":"Bergmann","suffix":""}],"badges":[],"createdAt":"2025-02-28 18:00:31","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-6130721/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-6130721/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":80048328,"identity":"ccdd5210-4b51-4762-be47-4a4b8cbd93a9","added_by":"auto","created_at":"2025-04-07 10:02:01","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":389283,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eGraphical overview of the study.\u003c/strong\u003e (a) We selected a subset of genotyped subjects from the UK Biobank who had colour fundus images (CFIs) taken, as well as available health outcome information and risk factors. (b) Top panel: Using a previously developed analysis pipeline to characterise the retinal vasculature \u003ca href=\"https://paperpile.com/c/wWN8q0/eyc1\"\u003e[10]\u003c/a\u003e, we extracted tangible image features (TIFs), such as vessel tortuosity or the number of bifurcations, from each CFI. We also used the retinal foundation model \u003cem\u003eRETFound\u003c/em\u003e \u003ca href=\"https://paperpile.com/c/wWN8q0/TJzOF\"\u003e[2]\u003c/a\u003e as a DL approach to characterise CFIs both (middle) in terms of “deep TIFs” by fine-tuning \u003cem\u003eRETFound\u003c/em\u003e to predict the 17 measured TIFs and (bottom) by extracting a set of 1 024 latent variables (LVs) using \u003cem\u003eRETFound’s\u003c/em\u003e pre-trained weights. (c) For downstream comparative analyses, we employed direct correlations, genetic, and disease associations.\u003c/p\u003e","description":"","filename":"floatimage1.png","url":"https://assets-eu.researchsquare.com/files/rs-6130721/v1/eda20d7cc1a02bc79019fdea.png"},{"id":80047251,"identity":"39647f84-b268-45f5-9d5e-7344b06687b0","added_by":"auto","created_at":"2025-04-07 09:53:58","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":403149,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eCorrelations analysis.\u003c/strong\u003e (a) Bi-clustered cross-correlation matrix between LVs (x-axis) and TIFs (y-axis). (b) Variance of TIFs explained by the best simple linear regression (SLR) models using the single best baseline LV (purple), multiple linear regression (MLR) models using all baseline LVs (green) and fine-tuning \u003cem\u003eRETFound\u003c/em\u003e models (orange). Out-of-sample R\u003csup\u003e2\u003c/sup\u003e and their standard errors across the five folds are displayed. A: arterial, V: venous, eq: equivalent, std: standard deviation.\u003c/p\u003e","description":"","filename":"floatimage2.png","url":"https://assets-eu.researchsquare.com/files/rs-6130721/v1/5042b0f514bc5619a9758120.png"},{"id":80047250,"identity":"1828daf4-68e4-4fcc-b1fe-be8b207327a1","added_by":"auto","created_at":"2025-04-07 09:53:58","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":1403471,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eGenetics associations of mTIFs, dTIFs, and LVs.\u003c/strong\u003e (a) Manhattan plots, (b) QQ-plots and (c) Venn-diagrams for SNP-wise associations. (d) Manhattan plots (with the top 10 most significant genes per feature set plotted), (e) QQ-plots and (f) Venn diagrams for gene-wise associations. (g) Density (i.e. normalised count) of mTIFs, LVs and dTIFs within heritability (h\u003csup\u003e2\u003c/sup\u003e) range. (h) Density for number of significantly associated genes with mTIFs, LVs and dTIFs. (i) Percent of significant genes (see Fig. 3f) associated with a certain number of traits.\u003c/p\u003e","description":"","filename":"floatimage3.png","url":"https://assets-eu.researchsquare.com/files/rs-6130721/v1/42ed9a326b039616c2ba38f5.png"},{"id":80047272,"identity":"243439c1-35ec-497f-ad80-371889ad9730","added_by":"auto","created_at":"2025-04-07 09:54:01","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":362561,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eHeritabilities and gene associations per TIF. \u003c/strong\u003e(a) SNP heritability estimates obtained via \u003cem\u003eLDSR \u003c/em\u003eand the difference between deep and measured equivalents. (b) Number of associated genes of mTIFs versus dTIFs, including the count of genes exclusive to each (blue: exclusive to mTIFs, red: exclusive to dTIFs). Abbreviations: A = Arterial; V = Venous; std = Standard Deviation; mTIFs = measured Tangible Image Features; dTIFs = Deep Tangible Image Features.\u003c/p\u003e","description":"","filename":"floatimage5.png","url":"https://assets-eu.researchsquare.com/files/rs-6130721/v1/b9efe23926641001c0cc7081.png"},{"id":80047260,"identity":"6b27020e-f7ba-4f2d-9a45-28ee9f129c42","added_by":"auto","created_at":"2025-04-07 09:53:59","extension":"png","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":855001,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eDisease associations with mTIFs and dTIFs\u003c/strong\u003e. Each cell presents the standardised effects (“beta values”) of a linear model that explains a risk factor, ocular or general disease incident with a single mTIF (lower left portion of a cell) or dTIF (upper right portion of a cell), while accounting for several covariates. Associations significant only for the dTIFs or mTIFs are indicated with ‘D’ or ‘M’, respectively, or a star (*) if significant for both. The bar graphs on the top and right indicate the total number of unique counts for each type of these associations across all diseases and TIFs, respectively. Abbreviations: A = Arterial; V = Venous; DBP = Diastolic Blood Pressure; SBP = Systolic Blood Pressure; Age BP = Age of High Blood Pressure Diagnosis; PR = Pulse Rate; PWASI = Pulse Wave Arterial Stiffness Index; HDL = High-Density Lipoprotein; LDL = Low-Density Lipoprotein; HbA1c = Glycated haemoglobin; N cig/day = self-reported number of cigarettes per day; N pack/yr = self-reported number of packs of cigarettes per year; BMI = Body Mass Index; DVT = Deep Vein Thrombosis.\u003c/p\u003e","description":"","filename":"floatimage6.png","url":"https://assets-eu.researchsquare.com/files/rs-6130721/v1/800a06e43f462b609e2cc18b.png"},{"id":80047270,"identity":"9dba583d-544b-4287-a3c7-62a348c75947","added_by":"auto","created_at":"2025-04-07 09:54:00","extension":"png","order_by":6,"title":"Figure 6","display":"","copyAsset":false,"role":"figure","size":300549,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003ePredictive power of CFI features in assessing risk factors, diseases and events. \u003c/strong\u003e(a)\u003cstrong\u003e \u003c/strong\u003eExplained variance achieved by linear models built to predict risk factors. (b) Area under the receiver operating characteristic curve (AUC) achieved by logistic regression models built to predict ocular diseases. (c) AUC achieved by logistic regression models built to predict diseases or events. Feature sets include: (i) a baseline model including only covariates (see methods for list), (ii) all 17 mTIFs, (iii) all 17 dTIFs, (iv) 1024 RETFound LVs, (v) all mTIF and 1024 RETFound LVs, (vi) all dTIFs and 1024 RETFound LVs, (vii) all mTIFs and dTIFs, (viii) all features combined. Abbreviations: Covar = Covariates; dTIF = Deep Tangible Image Feature; mTIF = Measured Tangible Image Feature; LVs = Latent Variables; DBP = Diastolic Blood Pressure; SBP = Systolic Blood Pressure; BMI = Body Mass Index; AUC = Area Under the Curve. Error bars reflect 95% confidence intervals.\u003c/p\u003e","description":"","filename":"floatimage7.png","url":"https://assets-eu.researchsquare.com/files/rs-6130721/v1/4a0dda3691dd0b82b948b565.png"},{"id":88929190,"identity":"b499238b-2359-4ca0-9707-dc1b5d35be66","added_by":"auto","created_at":"2025-08-12 20:24:25","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":4526413,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-6130721/v1/868b6f04-64cf-4b93-8740-55273b5c986a.pdf"},{"id":80047262,"identity":"179965c7-e1b5-4e92-a0d7-a7bf9686ef42","added_by":"auto","created_at":"2025-04-07 09:54:00","extension":"docx","order_by":1,"title":"","display":"","copyAsset":false,"role":"supplement","size":1600260,"visible":true,"origin":"","legend":"Supplemental material","description":"","filename":"2manuscriptsupplement.docx","url":"https://assets-eu.researchsquare.com/files/rs-6130721/v1/21f53af6b37bcfac4e5a39f2.docx"}],"financialInterests":"There is \u003cb\u003eNO\u003c/b\u003e Competing Interest.","formattedTitle":"Comparing tangible retinal image characteristics with deep learning features reveals their complementarity for gene association and disease prediction","fulltext":[{"header":"Introduction","content":"\u003cp\u003eMedical imaging has become essential for diagnosing diseases and providing presymptomatic risk prognoses. The use of computer-aided analysis, particularly through deep learning (DL), has made these processes not only more accurate but also scalable to large datasets. Recently, foundation models [\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e] have emerged as a powerful DL approach generating initial models pre-trained on extensive image datasets using self-supervised learning. Such models can then be applied across a wide range of use cases as they provide a robust and efficient latent space that captures image variability in a low-dimensional form. The basic idea is to use the \u0026ldquo;encoder\u0026rdquo; of the model that transforms an image into latent variables (LVs), which are then used as features for disease identification or prediction using a relatively simple model architecture. In some cases, the training procedure also involves \u0026ldquo;fine-tuning\u0026rdquo; the weights of the encoder, effectively co-adapting the latent space to provide the most pertinent features for the end-point prediction.\u003c/p\u003e \u003cp\u003eWhile there are ample examples showing the power of using foundation models [\u003cspan additionalcitationids=\"CR2\" citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e], they also have several disadvantages: First, the LVs are typically difficult to interpret. Although there exist techniques to elucidate which regions of an image most strongly contribute to variation in a given LV, it is generally not obvious what exact properties of the image in these regions are captured. Second, the number of LVs is usually relatively large, typically between 2\u003csup\u003e7\u003c/sup\u003e and 2\u003csup\u003e10\u003c/sup\u003e, and their potential correlation might further complicate interpretability. Finally, there is no guarantee that LVs that efficiently capture the information needed to reconstruct medical images also include potentially very subtle signals that are most relevant for certain diseases.\u003c/p\u003e \u003cp\u003eIn contrast to DL approaches, clinicians and researchers have been examining images for decades to determine tangible image features (TIFs), such as the size or shape of previously identified anatomical objects, and for many of those there is well-established medical evidence showing their relevance for certain diseases. Recently, automated methods have been increasingly utilized for TIF extraction from medical images, facilitating large-scale analysis and enhancing diagnostic capabilities [\u003cspan additionalcitationids=\"CR5 CR6 CR7 CR8 CR9\" citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e]. Among various imaging modalities, colour fundus images (CFIs) have emerged as a particularly valuable tool due to their noninvasive nature and ability to predict a wide range of diseases. CFIs provide a cost-effective, in vivo assessment of the superficial inner retinal layer, making them instrumental in monitoring ocular diseases such as diabetic retinopathy, macular degeneration, and glaucoma. Furthermore, they enable early risk prediction for systemic conditions, including cardiovascular and cerebrovascular disease, chronic kidney disease, and diabetes [\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e]. Of note, we recently published a study of 17 different morphological vascular phenotypes extracted from over 130k fundus images of close to 72k UK Biobank subjects. We found that a substantial portion of the variability in these measured TIFs (mTIFs)\u0026mdash;features of longstanding interest to ophthalmologists, including median vessel diameter, diameter variability, main temporal angles, vascular density, central retinal equivalents, the number of bifurcations, and tortuosity\u0026mdash;can be attributed to genetic factors, with SNP-based heritability estimates ranging from 5\u0026ndash;25%. Moreover, we observed a large number of genetic association signals with genes and pathways that appear plausible to be involved in modulating the mTIFs, and many significant correlations with both ocular and systemic diseases or their risk factors [\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e]. However, mTIFs are inherently limited by human-driven feature selection, which dictates what is considered relevant in disease. Additionally, mTIFs may also suffer from poor generalizability in automated settings, as mathematical \u0026ldquo;one-size-fits-all\u0026rdquo; heuristics are often employed for feature extraction.\u003c/p\u003e \u003cp\u003eHere we investigate how mTIFs and DL-based features compare against and complement each other in the context of genetics and disease by leveraging \u003cem\u003eRETFound\u003c/em\u003e [\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e], a recently developed transformer-based [\u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e] foundation model for retinal images. To better understand the potential limitations of the information embedded in RETFound\u0026rsquo;s latent space representation, we examine how well it can predict retinal vascular mTIFs. This exploration is driven by the fact that, unlike medical endpoints, mTIFs are entirely determined by the image itself. We utilize genomic analysis to assess the underlying genetic architectures as they can inform upon what physiological aspect an LV represents within an image. To determine if TIFs outperform or complement foundation models in prediction tasks, we build models that include TIFs and LVs and assess their ability to predict disease. Lastly, as classical retinal features are established using heuristic approaches that may fail to generalize, we use \u003cem\u003eRETFound\u0026rsquo;s\u003c/em\u003e prediction of mTIFs to derive \u0026ldquo;deep\u0026rdquo; TIFs (dTIFs), and compare their genetic architectures as well as disease predictive capacities (see Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e for a graphical overview of our study). In summary, using \u003cem\u003eRETFound\u003c/em\u003e and CFIs as an example, our study sheds light on (i) whether foundation models can detect known image features, (ii) whether their latent variables can gain interpretability using genomic analyses, and (iii) whether mTIFs and deep features complement in regards to their genetic signals and disease prediction.\u003c/p\u003e"},{"header":"Results","content":"\u003cdiv id=\"Sec3\" class=\"Section2\"\u003e \u003ch2\u003ePhenotypic data generation in terms of vascular TIFs and LVs\u003c/h2\u003e \u003cp\u003eUsing the methodology described in our previous work [\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e], we measured 17 TIFs from CFIs of 41 527 genotyped participants from the UK Biobank. Each CFI was also characterised in terms of 1024 LVs extracted with RETFound prior to any fine-tuning, using the weights of the network provided by the authors [\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e] (see Methods for details).\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003cb\u003eModelling vascular TIFs with\u003c/b\u003e \u003cb\u003eRETFound\u003c/b\u003e\u003c/p\u003e \u003cp\u003eBy design, \u003cem\u003eRETFound\u0026rsquo;\u003c/em\u003es LVs capture much of the information contained in CFIs, not limited to but including the retinal vasculature. Thus, in principle, these LVs should also be informative of our vascular TIFs. In order to evaluate to what extent any individual TIF can be represented by a single LV, we computed all pairwise correlations (shown as a bi-clustered cross-correlation matrix in Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003ea). For each TIF, we then picked the LV with the largest squared correlation, which corresponds to the explained variance of a Simple Linear Regression (SLR) model (\u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{R}_{SLR}^{2}\\)\u003c/span\u003e\u003c/span\u003e; top bars in Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003eb). Our analysis indicates that \u003cem\u003eindividual\u003c/em\u003e LVs can represent a sizable portion of the variability of relatively simple vascular measures, like the vascular densities \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{(max\\:R}_{SLR}^{2}=0.36)\\)\u003c/span\u003e\u003c/span\u003e. However, this is not the case for vascular TIFs whose measurement is more intricate, where we observe poor correlations between an individual LV and the arterial central retinal equivalent \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{(max\\:R}_{SLR}^{2}=0.09)\\)\u003c/span\u003e\u003c/span\u003e, tortuosity \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{(max\\:R}_{SLR}^{2}=0.01)\\)\u003c/span\u003e\u003c/span\u003e, and tortuosity artery/vein (A/V) ratios\u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{(max\\:R}_{SLR}^{2}\u0026lt;0.01)\\)\u003c/span\u003e\u003c/span\u003e.\u003c/p\u003e \u003cp\u003eNext, we investigated to what extent combining multiple LVs to predict TIFs can increase the explained variance. As a first step, we trained Multiple Linear Regression (MLR) models with all LVs as features using five-fold cross-validation. The corresponding out-of-sample estimates for the explained variance \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{R}_{MLR}^{2}\\)\u003c/span\u003e\u003c/span\u003e for each TIF (middle bars in Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003eb) are consistently higher than those from the best SLR model. We observed similar patterns as for SLR, with simple TIFs (such as vascular densities) having more than 90% of their variance explained and much less for more intricate TIFs, requiring explicit identification of vessels and their type. Notably, LVs failed to predict A/V ratios for tortuosity \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{(R}_{MLR}^{2}=0.09)\\)\u003c/span\u003e\u003c/span\u003e and central retinal equivalents \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{(R}_{MLR}^{2}=0.06)\\)\u003c/span\u003e\u003c/span\u003e. In a second step we directly trained \u003cem\u003eRETFound\u003c/em\u003e to predict TIFs, resulting in 17 fine-tuned \u003cem\u003eRETFound\u003c/em\u003e models providing TIF estimates as their output, which we call \"deep TIFs\" (dTIFs). For most of these dTIFs, the out-of-sample estimates \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{R}_{deep}^{2}\\)\u003c/span\u003e\u003c/span\u003e for the explained variances (bottom bars in Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003eb) are larger than those of the corresponding MLR models. Specifically, the fine-tuned \u003cem\u003eRETFound\u003c/em\u003e models outperformed the MLR models for 11 out of 17 TIFs, with a mean improvement in \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{R}_{}^{2}\\)\u003c/span\u003e\u003c/span\u003e of 0.05 (95% CI = [0.01, 0.08], t(16) = 2.45, p\u0026thinsp;=\u0026thinsp;0.03). We observed the most substantial increases for the temporal angles (arterial: \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{R}_{deep}^{2}=0.34,\\:\\varDelta\\:{R}^{2}=0.14\\)\u003c/span\u003e\u003c/span\u003e; venous: \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{R}_{deep}^{2}=0.61,\\:\\varDelta\\:{R}^{2}=0.22)\\)\u003c/span\u003e\u003c/span\u003e and arterial tortuosity \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{(R}_{deep}^{2}=0.43,\\:\\varDelta\\:{R}^{2}=0.16)\\)\u003c/span\u003e\u003c/span\u003e, while the improvements for the other TIFs are marginal or even absent.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e\n\u003ch3\u003eComparing the genetic association signals of TIFs and LVs\u003c/h3\u003e\n\u003cp\u003eWe previously showed that many of our TIFs have a sizable genetic component, with heritability estimates ranging from 5% for median vessel diameter to \u0026gt;\u0026thinsp;25% for arterial tortuosity [\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e]. Our genome-wide association study (GWAS) further revealed many plausible genes and pathways for the vast majority of TIFs. These genetic association signals indicate that our TIFs are likely to capture some processes related to vascular maintenance that are partially modulated by genetic variants. As we do not understand what information is captured by LVs\u0026mdash;other than that they can only partially represent TIFs\u0026mdash; we sought to elucidate their genetic architectures, as these genetic signals can potentially shed light on the image-based features represented by the LVs. To this end, we performed GWAS, heritability estimation, and gene analyses for all 1 024 LVs of the RETFound model (before any fine-tuning) and our 17 dTIFs.\u003c/p\u003e \u003cp\u003eAt the SNP-level (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003ea\u0026ndash;c), the number of significantly associated variants was higher for mTIFs (5 949 SNPs at a Bonferroni‐adjusted threshold of 5/17\u0026times;10⁻⁸) than for LVs (4 731 SNPs at 5/1 024\u0026times;10⁻⁸). Since LVs are not independent we also performed a GWAS on their 17 leading principal components (PCs), explaining 66% of their variance (Suppl. Figure\u0026nbsp;1), which resulted in n\u0026thinsp;=\u0026thinsp;2 130 SNPs with p\u0026thinsp;\u0026lt;\u0026thinsp;5/17\u0026middot;10\u003csup\u003e\u0026minus;\u0026thinsp;8\u003c/sup\u003e (Suppl. Figure\u0026nbsp;2). Notably, a subset of LVs showed the strongest associations, with an effect size of β\u0026thinsp;=\u0026thinsp;0.45 and a nominal significance of p\u0026thinsp;\u0026lt;\u0026thinsp;10⁻\u0026sup3;⁰⁰. rs12913832 was a key driver of this signal, which has previously been shown to modulate up to 50% of blue eye colour, with the blue-eye allele decreasing expression levels of \u003cem\u003eOCA2\u003c/em\u003e [\u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e]. In contrast, 50.7% of loci were shared between mTIFs and dTIFs, with the numbers of exclusive hits being 1 708 for mTIFs, 1 967 for dTIFs, and 3928 for LVs. Gene‐level aggregation using the \u003cem\u003ePascalX\u003c/em\u003e tool [\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e, \u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e15\u003c/span\u003e] confirmed that the genetic architecture of dTIFs more closely resembles that of mTIFs than LVs (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003ed), and dTIFs generally exhibit stronger genetic signals than LVs (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003ee). Consistently, the overlap of significantly associated genes was 69.0% between mTIFs and dTIFs, but only 19.2% between mTIFs and LVs (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003ef). The most significant gene‐level association was observed for the \u003cem\u003eHERC2\u003c/em\u003e/\u003cem\u003eOCA2 locus\u003c/em\u003e (p\u0026thinsp;=\u0026thinsp;7.7\u0026times;10⁻\u0026sup2;⁹⁷ in LV 810), which also influences some TIFs, albeit less strongly (p\u0026thinsp;=\u0026thinsp;1.7\u0026times;10⁻\u0026sup2;\u0026sup1; in venous vascular density). Finally, heritability estimates derived from \u003cem\u003eLinkage Disequilibrium (LD) Score Regression (LDSR)\u003c/em\u003e indicated that most LVs have a modest genetic contribution (h\u0026sup2; \u0026lt; 0.1 for 77.6% of LVs), with only one LV exhibiting a heritability h\u0026sup2; \u0026gt; 0.21 (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003eg). A similar pattern emerged at the gene level, with LVs showing a less polygenic architecture compared with mTIFs and dTIFs (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003eh). Moreover, while genes linked to TIFs were generally specific to only one or two of the 17 phenotypes, those associated with LVs tended to influence multiple traits (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003ei).\u003c/p\u003e \u003cp\u003eTIF-level analyses (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003e), comparing dTIFs and mTIFs in terms of heritabilities and genetic associations, revealed that heritability estimates for dTIFs were generally comparable to those of the corresponding mTIFs, and in some cases, were slightly higher (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003ea). A notable exception was the arterial temporal angle, where the \u0026ldquo;deep\u0026rdquo; version showed a substantially higher heritability (h\u003csup\u003e2\u003c/sup\u003e\u0026thinsp;=\u0026thinsp;0.22, \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:\\varDelta\\:{h}^{2}=0.13\\)\u003c/span\u003e\u003c/span\u003e). We made a consistent observation at the gene level (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003eb), where dTIFs and mTIFs showed similar numbers of associated genes and large overlaps. With the exception of the ratio of arterial and venous vascular density and venous tortuosity, dTIFs had more exclusively associated genes than their corresponding mTIFs. Interestingly, \u0026ldquo;deep\u0026rdquo; arterial tortuosity gained 25 gene discoveries (-28%), and did so with slightly decreased heritability (\u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:\\varDelta\\:{h}^{2}=-0.02\\)\u003c/span\u003e\u003c/span\u003e), whereas its venous counterpart lost 11 discoveries (-32%).\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003c/p\u003e\n\u003ch3\u003eAssociating disease risk with measured and deep TIFs\u003c/h3\u003e\n\u003cp\u003eNext, we investigated to what extent risk factors, ocular and general diseases, and clinical events can be explained by mTIFs or dTIFs. We used linear models that incorporated covariates alongside a single TIF as explanatory variables. We assessed the strength of associations using standardised effect sizes and statistical significance and quantified the total number of unique significant associations across diseases and TIFs to summarise the findings (Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003e). Additionally, we applied Fisher\u0026rsquo;s exact test to assess whether the ratio of significant associations differed between mTIFs and dTIFs. Although not statistically significant, the largest difference in the number of significant diseases tended to be found with the venous temporal angle, with five diseases being significantly associated uniquely with dTIFs (\u003cem\u003eOR\u003c/em\u003e\u0026thinsp;=\u0026thinsp;2.74, p\u0026thinsp;=\u0026thinsp;0.21). While also not significant, diabetes-related eye disease (\u0026ldquo;diabetes-eye\u0026rdquo;) tended to display the largest number of differences between TIF types, with five more dTIFs than mTIFs (\u003cem\u003eOR\u003c/em\u003e\u0026thinsp;=\u0026thinsp;3.43, p\u0026thinsp;=\u0026thinsp;0.17). When grouping the diseases into categories, we observed that ocular diseases had significantly more associations only with dTIFs than for mTIFs (\u003cem\u003eOR\u003c/em\u003e\u0026thinsp;=\u0026thinsp;1.75, p\u0026thinsp;=\u0026thinsp;0.02). No such pattern was observed across risk factors (OR\u0026thinsp;=\u0026thinsp;1.06, p\u0026thinsp;=\u0026thinsp;0.84) or general diseases and events (OR\u0026thinsp;=\u0026thinsp;0.92, p\u0026thinsp;=\u0026thinsp;0.81).\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eFinally, we aimed to determine whether models that combine multiple image feature sets increase predictive power over models restricted to just covariates or one feature set at a time. We used regularised multilinear models with eight sets of explanatory features: (i) a baseline model including only covariates (Covar) that were included in each subsequent model; (ii) all 17 mTIFs; (iii) all 17 dTIFs; (iv) the 1024 RETFound LVs; (v) all mTIFs and dTIFs; (vi) all mTIFs and the 1024 RETFound LVs; (vii) all dTIFs and the 1024 RETFound LVs; and (viii) all features. We found that: 1) models that incorporated TIFs with LVs performed best when predicting blood pressure and BMI (Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003ea and c); 2) when the outcome was an ocular disease, either there was no additional gain in predictive power over the baseline model, or the LVs tended to outperform the TIFs (Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003eb and Suppl. Figure\u0026nbsp;3); and 3) both the TIFs and the LVs added to predictive power over the base model for most risk factors and diseases (Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003ea-c and Suppl. Figure\u0026nbsp;3). Associated statistics for comparing feature predictive capacities can be found in Suppl. Table\u0026nbsp;1.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e"},{"header":"Discussion","content":"\u003cp\u003eBreakthroughs in computer-aided analysis are revolutionizing the use of medical images for diagnosis and risk assessment. These advances range from supporting physicians in segmenting anatomical structures and extracting precise measurements, to making fully automated disease risk predictions using end-to-end machine learning pipelines. DL algorithms typically hold internal “latent space” representations of the images for this task, but their nonlinear entanglement makes them appear as “black boxes” lacking straightforward explanations. This is problematic in the biomedical domain, where trust by the clinician is essential [\u003cspan additionalcitationids=\"CR17 CR18\" citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e–\u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e19\u003c/span\u003e]. Furthermore, interpretability is key for deciphering the underlying biological or physiological mechanisms, which is pivotal for advancing treatment [\u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e20\u003c/span\u003e]. Visual interpretability methods, such as attention maps [\u003cspan additionalcitationids=\"CR22\" citationid=\"CR21\" class=\"CitationRef\"\u003e21\u003c/span\u003e–\u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e23\u003c/span\u003e] and counterfactual explanations [\u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e24\u003c/span\u003e], have been developed to address this but often fall short of clarifying exactly how latent representations drive predictions (see [\u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e19\u003c/span\u003e] for recent advances in tackling this issue).\u003c/p\u003e \u003cp\u003eTo shed some light on what latent variables (LVs) might represent, here we investigated a recently proposed foundation model for retinal images and applied it to a large collection of colour fundus images (CFIs) which we had previously characterised in terms of “tangible image features” (TIFs) of high clinical interpretability [\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e]. Specifically, we asked to what extent LVs can predict TIFs, individually or jointly. Moreover, the fact that these images stem from genotyped subjects with extensive clinical phenotyping allowed us to directly compare the associations of LVs with genotypes and disease states or risk factors with those of TIFs.\u003c/p\u003e \u003cp\u003eOur analysis revealed that, prior to fine-tuning, individual LVs of \u003cem\u003eRETFound\u003c/em\u003e do not strongly correlate with any of our TIFs. Vascular densities and number of bifurcations seem to have the closest LV representation, explaining close to 40% of their variance, while it is less than 20% for the TIFs whose measurement is more intricate. Linear combinations of multiple LVs have substantially higher predictive power for TIFs, in particular for the vascular densities, explaining close to 90% of their variance. Strikingly, fine-tuning \u003cem\u003eRETFound\u003c/em\u003e to predict TIFs yielded marginal or no improvements compared to the LVs (c.f. [\u003cspan additionalcitationids=\"CR26\" citationid=\"CR25\" class=\"CitationRef\"\u003e25\u003c/span\u003e–\u003cspan citationid=\"CR27\" class=\"CitationRef\"\u003e27\u003c/span\u003e] for similar negative results on fine-tuning in LLMs), with the notable exception of the temporal angles between the main vascular branches and, to a lesser degree, vessel tortuosity. Thus, even the advanced transformer DL architecture of \u003cem\u003eRETFound\u003c/em\u003e and its foundation model parameters trained on more than 900 000 CFIs fail to provide a close approximation of well-established classical ophthalmological parameters of the retinal vasculature. This is remarkable because, in contrast to clinical endpoints, TIFs are fully determined by the images, i.e., their extraction is implemented in terms of deterministic algorithmic procedures based on conventional image processing.\u003c/p\u003e \u003cp\u003eOur analysis of genetic associations elucidates to what extent DL features are modulated genetically and their overlap with the association signals of the TIFs. We argue that genetic associations support physiological relevance because they point to a causal chain from a DNA difference (i.e. a genetic variant at the level of the DNA) over a molecular variation (such as differential gene expression) to a (typically minute) phenotypic variability. Heritability can be seen as a global summary statistic of such genetic effects, and our results indicate that dTIFs are at least as heritable as their measured counterparts, supporting similar if not higher physiological pertinence. Gene-scoring analyses revealed that among \u003cem\u003eRETFound’\u003c/em\u003es LVs the strongest association is observed with the \u003cem\u003eHERC2\u003c/em\u003e/\u003cem\u003eOCA2\u003c/em\u003e locus, a well‐known, strong modulator of pigmentation that has previously been found as a top hit in a GWAS of LVs from self‐supervised deep phenotyping of CFIs [\u003cspan citationid=\"CR28\" class=\"CitationRef\"\u003e28\u003c/span\u003e]. This indicates that some of the baseline LVs predominantly capture information related to the pigmentation of the retina.\u003c/p\u003e \u003cp\u003eIn contrast, dTIFs obtained by fine-tuning \u003cem\u003eRETFound\u003c/em\u003e demonstrate a genetic architecture that more closely mirrors that of mTIFs. Notably, fine‐tuning yielded 87 additional genetic discoveries exclusive to dTIFs, nearly doubling the number of unique associations observed for mTIFs. Furthermore, traits such as arterial temporal angles exhibited significantly higher heritability and a greater number of gene discoveries compared to their measured counterparts, suggesting that fine‐tuning enhances the capacity of some TIFs to capture subtle yet physiologically pertinent vascular signals. We thus hypothesize that directing \u003cem\u003eRETFound\u003c/em\u003e’s focus toward TIFs augments to some extent their ability to reflect the underlying vascular biology.\u003c/p\u003e \u003cp\u003eA complementary means of evaluating the pertinence of vascular features is in terms of their capacity to facilitate disease prediction. In this study, we contrasted mTIFs and dTIFs by comparing their standardised effects in logistic and linear models with diseases or risk factors as response variables. We found that the number of significant diseases and risk factors was largely similar between mTIFs and dTIFs, but dTIFs showed more significant associations with ocular diseases compared to mTIFs. This outcome may be attributed to \u003cem\u003eRETFound'\u003c/em\u003es initial training on a diverse set of retinal images [\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e, \u003cspan citationid=\"CR29\" class=\"CitationRef\"\u003e29\u003c/span\u003e]. We hypothesise that its fine-tuned weights, while focused on predicting vascular morphology, may retain some of \u003cem\u003eRETFound'\u003c/em\u003es broader representation of retinal features. This may allow the model to capture subtle patterns or relationships between vascular morphology and other retinal structures that are relevant to ocular pathologies but not explicitly measured in traditional vascular assessments. The enhanced performance in ocular disease prediction using our fine-tuned models suggests that AI-derived vascular morphology metrics may provide additional informative features for ocular condition detection compared to conventional morphological measurements alone.\u003c/p\u003e \u003cp\u003eUsing models that integrate multiple feature types (i.e., baseline, mTIFs, dTIFs, and PCs of LVs), we provide evidence that their integration enhances the predictive power for disease. Notably, despite being representative of the entire fundus image, including the vasculature, models that included both LVs from \u003cem\u003eRETFound\u003c/em\u003e and TIFs (both measured and deep) perform best when predicting blood pressure or BMI, suggesting that TIFs provide complimentary information not fully captured by \u003cem\u003eRETFound\u003c/em\u003e’s LVs. One potential reason the \u003cem\u003eRETFound\u003c/em\u003e model may not capture the information provided by TIFs is due to the process by which it was trained, whereby it used masked autoencoding during pre-training [\u003cspan citationid=\"CR30\" class=\"CitationRef\"\u003e30\u003c/span\u003e], which may not assign high importance to vascular structures when reconstructing the image. As a result, some critical vasculature information may be excluded from its base weights. Indeed, the original authors of \u003cem\u003eRETFound\u003c/em\u003e qualitatively report that the model fails to reconstruct much of the vasculature present in CFIs. Our results further support this, as we find that fine-tuning \u003cem\u003eRETFound\u003c/em\u003e to predict vascular TIFs only captures a relatively small portion of their variance, suggesting that the model is unable to relearn detailed vascular and vessel type specific information. Similar findings regarding hybrid models that combine manually- and DL-derived features have been observed in other domains, where manually engineered features capture domain-specific knowledge that DL models may overlook when sample sizes are small [\u003cspan citationid=\"CR31\" class=\"CitationRef\"\u003e31\u003c/span\u003e] or when training and test set distributions differ [\u003cspan citationid=\"CR32\" class=\"CitationRef\"\u003e32\u003c/span\u003e, \u003cspan citationid=\"CR33\" class=\"CitationRef\"\u003e33\u003c/span\u003e]. These results emphasize the importance of leveraging both foundation- and expert-derived features to optimize predictive performance.\u003c/p\u003e \u003cp\u003eIn summary, our findings provide key insights into the interplay between DL-derived features and classical retinal image features in the context of genetics and disease prediction. While \u003cem\u003eRETFound\u003c/em\u003e can predict certain TIFs, such as vessel densities, with reasonable accuracy, it provides poor estimates of more intricate features, like measures related to artery-to-vein ratios. Despite these limitations, dTIFs exhibit heritability comparable to or greater than their measured counterparts, with a notably large increase in heritability found with temporal angles of the arteries, suggesting that dTIFs may enhance physiologically relevant signal extraction. Furthermore, the genetic and associations of mTIFs and dTIFs show both overlap and complementarity, reinforcing the idea that DL-derived features do not simply replicate classical measurements but may provide additional, distinct insights. Most notably, integrating LVs with TIFs leads to superior predictive performance compared to models using either feature set alone, highlighting the potential of hybrid approaches in disease modeling. Overall, this work highlights the synergy between DL and classical feature extraction for advancing our understanding of retinal biology and disease mechanisms, potentially leading to improved diagnostic and prognostic tools in ophthalmology.\u003c/p\u003e \n\n \n\n\n\n "},{"header":"Methods","content":"\u003ch3\u003eData and quality control\u003c/h3\u003e\u003cp\u003eThe UK Biobank is a population-based cohort of ~ 488k subjects with rich, longitudinal phenotypic data, including medical history, and a median 10-year follow-up [\u003cspan citationid=\"CR34\" class=\"CitationRef\"\u003e34\u003c/span\u003e]. Standard retinal 45° CFIs were captured using a Topcon 3D-OCT 1000 Mark II, resulting in a total of 173 814 CFIs from 84 813 individuals. For this study we used 41 527 of these images from the right eye, for which we were able to measure all of our 17 TIFs using our previously described analysis pipeline [\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e]. 96% of these images passed the quality control threshold we had used in our previous work [\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e] (see Suppl. Figure\u0026nbsp;4). This effectively removed CFIs for which our algorithm failed to identify the optic disk or enough major blood vessels to measure the temporal angles. Since this happens more often for CFIs from older and diseased subjects, the cohort of the present study appears to be more “healthy” with respect to its baseline risk factors (see Suppl. Table\u0026nbsp;2 for comparison to our previous study). Since some of the LVs produced from \u003cem\u003eRETFound\u003c/em\u003e encoded information specific to eye laterality (left vs. right), we opted to analyze a single eye rather than averaging across both eyes. Genotyping was performed on Axiom arrays for a total of 805 426 markers, from which ~ 96\u0026nbsp;million genotypes were imputed. We used the subset of 15 599 830 SNPs that had been assigned an rsID.\u003c/p\u003e\u003ch2\u003eMeasured phenotypes\u003c/h2\u003e\u003cp\u003eThe 17 mTIFs comprise tortuosity, temporal angles, vessel diameters and diameter variability, vascular density, and the number of bifurcations. With the exception of the number of bifurcations, all of these measurements were performed independently for arteries and veins. To explore asymmetries between arteries and veins, the arterio-venous ratio between diameters, tortuosities and vascular densities were also included. Complete algorithmic descriptions of how these measurements were extracted are given in our recent publication [\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e].\u003c/p\u003e\u003ch3\u003eDeep learning model\u003c/h3\u003e\u003cp\u003eWe utilised \u003cem\u003eRETFound\u003c/em\u003e [\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e], a novel foundation model that leverages Google’s vision transformer large patch 16 architecture [\u003cspan citationid=\"CR35\" class=\"CitationRef\"\u003e35\u003c/span\u003e]. Briefly, \u003cem\u003eRETFound\u003c/em\u003e uses a masked autoencoder (MAE) approach [\u003cspan citationid=\"CR30\" class=\"CitationRef\"\u003e30\u003c/span\u003e] to optimise the model towards reconstructing retinal images. For building \u003cem\u003eRETFound\u003c/em\u003e, 75% of the image patches were masked and the model was trained to reconstruct the missing patches, thereby learning effective representations of the images provided as training data. We used the encoder portion of this model to extract LVs and make predictions.\u003c/p\u003e\u003ch3\u003eExtracting Latent Variables\u003c/h3\u003e\u003cp\u003eTo generate LVs, we first resized each image to 224×224 pixels and normalised the image to the means and standard deviations of ImageNet’s RGB channels. After preprocessing the image, we then propagated each individual image through the encoder portion of the pre-trained \u003cem\u003eRETFound\u003c/em\u003e model to extract the averages of the 1 024-dimensional patch embeddings, yielding 1 024 LVs for each image. We further derived the leading PCs of these LVs for both genetic analyses and disease associations.\u003c/p\u003e\u003ch2\u003ePredicting tangible image features\u003c/h2\u003e\u003cp\u003eFor predicting TIFs, we used the encoder portion of \u003cem\u003eRETFound\u003c/em\u003e and added a multi-layer perceptron to the head of the model. We utilised the base hyperparameters established in the original \u003cem\u003eRETFound\u003c/em\u003e manuscript [\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e] and train the model over 50 epochs, with a “warm-up” in the 10 first epochs (with the learning rate \u003cem\u003er\u003c/em\u003e monotonically increasing from 0 to 5 × 10\u003csup\u003e− 4\u003c/sup\u003e), followed by a cosine annealing schedule (where \u003cem\u003er\u003c/em\u003e decreases from 5 × 10\u003csup\u003e− 4\u003c/sup\u003e to 1 × 10\u003csup\u003e− 6\u003c/sup\u003e). We set the batch size to 16, the dropout rate to 0.2, and the image size to 224×224 pixels. After each epoch, we evaluated the model on a validation dataset and then kept the model that had the lowest mean absolute error across all epochs. Individual models were built for each of the 17 TIFs using five-fold cross-validation.\u003c/p\u003e\u003ch2\u003eCovariate corrections\u003c/h2\u003e\u003cp\u003eGenetic analysis and disease prediction models were corrected for a set of covariates known to confound phenotypic variability in general, or in the eye specifically. They comprise sex, age, age-squared, sex-by-age, sex-by-age-squared, spherical power, spherical power-squared, cylindrical power, cylindrical power-squared, imaging instance, assessment centre, genotype measurement batch, and the first 20 genomic PCs (see Suppl. Figure\u0026nbsp;5).\u003c/p\u003e\u003ch2\u003eGenomic analyses\u003c/h2\u003e\u003cp\u003eFor GWAS, raw phenotypes were transformed using rank-based inverse normal transformation. GWAS were conducted using \u003cem\u003eBGENIE\u003c/em\u003e [\u003cspan citationid=\"CR36\" class=\"CitationRef\"\u003e36\u003c/span\u003e]. SNP-specific heritabilities and genetic correlations between traits were calculated using \u003cem\u003eLDSR\u003c/em\u003e [\u003cspan citationid=\"CR37\" class=\"CitationRef\"\u003e37\u003c/span\u003e] which fits an ordinary least squares regression of SNP LD scores against their average chi-squared statistics. Gene and pathway scores were computed using \u003cem\u003ePascalX\u003c/em\u003e [\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e, \u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e15\u003c/span\u003e, \u003cspan citationid=\"CR38\" class=\"CitationRef\"\u003e38\u003c/span\u003e]. Both protein-coding genes and lincRNAs were scored using the approximate “saddle” method, incorporating all SNPs within a 50 kb window surrounding each gene. All pathways from MSigDB v7.2 [\u003cspan additionalcitationids=\"CR40\" citationid=\"CR39\" class=\"CitationRef\"\u003e39\u003c/span\u003e–\u003cspan citationid=\"CR41\" class=\"CitationRef\"\u003e41\u003c/span\u003e] were scored using \u003cem\u003ePascalX’\u003c/em\u003es ranking mode, merging and rescoring co-occurring genes separated by less than 100 kb. \u003cem\u003ePascalX\u003c/em\u003e utilises LD structure for accurate computation of gene scores, which in these analyses was derived from the UK10K (hg19) reference panel [\u003cspan citationid=\"CR42\" class=\"CitationRef\"\u003e42\u003c/span\u003e]. For annotating genes in the Manhattan plots, the top ten genes for each feature set (i.e., mTIFs, dTIFs, or LVs) were selected within a 1 000 SNP window.\u003c/p\u003e\u003ch2\u003eDisease associations\u003c/h2\u003e\u003cp\u003eThree distinct groups of disease traits were included in this work. First, a set of general systemic risk factors, second, a set of ocular diseases, and third, a set of general diseases and events (see Suppl. Figure\u0026nbsp;6). All disease data were collected from the UK Biobank (see Ref. [\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e] for how we mapped the data field identifiers to each disease included in this study).\u003c/p\u003e\u003cp\u003eTo extract beta coefficients, in the case of the continuous risk factors, linear regression was used to estimate standardised effects. For both ocular and general diseases, time-to-event data were binarised, and logistic regression was applied.\u003c/p\u003e\u003cp\u003eWhen examining the predictive capacity of different feature sets, linear regression was used for continuous risk factors and logistic regression was used for disease prediction. To assist with model generalization and reduce overfitting, five-fold cross-validation was employed. Both linear and logistic regression models were regularized using elastic net penalties, with optimal values for alpha and the L1 ratio determined by grid search within each fold. We utilized the coefficient of determination (R\u003csup\u003e2\u003c/sup\u003e) as the performance metric for continuous risk factors and area under the receiver operating characteristic curve (AUC) as the performance metric for diseases. As AUC can suffer from imbalanced datasets when used for prediction tasks, and most of the diseases investigated had many more controls than cases, we downsampled the controls to the same sample size as the cases. Downsampling was performed by matching each case with the control that had the most similar covariate profile using optimal bipartite matching (Hungarian algorithm), ensuring that no control was used more than once. To prevent poor feature-to-sample-size ratios, diseases and events were required to have at least 1 000 samples (cases and controls combined) to be analyzed.\u003c/p\u003e\u003cp\u003eFor all regression analyses, traits were corrected for the covariates listed above. To determine the significance of regression coefficients and prediction power, p-values were computed, and the Benjamini-Hochberg procedure was applied using a false discovery rate of 0.05.\u003c/p\u003e"},{"header":"Declarations","content":"\u003ch3\u003eData availability\u003c/h3\u003e\n\u003cp\u003eGWAS summary statistics, gene and pathway scores, and all data underlying figures will be made available on Zenodo upon publication. GWAS summary statistics will further be made available on the GWAS Catalog. Raw UK Biobank data are protected and not open access; however, they can be obtained upon project creation and acceptance via their platform (https://www.ukbiobank.ac.uk/). mTIFs [10] are already available via this channel; dTIFs and baseline LVs will be added to the platform upon publication.\u003c/p\u003e\n\u003ch3\u003eCode availability\u003c/h3\u003e\n\u003cp\u003eCode is going to be made available on Zenodo upon publication.\u003cbr\u003e\u0026nbsp;\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\n\u003cli\u003eAzad B, Azad R, Eskandari S, Bozorgpour A, Kazerouni A, Rekik I, et al. Foundational Models in Medical Imaging: A Comprehensive Survey and Future Vision. 2023. Available: http://arxiv.org/abs/2310.18689\u003c/li\u003e\n\u003cli\u003eZhou Y, Chia MA, Wagner SK, Ayhan MS, Williamson DJ, Struyven RR, et al. A foundation model for generalizable disease detection from retinal images. Nature. 2023;622: 156\u0026ndash;163.\u003c/li\u003e\n\u003cli\u003eParres D, Paredes R. Fine-Tuning Vision Encoder\u0026ndash;Decoder Transformers for Handwriting Text Recognition on Historical Documents. Document Analysis and Recognition - ICDAR 2023. 2023; 253\u0026ndash;268.\u003c/li\u003e\n\u003cli\u003eFhima J, Van Eijgen J, Billen Moulin-Roms\u0026eacute;e M-I, Brackenier H, Kulenovic H, Debeuf V, et al. LUNet: deep learning for the segmentation of arterioles and venules in high resolution fundus images. Physiol Meas. 2024;45. doi:10.1088/1361-6579/ad3d28\u003c/li\u003e\n\u003cli\u003eQuiros JV, Liefers B, van Garderen K, Vermeulen J, Center ER, Consortium S, et al. VascX Models: Model Ensembles for Retinal Vascular Analysis from Color Fundus Images. 2024. Available: http://arxiv.org/abs/2409.16016\u003c/li\u003e\n\u003cli\u003eTomasoni M, Beyeler MJ, Vela SO, Mounier N, Porcu E, Corre T, et al. Genome-wide Association Studies of Retinal Vessel Tortuosity Identify Numerous Novel Loci Revealing Genes and Pathways Associated With Ocular and Cardiometabolic Diseases. Ophthalmology Science. 2023;3: 100288.\u003c/li\u003e\n\u003cli\u003eZhou Y, Wagner SK, Chia MA, Zhao A, Woodward-Court P, Xu M, et al. AutoMorph: Automated Retinal Vascular Morphology Quantification Via a Deep Learning Pipeline. Trans Vis Sci Tech. 2022;11: 12\u0026ndash;12.\u003c/li\u003e\n\u003cli\u003eVan Eijgen J, Fhima J, Billen Moulin-Roms\u0026eacute;e M-I, Behar JA, Christinaki E, Stalmans I. Leuven-Haifa High-Resolution Fundus Image Dataset for Retinal Blood Vessel Segmentation and Glaucoma Diagnosis. Sci Data. 2024;11: 257.\u003c/li\u003e\n\u003cli\u003eOrtin Vela S, Beyeler MJ, Trofimova O, Tomasoni M. Phenotypic and Genetic Characteristics of Retinal Vascular Parameters and their Association with Diseases. medRxiv. 2023. Available: https://www.medrxiv.org/content/10.1101/2023.07.07.23292368.abstract\u003c/li\u003e\n\u003cli\u003eOrt\u0026iacute;n Vela S, Beyeler MJ, Trofimova O, Iuliani I, Vargas Quiros JD, de Vries VA, et al. Phenotypic and genetic characteristics of retinal vascular parameters and their association with diseases. Nat Commun. 2024;15: 9593.\u003c/li\u003e\n\u003cli\u003eHo H, Cheung CY, Sabanayagam C, Yip W, Ikram MK, Ong PG, et al. Retinopathy Signs Improved Prediction and Reclassification of Cardiovascular Disease Risk in Diabetes: A prospective cohort study. Sci Rep. 2017;7: 41492.\u003c/li\u003e\n\u003cli\u003eAdvances in medical image analysis with vision Transformers: A comprehensive review. Medical Image Analysis. 2024;91: 103000.\u003c/li\u003e\n\u003cli\u003eEiberg H, Troelsen J, Nielsen M, Mikkelsen A, Mengel-From J, Kjaer KW, et al. Blue eye color in humans may be caused by a perfectly associated founder mutation in a regulatory element located within the HERC2 gene inhibiting OCA2 expression. Human Genetics. 2008;123: 177\u0026ndash;187.\u003c/li\u003e\n\u003cli\u003eKrefl D, Bergmann S. Cross-GWAS coherence test at the gene and pathway level. PLoS Comput Biol. 2022;18: e1010517.\u003c/li\u003e\n\u003cli\u003eKrefl D, Brandulas Cammarata A, Bergmann S. PascalX: a Python library for GWAS gene and pathway enrichment tests. Bioinformatics. 2023;39. doi:10.1093/bioinformatics/btad296\u003c/li\u003e\n\u003cli\u003eAndrews B, Chang J-B, Collinson L, Li D, Lundberg E, Mahamid J, et al. Imaging cell biology. Nature Cell Biology. 2022;24: 1180\u0026ndash;1185.\u003c/li\u003e\n\u003cli\u003eRajpurkar P, Chen E, Banerjee O, Topol EJ. AI in health and medicine. Nat Med. 2022;28: 31\u0026ndash;38.\u003c/li\u003e\n\u003cli\u003eDeGrave AJ, Cai ZR, Janizek J, Daneshjou R, Lee S-I. Auditing the inference processes of medical-image classifiers by leveraging generative AI and the expertise of physicians. Nature Biomedical Engineering. 2023; 1\u0026ndash;13.\u003c/li\u003e\n\u003cli\u003eRotem O, Schwartz T, Maor R, Tauber Y, Shapiro MT, Meseguer M, et al. Visual interpretability of image-based classification models by generative latent space disentanglement applied to in vitro fertilization. Nature Communications. 2024;15: 1\u0026ndash;19.\u003c/li\u003e\n\u003cli\u003eRotem O, Zaritsky A. Visual interpretability of bioimaging deep learning models. Nat Methods. 2024;21: 1394\u0026ndash;1397.\u003c/li\u003e\n\u003cli\u003eZhou B, Khosla A, Lapedriza A, Oliva A, Torralba A. Learning deep features for discriminative localization. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE; 2016. doi:10.1109/cvpr.2016.319\u003c/li\u003e\n\u003cli\u003eSelvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D. Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization. 2016 [cited 17 Dec 2024]. doi:10.1007/s11263-019-01228-7\u003c/li\u003e\n\u003cli\u003eShrikumar A, Greenside P, Kundaje A. Learning important features through propagating activation differences. Precup D, Teh YW, editors. arXiv [cs.CV]. 06--11 Aug 2017. pp. 3145\u0026ndash;3153. Available: https://proceedings.mlr.press/v70/shrikumar17a.html\u003c/li\u003e\n\u003cli\u003eLang O, Gandelsman Y, Yarom M, Wald Y, Elidan G, Hassidim A, et al. Explaining in Style: Training a GAN to explain a classifier in StyleSpace. arXiv [cs.CV]. 2021. Available: http://arxiv.org/abs/2104.13369\u003c/li\u003e\n\u003cli\u003eFine-Tuning or Fine-Failing? Debunking Performance Myths in Large Language Models. [cited 18 Dec 2024]. Available: https://arxiv.org/html/2406.11201v1\u003c/li\u003e\n\u003cli\u003eLi X, Chan S, Zhu X, Pei Y, Ma Z, Liu X, et al. Are ChatGPT and GPT-4 General-Purpose Solvers for Financial Text Analytics? A Study on Several Typical Tasks. 2023. Available: http://arxiv.org/abs/2305.05862\u003c/li\u003e\n\u003cli\u003eUppaal R, Hu J, Li Y. Is Fine-tuning Needed? Pre-trained Language Models Are Near Perfect for Out-of-Domain Detection. 2023. Available: http://arxiv.org/abs/2305.13282\u003c/li\u003e\n\u003cli\u003eXie Z, Zhang T, Kim S, Lu J, Zhang W, Lin C-H, et al. iGWAS: Image-based genome-wide association of self-supervised deep phenotyping of retina fundus images. PLOS Genetics. 2024;20: e1011273.\u003c/li\u003e\n\u003cli\u003eFu Y, Ma L, Wan S, Ge S, Yang Z. A novel clinical artificial intelligence model for disease detection via retinal imaging. Innovation (Camb). 2024;5: 100575.\u003c/li\u003e\n\u003cli\u003eHe K, Chen X, Xie S, Li Y, Dollar P, Girshick R. Masked Autoencoders Are Scalable Vision Learners. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE; 2022. doi:10.1109/cvpr52688.2022.01553\u003c/li\u003e\n\u003cli\u003eMavaie P, Holder L, Skinner MK. Hybrid deep learning approach to improve classification of low-volume high-dimensional data. BMC Bioinformatics. 2023;24: 1\u0026ndash;20.\u003c/li\u003e\n\u003cli\u003eLin W, Hasenstab K, Moura Cunha G, Schwartzman A. Comparison of handcrafted features and convolutional neural networks for liver MR image adequacy assessment. Scientific Reports. 2020;10: 1\u0026ndash;11.\u003c/li\u003e\n\u003cli\u003eBento N, Rebelo J, Barandas M, Carreiro AV, Campagner A, Cabitza F, et al. Comparing Handcrafted Features and Deep Neural Representations for Domain Generalization in Human Activity Recognition. Sensors (Basel, Switzerland). 2022;22: 7324.\u003c/li\u003e\n\u003cli\u003eBycroft C, Freeman C, Petkova D, Band G, Elliott LT, Sharp K, et al. The UK Biobank resource with deep phenotyping and genomic data. Nature. 2018;562: 203\u0026ndash;209.\u003c/li\u003e\n\u003cli\u003eDosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. 2020. Available: http://arxiv.org/abs/2010.11929\u003c/li\u003e\n\u003cli\u003eBand G, Marchini J. BGEN: a binary file format for imputed genotype and haplotype data. bioRxiv. 2018. p. 308296. doi:10.1101/308296\u003c/li\u003e\n\u003cli\u003eBulik-Sullivan B, Loh PR, Finucane HK, Ripke S, Yang J, Patterson N, et al. LD score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat Genet. 2015;47: 291\u0026ndash;295.\u003c/li\u003e\n\u003cli\u003eLamparter D, Marbach D, Rueedi R, Kutalik Z, Bergmann S. Fast and Rigorous Computation of Gene and Pathway Scores from SNP-Based Summary Statistics. 2016. doi:10.1371/journal.pcbi.1004714\u003c/li\u003e\n\u003cli\u003eSubramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A. 2005;102: 15545\u0026ndash;15550.\u003c/li\u003e\n\u003cli\u003eLiberzon A, Birger C, Thorvaldsd\u0026oacute;ttir H, Ghandi M, Mesirov JP, Tamayo P. The Molecular Signatures Database (MSigDB) hallmark gene set collection. Cell Syst. 2015;1: 417\u0026ndash;425.\u003c/li\u003e\n\u003cli\u003eCastanza AS, Recla JM, Eby D, Thorvaldsd\u0026oacute;ttir H, Bult CJ, Mesirov JP. Extending support for mouse data in the Molecular Signatures Database (MSigDB). Nature Methods. 2023;20: 1619\u0026ndash;1620.\u003c/li\u003e\n\u003cli\u003eUK10K Consortium, Walter K, Min JL, Huang J, Crooks L, Memari Y, et al. The UK10K project identifies rare variants in health and disease. Nature. 2015;526: 82\u0026ndash;90.\u003c/li\u003e\n\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":true,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"nature-portfolio","isNatureJournal":true,"hasQc":false,"allowDirectSubmit":false,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"","title":"Nature Portfolio","twitterHandle":"","acdcEnabled":false,"dfaEnabled":false,"editorialSystem":"ejp","reportingPortfolio":"","inReviewEnabled":true,"inReviewRevisionsEnabled":false},"keywords":"","lastPublishedDoi":"10.21203/rs.3.rs-6130721/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-6130721/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eAdvances in AI, including deep learning (DL), are transforming medical image analysis by enabling automated disease risk predictions. However, DL's outputs and latent space representations often lack interpretability, impeding clinical trust and biological insight. In this study, we evaluated \u003cem\u003eRETFound\u003c/em\u003e, a foundation model for retinal images, by comparing its predictive performance and genetic associations to those obtained using clinically interpretable traditional image features (TIFs). Our findings revealed that \u003cem\u003eRETFound\u0026rsquo;\u003c/em\u003es individual latent space variables poorly represent most TIFs but typically achieve higher accuracy when combined linearly. Fine-tuning \u003cem\u003eRETFound\u003c/em\u003e to predict TIFs provided better, but far from perfect surrogates, highlighting \u003cem\u003eRETFound\u0026rsquo;s\u003c/em\u003e limitations to fully characterise the retinal vasculature. We also find that \u003cem\u003eRETFound\u0026rsquo;\u003c/em\u003es latent space variables have many genetic associations, though there was minimal overlap between the significant genes identified from measured or predicted TIFs. Notably, predicted TIFs demonstrated greater heritability and excelled in ocular disease prediction as compared to their measured TIF counterparts. Comparing the predictive capacity of \u003cem\u003eRETFound\u003c/em\u003e compared to TIFs, \u003cem\u003eRETFound\u003c/em\u003e\u0026rsquo;s features carry more predictive value for diabetes and ocular diseases but the best models for predicting blood pressure and body mass index are those that combine tangible and deep features. Overall, these findings indicate that manually derived image features can complement foundation models, enhancing their interpretability and predictive capability. This study highlights the synergistic potential of integrating deep learning with classical feature extraction, advancing our understanding of retinal biology and disease mechanisms, and paving the way toward improved diagnostic and prognostic tools in ophthalmology.\u003c/p\u003e","manuscriptTitle":"Comparing tangible retinal image characteristics with deep learning features reveals their complementarity for gene association and disease prediction","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-04-07 09:53:44","doi":"10.21203/rs.3.rs-6130721/v1","editorialEvents":[],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"nature-genetics","isNatureJournal":true,"hasQc":false,"allowDirectSubmit":false,"externalIdentity":"ng","sideBox":"Learn more about [Nature Genetics](http://www.nature.com/ng/)","snPcode":"","submissionUrl":"","title":"Nature Genetics","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"ejp","reportingPortfolio":"Nature Research","inReviewEnabled":true,"inReviewRevisionsEnabled":false}}],"origin":"","ownerIdentity":"e58a5ff6-a271-4e5e-9c40-07ff7f593bbf","owner":[],"postedDate":"April 7th, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"under-review","subjectAreas":[{"id":46595592,"name":"Biological sciences/Genetics/Genetic association study/Genome-wide association studies"},{"id":46595593,"name":"Health sciences/Medical research/Genetics research"}],"tags":[],"updatedAt":"2026-03-02T23:15:15+00:00","versionOfRecord":[],"versionCreatedAt":"2025-04-07 09:53:44","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-6130721","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-6130721","identity":"rs-6130721","version":["v1"]},"buildId":"8U1c8b4HqxoKbykW_rLl7","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.