{"paper_id":"a15418a6-e241-463e-9ade-e2e3c33dd452","body_text":"1 \n \nRare variants and survival of patients with idiopathic \npulmonary fibrosis \n \nAitana Alonso-Gonzalez1, David Jáspez2, José M. Lorenzo-Salazar2, Shwu-Fan Ma3, \nEmma Strickland3, Josyf Mychaleckyj4, John S. Kim3, Yong Huang3, Ayodeji \nAdegunsoye5, Justin M. Oldham6, Iain Steward7, Philip L. Molyneaux7,8, Toby M. \nMaher7,9, Louise V. Wain10,11, Richard J. Allen10,11, R. Gisli Jenkins7, Jonathan A. \nKropski12,13,14, Brian Yaspan15, Timothy S. Blackwell16, David Zhang17, Christine Kim \nGarcia17,18, Fernando J. Martinez19, Imre Noth3, and Carlos Flores1,2,20,21 \n \n1Research Unit, Hospital Universitario Nuestra Señora de Candelaria, Instituto de Investigación \nSanitaria de Canarias, Santa Cruz de Tenerife, Spain; 2Genomics Division, Instituto Tecnológico y \nde Energías Renovables, Santa Cruz de Tenerife, Spain;3Division of Pulmonary and Critical Care \nMedicine, University of Virginia, Charlottesville, VA USA; 4Center for Public Health Genomics; \nUniversity of Virginia, Charlottesville, VA, USA; 5Section of Pulmonary and Critical Care \nMedicine, University of Chicago, Chicago, IL USA;6Division of Pulmonary and Critical Care \nMedicine, University of Michigan, Ann Arbor, MI USA; 7National Heart and Lung Institute, \nImperial College London, London, UK; 8Royal Brompton and Harefield Hospitals, Guy’s and St. \nThomas’ NHS Foundation Trust, London, UK; 9Division of Pulmonary and Critical Care Medicine, \nUniversity of Southern California, Los Angeles, CA USA; 10Department of Population Health \nSciences, University of Leicester, Leicester, UK; 11National Institute for Health Research, Leicester \nRespiratory Biomedical Research Centre, Glenfield Hospital, Leicester, UK; 12 Department of Cell \nand Developmental Biology, Vanderbilt University, 13Department of Veterans Affairs Medical \nCenter, Nashville, TN; 14Division of Pulmonary and Critical Care Medicine, Vanderbilt University, \nNashville, TN USA; 15Genentech Inc., San Francisco, CA USA; 16Department of Internal Medicine, \nUniversity of Michigan, Ann Arbor, MI USA; 17Department of Medicine, Columbia University \nIrving Medical Center, New York, NY, USA; 18Columbia Precision Medicine Initiative , Columbia \nUniversity Irving Medical Center, New York, NY, USA;19Weill Cornell Medical Center, New York, \nNY USA; 20Facultad de Ciencias de la Salud, Universidad Fernando Pessoa Canarias, Las Palmas \nde Gran Canaria, Spain; 21CIBER de Enfermedades Respiratorias (CIBERES), Instituto de Salud \nCarlos III, Madrid, Spain. \n \n \n \n \n \n \n \n . CC-BY 4.0 International licenseIt is made available under a \nperpetuity. \n is the author/funder, who has granted medRxiv a license to display the preprint in(which was not certified by peer review)preprint \nThe copyright holder for thisthis version posted October 15, 2024. ; https://doi.org/10.1101/2024.10.12.24315151doi: medRxiv preprint \nNOTE: This preprint reports new research that has not been certified by peer review and should not be used to guide clinical practice.\n\n2 \n \n \nABSTRACT \nBackground  \nThe clinical course of idiopathic pulmonary fibrosis (IPF) is highly variable and unpredictable, \nwith multiple genetic variants influencing IPF outcomes. Notably, rare pathogenic variants in \ntelomere-related genes are associated with poorer clinical outcomes in these patients. Here we \nassessed whether rare qualifying variants (QVs) in monogenic adult-onset pulmonary fibrosis \n(PF) genes are associated with IPF survival. Using polygenic risk scores (PRS), we also evaluated \nthe influence of common IPF risk variants in individuals carrying these QVs. \nMethods \nWe identified QVs in telomere and non-telomere genes linked to monogenic PF forms using \nwhole-genome sequences (WGS) from 888 Pulmonary Fibrosis Foundation Patient Registry \n(PFFPR) individuals. We also derived a PRS for IPF (PRS-IPF) from 19 previously published \ncommon sentinel IPF variants. Using regression models, we then examined the mutual \nrelationships of QVs and PRS-IPF and their association with survival. Validation of results was \nsought in WGS from an independent IPF study (PROFILE, n=472), and results from the two \ncohorts were meta-analyzed. \nResults: Carriers of QVs in monogenic adult-onset PF genes, representing nearly 1 out of 6 IPF \npatients, were associated with lower PRS-IPF (Odds Ratio [OR]: 1.79; 95% Confidence Interval \n[CI]: 1.15-2.81; p=0.010) and shorter survival (Hazard Ratio [HR]: 1.53; 95% CI: 1.12-2.10; \np=7.3x10-3). Notably, carriers of pathogenic variants at telomere genes showed the strongest \nassociation with survival (HR: 1.76; 95% CI: 1.13-2.76; p=0.013). The meta-analysis of the \nresults showed a consistent direction of effect across both cohorts. \nConclusions: We revealed the opposite effects of QVs and PRS-IPF on IPF survival. Thus, a \ndistinct IPF molecular subtype might be defined by QVs in monogenic adult-onset PF genes. \nAssessing the carrier status for QVs and modelling PRS-IPF promises to further contribute to \npredicting disease progression among IPF patients. \n \n \n \n \n \n \n \n \n \n \n . CC-BY 4.0 International licenseIt is made available under a \nperpetuity. \n is the author/funder, who has granted medRxiv a license to display the preprint in(which was not certified by peer review)preprint \nThe copyright holder for thisthis version posted October 15, 2024. ; https://doi.org/10.1101/2024.10.12.24315151doi: medRxiv preprint \n\n3 \n \nINTRODUCTION \nIdiopathic pulmonary fibrosis (IPF) is a rare and progressive disease characterized by lung \nscarring and poor prognosis, with a median survival of 3-5 years after diagnosis (1). The clinical \ncourse of the disease varies greatly among patients and is difficult to predict. While some \npatients maintain relatively stable lung function for many years, 10-15% of them experience a \nrapid decline (2) and may succumb to their disease before initiation of effective therapy or lung \ntransplantation. Identifying those in need of immediate therapy is crucial to improve clinical \nmanagement. \nGenetic studies have revealed that both rare and common genetic variants contribute to IPF \nsusceptibility (3–8). However, the incorporation of genetic data in IPF diagnosis remains \nlimited, as current guidelines rely primarily on radiological or histological criteria to identify \ninterstitial pneumonia patterns. Instead, genetic testing is increasingly valuable for predicting \ndisease prognosis (9). Genome-wide association studies (GWAS) have revealed multiple \ncommon loci involved in IPF progression and survival. For instance, the mucin 5B, oligomeric \nmucus/gel-forming gene (MUC5B) risk allele (rs3570950-T), which is the strongest common \ngenetic risk factor known for IPF, is also linked to slower disease progression (10), although this \nassociation might be subject to an index event bias (11). A novel genetic risk locus involving the \nantisense RNA gene of protein kinase N2 (PKN2) gene has also shown association with forced \nvital capacity (FVC) decline, a common measure for monitoring disease progression (12). In \naddition, the first GWAS of IPF survival found a variant in PCSK6 associated with differential \npatient survival (13).  \nRare qualifying variants (QVs) in telomere-related genes were associated with poor clinical \noutcomes among IPF patients (7) and among patients with other interstitial lung diseases (ILD), \nsuch as hypersensitivity pneumonitis (14). Specifically, QVs associate with progressive PF, a \nrapid decline in lung function, and reduced survival (15–17). However, previous studies have \nfocused mostly on the effects of a few genes (TERT, RTEL1, and PARN), even though many other \ntelomere genes are known to be associated with monogenic PF. \nAltogether, this evidence supports that multiple genetic factors are involved in distinct \nmechanisms of IPF pathogenesis and rates of progression. In addition, patients with the \nMUC5B risk allele are less likely to carry rare likely deleterious variants in adult-onset PF genes \nthan non-carriers of the MUC5B risk allele, suggesting that the expectation of additive effects \nof common and rare variants may not hold in this case (7,18). Nevertheless, the effect of QVs \nacross all known monogenic adult-onset PF genes (including telomere and non-telomere \ngenes) in survival and the modifier role of polygenic risk scores (PRS) of common risk variants \nof IPF (PRS-IPF) in QVs carriers remains to be elucidated. \nUsing whole-genome sequencing (WGS) from IPF patients, we aimed to determine the \nprevalence of QVs in monogenic adult-onset PF genes and examine the mutual relationships of \nQVs and PRS-IPF and their association with IPF survival. \nMETHODS \nStudy design and sample description \nWe assessed the association of QVs and PRS-IPF with the primary outcome in patients with IPF. \nIn the discovery stage, we utilized data from the Pulmonary Fibrosis Foundation Patient \nRegistry (PFFPR) (19). In the second stage, we employed data from the Prospective Observation \nof Fibrosis in the Lung Clinical Endpoints (PROFILE) cohort for validation (20,21), ensuring the \nrobustness of our findings. In the PFFPR, the primary outcome was the time from initial \ndiagnosis to either death or lung transplantation. In the PROFILE cohort, the primary outcome \n . CC-BY 4.0 International licenseIt is made available under a \nperpetuity. \n is the author/funder, who has granted medRxiv a license to display the preprint in(which was not certified by peer review)preprint \nThe copyright holder for thisthis version posted October 15, 2024. ; https://doi.org/10.1101/2024.10.12.24315151doi: medRxiv preprint \n\n4 \n \nwas the time from diagnosis to death. For both cohorts, right censoring was applied at 60 \nmonths.  \nIn the validation stage, we included 888 patients clinically diagnosed with IPF from the PFFPR, \nwith baseline and longitudinal demographic and clinical information recorded in the United \nStates since March 2016. In the second stage, 472 clinically diagnosed IPF patients from the \nPROFILE cohort, recruited in UK from 2010 to 2017, were included and followed for three years \nto track disease progression (Figure 1). For further details, see Supplementary methods, and \nSupplementary Table 1. \n \n \nFigure 1. Patient cohorts included from the PFFPR and PROFILE studies. \n \nBoth studies were conducted according to The Code of Ethics of the World Medical Association \n(Declaration of Helsinki) and written informed consent was obtained from all participants. The \nResearch Ethics Committees at each participating centre approved the study. \nSequencing and bioinformatics analysis methods \nIn the PFFPR, library preparation and sequencing were performed by Psomagen (Rockville, \nMD). Genomic DNA libraries were prepared using the TruSeq DNA PCR Free kit (Illumina Inc.) \nand sequenced on an Illumina NovaSeq 6000 instrument (Illumina Inc.) with 150 bp paired-end \nreads at an average depth of 30X. At least 80% of the genome was covered by ≥20 reads, and \n≥90% was covered by ≥10 reads. WGS was processed using the Illumina DRAGEN Bio-IT \nPlatform Germline Pipeline v3.10.4 (Illumina Inc.) using the Illumina DRAGEN Multigenome \nGraph hg38 as the reference genome. Only variants with a “PASS” filter were included in \nsubsequent analyses.  \nFor the PROFILE cohort, WGS was performed at Human Longevity Inc. using the Illumina \nNovaSeq 6000 system with 150 pb paired-end reads. Coverage of at least 10X was achieved in \nover 98% of the Consensus Coding Sequence Release 22 (CCDS), with an average read depth of \n42X across the CCDS as described previously (4). Sequences were processed using the Illumina \nDRAGEN Bio-IT Platform Germline Pipeline v3.0.7, with the GRCh38 as the reference genome. \nIn both cohorts, quality control (QC) included identifying QC outliers, detecting kinship \nbetween patients, checking for cross-contamination of samples, and identifying sex \ndiscordance, using metrics from different tools. Figure 1 summarized the number of individuals \nexcluded and the reasons for exclusion. For further details, see Supplementary methods. \n . CC-BY 4.0 International licenseIt is made available under a \nperpetuity. \n is the author/funder, who has granted medRxiv a license to display the preprint in(which was not certified by peer review)preprint \nThe copyright holder for thisthis version posted October 15, 2024. ; https://doi.org/10.1101/2024.10.12.24315151doi: medRxiv preprint \n\n5 \n \nIdentification of QVs in monogenic adult-onset PF genes \nWe restricted the identification of QVs to a curated list of 13 PF genes, categorized as either \ntelomere related (TERC, TERT, TINF2, DKC1, RTEL1, PARN, NAF1, and ZCCHC8) or non-telomere \nrelated (SFTPC, SFTPA1, SFTPA2, SPDL1, and KIF15) (Supplementary Table 2). With the \nexception of SPDL1, and KIF15, this list includes genes with a known dominant inheritance \npattern (presuming that QVs in these genes would have higher penetrance) and genes \ncommonly found in familial IPF cohorts, despite they also occur in sporadic cases (7). \nKIF15 and SPDL1 were incorporated to the list as recent largescale sequencing studies \nidentified them as PF-related genes (4,22,23). Both genes are critical for mitosis, pointing to a \nnovel, non-telomeric mechanism underlying IPF. Rare deleterious variants in KIF15 and three \ntelomere genes (TERT, PARN and RTEL1) have been previously associated with IPF risk, early \nonset, and progression to early-age lung transplantation or death (23). In SPDL1, a rare \nmissense variant was confirmed as a new IPF risk allele, although carriers did not exhibit \ndistinct clinical features (4).  For simplicity, we refer to this gene set as monogenic adult-onset \nPF genes. \nVariants in these genes were filtered based on read depth (DP) <10, mapping quality (MQ) <50, \nor the percentage of missing genotypes (FMISS) >0.05 in the cohort. The remaining variants \nwere annotated using the Variant Effect Predictor tool v109.3 (24). Variants with a global allele \nfrequency (AF) >0.0005 in gnomAD v2.1 were excluded from the study. For our analyses, we \nretained protein-truncating variants (including frameshift, stop-gained, start-loss, and splicing \nvariants) and missense variants with a CADD >15. \nFor the non-coding RNA gene TERC, due to the difficulty in predicting functional effects in non-\ncoding genes, variants were considered for the analysis if their global population AF was \n<0.0005 and they were annotated by ClinVar as pathogenic (P), likely pathogenic (LP), or of \nuncertain significance (VUS).  \nAll variants that met the aforementioned criteria were annotated as the total set of QVs. \nAdditionally, QVs were manually classified according to ACMG guidelines (25) as P , LP , or VUS. \nVariants classified as P or LP comprised the set of pathogenic variants, while ClinVar variants \nwere cross-referenced and annotated as VUS, P , or LP.  \nFor sensitivity analysis, we defined six additional categories for QVs based on specific \nthresholds for population AF and different predicted variant effect. We also assessed a category \nof rare synonymous variants, expected to capture neutral variation, to use as a null model in in \nthe association analyses. These criteria are summarized in Supplementary Table 3. \nPrincipal components of genetic heterogeneity in the cohorts \nPrincipal components (PC) were calculated after excluding single nucleotide polymorphisms \n(SNPs) with a minor allele frequency (MAF) <0.01 from WGS data, using BCFtools \n(https://samtools.github.io/bcftools/bcftools.html). Genotyping QC was then performed using \nPLINK v.1.9. First, SNPs with a genotyping call rate (CR) <95% or those deviating significantly \nfrom Hardy-Weinberg equilibrium (HWE) (p<1.0x10-6) were removed. After linkage \ndisequilibrium pruning (indep-pairwise 100 5 0.01), the main PCs of genetic variation were \ncalculated based on 110,951 independent SNPs in the PFFPR and 143,214 independent SNPs in \nPROFILE (see Supplementary Figure 1). \n \n . CC-BY 4.0 International licenseIt is made available under a \nperpetuity. \n is the author/funder, who has granted medRxiv a license to display the preprint in(which was not certified by peer review)preprint \nThe copyright holder for thisthis version posted October 15, 2024. ; https://doi.org/10.1101/2024.10.12.24315151doi: medRxiv preprint \n\n6 \n \nEstimation of the PRS of IPF in the cohorts \nThe PRS-IPF for each patient was derived using the 19 previously published genome-wide \nsignificant IPF variants (Supplementary Table 4) using PRSice-2 (26). Briefly, PRS were \ncalculated as the number of risk alleles carried by each individual, multiplied by the effect size \nof the variant as described in the GWAS study (26) summed across all variants included in the \nscore: \nPRS = ∑ βiGi\n𝑛\n𝑖=1\n \nwhere βi is the OR (in the case of binary traits) from variant i, Gi represents the number of risk \nalleles carried at the variant i and n represents the conditionally independent signals identified \nelsewhere. Raw polygenic scores were then standardized as z-scores using the following \nformula: \n𝑃𝑅𝑆𝑧 = 𝑃𝑅𝑠 − 𝑀ⅇ𝑎𝑛 (𝑃𝑅𝑆)\n𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑ⅇ𝑣𝑖𝑎𝑡𝑖𝑜𝑛𝑠 (𝑃𝑅𝑆) \nAs part of the sensitivity analysis, we also used the same methodology to derive PRS for \ntelomere length (PRS-TL) based on the 20 common variants that were previously found \nassociated with leukocyte telomere length (TL) (27) (Supplementary Table 5). In this case, since \nTL is a quantitative trait, βi is represented by beta coefficients in the PRS formula. \nStatistical analysis \nDescriptive statistics were provided as mean (standard deviation, SD) or median (interquartile \nrange, IQR) and valid percentage for continuous (quantitative) and categorical (binary) data, \nrespectively. Categorical variables were compared using a Chi-squared test or a Fisher’s exact \ntest as indicated. \nTo examine the relationship between the presence of QVs and the PRS, we first used the \nStudent’s t-test and the Kolmogorov-Smirnov (KS) test to compare the mean PRS values and \ntheir distributions between QVs carriers and non-carriers. Additionally, we assessed this \nrelationship using logistic regression models, adjusting for sex, age at diagnosis, and the two \nmain PCs of genetic heterogeneity, which accounted for a significant proportion of genetic \nvariance (Supplementary Figure 1C). \nTo examine the association between QVs, PRS, and survival, we used Cox proportional hazards \nregression models adjusted for sex, age at diagnosis, the two main PCs for genetic \nheterogeneity, smoking history, DLCO% predicted, FVC % predicted, and MUC5B risk allele \ncarrier status necessary. The proportional hazards assumption of each covariate was assessed \nby plotting scaled Schoenfeld residuals against transformed time, revealing no evidence of non-\nproportional hazards. \nThe Survival R package (v.3.5-7) was used to calculate p-values, hazard ratios (HR), and 95% \nconfidence intervals (CI). For visualizing survival differences, we generated Kaplan-Meier \nsurvival plots and tested the differences using log-rank tests.  \nResults from PFFPR and PROFILE studies were meta-analysed under a fixed-effects model to \nassess the directional concordance of associations, using the Meta R package (28). Statistical \nanalyses were performed with R v.4.3.1, with p-values <0.05 considered significant. \n . CC-BY 4.0 International licenseIt is made available under a \nperpetuity. \n is the author/funder, who has granted medRxiv a license to display the preprint in(which was not certified by peer review)preprint \nThe copyright holder for thisthis version posted October 15, 2024. ; https://doi.org/10.1101/2024.10.12.24315151doi: medRxiv preprint \n\n7 \n \nRESULTS \n \nPrevalence of QVs in the PFFPR \nWe identified 131 QVs in monogenic adult-onset PF genes in 144 patients from the PFFPR \nresulting in a diagnostic yield of finding a QV of 16.2%, with a 95% CI of 13.8-18.6 \n(Supplementary Figure 2, Table 1). Most patients (96.5%) carried a single QV, while five \npatients (3.5% of the QV carriers) had two or more QVs, including combinations such as \nNAF1/TERT, KIF15/RTEL1, TERT/SPDL1, TINF2/TERT/RTEL1, and RTEL1/RTEL1. Consistent with \nprevious studies, the prevalence of QVs was higher in patients with a familial history of disease \n(27.3%) compared to those with sporadic disease (13.5%) (p=3.08x10-5). \n \nMost QVs were in telomere genes (75.6%), while nearly a quarter were found in non-telomere \ngenes (22.9%). The highest number of QVs were identified in telomere-related genes including \nRTEL1 (25.2%), TERT (23.7%), and PARN (12.2%). These genes also had the highest proportion \nof P/LP variants (31.0%, 31.0%, and 20.7%, respectively) (Table 1). In total, 42.7% of QVs were \npreviously annotated in ClinVar as VUS, LP, or P . \n \nConsistent with previous findings (7,18), carriers of the risk MUC5B allele (rs35705950 TT or TC \ngenotype) had lower prevalence of QVs (13.5%) compared to those carrying the protective GG \ngenotype (22.6%) (p=1.03x10-3). \n \nAssociation of QVs and PRS-IPF in the PFFPR \nGiven the potential non-additive effects of QVs and the MUC5B common variant, we \ninvestigated whether individuals with lower polygenic risk were more likely to carry QVs \ncompared to those with higher polygenic risks. We first compared the mean and distribution of \nPRS-IPF between QV carriers and non-carriers and found significant differences (Student’s t-\ntest, p=1.30x10-3; KS-test, p=3.74x10-4) (Figure 2A). When patients were stratified into PRS-IPF \ntertiles, the prevalence of QVs was higher in the lowest tertile patients (low PRS-IPF) than in \nthe highest tertile patients (high PRS-IPF), which associated with an increased risk of carrying a \nQV in the patients classified in the low PRS-IPF tertile (OR=1.79, 95% CI=1.15-2.81, p=0.010) \n(Figure 2B). The association persisted when the cohort was divided into two PRS-IPF categories, \nlow and high (OR=1.74, 95% CI=1.20-2.53, p=3.57x10-3 (Supplementary Figure 3).  \n \nFigure 2. Association between prevalence of qualifying variants (QVs) and PRS-IPF in the PFFPR. A) \nDistribution of PRS-IPF in QV carriers (1) and non-carriers (0). Vertical dotted lines represent the mean \nvalue of the distribution. B) Risk of carrying a QV for patients with low polygenic risk (T1) and high \npolygenic risk (T3) compared to those in the middle tertile. The odds ratios (OR) and the 95% confidence \nintervals (CI) were estimated using logistic regression adjusted for sex, age of diagnosis, and the two \nmain principal components. \n \n . CC-BY 4.0 International licenseIt is made available under a \nperpetuity. \n is the author/funder, who has granted medRxiv a license to display the preprint in(which was not certified by peer review)preprint \nThe copyright holder for thisthis version posted October 15, 2024. ; https://doi.org/10.1101/2024.10.12.24315151doi: medRxiv preprint \n\n8 \n \nExcluding the MUC5B locus from the PRS-IPF calculations yielded non-significant differences in \nthe mean and distribution of PRS-IPF between QV carriers and non-carriers (Supplementary \nFigure 4A). However, QVs remained more common in the lowest PRS tertile patients compared \nto the highest (OR=1.60, 95% CI=1.01-2.54, p=0.05) (Supplementary Figure 4B). \n \nTo explore if these observations were independent of genetically predicted TL, we then \nassessed the association between the prevalence of QVs and PRS-TL, focusing only on the QVs \nin telomere genes. No significant associations were found between QVs and PRS-TL under the \ntwo assumptions (Supplementary Figures 5-7). \n \nAssociation of QVs with survival in the PFFPR \nGiven that carriers of the MUC5B risk allele are associated with better survival and are less \nlikely to carry QVs, we hypothesized that QV carriers might have poorer survival. Indeed, QVs \ncarriers were associated with reduced survival (HR=1.53, 95% CI=1.12-2.10, p=7.33x10-3; log-\nrank test, p=0.022). The result was consistent when the analysis was limited to QVs that were \nclassified as pathogenic or likely pathogenic variants (HR=1.71, 95%CI=1.11-2.65, p=0.016; log-\nrank test, p=0.043). However, no significant association was found for ClinVar variants alone \n(HR=1.35, 95% CI=0.87-2.09, p=0.18) (Figure 3A, Supplementary Figure 8). As an internal \ncontrol, we found no association between survival and rare synonymous variants in the same \nmonogenic adult-onset PF genes (HR=1.38, 95%CI=0.80-2.38, p=0.24) (Figure 3A). \n \n \nFigure 3. Qualifying variants (QVs), MUC5B risk allele, PRS-IPF, and family history effect on survival. A) \nPFFPR. B) PROFILE. All analysis were performed using Cox regression models adjusted for sex, age of \ndiagnosis, the two main principal components, smoking history, forced vital capacity (FVC) % predicted, \nand diffusing capacity for carbon monoxide (DLCO) % predicted, and the MUC5B risk allele whenever \nnecessary. The X-axis shows Hazard-ratios (HR); the grey line corresponds to the HR=1.0. The circles \ncorrespond to adjusted HR and horizontal lines correspond to 95% confidence intervals (CI). \n \nFurther analyses showed that QVs in telomere-related genes had the largest effect on survival \n(HR=1.76, 95%CI=1.13-2.76, p=0.013; log-rank test, p=0.029), and QVs in PARN were \nparticularly associated with worse survival (HR=2.28, 95%CI=1.11-4.68, p=0.03; log-rank test, \np=0.035) (Figure 3A, Supplementary Figure 8).  \n \nWe performed additional sensitivity analyses. First, excluding PARN QV carriers attenuated \neffect size, but the results remained significant (HR=1.46, 95% CI=1.04-2.03, p=0.03), indicating \nthat other genes also contribute to the association with worse survival (Supplementary Figure \n9). Second, as the probability of carrying QVs is higher among cases with a family history of PF, \n . CC-BY 4.0 International licenseIt is made available under a \nperpetuity. \n is the author/funder, who has granted medRxiv a license to display the preprint in(which was not certified by peer review)preprint \nThe copyright holder for thisthis version posted October 15, 2024. ; https://doi.org/10.1101/2024.10.12.24315151doi: medRxiv preprint \n\n9 \n \nand family history of PF predicts reduced survival (29), we tried to account for the risk \nattributed to family history of PF. We found no relationship between family history of PF and \nsurvival (HR=1.09, 95%CI=0.80-1.49, p=0.59), suggesting that family history is not a major \nfactor in this cohort’s survival outcomes (Figure 3A). Third, the criteria for defining QVs are \nsubject to different choices and predictors. Using alternative and stricter definitions of QVs in \nthe analyses (Supplementary Table 3), we found that the effect had consistent directionality \nacross all QV definitions (Supplementary Figure 10).  Additionally, we observed that applying \nmore stringent in silico predictors resulted in a higher risk of reduced survival (ultrarare PTV \nHR=2.44, 95% CI=1.14-5.24, p=0.02). However, these criteria also led to a reduced number of \nidentified carriers, thereby decreasing the power to detect significant differences. \n \nAssociations of PRS-IPF with survival in the PFFPR \nSince both rare and common IPF genetic variants are associated with IPF survival, and QVs \nassociate with worse survival, we then examined whether the polygenic component of IPF was \nalso associated with survival. As PRS-IPF values were mainly influenced by the MUC5B effect, \nwe did not adjust for the risk MUC5B genotype in these analyses, although we did for relevant \nindividual covariates. We found that the lowest PRS-IPF tertile was associated with the worst \nsurvival (log-rank test, p=1.8x10-4; HR=1.61, 95% CI=1.25-2.07, p=1.9x10-4) (Figure 3A, Figure \n4). In contrast, PRS-TL was not associated with survival, whether analyzed by tertiles or by high \nvs. low-risk groups (Supplementary Figure 11, Supplementary Figure 12). \n \n \nFigure 4. Association between PRS-IPF and survival in PFFPR. Patients with low polygenic risk of IPF (T1) \nhave worse survival in comparison with patients with high polygenic risk of IPF (T2 and T3). \n \nA sensitivity analysis, excluding the MUC5B locus from the PRS-IPF calculation, yielded non-\nsignificant results (Supplementary Figure 13). To explore whether the association of PRS-IPF \nand survival was solely driven by the known association of MUC5B, we stratified patients by QV \ncarrier status and assessed its effect in each group. The analyses showed that patients with \nlower PRS-IPF were associated with worse survival in both groups, although this association \nwas attenuated among carriers (HR=1.76, 95% CI=0.99-3.15, p=0.055) compared to non-\ncarriers (HR=1.54, 95% CI=1.16-2.04, p=2.5x10-3) (Supplementary Figure 14A, Supplementary \nFigure 14B). Similar results were obtained when we assessed the effect of the MUC5B \nrs35705950 genotypes on survival among both groups (Supplementary Figure 14C, \nSupplementary Figure 14D). Our findings suggest that the strong observed association \n . CC-BY 4.0 International licenseIt is made available under a \nperpetuity. \n is the author/funder, who has granted medRxiv a license to display the preprint in(which was not certified by peer review)preprint \nThe copyright holder for thisthis version posted October 15, 2024. ; https://doi.org/10.1101/2024.10.12.24315151doi: medRxiv preprint \n\n10 \n \nbetween PRS-IPF and survival is mainly driven by MUC5B. While it is suspected that QVs may \nexert an independent but opposite effect on IPF survival, we cannot rule out that the observed \ndifferences were influenced by disparities in the group sizes.  \nValidation of results in PROFILE and meta-analysis \nUsing the same classification of QVs as in PFFPR, the diagnostic yield of finding a QVs in \nPROFILE was 15.67 % (95% CI=12.4-19.0%). However, PROFILE IPF patients showed no \nstatistical differences in the mean and the distribution of PRS-IPF between QV carriers and non-\ncarriers (Student’s t-test, p=0.17; KS-test, p=0.24). Similarly, although not statistically \nsignificant, an enrichment of QVs in the patients with lower polygenic component of IPF was \nobserved compared to those with higher polygenic component (OR=1.29, 95% CI=0.78-2.15, \np=0.31) (Supplementary Figure 15).  \n \nDespite these results, the association of QVs and IPF survival were validated in PROFILE \npatients (Figure 3B). As for PFFPR, the survival analyses of PROFILE patients focusing on the \nLP/P variants, both of all genes or only of telomeric genes, also showed the largest effect sizes \n(HR=1.98, 95% CI=1.28-3.05, p=2.1x10-3; log-rank test, p=0.023). However, we did not replicate \nthe association of QVs in PARN with survival. Instead, the association with worse survival in \nPROFILE was observed for TERT QV carriers (HR=3.55, 95% CI=1.85-6.82, p=1.4x10-4; log-rank \ntest, p=0.03) (Figure 3B, Supplementary Figure 16).  \n \nThe meta-analysis of results from PFFPR and PROFILE cohorts showed a consistent direction of \neffect across all categories and supported a robust association between QVs (including “all \nvariants”, “pathogenic”, “telomeric”, and “pathogenic telomeric”) and PRS-IPF tertiles with \nsurvival (Figure 5). In the meta-analysis, RTEL1 QVs were also nominally associated with IPF \nsurvival despite not reaching nominal significance in any of the two cohorts by separate. No \nassociation with IPF survival was found for QVs in SPDL1 and KIF15. \n . CC-BY 4.0 International licenseIt is made available under a \nperpetuity. \n is the author/funder, who has granted medRxiv a license to display the preprint in(which was not certified by peer review)preprint \nThe copyright holder for thisthis version posted October 15, 2024. ; https://doi.org/10.1101/2024.10.12.24315151doi: medRxiv preprint \n\n11 \n \n \nFigure 5. Meta-analysed results from adjusted PFFPR and PROFILE (N=1360) Cox regression models. \n . CC-BY 4.0 International licenseIt is made available under a \nperpetuity. \n is the author/funder, who has granted medRxiv a license to display the preprint in(which was not certified by peer review)preprint \nThe copyright holder for thisthis version posted October 15, 2024. ; https://doi.org/10.1101/2024.10.12.24315151doi: medRxiv preprint \n\n12 \n \nDISCUSSION \nOur study demonstrates that IPF patients carrying QVs in monogenic adult-onset PF genes are \nat increased risk of reduced survival compared to non-carriers. Additionally, we found that QV \ncarriers tend to exhibit a lower polygenic risk component for IPF, as measured by PRS, \nsuggesting non-additive effects between rare and common genetic variants. This indicates the \nexistence of a distinct genetic subtype of IPF patients, defined by the interplay between these \nrare and common variants. These key findings were replicated across two independent studies \nencompassing a total of 1,360 IPF patients. \nWe describe for the first time that QVs in monogenic adult-onset PF genes are present in 1 in 6 \nto 1 in 7 IPF patients, a prevalence consistent with a recent WGS study, despite using different \ncriteria for defining QVs (7). In comparison,  our study used less stringent classification criteria \nfor QV carrier, relying on one pathogenicity predictor instead of three, and including SPDL1 \ngene, recently identified as an IPF susceptibility gene (4). Previous studies primarily focused on \ntelomere-related genes such as TERT, PARN, TERC, and RTEL1, resulting in lower diagnostic \nyield (8.5-11.5% carriers of QVs) (18,30). Some earlier studies were enriched for familial IPF \npatients (31), who tend to carry more QVs than sporadic cases (21.4% carriers of QVs). Unlike \nthese studies, we expanded our analysis to include additional well-defined telomere genes in \nthe analysis, such as DKC1, NAF1, ZCCHC8, and TINF2. \nMost QVs were found in telomere-related genes (75.6% in PFFPR, 68.9% in PROFILE). The \nassociation between variants in these genes and worse IPF outcomes is well-established. \nCarriers of QVs in TERT, PARN, TERC, or RTEL1 tend to have earlier disease onset, more rapid \nlung function decline, and poorer survival compared to non-carriers (7,15,18), findings that are \nmirrored in individuals with short TL (7,17). However, the exact correlation between rare \ntelomere-related variants and TL remains unclear, and the effect sizes of known common \nvariants on TL are too small to fully explain this relationship (7). \nOur study stands out by assessing the aggregate effect of QVs across both telomere and non-\ntelomere genes on IPF survival. While the effect sizes were smaller compared to models that \nincluded only telomere-related variants, the most robust associations were found when \nconsidering all QVs across telomere and non-telomere genes. As expected, we identified few \nQVs in surfactant metabolism genes, and observed a comparable high burden of QVs in KIF15 \nand SPDL1, two recently reported IPF genes in European cohorts. A common SPDL1 missense \nvariant (rs116483731) has been linked to IPF in the PROFILE cohort, particularly in the Finnish \npopulation (4), though without significant clinical differences between carriers and non-\ncarriers. In contrast, KIF15 rs138043992 carriers from the FinnGen FinnIPF study demonstrated \nearly disease onset and progression (23). In the PRRPF cohort, these two risk variants were \nexcluded from the analysis based on the applied a global genome AF cut-off. We found that \ncarriers of KIF15 and SPDL1 QVs did not exhibit worse survival. However, the same direction of \neffect was observed among KIF15 QV carriers in both the PFFPR and PROFILE cohorts. Future \nstudies with larger sample sizes may be required to better define the clinical course of IPF \npatients with KIF15 QVs. \nIdentifying prognostic markers in PF is crucial to improve the clinical management. As \nantifibrotic drugs like nintedanib and pirfenidone can only slow disease progression (32–34), \nearly identification of patients with poorer prognosis could guide decisions on more aggressive \ntreatments, such as lung transplantation. This information could also enhance the efficiency of \nclinical trials (35). Previous studies support that a family history of PF associates with patient \n . CC-BY 4.0 International licenseIt is made available under a \nperpetuity. \n is the author/funder, who has granted medRxiv a license to display the preprint in(which was not certified by peer review)preprint \nThe copyright holder for thisthis version posted October 15, 2024. ; https://doi.org/10.1101/2024.10.12.24315151doi: medRxiv preprint \n\n13 \n \nsurvival, highlighting the importance of systematically ascertaining family histories for ILD \npatients to better inform prognosis (29). However, familial PF could encompass a broad group \nof ILD patients whose disease progression is dependent on different factors. For instance, one \nwill be considering the risk linked to rare telomere-related variants (36,37), to common genetic \nvariants (38), and to short age-adjusted TL (39), since they are interrelated with the family \nhistory of PF. Our findings revealed non-additive opposite effects of QVs and PRS-IPF on \nsurvival, while no apparent association was found with PRS-TL. Moreover, we did not find \nassociation between family history and IPF survival, although these results should be \ninterpreted with caution considering that self-reported family history is prone to recall bias.  \nIn contrast, our results show that QVs are robust predictors of IPF patient prognosis. Despite \nthis, the use of genetic testing in IPF patients is not yet a generalized practice and current \nguidelines only recommend its use in familial forms of PF (9,40,41). Nonetheless we and others \nshow the existence of a significant burden of QVs in cases in which familial history is not \nconfirmed. It is therefore an urgent necessity to define additional criteria to improve the \ndiagnostic yield of genetic tests in these patients. For some complex diseases, PRS have been \nsuggested as a promising strategy for prioritizing patients who should undergo genetic \nsequencing (42). For example, in prostate cancer, the high penetrant variant HOXB13 G84E is \nmost common in cases with the lowest PRS (43). The success of this approach depends on the \nstrength of the PRS and the genetic architecture of the disease (42). IPF meets several criteria \nthat make it suitable for it since: 1) There is modest genetic heterogeneity in this disease and \nthe number of susceptibility genes appears to be markedly less than other complex diseases \n(26); 2) Its etiology is driven by both rare, highly penetrant variants which explain monogenic-\nlike presentations, and common variants with low effect sizes that contribute for a polygenic \ndisorder (8); and 3) the PRS-IPF accurately identifies individuals at high risk of suffering \ninterstitial lung abnormalities (ILA) and IPF (44). In agreement, we have found that PRS-IPF \nvalues are inversely associated with the likelihood that QVs are present in the patient. This \nsupports the idea that PRS could serve as a valuable tool for prioritizing those patients who \nshould undergo a deeper sequence-based analysis. \nWe acknowledge some limitations. First, we are aware that other genes not considered for the \nstudy are also involved in IPF risk. To test the hypothesis, the analyses were restricted to a very \nlimited number of genes showing dominant inheritance and presumably high penetrance. This \nresulted in the exclusion of well-defined monogenic PF genes, such as ABCA3 which shows \nrecessive inheritance. In addition, there is not co-segregation data, TL measures, or functional \nevidence to accurately classify most QVs as P or LP . Therefore, we recognize that some variants \ncategorized as QVs may be VUS or benign. To address this limitation, we have provided \nalternative QV definitions based on different in silico predictors and AF cutoffs which \nconsistently showed the same direction of effect in the survival analysis. For PRS analyses, we \nrelied on simple models based on sentinel variants from existing GWAS. For IPF, this implies \nthat the PRS-IPF results are mainly driven by the effect of MUC5B. However, a recent study \nproved that a common genetic variant score complements the MUC5B variant in accurately \nidentifying individuals at high risk of suffering ILA and IPF (44). Therefore, the use of whole-\ngenome PRS instead of the sentinel variant-based PRS model might have benefits by offering \nrobust associations across cohorts. Moreover, it is important to highlight that most IPF GWAS \nconducted to date mainly involve participants of European genetic ancestry. As a result, the \nPRS-IPF derived in this study might be much less accurate for non-European ancestry \nparticipants of the cohorts. Finally, despite the main findings were consistent across PFFPR and \nPROFILE cohorts, we acknowledge that there were differences in the definition of the primary \n . CC-BY 4.0 International licenseIt is made available under a \nperpetuity. \n is the author/funder, who has granted medRxiv a license to display the preprint in(which was not certified by peer review)preprint \nThe copyright holder for thisthis version posted October 15, 2024. ; https://doi.org/10.1101/2024.10.12.24315151doi: medRxiv preprint \n\n14 \n \nendpoint or the received antifibrotic treatment between the two studies which could have \ninfluenced the results. Additionally, patient recruitment in PROFILE started before the approval \nof pirfenidone and nintedanib, which might explain the reduced median survival of patients \ncompared to PFFPR. These differences might explain some of the inconsistencies, such as the \nassociation with survival found for PARN or TERT QV carriers only in one of the cohorts but not \nin the other.  \n \nCONCLUSIONS \nWe found a robust association of telomere and non-telomere gene QVs with IPF patient \nsurvival. We also show that those QVs were non-additive to common IPF risk variants on \nsurvival. This study highlights the potential significance of identifying QVs in telomere and non-\ntelomere genes linked to monogenic forms of PF in clinical practice. \n \nACKNOWLEDGEMENTS \nWe thank all patients who participated in the PFF Patient Registry. We also thank investigators \nand other staff at participating PFF Care Centers for providing clinical data and blood samples, \nthe PFF which established and has maintained the Patient Registry since 2016, and lastly, the \nmany generous donors. \n \nDATA AVAILABILITY \nData supporting the findings are available as part of the manuscript or from the supplementary \nfiles. Access to the raw whole-genome sequence dataset is restricted to qualified researchers \nunder an agreement with the PFFPR and PROFILE steering committees to protect the privacy of \nthe participants. For further information and to apply for access to the data from the PFFPR prior \nto public deposit in BioLINCC, please contact the chair of the ancillary committee ( Dr. Noth) or \nany other member of the steering committee  (https://www.pulmonaryfibrosis.org/pff-\nregistry/pff-patient-registry). For further information and to apply for access to the PROFILE data, \nplease contact admin.mtwc@imperial.ac.uk . Data access requests must be reviewed before \nrelease. \n \nAUTHOR CONTRIBUTIONS \nConceptualization and supervision: A.A.G., I.N., and C.F.  Patient recruitment, collection of \nbiospecimen, and clinical data: J.M.O, P .L.M., J.A.K., A.A., F.J.M., T.M.M., T.S.B., R.G.J., I.N., I.S., \nB.Y ., S.F.M., E.S., J.M., J.S.K., Y .H., L.V.W., R.J.A., D.Z., and C.G. Formal analysis: A.A.G., D.J., \nJ.M.L.S., and D.Z. Supervision: I.N. and C.F. Writing-original draft: A.A.G. and C.F. Funding \nacquisition: C.F., L.V .W., and I.N. Visualization: A.A.G. Writing-review and editing: All the \nauthors. \n \n . CC-BY 4.0 International licenseIt is made available under a \nperpetuity. \n is the author/funder, who has granted medRxiv a license to display the preprint in(which was not certified by peer review)preprint \nThe copyright holder for thisthis version posted October 15, 2024. ; https://doi.org/10.1101/2024.10.12.24315151doi: medRxiv preprint \n\n15 \n \nFUNDING \nInstituto de Salud Carlos III (PI20/00876, PI23/00980), co-financed by the European Regional \nDevelopment Funds (ERDF), “A way of making Europe” from the EU; ITER agreements \n(OA17/008 and OA23/043); Cabildo Insular de Tenerife (CGIEU0000219140 and A0000014697); \nNIH/NHLBI grants UG3HL145266 and R01HL171918. LVW reports funding from the Medical \nResearch Council (MR/V00235X/1). \n \nREFERENCES \n1. Lederer DJ, Martinez FJ. Idiopathic Pulmonary Fibrosis. N Engl J Med. 2018 May \n10;378(19):1811–23.  \n2. Pires FS, Damas C, Mota P , Melo N, Costa D, Jesus JM, et al. Slow versus rapid progressors in \nidiopathic pulmonary fibrosis. European Respiratory Journal [Internet]. 2011 Sep 1 [cited 2024 \nMar 14];38(Suppl 55). Available from: https://erj.ersjournals.com/content/38/Suppl_55/p656 \n3. Allen RJ, Porte J, Braybrooke R, Flores C, Fingerlin TE, Oldham JM, et al. Genetic variants \nassociated with susceptibility to idiopathic pulmonary fibrosis in people of European ancestry: \na genome-wide association study. The Lancet Respiratory medicine. 2017 Nov;5(11):869–80.  \n4. Dhindsa RS, Mattsson J, Nag A, Wang Q, Wain LV, Allen R, et al. Identification of a missense \nvariant in SPDL1 associated with idiopathic pulmonary fibrosis. Commun Biol. 2021 Mar \n23;4(1):1–8.  \n5. Fingerlin TE, Murphy E, Zhang W, Peljto AL, Brown KK, Steele MP , et al. Genome-wide \nassociation study identifies multiple susceptibility loci for pulmonary fibrosis. Nature Genetics. \n2013;45(6):613–20.  \n6. Partanen JJ, Häppölä P , Zhou W, Lehisto AA, Ainola M, Sutinen E, et al. Leveraging global \nmulti-ancestry meta-analysis in the study of idiopathic pulmonary fibrosis genetics. Cell \nGenomics. 2022;2(10):100181.  \n7. Zhang D, Newton CA, Wang B, Povysil G, Noth I, Martinez FJ, et al. Utility of whole genome \nsequencing in assessing risk and clinically relevant outcomes for pulmonary fibrosis. European \nRespiratory Journal [Internet]. 2022 Dec 1 [cited 2023 Oct 12];60(6). Available from: \nhttps://erj.ersjournals.com/content/60/6/2200577 \n8. Peljto AL, Blumhagen RZ, Walts AD, Cardwell J, Powers J, Corte TJ, et al. Idiopathic \nPulmonary Fibrosis Is Associated with Common Genetic Variants and Limited Rare Variants. Am \nJ Respir Crit Care Med. 2023 May 1;207(9):1194–202.  \n9. Raghu G, Remy-Jardin M, Richeldi L, Thomson CC, Inoue Y , Johkoh T, et al. Idiopathic \nPulmonary Fibrosis (an Update) and Progressive Pulmonary Fibrosis in Adults: An Official \nATS/ERS/JRS/ALAT Clinical Practice Guideline. Am J Respir Crit Care Med. 2022 May \n1;205(9):e18–47.  \n10. Peljto AL, Zhang Y , Fingerlin TE, Ma SF, Garcia JGN, Richards TJ, et al. Association between \nthe MUC5B promoter polymorphism and survival in patients with idiopathic pulmonary \nfibrosis. JAMA. 2013 Jun;309(21):2232–9.  \n . CC-BY 4.0 International licenseIt is made available under a \nperpetuity. \n is the author/funder, who has granted medRxiv a license to display the preprint in(which was not certified by peer review)preprint \nThe copyright holder for thisthis version posted October 15, 2024. ; https://doi.org/10.1101/2024.10.12.24315151doi: medRxiv preprint \n\n16 \n \n11. Cai S, Allen RJ, Wain LV, Dudbridge F. Reassessing the association of MUC5B with survival in \nidiopathic pulmonary fibrosis. Ann Hum Genet. 2023 Sep;87(5):248–53.  \n12. Allen RJ, Oldham JM, Jenkins DA, Leavy OC, Guillen-Guio B, Melbourne CA, et al. \nLongitudinal lung function and gas transfer in individuals with idiopathic pulmonary fibrosis: a \ngenome-wide association study. The Lancet Respiratory Medicine [Internet]. 2022 Nov; \nAvailable from: https://doi.org/10.1016/S2213-2600(22)00251-X \n13. Oldham JM, Allen RJ, Lorenzo-Salazar JM, Molyneaux PL, Ma SF, Joseph C, et al. PCSK6 and \nSurvival in Idiopathic Pulmonary Fibrosis. Am J Respir Crit Care Med. 2023 Jun 1;207(11):1515–\n24.  \n14. Ley B, Torgerson DG, Oldham JM, Adegunsoye A, Liu S, Li J, et al. Rare Protein-Altering \nTelomere-related Gene Variants in Patients with Chronic Hypersensitivity Pneumonitis. Am J \nRespir Crit Care Med. 2019 Nov 1;200(9):1154–63.  \n15. Newton CA, Batra K, Torrealba J, Kozlitina J, Glazer CS, Aravena C, et al. Telomere-related \nlung fibrosis is diagnostically heterogeneous but uniformly progressive. European Respiratory \nJournal. 2016;48(6):1710–20.  \n16. Borie R, Kannengiesser C, Hirschi S, Pavec JL, Mal H, Bergot E, et al. Severe hematologic \ncomplications after lung transplantation in patients with telomerase complex mutations. The \nJournal of heart and lung transplantation : the official publication of the International Society \nfor Heart Transplantation. 2015 Apr;34(4):538–46.  \n17. Stuart BD, Lee JS, Kozlitina J, Noth I, Devine MS, Glazer CS, et al. Effect of telomere length \non survival in patients with idiopathic pulmonary fibrosis: An observational cohort study with \nindependent validation. The Lancet Respiratory Medicine. 2014;2(7):557–65.  \n18. Dressen A, Abbas AR, Cabanski C, Reeder J, Ramalingam TR, Neighbors M, et al. Analysis of \nprotein-altering variants in telomerase genes and their association with MUC5B common \nvariant status in patients with idiopathic pulmonary fibrosis: a candidate gene sequencing \nstudy. Lancet Respir Med. 2018 Aug;6(8):603–14.  \n19. Wang BR, Edwards R, Freiheit EA, Ma Y , Burg C, de Andrade J, et al. The Pulmonary Fibrosis \nFoundation Patient Registry. Rationale, Design, and Methods. Ann Am Thorac Soc. 2020 \nDec;17(12):1620–8.  \n20. Maher TM. PROFILEing idiopathic pulmonary fibrosis: rethinking biomarker discovery. \nEuropean Respiratory Review. 2013 Jun 1;22(128):148–52.  \n21. Maher TM, Oballa E, Simpson JK, Porte J, Habgood A, Fahy WA, et al. An epithelial \nbiomarker signature for idiopathic pulmonary fibrosis: an analysis from the multicentre \nPROFILE cohort study. The Lancet Respiratory Medicine. 2017 Dec 1;5(12):946–55.  \n22. Zhang D, Povysil G, Kobeissy PH, Li Q, Wang B, Amelotte M, et al. Rare and Common \nVariants in KIF15 Contribute to Genetic Risk of Idiopathic Pulmonary Fibrosis. Am J Respir Crit \nCare Med. 2022 Jul 1;206(1):56–69.  \n23. Hollmén M, Laaka A, Partanen JJ, Koskela J, Sutinen E, Kaarteenaho R, et al. KIF15 missense \nvariant is associated with the early onset of idiopathic pulmonary fibrosis. Respir Res. 2023 Sep \n30;24(1):240.  \n . CC-BY 4.0 International licenseIt is made available under a \nperpetuity. \n is the author/funder, who has granted medRxiv a license to display the preprint in(which was not certified by peer review)preprint \nThe copyright holder for thisthis version posted October 15, 2024. ; https://doi.org/10.1101/2024.10.12.24315151doi: medRxiv preprint \n\n17 \n \n24. Harrison PW, Amode MR, Austine-Orimoloye O, Azov AG, Barba M, Barnes I, et al. Ensembl \n2024. Nucleic Acids Research. 2024 Jan 5;52(D1):D891–9.  \n25. Richards S, Aziz N, Bale S, Bick D, Das S, Gastier-Foster J, et al. Standards and guidelines for \nthe interpretation of sequence variants: a joint consensus recommendation of the American \nCollege of Medical Genetics and Genomics and the Association for Molecular Pathology. Genet \nMed. 2015 May;17(5):405–24.  \n26. Allen RJ, Stockwell A, Oldham JM, Guillen- B, Schwartz DA, Maher TM, et al. wide \nassociation study across five cohorts identifies five novel loci associated with idiopathic \npulmonary fibrosis. 2022;1–5.  \n27. Li C, Stoma S, Lotta LA, Warner S, Albrecht E, Allione A, et al. Genome-wide Association \nAnalysis in Humans Links Nucleotide Metabolism to Leukocyte Telomere Length. Am J Hum \nGenet. 2020 Mar 5;106(3):389–404.  \n28. Schwarzer G, Carpenter J, Rücker G. Meta-Analysis with R. 2015.  \n29. Cutting CC, Bowman WS, Dao N, Pugashetti JV, Garcia CK, Oldham JM, et al. Family History \nof Pulmonary Fibrosis Predicts Worse Survival in Patients With Interstitial Lung Disease. CHEST. \n2021 May 1;159(5):1913–21.  \n30. Petrovski S, Todd JL, Durheim MT, Wang Q, Chien JW, Kelly FL, et al. An Exome Sequencing \nStudy to Assess the Role of Rare Genetic Variation in Pulmonary Fibrosis. American journal of \nrespiratory and critical care medicine. 2017 Jul;196(1):82–93.  \n31. Liu Q, Zhou Y , Cogan JD, Mitchell DB, Sheng Q, Zhao S, et al. The Genetic Landscape of \nFamilial Pulmonary Fibrosis. Am J Respir Crit Care Med. 2023 May 15;207(10):1345–57.  \n32. Justet A, Klay D, Porcher R, Cottin V, Ahmad K, Molina MM, et al. Safety and efficacy of \npirfenidone and nintedanib in patients with idiopathic pulmonary fibrosis and carrying a \ntelomere-related gene mutation. European Respiratory Journal. 2021;57(2):4–7.  \n33. Richeldi L, du Bois RM, Raghu G, Azuma A, Brown KK, Costabel U, et al. Efficacy and safety \nof nintedanib in idiopathic pulmonary fibrosis. N Engl J Med. 2014 May 29;370(22):2071–82.  \n34. King TE, Bradford WZ, Castro-Bernardini S, Fagan EA, Glaspole I, Glassberg MK, et al. A \nphase 3 trial of pirfenidone in patients with idiopathic pulmonary fibrosis. N Engl J Med. 2014 \nMay 29;370(22):2083–92.  \n35. Bowman WS, Newton CA, Linderholm AL, Neely ML, Pugashetti JV, Kaul B, et al. Proteomic \nbiomarkers of progressive fibrosing interstitial lung disease: a multicentre cohort analysis. The \nLancet Respiratory Medicine. 2022 Jun 1;10(6):593–602.  \n36. Armanios MY , Chen JJL, Cogan JD, Alder JK, Ingersoll RG, Markin C, et al. Telomerase \nMutations in Families with Idiopathic Pulmonary Fibrosis. New England Journal of Medicine. \n2007;356(13):1317–26.  \n37. Stuart BD, Choi J, Zaidi S, Xing C, Holohan B, Chen R, et al. Exome sequencing links \nmutations in PARN and RTEL1 with familial pulmonary fibrosis and telomere shortening. Nature \nGenetics. 2015;47(5):512–7.  \n . CC-BY 4.0 International licenseIt is made available under a \nperpetuity. \n is the author/funder, who has granted medRxiv a license to display the preprint in(which was not certified by peer review)preprint \nThe copyright holder for thisthis version posted October 15, 2024. ; https://doi.org/10.1101/2024.10.12.24315151doi: medRxiv preprint \n\n18 \n \n38. Leavy OC, Ma SF, Molyneaux PL, Maher TM, Oldham JM, Flores C, et al. Proportion of \nIdiopathic Pulmonary Fibrosis Risk Explained by Known Common Genetic Loci in European \nPopulations. American Journal of Respiratory and Critical Care Medicine. 2020 Nov;203(6):775–\n8.  \n39. Pulmonary fibrosis in non-mutation carriers of families with short telomere syndrome gene \nmutations - PubMed [Internet]. [cited 2024 Jun 19]. Available from: \nhttps://pubmed.ncbi.nlm.nih.gov/34580961/ \n40. Borie R, Kannengiesser C, Antoniou K, Bonella F, Crestani B, Fabre A, et al. European \nRespiratory Society statement on familial pulmonary fibrosis. European Respiratory Journal \n[Internet]. 2023 Mar 1 [cited 2023 Oct 12];61(3). Available from: \nhttps://erj.ersjournals.com/content/61/3/2201383 \n41. Zhang D, Eckhardt CM, McGroder C, Benesh S, Porcelli J, Depender C, et al. Clinical Impact \nof Telomere Length Testing for Interstitial Lung Disease. CHEST [Internet]. 2024 Jun 28 [cited \n2024 Jul 2];0(0). Available from: https://journal.chestnet.org/article/S0012-3692(24)00808-\n0/abstract \n42. Lu T, Zhou S, Wu H, Forgetta V, Greenwood CMT, Richards JB. Individuals with common \ndiseases but with a low polygenic risk score could be prioritized for rare variant screening. \nGenet Med. 2021 Mar;23(3):508–15.  \n43. Darst BF , Sheng X, Eeles RA, Kote-Jarai Z, Conti DV, Haiman CA. Combined Effect of a \nPolygenic Risk Score and Rare Genetic Variants on Prostate Cancer Risk. Eur Urol. 2021 \nAug;80(2):134–8.  \n44. Moll M, Peljto AL, Kim JS, Xu H, Debban CL, Chen X, et al. A Polygenic Risk Score for \nIdiopathic Pulmonary Fibrosis and Interstitial Lung Abnormalities. Am J Respir Crit Care Med. \n2023 Oct;208(7):791–801.  \n \n \n \n \n \n \n \n \n \n \n \n \n \n . CC-BY 4.0 International licenseIt is made available under a \nperpetuity. \n is the author/funder, who has granted medRxiv a license to display the preprint in(which was not certified by peer review)preprint \nThe copyright holder for thisthis version posted October 15, 2024. ; https://doi.org/10.1101/2024.10.12.24315151doi: medRxiv preprint \n\n19 \n \nTABLES \n \nTable 1. Count of qualifying variants in monogenic adult-onset PF genes \nidentified in the PFFPR patients.*  \nGene Number of \nvariants \nP/LP VUS Gene category \nKIF15 15 3 12 Non_Telomere \nSPDL1 8 2 7 Non_Telomere \nSFTPC 5 0 5 Non_Telomere \nSFTPA2 1 0 1 Non_Telomere \nSFTPA1 1 0 1 Non_Telomere \nZCCHC8 5 1 4 Telomere \nTINF2 3 3 0 Telomere \nPARN 16 12 4 Telomere \nRTEL1 33 18 15 Telomere \nDKC1 1 0 1 Telomere  \nTERC 4 0 4 Telomere \nNAF1 6 1 5 Telomere \nTERT 31 18 13 Telomere \nTotal variants 131 58 73 \n \n*Filtered by frequency (AF< 0.0005) and CADD score (>15). P/LP, Pathogenic or likely \npathogenic; VUS, variant of uncertain significance. \n \n . CC-BY 4.0 International licenseIt is made available under a \nperpetuity. \n is the author/funder, who has granted medRxiv a license to display the preprint in(which was not certified by peer review)preprint \nThe copyright holder for thisthis version posted October 15, 2024. ; https://doi.org/10.1101/2024.10.12.24315151doi: medRxiv preprint \n\n1 \n \nSupplementary material \nRare variants and survival of patients with idiopathic \npulmonary fibrosis \nAitana Alonso-Gonzalez1, David Jáspez2, José M. Lorenzo-Salazar2, Shwu-Fan Ma3, \nEmma Strickland3, Josyf Mychaleckyj4, John S. Kim3, Yong Huang3, Ayodeji \nAdegunsoye5, Justin M. Oldham6, Philip L. Molyneaux7,8, Toby Maher7,8,9, Louise V \nWain10,11, Richard Allen10, Martin D. Tobin10,11, Jonathan Kropski12, Brian Yaspan13, \nTimothy S. Blackwell12, David Zhang14, Christine Kim Garcia14,15, Fernando J. \nMartinez16, Imre Noth3, and Carlos Flores1,2,17,18 \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n . CC-BY 4.0 International licenseIt is made available under a \nperpetuity. \n is the author/funder, who has granted medRxiv a license to display the preprint in(which was not certified by peer review)preprint \nThe copyright holder for thisthis version posted October 15, 2024. ; https://doi.org/10.1101/2024.10.12.24315151doi: medRxiv preprint \n\n2 \n \n \nSupplementary methods .......................................................................................... 4 \nDescription of study cohorts .................................................................................................... 4 \nSupplementary bioinformatics methods ................................................................................. 4 \nSupplementary results .............................................................................................. 5 \nPrevalence of qualifying variants (QV) in PROFILE .................................................................. 5 \nSupplementary tables ............................................................................................... 6 \nSupplementary Table 1. Baseline characteristics and outcomes of IPF patients from stage one \nand stage two cohorts. ............................................................................................................... 6 \nSupplementary Table 2. Regions of interest (ROIs) for qualifying variants annotations in hg38. .. 7 \nSupplementary Table 3. Alternative definitions for qualifying variants and the rare synonymous \nused for sensitivity analyses. ....................................................................................................... 8 \nSupplementary Table 4. Common IPF risk variants and effects considered for PRS-IPF estimation.\n .................................................................................................................................................. 9 \nSupplementary Table 5. Common telomere length variants and effects considered for PRS-TL \nestimation................................................................................................................................ 10 \nSupplementary Figures ........................................................................................... 11 \nSupplementary Figure 1. Principal component analysis.. ......................................................... 11 \nSupplementary Figure 2. Distribution of qualifying variants (QV) in monogenic adult-onset \npulmonary fibrosis (PF) genes in the PFFPF and PROFILE cohorts. ........................................... 12 \nSupplementary Figure 3. Association between prevalence of qualifying variants (QV) and \nPRS-IPF in the PFFPR. .................................................................................................................. 13 \nSupplementary Figure 4. Association between prevalence of qualifying variants (QV) and \nPRS-IPF (after excluding the MUC5B locus) in the PFFPR. ......................................................... 14 \nSupplementary Figure 5. Association between the prevalence of qualifying variants (QV) and \nPRS-TL in the PFFPR. ................................................................................................................... 15 \nSupplementary Figure 6. Association between prevalence of qualifying variants (QV) in \ntelomere and non-telomere genes and PRS-TL in the PFFPR.. .................................................. 16 \nSupplementary Figure 7. Association between prevalence of qualifying variants (QV) in \ntelomere genes and PRS-TL in the PFFPR. ................................................................................. 17 \nSupplementary Figure 8. Kaplan-Meier survival analysis for qualifying variants (QV) (per gene \nand group of genes) and the MUC5B risk allele in the PFFPR. p-values for the log-rank test are \nshown. ......................................................................................................................................... 18 \nSupplementary Figure 9. Qualifying variants (QV) effect on survival in the PFFPR (excluding \ncarriers of QV within PARN). ...................................................................................................... 19 \nSupplementary Figure 10. Alternative qualifying variants (QV) classifications and effects on \nsurvival in the PFFPR. ................................................................................................................. 20 \nSupplementary Figure 11. Association between PRS-TL tertiles and survival in the PFFPR.. .. 21 \nSupplementary Figure 12. Association between high and low PRS-TL and survival in the \nPFFPR.. ........................................................................................................................................ 22 \nSupplementary Figure 13. Association of PRS-IPF (after excluding the MUC5B locus) and \nsurvival in the PFFPR. Kaplan-Meier analysis showing p-values for the log-rank test. ............ 23 \nSupplementary Figure 14. Associations between PRS-IPF and MUC5B rs35705950 genotypes \nwith survival among carriers and non-carriers of qualifying variants (QV) in the PFFPR.. ...... 24 \n . CC-BY 4.0 International licenseIt is made available under a \nperpetuity. \n is the author/funder, who has granted medRxiv a license to display the preprint in(which was not certified by peer review)preprint \nThe copyright holder for thisthis version posted October 15, 2024. ; https://doi.org/10.1101/2024.10.12.24315151doi: medRxiv preprint \n\n3 \n \nSupplementary Figure 15. Association between prevalence of qualifying variants (QV) and \nPRS-IPF in PROFILE. ..................................................................................................................... 25 \nSupplementary Figure 16. Kaplan-Meier survival analysis for qualifying variants (QV) (per \ngene and group PF genes) and the MUC5B risk allele in PROFILE. p-values for the log-rank test \nare shown. .................................................................................................................................. 26 \n \nSupplementary references ...................................................................................... 27 \n \n  \n . CC-BY 4.0 International licenseIt is made available under a \nperpetuity. \n is the author/funder, who has granted medRxiv a license to display the preprint in(which was not certified by peer review)preprint \nThe copyright holder for thisthis version posted October 15, 2024. ; https://doi.org/10.1101/2024.10.12.24315151doi: medRxiv preprint \n\n4 \n \nSupplementary methods \nDescription of study cohorts \nThe Pulmonary Fibrosis Foundation Patient Registry (PFFPR) is a large multicentre based \nregistry that collects baseline and longitudinal demographic and clinical information about \nwell-characterized patients with interstitial lung diseases (ILD) in the United States since March \n2016 to allow retrospective and prospective research1. In addition, the PFFPR major objective is \nto apply blood-based omics technologies (whole-genome sequencing [WGS], proteomic \nanalysis, and transcriptional profiling) on blood samples from patients to study molecular \nmarkers of the onset or progression of diseases. Patients aged 18 years old who has ILD \ndiagnosed and had not undergone lung transplantation were recruited from approximately 42 \nUSA sites selected primarily from the familial pulmonary fibrosis (FPF) Care Center Network. \nThey were followed for the progression of the disease through the lifetime of the PFFPR or the \npatient until the patient receives lung transplant. More details of the PFFPR including inclusion \nand exclusion criteria as well as collected clinical variables are described elsewhere1. The PFFPR \ncohort includes 1317 individuals with ILD for whom WGS data are available. For this study, we \nincluded the 917 PFFPR patients with a definitive IPF diagnosis. Family history was available for \nall of them although no genetic causes were previously assessed. After the quality control \nprocedures, 888 of those patients remained in the study (Figure 1).  \nThe PROFILE is a UK large, prospective, multicentre, longitudinal study conducted on patients \nwith fibrotic ILD2,3. The cohort includes 541 patients with IPF or idiopathic non-specific \ninterstitial pneumonia aged 18-85 recruited from tertiary specialist ILD and from local \nsecondary care hospitals. Blood samples for genomic analysis were collected and they were \nfollowed for disease progression through 3 years. After quality control steps, the second stage \nof the study included 472 patients with a confirmed diagnosis of IPF (Figure 1).  \n \nBaseline characteristics of the PFFPR and the PROFILE cohorts are listed in Supplementary \nTable 1.  \nSupplementary bioinformatics methods \nIn both cohorts, several quality control (QC) analyses were performed: (i) detection of QC \noutliers, (ii) the kinship between patients, (iii) sample cross-contamination, and (iv) sex \ndiscordance. We used a combination of DRAGEN metrics, and assessments with PLINK \nv1.90b6.244, SCE-VCF v0.1.2 (https://github.com/HTGenomeAnalysisUnit/SCE-VCF), Somalier \nv0.2.195, and KING v2.3.26. \nDetection of QC outliers: Based on PLINK analysis, we detected abnormal heterozygosity rate \nand genotyping call rate to infer potential sample contaminations and/or a low DNA \nconcentration. A heterozygosity rate value ± 3 standard deviations from the mean and/or \ngenotype call rate <0.95 were considered as outliers. \nKinship between patients: We detected duplicates or monozygotic twins, and first-degree \nkinship relationships with three different tools: we considered two samples as duplicates or \nobtained from monozygotic twins if a PI_HAT value was >0.9 for PLINK, a Somalier relatedness \nvalue >0.9, and a KING kinship coefficient >0.354. We considered as first-degree relatives a \nPI_HAT in the range of 0.4-0.6 for PLINK, a Somalier relatedness value in the range of 0.4-0.6, \nand a KING kinship coefficient in the range of 0.177-0.354. We found a complete consensus \namong these tools in the cohort. Second-degree relatives were not detected. \n . CC-BY 4.0 International licenseIt is made available under a \nperpetuity. \n is the author/funder, who has granted medRxiv a license to display the preprint in(which was not certified by peer review)preprint \nThe copyright holder for thisthis version posted October 15, 2024. ; https://doi.org/10.1101/2024.10.12.24315151doi: medRxiv preprint \n\n5 \n \nSample cross-contamination: We used the “estimated_sample_contamination” parameter \nfrom DRAGEN metrics to exclude samples with evidence ≥2% of contamination. We also used \nSCE-VCF tool, which estimates contamination from VCF files using the CHARR method7, based \non the recommended thresholds to consider a sample as contaminated (CHARR > 0.03 and \nINCONSISTENT_AB_HET_RATE > 0.15). We found a complete consensus among these tools in \nthe identification of potential sample contamination in the PFF-PR. For PROFILE, we only relied \non SCE-VCF for the sample cross-contamination inference. \nSex discordance: Biological sex inference from genetic data was obtained with Somalier \nfollowing recommendations. For that we compared the scaled mean depth on X and Y \nchromosomes for 365 and 17 genomic positions, respectively. Sex discordance, identified by \ncomparing the genetically inferred sex with that recorded, was also used to exclude patients \nfrom the study. In the PFF-PR, a female was identified as a possible X0 aneuploid due to the \nlow number of heterozygous sites on the X chromosome and was excluded from the analysis. \nSupplementary results \nPrevalence of qualifying variants (QV) in PROFILE \nThe genes with the highest burden of QVs were: RTEL1 (20.5%), TERT (15.1%), and PARN \n(17.8%) (Supplementary Figure 2B, 2D). The prevalence of QVs among carriers of the risk \nMUC5B genotype (rs35705950-T) was lower (14.97%) than among those carrying the \nprotective GG genotype (16.85%), although the difference was not statistically significant \n(p=0.60).  We observed the same effect direction as in PFFPR when assessing the association \nbetween the lower PRS-IPF tertile and reduced survival (HR=1.49, 95% CI=0.14-1.95, p=3.1X10-\n3) (Figure 3B). \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n . CC-BY 4.0 International licenseIt is made available under a \nperpetuity. \n is the author/funder, who has granted medRxiv a license to display the preprint in(which was not certified by peer review)preprint \nThe copyright holder for thisthis version posted October 15, 2024. ; https://doi.org/10.1101/2024.10.12.24315151doi: medRxiv preprint \n\n6 \n \nSupplementary tables \n \nSupplementary Table 1. Baseline characteristics and outcomes of IPF patients from stage one and \nstage two cohorts. \nCharacteristics PFFPR (n=888)* PROFILE (n=472)$ \nAge, yr, mean (SD) 71.02 (7.8) 70.65 (7.9) \nMale, n (%) 676 (76.1%) 366 (77.5%) \nEthnicity, n (%) \nUnknown \nAsian \nBlack \nWhite \n \n17 \n23 \n10 \n838 \n- \nEver smoker, n, (%) 571 (64.3%) 326 (69.1%) \nFamilial history, n, (%) 176 (19.8 %) - \nFVC% predicted, mean (SD) 67.74 (16.79) 78.97 (19.01) \nDLCO% predicted, mean (SD) 29.3 (4.84) 44.97 (14.98) \nDead, n, (%) 337 (37.9%) 346 (73.3%) \nTransplant, n (%) 139 (15.6%) - \nMean survival in years (IQR) 4.86 (3.31-6.93) 3.03 (1.7-5.71) \nMUC5B genotype with risk allele 622 (70%) 294 (62.3%) \nAbbreviations: PFFPR, The Pulmonary Fibrosis Foundation Patient Registry; SD, standard deviation; FVC, \nForced vital capacity; DLCO, predicted diffusing capacity of the lungs for monoxide; IQR=interquartile \nrange. *Missing data: FVC predicted (n=41) and DLCO predicted (n=68); $Missing data: FVC predicted \n(n=12) and DLCO predicted (n=50). \n \n \n \n \n \n \n \n \n \n \n \n \n \n . CC-BY 4.0 International licenseIt is made available under a \nperpetuity. \n is the author/funder, who has granted medRxiv a license to display the preprint in(which was not certified by peer review)preprint \nThe copyright holder for thisthis version posted October 15, 2024. ; https://doi.org/10.1101/2024.10.12.24315151doi: medRxiv preprint \n\n7 \n \n \nSupplementary Table 2. Regions of interest (ROIs) for qualifying \nvariants annotations in hg38. \nChromosome Gene Start End \n5 TERT 1,253,047 1,295,168 \n3 TERC 169,764,420 169,765,160 \n14 TINF2 24,238,186 24,242,763 \nX DKC1 154,762,642 154,777,789 \n20 RTEL1 63,657,710 63,696,353 \n16 PARN 14,435,600 14,632,828 \n4 NAF1 163,109,973 163,166,990 \n12 ZCCHC8 122,471,500 122,501,032 \n8 SFTPC 22,156,813 22,164,579 \n10 SFTPA2 79,555,752 79,560,507 \n10 SFTPA1 79,610,839 79,615,555 \n5 SPDL1 169,583,536 169,604,878 \n3 KIF15 44,761,621 44,873,476 \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n . CC-BY 4.0 International licenseIt is made available under a \nperpetuity. \n is the author/funder, who has granted medRxiv a license to display the preprint in(which was not certified by peer review)preprint \nThe copyright holder for thisthis version posted October 15, 2024. ; https://doi.org/10.1101/2024.10.12.24315151doi: medRxiv preprint \n\n8 \n \n \nSupplementary Table 3. Alternative definitions for qualifying variants and the rare \nsynonymous used for sensitivity analyses. \n \nUltra-\nrare \nPTV \nUltra-rare \nEnsemble#  \n(PTV + \nMissense + \nIndel) \nRare  \nPTV \nonly \nRare  \nEnsemble# \n(PTV + \nMissense + \nIndel) \nSemi-\nrare \nPTV \nonly \nSemi-rare \nEnsemble#  \n(PTV + \nMissense + \nIndel) \nRare \nsynonymous^ \nMissense \nAF* - 0 - 0.0005 - 0.01 - \nPTV AF* 0 0 0.001 0.001 0.01 0.01 - \nConsensus in silico prediction for missense:& \nPolyphen2 \nHumdiv - Probably - Probably - Probably - \nREVEL - >0.5 - >0.5 - >0.5 - \nPrimateAI \n - >0.8 - >0.8 - >0.8 - \nVariants (n) 13 30 28 67 29 78 38 \n*Below threshold for any population in gnomAD v2.1 exomes (AFR, AMR, ASJ, EAS, FIN, NFE, OTH, SAS) or \ngnomAD v3.2 genomes (AFR, AMR, ASJ, EAS, FIN, MID, NFE, OTH, SAS) or in The 1000 Genomes Project Phase 3 \ngenomes (AFR, AMR, EAS, EUR, SAS). \n&Consensus of three predictors (Polyphen2, REVEL, PrimateAI) for missense variants only if >2 out of 3, or 2 out \nof 2, or 1 out of 1 filters pass. Some predictors may have missing values. \n^Allele frequency cutoff of 0.0005 in any population in gnomAD v2.1 exomes (AFR, AMR, ASJ, EAS, FIN, NFE, OTH, \nSAS) or gnomAD v3.2 genomes (AFR, AMR, ASJ, EAS, FIN, MID, NFE, OTH, SAS) or in The 1000 Genomes Project \nPhase 3 genomes (AFR, AMR, EAS, EUR, SAS), only synonymous variants. \n#Ensemble models include non-coding TERC variants selected if passing missense AF level and involved in \nintramolecular base-pairing or previously described in pulmonary fibrosis or dyskeratosis congenita or hoyeraal \nhreidarsson. \nPTV: Protein truncating variants. \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n . CC-BY 4.0 International licenseIt is made available under a \nperpetuity. \n is the author/funder, who has granted medRxiv a license to display the preprint in(which was not certified by peer review)preprint \nThe copyright holder for thisthis version posted October 15, 2024. ; https://doi.org/10.1101/2024.10.12.24315151doi: medRxiv preprint \n\n9 \n \n \nSupplementary Table 4. Common IPF risk variants and effects considered for PRS-IPF estimation. \nLocus SNP ID Chr. POSITION \n(hg38) \nEFFECT NON EFFECT OR P \nKIF15 rs141979279 3 44,816,639 C T 1.50 1.21x10-10 \nTERC rs10936601 3 169,810,661 C T 0.79 2.10x10-15 \nFAM13A rs2013701 4 88,963,935 G T 1.25 4.60x10-16 \nTERT rs7725218 5 1,282,299 G A 1.41 4.90x10-32 \nDSP rs2076295 6 7,562,999 G T 1.49 1.50x10-48 \nMAD1L1 rs12699415 7 1,869,843 A G 1.27 7.85x10-18 \nZKSCAN1 rs2897075 7 100,032,719 T C 1.30 1.77x10-21 \nDEPTOR rs28513081 8 119,921,886 A G 1.20 1.22x10-9 \n10q25.1 rs79684490 10 109,470,103 A G 1.40 3.52x10-8 \nMUC5B rs35705950 11 1,219,991 T G 5.06 9.09x10-418 \nATP11A rs12585036 13 112,881,427 C T 1.29 5.99x10-14 \nIVD rs59424629 15 40,428,343 T G 1.27 4.98x10-19 \nKNL1 rs12912339 15 40,639,510 A G 1.30 7.41x10-13 \nAKAP13 rs62023891 15 85,553,985 A G 1.18 1.32x10-8 \nNPRL3 rs74614704 16 112,241 A G 1.49 2.57x10-12 \n17q21.31 rs3785884 17 45,980,229 G A 1.40 2.53x10-20 \nDPP9 rs35574495 19 4,686,976 G T 0.80 1.08x10-9 \nSTMN3 rs112087793 20 63,652,817 C T 1.34 1.09x10-8 \nRTEL1 rs41308092 20 63,693,038 A G 1.75 3.13x10-9 \nSNP: Single nucleotide polymorphism; Chr.: chromosome; OR: odds ratio; P: significance in the original study \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n . CC-BY 4.0 International licenseIt is made available under a \nperpetuity. \n is the author/funder, who has granted medRxiv a license to display the preprint in(which was not certified by peer review)preprint \nThe copyright holder for thisthis version posted October 15, 2024. ; https://doi.org/10.1101/2024.10.12.24315151doi: medRxiv preprint \n\n10 \n \n \nSupplementary Table 5. Common telomere length variants and effects considered for \nPRS-TL estimation. \nLocus SNP ID CHR POSITION \n(hg38) EFFECT NON \nEFFECT BETA P \nPARP1 rs3219104 1 226374920 C A 0.042 9.60 x 10-11 \nTERC rs10936600 3 169796797 T A -0.086 7.18 x 10-51 \nNAF1 rs4691895 4 163127047 C G 0.058 1.58 x 10-21 \nTERT rs7705526 5 1285859 A C 0.082 5.34 x 10-45 \nTERT rs2853677 5 1287079 A G -0.064 3.35 x 10-31 \nPOT1 rs59294613 7 124914213 A C -0.041 1.17 x 10-13 \nSTN1 rs9419958 10 103916188 C T -0.064 5.05 x 10-19 \nATM rs228595 11 108234866 A G -0.029 1.43 x 10-8 \nDCAF4 rs2302588 14 72938044 C G 0.048 1.68 x 10-8 \nMPHOSPH6 rs7194734 16 82166375 T C -0.037 6.94 x 10-10 \nZNF208 rs8105767 19 22032639 G A 0.039 5.42 x 10-13 \nRTEL1/ \nSTMN3 rs75691080 20 63638397 T C -0.067 5.99 x 10-14 \nRTEL1 rs34978822 20 63660246 G C -0.140 7.26 x 10-10 \nRTEL1/ \nZBTB46 rs73624724 20 63805045 C T 0.051 6.33 x 10-12 \nSENP7 Rs551442 3 101346524 T C -0.037 2.45 x 10-8 \nMOB1B rs13137667 4 70908630 C T 0.077 2.43 x 10-8 \nCARMIL1 rs34991172 6 25480100 G T -0.061 6.19 x 10-9 \nPRRC2A rs2736176 6 31619784 C G 0.035 3.53 x 10-10 \nTERF2 rs3785074 16 69373083 G A 0.035 4.64 x 10-10 \nRFWD3 rs62053580 16 74646176 G A -0.039 4.08 x 10-8 \nSNP: Single nucleotide polymorphism; CHR: chromosome; OR: odds ratio; P: significance in the original study \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n . CC-BY 4.0 International licenseIt is made available under a \nperpetuity. \n is the author/funder, who has granted medRxiv a license to display the preprint in(which was not certified by peer review)preprint \nThe copyright holder for thisthis version posted October 15, 2024. ; https://doi.org/10.1101/2024.10.12.24315151doi: medRxiv preprint \n\n11 \n \nSupplementary Figures \n \n \nSupplementary Figure 1. Principal component analysis. A) Plot of the first two (left) and the second and third (right) \nprincipal components of genetic variation of IPF patients in the PFFPR. B) Plot of the first two (left) and the second \nand third (right) principal components of genetic variation of IPF patient in PROFILE. C) Proportion of variance \nexplained by each PC (PFFPR on the right, and PROFILE on the left). \n \n \n \n \n . CC-BY 4.0 International licenseIt is made available under a \nperpetuity. \n is the author/funder, who has granted medRxiv a license to display the preprint in(which was not certified by peer review)preprint \nThe copyright holder for thisthis version posted October 15, 2024. ; https://doi.org/10.1101/2024.10.12.24315151doi: medRxiv preprint \n\n12 \n \n \nSupplementary Figure 2. Distribution of qualifying variants (QV) in monogenic adult-onset pulmonary fibrosis (PF) \ngenes in the PFFPF and PROFILE cohorts. A) Total QVs in monogenic adult-onset PF genes in the PFFPR. B) Total QVs \nin monogenic adult-onset PF genes in the PROFILE cohort. C) Variants classified in P/LP/VUS per gene in the PFFPR. \nD) Variants classified in P/LP/VUS per gene in the PROFILE cohort. T: Telomere; N-T: Non telomere. \n \n \n \n \n \n \n \n \n \n \n . CC-BY 4.0 International licenseIt is made available under a \nperpetuity. \n is the author/funder, who has granted medRxiv a license to display the preprint in(which was not certified by peer review)preprint \nThe copyright holder for thisthis version posted October 15, 2024. ; https://doi.org/10.1101/2024.10.12.24315151doi: medRxiv preprint \n\n13 \n \n \nSupplementary Figure 3. Association between prevalence of qualifying variants (QV) and PRS-IPF in the PFFPR. A) \nDistribution of carriers (1) and non-carriers (0) in low and high PRS-IPF. B) Risk of carrying a QV in patients with low \npolygenic risk in comparison with individuals with high polygenic risk. The odds ratio (OR) and the 95% confidence \ninterval (CI) were estimated using logistic regression adjusted by age of diagnosis, sex, and the two main principal \ncomponents. \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n . CC-BY 4.0 International licenseIt is made available under a \nperpetuity. \n is the author/funder, who has granted medRxiv a license to display the preprint in(which was not certified by peer review)preprint \nThe copyright holder for thisthis version posted October 15, 2024. ; https://doi.org/10.1101/2024.10.12.24315151doi: medRxiv preprint \n\n14 \n \n \nSupplementary Figure 4. Association between prevalence of qualifying variants (QV) and PRS-IPF (after excluding \nthe MUC5B locus) in the PFFPR. A) Distribution of PRS-IPF in carriers (1) and non-carriers (0). Vertical dotted lines \nrepresent the mean value of the distribution. B) Risk of carrying a QV for patients with low polygenic risk (T1) and \nhigh polygenic risk (T3) compared to those in the middle tertile. The odds ratios (OR) and the 95% confidence \nintervals (CI) were estimated using logistic regression adjusted for age of diagnosis, sex, and the two main principal \ncomponents. \n \n \n \n \n \n \n \n \n \n \n \n \n \n . CC-BY 4.0 International licenseIt is made available under a \nperpetuity. \n is the author/funder, who has granted medRxiv a license to display the preprint in(which was not certified by peer review)preprint \nThe copyright holder for thisthis version posted October 15, 2024. ; https://doi.org/10.1101/2024.10.12.24315151doi: medRxiv preprint \n\n15 \n \n \nSupplementary Figure 5. Association between the prevalence of qualifying variants (QV) and PRS-TL in the PFFPR. \nDistribution of PRS-TL in carriers (1) and non-carriers (0). Vertical dotted lines represent the mean value of the \ndistribution A) Carriers (1) and non-carriers (0) in telomere and non-telomere genes. B) Carriers (1) and non-carriers \n(0) in telomere genes. T-test: Student's t-test; KS: Kolmogorov-Smirnov test. \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n . CC-BY 4.0 International licenseIt is made available under a \nperpetuity. \n is the author/funder, who has granted medRxiv a license to display the preprint in(which was not certified by peer review)preprint \nThe copyright holder for thisthis version posted October 15, 2024. ; https://doi.org/10.1101/2024.10.12.24315151doi: medRxiv preprint \n\n16 \n \n \nSupplementary Figure 6. Association between prevalence of qualifying variants (QV) in telomere and non-\ntelomere genes and PRS-TL in the PFFPR. A) Distribution of carriers (1) and non-carriers (0) in PRS-TL tertiles. B) Risk \nof carrying a QV for individuals with low polygenic risk (T1) and high polygenic risk (T3) compared to those in the \nmiddle tertile. C) Distribution of carriers (1) and non-carriers (0) in high and low PRS-TL. D) Risk of carrying a QV in \npatients with high polygenic risk in comparison with patients with low polygenic risk. The odds ratios (OR) and the \n95% confidence intervals (CI) were estimated using logistic regression adjusted by age of diagnosis, sex, and the two \nmain principal components. \n \n \n \n \n \n \n \n \n \n \n . CC-BY 4.0 International licenseIt is made available under a \nperpetuity. \n is the author/funder, who has granted medRxiv a license to display the preprint in(which was not certified by peer review)preprint \nThe copyright holder for thisthis version posted October 15, 2024. ; https://doi.org/10.1101/2024.10.12.24315151doi: medRxiv preprint \n\n17 \n \n \nSupplementary Figure 7. Association between prevalence of qualifying variants (QV) in telomere genes and PRS-\nTL in the PFFPR. A) Distribution of carriers (1) and non-carriers (0) in PRS-TL tertiles. B) Risk of carrying a QV for \nindividuals with low polygenic risk (T1) and high polygenic risk (T3) compared to those in the middle tertile. C) \nDistribution of carriers (1) and non-carriers (0) in high and low PRS-TL. D) Risk of carrying a QV in individuals with \nhigh polygenic risk in comparison with individuals with low polygenic risk. The odds ratios (OR) and the 95% \nconfidence intervals (CI) were estimated using logistic regression adjusted by age of diagnosis, sex, and the two main \nprincipal components. \n \n \n \n . CC-BY 4.0 International licenseIt is made available under a \nperpetuity. \n is the author/funder, who has granted medRxiv a license to display the preprint in(which was not certified by peer review)preprint \nThe copyright holder for thisthis version posted October 15, 2024. ; https://doi.org/10.1101/2024.10.12.24315151doi: medRxiv preprint \n\n18 \n \n \nSupplementary Figure 8. Kaplan-Meier survival analysis for qualifying variants (QV) (per gene and group of genes) \nand the MUC5B risk allele in the PFFPR. p-values for the log-rank test are shown. \n \n \n \n \n \n . CC-BY 4.0 International licenseIt is made available under a \nperpetuity. \n is the author/funder, who has granted medRxiv a license to display the preprint in(which was not certified by peer review)preprint \nThe copyright holder for thisthis version posted October 15, 2024. ; https://doi.org/10.1101/2024.10.12.24315151doi: medRxiv preprint \n\n19 \n \n \n \n \n \n \n \n \n \n \n \n \n \nSupplementary Figure 9. Qualifying variants (QV) effect on survival in the PFFPR (excluding carriers of QV \nwithin PARN). All analysis were performed using Cox regression models adjusted for sex, age of diagnosis, the \ntwo main principal components, MUC5B risk allele, smoking history, forced vital capacity (FVC) % predicted, \nand diffusing capacity for carbon monoxide (DLCO) % predicted. The X-axis shows Hazard-ratios (HR); the grey \nline corresponds to the HR=1.0. The boxes correspond to adjusted HR and horizontal lines correspond to 95% \nconfidence intervals (CI). \n . CC-BY 4.0 International licenseIt is made available under a \nperpetuity. \n is the author/funder, who has granted medRxiv a license to display the preprint in(which was not certified by peer review)preprint \nThe copyright holder for thisthis version posted October 15, 2024. ; https://doi.org/10.1101/2024.10.12.24315151doi: medRxiv preprint \n\n20 \n \n \nSupplementary Figure 10. Alternative qualifying variants (QV) classifications and effects on survival in the PFFPR. \nAll analysis were performed using Cox regression models adjusted for sex, age of diagnosis, the two main principal \ncomponents, MUC5B risk allele, smoking history, forced vital capacity (FVC) % predicted, and diffusing capacity for \ncarbon monoxide (DLCO) % predicted. The X-axis shows Hazard-ratios (HR); the grey line corresponds to the HR=1.0. \nThe boxes correspond to adjusted HR and horizontal lines correspond to 95% confidence intervals (CI). \n \n \n \n \n \n \n \n \n \n \n . CC-BY 4.0 International licenseIt is made available under a \nperpetuity. \n is the author/funder, who has granted medRxiv a license to display the preprint in(which was not certified by peer review)preprint \nThe copyright holder for thisthis version posted October 15, 2024. ; https://doi.org/10.1101/2024.10.12.24315151doi: medRxiv preprint \n\n21 \n \n \nSupplementary Figure 11. Association between PRS-TL tertiles and survival in the PFFPR. A) Kaplan-Meier survival \nanalysis for PRS-TL tertiles (p-value for the log-rank test is shown). B) PRS-TL effect on survival. The analysis was \nperformed using Cox regression models adjusted for sex, age of diagnosis, the two main principal components, \nsmoking history, forced vital capacity (FVC) % predicted, and diffusing capacity for carbon monoxide (DLCO) % \npredicted. The X-axis shows Hazard-ratios (HR); the grey line corresponds to the HR=1.0. The boxes correspond to \nadjusted HR and horizontal lines correspond to 95% confidence intervals (CI). \n . CC-BY 4.0 International licenseIt is made available under a \nperpetuity. \n is the author/funder, who has granted medRxiv a license to display the preprint in(which was not certified by peer review)preprint \nThe copyright holder for thisthis version posted October 15, 2024. ; https://doi.org/10.1101/2024.10.12.24315151doi: medRxiv preprint \n\n22 \n \n \nSupplementary Figure 12. Association between high and low PRS-TL and survival in the PFFPR. A) Kaplan-Meier \nsurvival analysis for high/low risk PRS-TL (p-value for the log-rank test is shown). B) PRS-TL effect on survival. The \nanalysis was performed using Cox regression models adjusted for sex, age of diagnosis, the two main principal \ncomponents, smoking history, forced vital capacity (FVC) % predicted, and diffusing capacity for carbon monoxide \n(DLCO) % predicted. The X-axis shows Hazard-ratios (HR); the grey line corresponds to the HR=1.0. The boxes \ncorrespond to adjusted HR and horizontal lines correspond to 95% confidence intervals (CI). \n \n \n \n . CC-BY 4.0 International licenseIt is made available under a \nperpetuity. \n is the author/funder, who has granted medRxiv a license to display the preprint in(which was not certified by peer review)preprint \nThe copyright holder for thisthis version posted October 15, 2024. ; https://doi.org/10.1101/2024.10.12.24315151doi: medRxiv preprint \n\n23 \n \n \nSupplementary Figure 13. Association of PRS-IPF (after excluding the MUC5B locus) and survival in the PFFPR. \nKaplan-Meier analysis showing p-values for the log-rank test. \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n . CC-BY 4.0 International licenseIt is made available under a \nperpetuity. \n is the author/funder, who has granted medRxiv a license to display the preprint in(which was not certified by peer review)preprint \nThe copyright holder for thisthis version posted October 15, 2024. ; https://doi.org/10.1101/2024.10.12.24315151doi: medRxiv preprint \n\n24 \n \n \nSupplementary Figure 14. Associations between PRS-IPF and MUC5B rs35705950 genotypes with survival among \ncarriers and non-carriers of qualifying variants (QV) in the PFFPR. A) Association between PRS-IPF and survival in \ncarriers. B) Association between PRS-IPF and survival in non-carriers. C) Association between MUC5B rs35705950 \ngenotypes and survival in carriers. D) Association between MUC5B rs35705950 genotypes and survival in non-\ncarriers. Kaplan-Meier analysis, showing p-values for the log-rank test. \n \n \n \n \n \n \n \n . CC-BY 4.0 International licenseIt is made available under a \nperpetuity. \n is the author/funder, who has granted medRxiv a license to display the preprint in(which was not certified by peer review)preprint \nThe copyright holder for thisthis version posted October 15, 2024. ; https://doi.org/10.1101/2024.10.12.24315151doi: medRxiv preprint \n\n25 \n \n \nSupplementary Figure 15. Association between prevalence of qualifying variants (QV) and PRS-IPF in PROFILE. A) \nDistribution of PRS-IPF in carriers (1) and non-carriers (0). Vertical dotted lines represent the mean value of the \ndistribution. B) Distribution of carriers (1) and non-carriers (0) in high and low PRS-IPF. \n \n \n . CC-BY 4.0 International licenseIt is made available under a \nperpetuity. \n is the author/funder, who has granted medRxiv a license to display the preprint in(which was not certified by peer review)preprint \nThe copyright holder for thisthis version posted October 15, 2024. ; https://doi.org/10.1101/2024.10.12.24315151doi: medRxiv preprint \n\n26 \n \n \nSupplementary Figure 16. Kaplan-Meier survival analysis for qualifying variants (QV) (per gene and group PF \ngenes) and the MUC5B risk allele in PROFILE. p-values for the log-rank test are shown. \n \n \n \n \n \n \n \n \n \n \n . CC-BY 4.0 International licenseIt is made available under a \nperpetuity. \n is the author/funder, who has granted medRxiv a license to display the preprint in(which was not certified by peer review)preprint \nThe copyright holder for thisthis version posted October 15, 2024. ; https://doi.org/10.1101/2024.10.12.24315151doi: medRxiv preprint \n\n27 \n \nSupplementary references \n \n1. Wang, B. R. et al. The Pulmonary Fibrosis Foundation Patient Registry. Rationale, Design, and \nMethods. Ann Am Thorac Soc 17, 1620–1628 (2020). \n2. Maher, T. M. PROFILEing idiopathic pulmonary fibrosis: rethinking biomarker discovery. \nEuropean Respiratory Review 22, 148–152 (2013). \n3. Maher, T. M. et al. An epithelial biomarker signature for idiopathic pulmonary fibrosis: an \nanalysis from the multicentre PROFILE cohort study. The Lancet Respiratory Medicine 5, \n946–955 (2017). \n4. Chang, C. C. et al. Second-generation PLINK: rising to the challenge of larger and richer \ndatasets. GigaScience 4, s13742-015-0047–8 (2015). \n5. Pedersen, B. S. et al. Somalier: rapid relatedness estimation for cancer and germline studies \nusing efficient genome sketches. Genome Medicine 12, 62 (2020). \n6. Manichaikul, A. et al. Robust relationship inference in genome-wide association studies. \nBioinformatics 26, 2867–2873 (2010). \n7. Lu, W. et al. CHARR efficiently estimates contamination from DNA sequencing data. The \nAmerican Journal of Human Genetics 110, 2068–2076 (2023). \n \n \n \n \n \n . CC-BY 4.0 International licenseIt is made available under a \nperpetuity. \n is the author/funder, who has granted medRxiv a license to display the preprint in(which was not certified by peer review)preprint \nThe copyright holder for thisthis version posted October 15, 2024. ; https://doi.org/10.1101/2024.10.12.24315151doi: medRxiv preprint","source_license":"CC-BY-4.0","license_restricted":false}