Methods
Study design and sample description
We assessed the association of QVs and PRS-IPF with the primary outcome in patients with IPF.
In the discovery stage, we utilized data from the Pulmonary Fibrosis Foundation Patient
Registry (PFFPR) (19). In the second stage, we employed data from the Prospective Observation
of Fibrosis in the Lung Clinical Endpoints (PROFILE) cohort for validation (20,21), ensuring the
robustness of our findings. In the PFFPR, the primary outcome was the time from initial
diagnosis to either death or lung transplantation. In the PROFILE cohort, the primary outcome
. CC-BY 4.0 International licenseIt is made available under a
perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in(which was not certified by peer review)preprint
The copyright holder for thisthis version posted October 15, 2024. ; https://doi.org/10.1101/2024.10.12.24315151doi: medRxiv preprint
4
was the time from diagnosis to death. For both cohorts, right censoring was applied at 60
months.
In the validation stage, we included 888 patients clinically diagnosed with IPF from the PFFPR,
with baseline and longitudinal demographic and clinical information recorded in the United
States since March 2016. In the second stage, 472 clinically diagnosed IPF patients from the
PROFILE cohort, recruited in UK from 2010 to 2017, were included and followed for three years
to track disease progression (Figure 1). For further details, see Supplementary methods, and
Supplementary Table 1.
Figure 1. Patient cohorts included from the PFFPR and PROFILE studies.
Both studies were conducted according to The Code of Ethics of the World Medical Association
(Declaration of Helsinki) and written informed consent was obtained from all participants. The
Research Ethics Committees at each participating centre approved the study.
Sequencing and bioinformatics analysis methods
In the PFFPR, library preparation and sequencing were performed by Psomagen (Rockville,
MD). Genomic DNA libraries were prepared using the TruSeq DNA PCR Free kit (Illumina Inc.)
and sequenced on an Illumina NovaSeq 6000 instrument (Illumina Inc.) with 150 bp paired-end
reads at an average depth of 30X. At least 80% of the genome was covered by ≥20 reads, and
≥90% was covered by ≥10 reads. WGS was processed using the Illumina DRAGEN Bio-IT
Platform Germline Pipeline v3.10.4 (Illumina Inc.) using the Illumina DRAGEN Multigenome
Graph hg38 as the reference genome. Only variants with a “PASS” filter were included in
subsequent analyses.
For the PROFILE cohort, WGS was performed at Human Longevity Inc. using the Illumina
NovaSeq 6000 system with 150 pb paired-end reads. Coverage of at least 10X was achieved in
over 98% of the Consensus Coding Sequence Release 22 (CCDS), with an average read depth of
42X across the CCDS as described previously (4). Sequences were processed using the Illumina
DRAGEN Bio-IT Platform Germline Pipeline v3.0.7, with the GRCh38 as the reference genome.
In both cohorts, quality control (QC) included identifying QC outliers, detecting kinship
between patients, checking for cross-contamination of samples, and identifying sex
discordance, using metrics from different tools. Figure 1 summarized the number of individuals
excluded and the reasons for exclusion. For further details, see Supplementary methods.
. CC-BY 4.0 International licenseIt is made available under a
perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in(which was not certified by peer review)preprint
The copyright holder for thisthis version posted October 15, 2024. ; https://doi.org/10.1101/2024.10.12.24315151doi: medRxiv preprint
5
Identification of QVs in monogenic adult-onset PF genes
We restricted the identification of QVs to a curated list of 13 PF genes, categorized as either
telomere related (TERC, TERT, TINF2, DKC1, RTEL1, PARN, NAF1, and ZCCHC8) or non-telomere
related (SFTPC, SFTPA1, SFTPA2, SPDL1, and KIF15) (Supplementary Table 2). With the
exception of SPDL1, and KIF15, this list includes genes with a known dominant inheritance
pattern (presuming that QVs in these genes would have higher penetrance) and genes
commonly found in familial IPF cohorts, despite they also occur in sporadic cases (7).
KIF15 and SPDL1 were incorporated to the list as recent largescale sequencing studies
identified them as PF-related genes (4,22,23). Both genes are critical for mitosis, pointing to a
novel, non-telomeric mechanism underlying IPF. Rare deleterious variants in KIF15 and three
telomere genes (TERT, PARN and RTEL1) have been previously associated with IPF risk, early
onset, and progression to early-age lung transplantation or death (23). In SPDL1, a rare
missense variant was confirmed as a new IPF risk allele, although carriers did not exhibit
distinct clinical features (4). For simplicity, we refer to this gene set as monogenic adult-onset
PF genes.
Variants in these genes were filtered based on read depth (DP) <10, mapping quality (MQ) 0.05 in the cohort. The remaining variants
were annotated using the Variant Effect Predictor tool v109.3 (24). Variants with a global allele
frequency (AF) >0.0005 in gnomAD v2.1 were excluded from the study. For our analyses, we
retained protein-truncating variants (including frameshift, stop-gained, start-loss, and splicing
variants) and missense variants with a CADD >15.
For the non-coding RNA gene TERC, due to the difficulty in predicting functional effects in non-
coding genes, variants were considered for the analysis if their global population AF was
<0.0005 and they were annotated by ClinVar as pathogenic (P), likely pathogenic (LP), or of
uncertain significance (VUS).
All variants that met the aforementioned criteria were annotated as the total set of QVs.
Additionally, QVs were manually classified according to ACMG guidelines (25) as P , LP , or VUS.
Variants classified as P or LP comprised the set of pathogenic variants, while ClinVar variants
were cross-referenced and annotated as VUS, P , or LP.
For sensitivity analysis, we defined six additional categories for QVs based on specific
thresholds for population AF and different predicted variant effect. We also assessed a category
of rare synonymous variants, expected to capture neutral variation, to use as a null model in in
the association analyses. These criteria are summarized in Supplementary Table 3.
Principal components of genetic heterogeneity in the cohorts
Principal components (PC) were calculated after excluding single nucleotide polymorphisms
(SNPs) with a minor allele frequency (MAF) <0.01 from WGS data, using BCFtools
(https://samtools.github.io/bcftools/bcftools.html). Genotyping QC was then performed using
PLINK v.1.9. First, SNPs with a genotyping call rate (CR) <95% or those deviating significantly
from Hardy-Weinberg equilibrium (HWE) (p<1.0x10-6) were removed. After linkage
disequilibrium pruning (indep-pairwise 100 5 0.01), the main PCs of genetic variation were
calculated based on 110,951 independent SNPs in the PFFPR and 143,214 independent SNPs in
PROFILE (see Supplementary Figure 1).
. CC-BY 4.0 International licenseIt is made available under a
perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in(which was not certified by peer review)preprint
The copyright holder for thisthis version posted October 15, 2024. ; https://doi.org/10.1101/2024.10.12.24315151doi: medRxiv preprint
6
Estimation of the PRS of IPF in the cohorts
The PRS-IPF for each patient was derived using the 19 previously published genome-wide
significant IPF variants (Supplementary Table 4) using PRSice-2 (26). Briefly, PRS were
calculated as the number of risk alleles carried by each individual, multiplied by the effect size
of the variant as described in the GWAS study (26) summed across all variants included in the
score:
PRS = ∑ βiGi
𝑛
𝑖=1
where βi is the OR (in the case of binary traits) from variant i, Gi represents the number of risk
alleles carried at the variant i and n represents the conditionally independent signals identified
elsewhere. Raw polygenic scores were then standardized as z-scores using the following
formula:
𝑃𝑅𝑆𝑧 = 𝑃𝑅𝑠 − 𝑀ⅇ𝑎𝑛 (𝑃𝑅𝑆)
𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑ⅇ𝑣𝑖𝑎𝑡𝑖𝑜𝑛𝑠 (𝑃𝑅𝑆)
As part of the sensitivity analysis, we also used the same methodology to derive PRS for
telomere length (PRS-TL) based on the 20 common variants that were previously found
associated with leukocyte telomere length (TL) (27) (Supplementary Table 5). In this case, since
TL is a quantitative trait, βi is represented by beta coefficients in the PRS formula.
Statistical analysis
Descriptive statistics were provided as mean (standard deviation, SD) or median (interquartile
range, IQR) and valid percentage for continuous (quantitative) and categorical (binary) data,
respectively. Categorical variables were compared using a Chi-squared test or a Fisher’s exact
test as indicated.
To examine the relationship between the presence of QVs and the PRS, we first used the
Student’s t-test and the Kolmogorov-Smirnov (KS) test to compare the mean PRS values and
their distributions between QVs carriers and non-carriers. Additionally, we assessed this
relationship using logistic regression models, adjusting for sex, age at diagnosis, and the two
main PCs of genetic heterogeneity, which accounted for a significant proportion of genetic
variance (Supplementary Figure 1C).
To examine the association between QVs, PRS, and survival, we used Cox proportional hazards
regression models adjusted for sex, age at diagnosis, the two main PCs for genetic
heterogeneity, smoking history, DLCO% predicted, FVC % predicted, and MUC5B risk allele
carrier status necessary. The proportional hazards assumption of each covariate was assessed
by plotting scaled Schoenfeld residuals against transformed time, revealing no evidence of non-
proportional hazards.
The Survival R package (v.3.5-7) was used to calculate p-values, hazard ratios (HR), and 95%
confidence intervals (CI). For visualizing survival differences, we generated Kaplan-Meier
survival plots and tested the differences using log-rank tests.
Results
Prevalence of QVs in the PFFPR
We identified 131 QVs in monogenic adult-onset PF genes in 144 patients from the PFFPR
resulting in a diagnostic yield of finding a QV of 16.2%, with a 95% CI of 13.8-18.6
(Supplementary Figure 2, Table 1). Most patients (96.5%) carried a single QV, while five
patients (3.5% of the QV carriers) had two or more QVs, including combinations such as
NAF1/TERT, KIF15/RTEL1, TERT/SPDL1, TINF2/TERT/RTEL1, and RTEL1/RTEL1. Consistent with
previous studies, the prevalence of QVs was higher in patients with a familial history of disease
(27.3%) compared to those with sporadic disease (13.5%) (p=3.08x10-5).
Most QVs were in telomere genes (75.6%), while nearly a quarter were found in non-telomere
genes (22.9%). The highest number of QVs were identified in telomere-related genes including
RTEL1 (25.2%), TERT (23.7%), and PARN (12.2%). These genes also had the highest proportion
of P/LP variants (31.0%, 31.0%, and 20.7%, respectively) (Table 1). In total, 42.7% of QVs were
previously annotated in ClinVar as VUS, LP, or P .
Consistent with previous findings (7,18), carriers of the risk MUC5B allele (rs35705950 TT or TC
genotype) had lower prevalence of QVs (13.5%) compared to those carrying the protective GG
genotype (22.6%) (p=1.03x10-3).
Association of QVs and PRS-IPF in the PFFPR
Given the potential non-additive effects of QVs and the MUC5B common variant, we
investigated whether individuals with lower polygenic risk were more likely to carry QVs
compared to those with higher polygenic risks. We first compared the mean and distribution of
PRS-IPF between QV carriers and non-carriers and found significant differences (Student’s t-
test, p=1.30x10-3; KS-test, p=3.74x10-4) (Figure 2A). When patients were stratified into PRS-IPF
tertiles, the prevalence of QVs was higher in the lowest tertile patients (low PRS-IPF) than in
the highest tertile patients (high PRS-IPF), which associated with an increased risk of carrying a
QV in the patients classified in the low PRS-IPF tertile (OR=1.79, 95% CI=1.15-2.81, p=0.010)
(Figure 2B). The association persisted when the cohort was divided into two PRS-IPF categories,
low and high (OR=1.74, 95% CI=1.20-2.53, p=3.57x10-3 (Supplementary Figure 3).
Figure 2. Association between prevalence of qualifying variants (QVs) and PRS-IPF in the PFFPR. A)
Distribution of PRS-IPF in QV carriers (1) and non-carriers (0). Vertical dotted lines represent the mean
value of the distribution. B) Risk of carrying a QV for patients with low polygenic risk (T1) and high
polygenic risk (T3) compared to those in the middle tertile. The odds ratios (OR) and the 95% confidence
intervals (CI) were estimated using logistic regression adjusted for sex, age of diagnosis, and the two
main principal components.
. CC-BY 4.0 International licenseIt is made available under a
perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in(which was not certified by peer review)preprint
The copyright holder for thisthis version posted October 15, 2024. ; https://doi.org/10.1101/2024.10.12.24315151doi: medRxiv preprint
8
Excluding the MUC5B locus from the PRS-IPF calculations yielded non-significant differences in
the mean and distribution of PRS-IPF between QV carriers and non-carriers (Supplementary
Figure 4A). However, QVs remained more common in the lowest PRS tertile patients compared
to the highest (OR=1.60, 95% CI=1.01-2.54, p=0.05) (Supplementary Figure 4B).
To explore if these observations were independent of genetically predicted TL, we then
assessed the association between the prevalence of QVs and PRS-TL, focusing only on the QVs
in telomere genes. No significant associations were found between QVs and PRS-TL under the
two assumptions (Supplementary Figures 5-7).
Association of QVs with survival in the PFFPR
Given that carriers of the MUC5B risk allele are associated with better survival and are less
likely to carry QVs, we hypothesized that QV carriers might have poorer survival. Indeed, QVs
carriers were associated with reduced survival (HR=1.53, 95% CI=1.12-2.10, p=7.33x10-3; log-
rank test, p=0.022). The result was consistent when the analysis was limited to QVs that were
classified as pathogenic or likely pathogenic variants (HR=1.71, 95%CI=1.11-2.65, p=0.016; log-
rank test, p=0.043). However, no significant association was found for ClinVar variants alone
(HR=1.35, 95% CI=0.87-2.09, p=0.18) (Figure 3A, Supplementary Figure 8). As an internal
control, we found no association between survival and rare synonymous variants in the same
monogenic adult-onset PF genes (HR=1.38, 95%CI=0.80-2.38, p=0.24) (Figure 3A).
Figure 3. Qualifying variants (QVs), MUC5B risk allele, PRS-IPF, and family history effect on survival. A)
PFFPR. B) PROFILE. All analysis were performed using Cox regression models adjusted for sex, age of
diagnosis, the two main principal components, smoking history, forced vital capacity (FVC) % predicted,
and diffusing capacity for carbon monoxide (DLCO) % predicted, and the MUC5B risk allele whenever
necessary. The X-axis shows Hazard-ratios (HR); the grey line corresponds to the HR=1.0. The circles
correspond to adjusted HR and horizontal lines correspond to 95% confidence intervals (CI).
Further analyses showed that QVs in telomere-related genes had the largest effect on survival
(HR=1.76, 95%CI=1.13-2.76, p=0.013; log-rank test, p=0.029), and QVs in PARN were
particularly associated with worse survival (HR=2.28, 95%CI=1.11-4.68, p=0.03; log-rank test,
p=0.035) (Figure 3A, Supplementary Figure 8).
We performed additional sensitivity analyses. First, excluding PARN QV carriers attenuated
effect size, but the results remained significant (HR=1.46, 95% CI=1.04-2.03, p=0.03), indicating
that other genes also contribute to the association with worse survival (Supplementary Figure
9). Second, as the probability of carrying QVs is higher among cases with a family history of PF,
. CC-BY 4.0 International licenseIt is made available under a
perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in(which was not certified by peer review)preprint
The copyright holder for thisthis version posted October 15, 2024. ; https://doi.org/10.1101/2024.10.12.24315151doi: medRxiv preprint
9
and family history of PF predicts reduced survival (29), we tried to account for the risk
attributed to family history of PF. We found no relationship between family history of PF and
survival (HR=1.09, 95%CI=0.80-1.49, p=0.59), suggesting that family history is not a major
factor in this cohort’s survival outcomes (Figure 3A). Third, the criteria for defining QVs are
subject to different choices and predictors. Using alternative and stricter definitions of QVs in
the analyses (Supplementary Table 3), we found that the effect had consistent directionality
across all QV definitions (Supplementary Figure 10). Additionally, we observed that applying
more stringent in silico predictors resulted in a higher risk of reduced survival (ultrarare PTV
HR=2.44, 95% CI=1.14-5.24, p=0.02). However, these criteria also led to a reduced number of
identified carriers, thereby decreasing the power to detect significant differences.
Associations of PRS-IPF with survival in the PFFPR
Since both rare and common IPF genetic variants are associated with IPF survival, and QVs
associate with worse survival, we then examined whether the polygenic component of IPF was
also associated with survival. As PRS-IPF values were mainly influenced by the MUC5B effect,
we did not adjust for the risk MUC5B genotype in these analyses, although we did for relevant
individual covariates. We found that the lowest PRS-IPF tertile was associated with the worst
survival (log-rank test, p=1.8x10-4; HR=1.61, 95% CI=1.25-2.07, p=1.9x10-4) (Figure 3A, Figure
4). In contrast, PRS-TL was not associated with survival, whether analyzed by tertiles or by high
vs. low-risk groups (Supplementary Figure 11, Supplementary Figure 12).
Figure 4. Association between PRS-IPF and survival in PFFPR. Patients with low polygenic risk of IPF (T1)
have worse survival in comparison with patients with high polygenic risk of IPF (T2 and T3).
A sensitivity analysis, excluding the MUC5B locus from the PRS-IPF calculation, yielded non-
significant results (Supplementary Figure 13). To explore whether the association of PRS-IPF
and survival was solely driven by the known association of MUC5B, we stratified patients by QV
carrier status and assessed its effect in each group. The analyses showed that patients with
lower PRS-IPF were associated with worse survival in both groups, although this association
was attenuated among carriers (HR=1.76, 95% CI=0.99-3.15, p=0.055) compared to non-
carriers (HR=1.54, 95% CI=1.16-2.04, p=2.5x10-3) (Supplementary Figure 14A, Supplementary
Figure 14B). Similar results were obtained when we assessed the effect of the MUC5B
rs35705950 genotypes on survival among both groups (Supplementary Figure 14C,
Supplementary Figure 14D). Our findings suggest that the strong observed association
. CC-BY 4.0 International licenseIt is made available under a
perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in(which was not certified by peer review)preprint
The copyright holder for thisthis version posted October 15, 2024. ; https://doi.org/10.1101/2024.10.12.24315151doi: medRxiv preprint
10
between PRS-IPF and survival is mainly driven by MUC5B. While it is suspected that QVs may
exert an independent but opposite effect on IPF survival, we cannot rule out that the observed
differences were influenced by disparities in the group sizes.
Validation of results in PROFILE and meta-analysis
Using the same classification of QVs as in PFFPR, the diagnostic yield of finding a QVs in
PROFILE was 15.67 % (95% CI=12.4-19.0%). However, PROFILE IPF patients showed no
statistical differences in the mean and the distribution of PRS-IPF between QV carriers and non-
carriers (Student’s t-test, p=0.17; KS-test, p=0.24). Similarly, although not statistically
significant, an enrichment of QVs in the patients with lower polygenic component of IPF was
observed compared to those with higher polygenic component (OR=1.29, 95% CI=0.78-2.15,
p=0.31) (Supplementary Figure 15).
Despite these results, the association of QVs and IPF survival were validated in PROFILE
patients (Figure 3B). As for PFFPR, the survival analyses of PROFILE patients focusing on the
LP/P variants, both of all genes or only of telomeric genes, also showed the largest effect sizes
(HR=1.98, 95% CI=1.28-3.05, p=2.1x10-3; log-rank test, p=0.023). However, we did not replicate
the association of QVs in PARN with survival. Instead, the association with worse survival in
PROFILE was observed for TERT QV carriers (HR=3.55, 95% CI=1.85-6.82, p=1.4x10-4; log-rank
test, p=0.03) (Figure 3B, Supplementary Figure 16).
The meta-analysis of results from PFFPR and PROFILE cohorts showed a consistent direction of
effect across all categories and supported a robust association between QVs (including “all
variants”, “pathogenic”, “telomeric”, and “pathogenic telomeric”) and PRS-IPF tertiles with
survival (Figure 5). In the meta-analysis, RTEL1 QVs were also nominally associated with IPF
survival despite not reaching nominal significance in any of the two cohorts by separate. No
association with IPF survival was found for QVs in SPDL1 and KIF15.
. CC-BY 4.0 International licenseIt is made available under a
perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in(which was not certified by peer review)preprint
The copyright holder for thisthis version posted October 15, 2024. ; https://doi.org/10.1101/2024.10.12.24315151doi: medRxiv preprint
11
Figure 5. Meta-analysed results from adjusted PFFPR and PROFILE (N=1360) Cox regression models.
. CC-BY 4.0 International licenseIt is made available under a
perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in(which was not certified by peer review)preprint
The copyright holder for thisthis version posted October 15, 2024. ; https://doi.org/10.1101/2024.10.12.24315151doi: medRxiv preprint
12
References
1. Lederer DJ, Martinez FJ. Idiopathic Pulmonary Fibrosis. N Engl J Med. 2018 May
10;378(19):1811–23.
2. Pires FS, Damas C, Mota P , Melo N, Costa D, Jesus JM, et al. Slow versus rapid progressors in
idiopathic pulmonary fibrosis. European Respiratory Journal [Internet]. 2011 Sep 1 [cited 2024
Mar 14];38(Suppl 55). Available from: https://erj.ersjournals.com/content/38/Suppl_55/p656
3. Allen RJ, Porte J, Braybrooke R, Flores C, Fingerlin TE, Oldham JM, et al. Genetic variants
associated with susceptibility to idiopathic pulmonary fibrosis in people of European ancestry:
a genome-wide association study. The Lancet Respiratory medicine. 2017 Nov;5(11):869–80.
4. Dhindsa RS, Mattsson J, Nag A, Wang Q, Wain LV, Allen R, et al. Identification of a missense
variant in SPDL1 associated with idiopathic pulmonary fibrosis. Commun Biol. 2021 Mar
23;4(1):1–8.
5. Fingerlin TE, Murphy E, Zhang W, Peljto AL, Brown KK, Steele MP , et al. Genome-wide
association study identifies multiple susceptibility loci for pulmonary fibrosis. Nature Genetics.
2013;45(6):613–20.
6. Partanen JJ, Häppölä P , Zhou W, Lehisto AA, Ainola M, Sutinen E, et al. Leveraging global
multi-ancestry meta-analysis in the study of idiopathic pulmonary fibrosis genetics. Cell
Genomics. 2022;2(10):100181.
7. Zhang D, Newton CA, Wang B, Povysil G, Noth I, Martinez FJ, et al. Utility of whole genome
sequencing in assessing risk and clinically relevant outcomes for pulmonary fibrosis. European
Respiratory Journal [Internet]. 2022 Dec 1 [cited 2023 Oct 12];60(6). Available from:
https://erj.ersjournals.com/content/60/6/2200577
8. Peljto AL, Blumhagen RZ, Walts AD, Cardwell J, Powers J, Corte TJ, et al. Idiopathic
Pulmonary Fibrosis Is Associated with Common Genetic Variants and Limited Rare Variants. Am
J Respir Crit Care Med. 2023 May 1;207(9):1194–202.
9. Raghu G, Remy-Jardin M, Richeldi L, Thomson CC, Inoue Y , Johkoh T, et al. Idiopathic
Pulmonary Fibrosis (an Update) and Progressive Pulmonary Fibrosis in Adults: An Official
ATS/ERS/JRS/ALAT Clinical Practice Guideline. Am J Respir Crit Care Med. 2022 May
1;205(9):e18–47.
10. Peljto AL, Zhang Y , Fingerlin TE, Ma SF, Garcia JGN, Richards TJ, et al. Association between
the MUC5B promoter polymorphism and survival in patients with idiopathic pulmonary
fibrosis. JAMA. 2013 Jun;309(21):2232–9.
. CC-BY 4.0 International licenseIt is made available under a
perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in(which was not certified by peer review)preprint
The copyright holder for thisthis version posted October 15, 2024. ; https://doi.org/10.1101/2024.10.12.24315151doi: medRxiv preprint
16
11. Cai S, Allen RJ, Wain LV, Dudbridge F. Reassessing the association of MUC5B with survival in
idiopathic pulmonary fibrosis. Ann Hum Genet. 2023 Sep;87(5):248–53.
12. Allen RJ, Oldham JM, Jenkins DA, Leavy OC, Guillen-Guio B, Melbourne CA, et al.
Longitudinal lung function and gas transfer in individuals with idiopathic pulmonary fibrosis: a
genome-wide association study. The Lancet Respiratory Medicine [Internet]. 2022 Nov;
Available from: https://doi.org/10.1016/S2213-2600(22)00251-X
13. Oldham JM, Allen RJ, Lorenzo-Salazar JM, Molyneaux PL, Ma SF, Joseph C, et al. PCSK6 and
Survival in Idiopathic Pulmonary Fibrosis. Am J Respir Crit Care Med. 2023 Jun 1;207(11):1515–
24.
14. Ley B, Torgerson DG, Oldham JM, Adegunsoye A, Liu S, Li J, et al. Rare Protein-Altering
Telomere-related Gene Variants in Patients with Chronic Hypersensitivity Pneumonitis. Am J
Respir Crit Care Med. 2019 Nov 1;200(9):1154–63.
15. Newton CA, Batra K, Torrealba J, Kozlitina J, Glazer CS, Aravena C, et al. Telomere-related
lung fibrosis is diagnostically heterogeneous but uniformly progressive. European Respiratory
Journal. 2016;48(6):1710–20.
16. Borie R, Kannengiesser C, Hirschi S, Pavec JL, Mal H, Bergot E, et al. Severe hematologic
complications after lung transplantation in patients with telomerase complex mutations. The
Journal of heart and lung transplantation : the official publication of the International Society
for Heart Transplantation. 2015 Apr;34(4):538–46.
17. Stuart BD, Lee JS, Kozlitina J, Noth I, Devine MS, Glazer CS, et al. Effect of telomere length
on survival in patients with idiopathic pulmonary fibrosis: An observational cohort study with
independent validation. The Lancet Respiratory Medicine. 2014;2(7):557–65.
18. Dressen A, Abbas AR, Cabanski C, Reeder J, Ramalingam TR, Neighbors M, et al. Analysis of
protein-altering variants in telomerase genes and their association with MUC5B common
variant status in patients with idiopathic pulmonary fibrosis: a candidate gene sequencing
study. Lancet Respir Med. 2018 Aug;6(8):603–14.
19. Wang BR, Edwards R, Freiheit EA, Ma Y , Burg C, de Andrade J, et al. The Pulmonary Fibrosis
Foundation Patient Registry. Rationale, Design, and Methods. Ann Am Thorac Soc. 2020
Dec;17(12):1620–8.
20. Maher TM. PROFILEing idiopathic pulmonary fibrosis: rethinking biomarker discovery.
European Respiratory Review. 2013 Jun 1;22(128):148–52.
21. Maher TM, Oballa E, Simpson JK, Porte J, Habgood A, Fahy WA, et al. An epithelial
biomarker signature for idiopathic pulmonary fibrosis: an analysis from the multicentre
PROFILE cohort study. The Lancet Respiratory Medicine. 2017 Dec 1;5(12):946–55.
22. Zhang D, Povysil G, Kobeissy PH, Li Q, Wang B, Amelotte M, et al. Rare and Common
Variants in KIF15 Contribute to Genetic Risk of Idiopathic Pulmonary Fibrosis. Am J Respir Crit
Care Med. 2022 Jul 1;206(1):56–69.
23. Hollmén M, Laaka A, Partanen JJ, Koskela J, Sutinen E, Kaarteenaho R, et al. KIF15 missense
variant is associated with the early onset of idiopathic pulmonary fibrosis. Respir Res. 2023 Sep
30;24(1):240.
. CC-BY 4.0 International licenseIt is made available under a
perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in(which was not certified by peer review)preprint
The copyright holder for thisthis version posted October 15, 2024. ; https://doi.org/10.1101/2024.10.12.24315151doi: medRxiv preprint
17
24. Harrison PW, Amode MR, Austine-Orimoloye O, Azov AG, Barba M, Barnes I, et al. Ensembl
2024. Nucleic Acids Research. 2024 Jan 5;52(D1):D891–9.
25. Richards S, Aziz N, Bale S, Bick D, Das S, Gastier-Foster J, et al. Standards and guidelines for
the interpretation of sequence variants: a joint consensus recommendation of the American
College of Medical Genetics and Genomics and the Association for Molecular Pathology. Genet
Med. 2015 May;17(5):405–24.
26. Allen RJ, Stockwell A, Oldham JM, Guillen- B, Schwartz DA, Maher TM, et al. wide
association study across five cohorts identifies five novel loci associated with idiopathic
pulmonary fibrosis. 2022;1–5.
27. Li C, Stoma S, Lotta LA, Warner S, Albrecht E, Allione A, et al. Genome-wide Association
Analysis in Humans Links Nucleotide Metabolism to Leukocyte Telomere Length. Am J Hum
Genet. 2020 Mar 5;106(3):389–404.
28. Schwarzer G, Carpenter J, Rücker G. Meta-Analysis with R. 2015.
29. Cutting CC, Bowman WS, Dao N, Pugashetti JV, Garcia CK, Oldham JM, et al. Family History
of Pulmonary Fibrosis Predicts Worse Survival in Patients With Interstitial Lung Disease. CHEST.
2021 May 1;159(5):1913–21.
30. Petrovski S, Todd JL, Durheim MT, Wang Q, Chien JW, Kelly FL, et al. An Exome Sequencing
Study to Assess the Role of Rare Genetic Variation in Pulmonary Fibrosis. American journal of
respiratory and critical care medicine. 2017 Jul;196(1):82–93.
31. Liu Q, Zhou Y , Cogan JD, Mitchell DB, Sheng Q, Zhao S, et al. The Genetic Landscape of
Familial Pulmonary Fibrosis. Am J Respir Crit Care Med. 2023 May 15;207(10):1345–57.
32. Justet A, Klay D, Porcher R, Cottin V, Ahmad K, Molina MM, et al. Safety and efficacy of
pirfenidone and nintedanib in patients with idiopathic pulmonary fibrosis and carrying a
telomere-related gene mutation. European Respiratory Journal. 2021;57(2):4–7.
33. Richeldi L, du Bois RM, Raghu G, Azuma A, Brown KK, Costabel U, et al. Efficacy and safety
of nintedanib in idiopathic pulmonary fibrosis. N Engl J Med. 2014 May 29;370(22):2071–82.
34. King TE, Bradford WZ, Castro-Bernardini S, Fagan EA, Glaspole I, Glassberg MK, et al. A
phase 3 trial of pirfenidone in patients with idiopathic pulmonary fibrosis. N Engl J Med. 2014
May 29;370(22):2083–92.
35. Bowman WS, Newton CA, Linderholm AL, Neely ML, Pugashetti JV, Kaul B, et al. Proteomic
biomarkers of progressive fibrosing interstitial lung disease: a multicentre cohort analysis. The
Lancet Respiratory Medicine. 2022 Jun 1;10(6):593–602.
36. Armanios MY , Chen JJL, Cogan JD, Alder JK, Ingersoll RG, Markin C, et al. Telomerase
Mutations in Families with Idiopathic Pulmonary Fibrosis. New England Journal of Medicine.
2007;356(13):1317–26.
37. Stuart BD, Choi J, Zaidi S, Xing C, Holohan B, Chen R, et al. Exome sequencing links
mutations in PARN and RTEL1 with familial pulmonary fibrosis and telomere shortening. Nature
Genetics. 2015;47(5):512–7.
. CC-BY 4.0 International licenseIt is made available under a
perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in(which was not certified by peer review)preprint
The copyright holder for thisthis version posted October 15, 2024. ; https://doi.org/10.1101/2024.10.12.24315151doi: medRxiv preprint
18
38. Leavy OC, Ma SF, Molyneaux PL, Maher TM, Oldham JM, Flores C, et al. Proportion of
Idiopathic Pulmonary Fibrosis Risk Explained by Known Common Genetic Loci in European
Populations. American Journal of Respiratory and Critical Care Medicine. 2020 Nov;203(6):775–
8.
39. Pulmonary fibrosis in non-mutation carriers of families with short telomere syndrome gene
mutations - PubMed [Internet]. [cited 2024 Jun 19]. Available from:
https://pubmed.ncbi.nlm.nih.gov/34580961/
40. Borie R, Kannengiesser C, Antoniou K, Bonella F, Crestani B, Fabre A, et al. European
Respiratory Society statement on familial pulmonary fibrosis. European Respiratory Journal
[Internet]. 2023 Mar 1 [cited 2023 Oct 12];61(3). Available from:
https://erj.ersjournals.com/content/61/3/2201383
41. Zhang D, Eckhardt CM, McGroder C, Benesh S, Porcelli J, Depender C, et al. Clinical Impact
of Telomere Length Testing for Interstitial Lung Disease. CHEST [Internet]. 2024 Jun 28 [cited
2024 Jul 2];0(0). Available from: https://journal.chestnet.org/article/S0012-3692(24)00808-
0/abstract
42. Lu T, Zhou S, Wu H, Forgetta V, Greenwood CMT, Richards JB. Individuals with common
diseases but with a low polygenic risk score could be prioritized for rare variant screening.
Genet Med. 2021 Mar;23(3):508–15.
43. Darst BF , Sheng X, Eeles RA, Kote-Jarai Z, Conti DV, Haiman CA. Combined Effect of a
Polygenic Risk Score and Rare Genetic Variants on Prostate Cancer Risk. Eur Urol. 2021
Aug;80(2):134–8.
44. Moll M, Peljto AL, Kim JS, Xu H, Debban CL, Chen X, et al. A Polygenic Risk Score for
Idiopathic Pulmonary Fibrosis and Interstitial Lung Abnormalities. Am J Respir Crit Care Med.
2023 Oct;208(7):791–801.
. CC-BY 4.0 International licenseIt is made available under a
perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in(which was not certified by peer review)preprint
The copyright holder for thisthis version posted October 15, 2024. ; https://doi.org/10.1101/2024.10.12.24315151doi: medRxiv preprint
19
TABLES
Table 1. Count of qualifying variants in monogenic adult-onset PF genes
identified in the PFFPR patients.*
Gene Number of
variants
P/LP VUS Gene category
KIF15 15 3 12 Non_Telomere
SPDL1 8 2 7 Non_Telomere
SFTPC 5 0 5 Non_Telomere
SFTPA2 1 0 1 Non_Telomere
SFTPA1 1 0 1 Non_Telomere
ZCCHC8 5 1 4 Telomere
TINF2 3 3 0 Telomere
PARN 16 12 4 Telomere
RTEL1 33 18 15 Telomere
DKC1 1 0 1 Telomere
TERC 4 0 4 Telomere
NAF1 6 1 5 Telomere
TERT 31 18 13 Telomere
Total variants 131 58 73
*Filtered by frequency (AF15). P/LP, Pathogenic or likely
pathogenic; VUS, variant of uncertain significance.
. CC-BY 4.0 International licenseIt is made available under a
perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in(which was not certified by peer review)preprint
The copyright holder for thisthis version posted October 15, 2024. ; https://doi.org/10.1101/2024.10.12.24315151doi: medRxiv preprint
1
Supplementary material
Rare variants and survival of patients with idiopathic
pulmonary fibrosis
Aitana Alonso-Gonzalez1, David Jáspez2, José M. Lorenzo-Salazar2, Shwu-Fan Ma3,
Emma Strickland3, Josyf Mychaleckyj4, John S. Kim3, Yong Huang3, Ayodeji
Adegunsoye5, Justin M. Oldham6, Philip L. Molyneaux7,8, Toby Maher7,8,9, Louise V
Wain10,11, Richard Allen10, Martin D. Tobin10,11, Jonathan Kropski12, Brian Yaspan13,
Timothy S. Blackwell12, David Zhang14, Christine Kim Garcia14,15, Fernando J.
Martinez16, Imre Noth3, and Carlos Flores1,2,17,18
. CC-BY 4.0 International licenseIt is made available under a
perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in(which was not certified by peer review)preprint
The copyright holder for thisthis version posted October 15, 2024. ; https://doi.org/10.1101/2024.10.12.24315151doi: medRxiv preprint
2
Supplementary methods .......................................................................................... 4
Description of study cohorts .................................................................................................... 4
Supplementary bioinformatics methods ................................................................................. 4
Supplementary results .............................................................................................. 5
Prevalence of qualifying variants (QV) in PROFILE .................................................................. 5
Supplementary tables ............................................................................................... 6
Supplementary Table 1. Baseline characteristics and outcomes of IPF patients from stage one
and stage two cohorts. ............................................................................................................... 6
Supplementary Table 2. Regions of interest (ROIs) for qualifying variants annotations in hg38. .. 7
Supplementary Table 3. Alternative definitions for qualifying variants and the rare synonymous
used for sensitivity analyses. ....................................................................................................... 8
Supplementary Table 4. Common IPF risk variants and effects considered for PRS-IPF estimation.
.................................................................................................................................................. 9
Supplementary Table 5. Common telomere length variants and effects considered for PRS-TL
estimation................................................................................................................................ 10
Supplementary Figures ........................................................................................... 11
Supplementary Figure 1. Principal component analysis.. ......................................................... 11
Supplementary Figure 2. Distribution of qualifying variants (QV) in monogenic adult-onset
pulmonary fibrosis (PF) genes in the PFFPF and PROFILE cohorts. ........................................... 12
Supplementary Figure 3. Association between prevalence of qualifying variants (QV) and
PRS-IPF in the PFFPR. .................................................................................................................. 13
Supplementary Figure 4. Association between prevalence of qualifying variants (QV) and
PRS-IPF (after excluding the MUC5B locus) in the PFFPR. ......................................................... 14
Supplementary Figure 5. Association between the prevalence of qualifying variants (QV) and
PRS-TL in the PFFPR. ................................................................................................................... 15
Supplementary Figure 6. Association between prevalence of qualifying variants (QV) in
telomere and non-telomere genes and PRS-TL in the PFFPR.. .................................................. 16
Supplementary Figure 7. Association between prevalence of qualifying variants (QV) in
telomere genes and PRS-TL in the PFFPR. ................................................................................. 17
Supplementary Figure 8. Kaplan-Meier survival analysis for qualifying variants (QV) (per gene
and group of genes) and the MUC5B risk allele in the PFFPR. p-values for the log-rank test are
shown. ......................................................................................................................................... 18
Supplementary Figure 9. Qualifying variants (QV) effect on survival in the PFFPR (excluding
carriers of QV within PARN). ...................................................................................................... 19
Supplementary Figure 10. Alternative qualifying variants (QV) classifications and effects on
survival in the PFFPR. ................................................................................................................. 20
Supplementary Figure 11. Association between PRS-TL tertiles and survival in the PFFPR.. .. 21
Supplementary Figure 12. Association between high and low PRS-TL and survival in the
PFFPR.. ........................................................................................................................................ 22
Supplementary Figure 13. Association of PRS-IPF (after excluding the MUC5B locus) and
survival in the PFFPR. Kaplan-Meier analysis showing p-values for the log-rank test. ............ 23
Supplementary Figure 14. Associations between PRS-IPF and MUC5B rs35705950 genotypes
with survival among carriers and non-carriers of qualifying variants (QV) in the PFFPR.. ...... 24
. CC-BY 4.0 International licenseIt is made available under a
perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in(which was not certified by peer review)preprint
The copyright holder for thisthis version posted October 15, 2024. ; https://doi.org/10.1101/2024.10.12.24315151doi: medRxiv preprint
3
Supplementary Figure 15. Association between prevalence of qualifying variants (QV) and
PRS-IPF in PROFILE. ..................................................................................................................... 25
Supplementary Figure 16. Kaplan-Meier survival analysis for qualifying variants (QV) (per
gene and group PF genes) and the MUC5B risk allele in PROFILE. p-values for the log-rank test
are shown. .................................................................................................................................. 26
Supplementary references ...................................................................................... 27
. CC-BY 4.0 International licenseIt is made available under a
perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in(which was not certified by peer review)preprint
The copyright holder for thisthis version posted October 15, 2024. ; https://doi.org/10.1101/2024.10.12.24315151doi: medRxiv preprint
4
Supplementary methods
Description of study cohorts
The Pulmonary Fibrosis Foundation Patient Registry (PFFPR) is a large multicentre based
registry that collects baseline and longitudinal demographic and clinical information about
well-characterized patients with interstitial lung diseases (ILD) in the United States since March
2016 to allow retrospective and prospective research1. In addition, the PFFPR major objective is
to apply blood-based omics technologies (whole-genome sequencing [WGS], proteomic
analysis, and transcriptional profiling) on blood samples from patients to study molecular
markers of the onset or progression of diseases. Patients aged 18 years old who has ILD
diagnosed and had not undergone lung transplantation were recruited from approximately 42
USA sites selected primarily from the familial pulmonary fibrosis (FPF) Care Center Network.
They were followed for the progression of the disease through the lifetime of the PFFPR or the
patient until the patient receives lung transplant. More details of the PFFPR including inclusion
and exclusion criteria as well as collected clinical variables are described elsewhere1. The PFFPR
cohort includes 1317 individuals with ILD for whom WGS data are available. For this study, we
included the 917 PFFPR patients with a definitive IPF diagnosis. Family history was available for
all of them although no genetic causes were previously assessed. After the quality control
procedures, 888 of those patients remained in the study (Figure 1).
The PROFILE is a UK large, prospective, multicentre, longitudinal study conducted on patients
with fibrotic ILD2,3. The cohort includes 541 patients with IPF or idiopathic non-specific
interstitial pneumonia aged 18-85 recruited from tertiary specialist ILD and from local
secondary care hospitals. Blood samples for genomic analysis were collected and they were
followed for disease progression through 3 years. After quality control steps, the second stage
of the study included 472 patients with a confirmed diagnosis of IPF (Figure 1).
Baseline characteristics of the PFFPR and the PROFILE cohorts are listed in Supplementary
Table 1.
Supplementary bioinformatics methods
In both cohorts, several quality control (QC) analyses were performed: (i) detection of QC
outliers, (ii) the kinship between patients, (iii) sample cross-contamination, and (iv) sex
discordance. We used a combination of DRAGEN metrics, and assessments with PLINK
v1.90b6.244, SCE-VCF v0.1.2 (https://github.com/HTGenomeAnalysisUnit/SCE-VCF), Somalier
v0.2.195, and KING v2.3.26.
Detection of QC outliers: Based on PLINK analysis, we detected abnormal heterozygosity rate
and genotyping call rate to infer potential sample contaminations and/or a low DNA
concentration. A heterozygosity rate value ± 3 standard deviations from the mean and/or
genotype call rate <0.95 were considered as outliers.
Kinship between patients: We detected duplicates or monozygotic twins, and first-degree
kinship relationships with three different tools: we considered two samples as duplicates or
obtained from monozygotic twins if a PI_HAT value was >0.9 for PLINK, a Somalier relatedness
value >0.9, and a KING kinship coefficient >0.354. We considered as first-degree relatives a
PI_HAT in the range of 0.4-0.6 for PLINK, a Somalier relatedness value in the range of 0.4-0.6,
and a KING kinship coefficient in the range of 0.177-0.354. We found a complete consensus
among these tools in the cohort. Second-degree relatives were not detected.
. CC-BY 4.0 International licenseIt is made available under a
perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in(which was not certified by peer review)preprint
The copyright holder for thisthis version posted October 15, 2024. ; https://doi.org/10.1101/2024.10.12.24315151doi: medRxiv preprint
5
Sample cross-contamination: We used the “estimated_sample_contamination” parameter
from DRAGEN metrics to exclude samples with evidence ≥2% of contamination. We also used
SCE-VCF tool, which estimates contamination from VCF files using the CHARR method7, based
on the recommended thresholds to consider a sample as contaminated (CHARR > 0.03 and
INCONSISTENT_AB_HET_RATE > 0.15). We found a complete consensus among these tools in
the identification of potential sample contamination in the PFF-PR. For PROFILE, we only relied
on SCE-VCF for the sample cross-contamination inference.
Sex discordance: Biological sex inference from genetic data was obtained with Somalier
following recommendations. For that we compared the scaled mean depth on X and Y
chromosomes for 365 and 17 genomic positions, respectively. Sex discordance, identified by
comparing the genetically inferred sex with that recorded, was also used to exclude patients
from the study. In the PFF-PR, a female was identified as a possible X0 aneuploid due to the
low number of heterozygous sites on the X chromosome and was excluded from the analysis.
Supplementary results
Prevalence of qualifying variants (QV) in PROFILE
The genes with the highest burden of QVs were: RTEL1 (20.5%), TERT (15.1%), and PARN
(17.8%) (Supplementary Figure 2B, 2D). The prevalence of QVs among carriers of the risk
MUC5B genotype (rs35705950-T) was lower (14.97%) than among those carrying the
protective GG genotype (16.85%), although the difference was not statistically significant
(p=0.60). We observed the same effect direction as in PFFPR when assessing the association
between the lower PRS-IPF tertile and reduced survival (HR=1.49, 95% CI=0.14-1.95, p=3.1X10-
3) (Figure 3B).
. CC-BY 4.0 International licenseIt is made available under a
perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in(which was not certified by peer review)preprint
The copyright holder for thisthis version posted October 15, 2024. ; https://doi.org/10.1101/2024.10.12.24315151doi: medRxiv preprint
6
Supplementary tables
Supplementary Table 1. Baseline characteristics and outcomes of IPF patients from stage one and
stage two cohorts.
Characteristics PFFPR (n=888)* PROFILE (n=472)$
Age, yr, mean (SD) 71.02 (7.8) 70.65 (7.9)
Male, n (%) 676 (76.1%) 366 (77.5%)
Ethnicity, n (%)
Unknown
Asian
Black
White
17
23
10
838
-
Ever smoker, n, (%) 571 (64.3%) 326 (69.1%)
Familial history, n, (%) 176 (19.8 %) -
FVC% predicted, mean (SD) 67.74 (16.79) 78.97 (19.01)
DLCO% predicted, mean (SD) 29.3 (4.84) 44.97 (14.98)
Dead, n, (%) 337 (37.9%) 346 (73.3%)
Transplant, n (%) 139 (15.6%) -
Mean survival in years (IQR) 4.86 (3.31-6.93) 3.03 (1.7-5.71)
MUC5B genotype with risk allele 622 (70%) 294 (62.3%)
Abbreviations: PFFPR, The Pulmonary Fibrosis Foundation Patient Registry; SD, standard deviation; FVC,
Forced vital capacity; DLCO, predicted diffusing capacity of the lungs for monoxide; IQR=interquartile
range. *Missing data: FVC predicted (n=41) and DLCO predicted (n=68); $Missing data: FVC predicted
(n=12) and DLCO predicted (n=50).
. CC-BY 4.0 International licenseIt is made available under a
perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in(which was not certified by peer review)preprint
The copyright holder for thisthis version posted October 15, 2024. ; https://doi.org/10.1101/2024.10.12.24315151doi: medRxiv preprint
7
Supplementary Table 2. Regions of interest (ROIs) for qualifying
variants annotations in hg38.
Chromosome Gene Start End
5 TERT 1,253,047 1,295,168
3 TERC 169,764,420 169,765,160
14 TINF2 24,238,186 24,242,763
X DKC1 154,762,642 154,777,789
20 RTEL1 63,657,710 63,696,353
16 PARN 14,435,600 14,632,828
4 NAF1 163,109,973 163,166,990
12 ZCCHC8 122,471,500 122,501,032
8 SFTPC 22,156,813 22,164,579
10 SFTPA2 79,555,752 79,560,507
10 SFTPA1 79,610,839 79,615,555
5 SPDL1 169,583,536 169,604,878
3 KIF15 44,761,621 44,873,476
. CC-BY 4.0 International licenseIt is made available under a
perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in(which was not certified by peer review)preprint
The copyright holder for thisthis version posted October 15, 2024. ; https://doi.org/10.1101/2024.10.12.24315151doi: medRxiv preprint
8
Supplementary Table 3. Alternative definitions for qualifying variants and the rare
synonymous used for sensitivity analyses.
Ultra-
rare
PTV
Ultra-rare
Ensemble#
(PTV +
Missense +
Indel)
Rare
PTV
only
Rare
Ensemble#
(PTV +
Missense +
Indel)
Semi-
rare
PTV
only
Semi-rare
Ensemble#
(PTV +
Missense +
Indel)
Rare
synonymous^
Missense
AF* - 0 - 0.0005 - 0.01 -
PTV AF* 0 0 0.001 0.001 0.01 0.01 -
Consensus in silico prediction for missense:&
Polyphen2
Humdiv - Probably - Probably - Probably -
REVEL - >0.5 - >0.5 - >0.5 -
PrimateAI
- >0.8 - >0.8 - >0.8 -
Variants (n) 13 30 28 67 29 78 38
*Below threshold for any population in gnomAD v2.1 exomes (AFR, AMR, ASJ, EAS, FIN, NFE, OTH, SAS) or
gnomAD v3.2 genomes (AFR, AMR, ASJ, EAS, FIN, MID, NFE, OTH, SAS) or in The 1000 Genomes Project Phase 3
genomes (AFR, AMR, EAS, EUR, SAS).
&Consensus of three predictors (Polyphen2, REVEL, PrimateAI) for missense variants only if >2 out of 3, or 2 out
of 2, or 1 out of 1 filters pass. Some predictors may have missing values.
^Allele frequency cutoff of 0.0005 in any population in gnomAD v2.1 exomes (AFR, AMR, ASJ, EAS, FIN, NFE, OTH,
SAS) or gnomAD v3.2 genomes (AFR, AMR, ASJ, EAS, FIN, MID, NFE, OTH, SAS) or in The 1000 Genomes Project
Phase 3 genomes (AFR, AMR, EAS, EUR, SAS), only synonymous variants.
#Ensemble models include non-coding TERC variants selected if passing missense AF level and involved in
intramolecular base-pairing or previously described in pulmonary fibrosis or dyskeratosis congenita or hoyeraal
hreidarsson.
PTV: Protein truncating variants.
. CC-BY 4.0 International licenseIt is made available under a
perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in(which was not certified by peer review)preprint
The copyright holder for thisthis version posted October 15, 2024. ; https://doi.org/10.1101/2024.10.12.24315151doi: medRxiv preprint
9
Supplementary Table 4. Common IPF risk variants and effects considered for PRS-IPF estimation.
Locus SNP ID Chr. POSITION
(hg38)
EFFECT NON EFFECT OR P
KIF15 rs141979279 3 44,816,639 C T 1.50 1.21x10-10
TERC rs10936601 3 169,810,661 C T 0.79 2.10x10-15
FAM13A rs2013701 4 88,963,935 G T 1.25 4.60x10-16
TERT rs7725218 5 1,282,299 G A 1.41 4.90x10-32
DSP rs2076295 6 7,562,999 G T 1.49 1.50x10-48
MAD1L1 rs12699415 7 1,869,843 A G 1.27 7.85x10-18
ZKSCAN1 rs2897075 7 100,032,719 T C 1.30 1.77x10-21
DEPTOR rs28513081 8 119,921,886 A G 1.20 1.22x10-9
10q25.1 rs79684490 10 109,470,103 A G 1.40 3.52x10-8
MUC5B rs35705950 11 1,219,991 T G 5.06 9.09x10-418
ATP11A rs12585036 13 112,881,427 C T 1.29 5.99x10-14
IVD rs59424629 15 40,428,343 T G 1.27 4.98x10-19
KNL1 rs12912339 15 40,639,510 A G 1.30 7.41x10-13
AKAP13 rs62023891 15 85,553,985 A G 1.18 1.32x10-8
NPRL3 rs74614704 16 112,241 A G 1.49 2.57x10-12
17q21.31 rs3785884 17 45,980,229 G A 1.40 2.53x10-20
DPP9 rs35574495 19 4,686,976 G T 0.80 1.08x10-9
STMN3 rs112087793 20 63,652,817 C T 1.34 1.09x10-8
RTEL1 rs41308092 20 63,693,038 A G 1.75 3.13x10-9
SNP: Single nucleotide polymorphism; Chr.: chromosome; OR: odds ratio; P: significance in the original study
. CC-BY 4.0 International licenseIt is made available under a
perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in(which was not certified by peer review)preprint
The copyright holder for thisthis version posted October 15, 2024. ; https://doi.org/10.1101/2024.10.12.24315151doi: medRxiv preprint
10
Supplementary Table 5. Common telomere length variants and effects considered for
PRS-TL estimation.
Locus SNP ID CHR POSITION
(hg38) EFFECT NON
EFFECT BETA P
PARP1 rs3219104 1 226374920 C A 0.042 9.60 x 10-11
TERC rs10936600 3 169796797 T A -0.086 7.18 x 10-51
NAF1 rs4691895 4 163127047 C G 0.058 1.58 x 10-21
TERT rs7705526 5 1285859 A C 0.082 5.34 x 10-45
TERT rs2853677 5 1287079 A G -0.064 3.35 x 10-31
POT1 rs59294613 7 124914213 A C -0.041 1.17 x 10-13
STN1 rs9419958 10 103916188 C T -0.064 5.05 x 10-19
ATM rs228595 11 108234866 A G -0.029 1.43 x 10-8
DCAF4 rs2302588 14 72938044 C G 0.048 1.68 x 10-8
MPHOSPH6 rs7194734 16 82166375 T C -0.037 6.94 x 10-10
ZNF208 rs8105767 19 22032639 G A 0.039 5.42 x 10-13
RTEL1/
STMN3 rs75691080 20 63638397 T C -0.067 5.99 x 10-14
RTEL1 rs34978822 20 63660246 G C -0.140 7.26 x 10-10
RTEL1/
ZBTB46 rs73624724 20 63805045 C T 0.051 6.33 x 10-12
SENP7 Rs551442 3 101346524 T C -0.037 2.45 x 10-8
MOB1B rs13137667 4 70908630 C T 0.077 2.43 x 10-8
CARMIL1 rs34991172 6 25480100 G T -0.061 6.19 x 10-9
PRRC2A rs2736176 6 31619784 C G 0.035 3.53 x 10-10
TERF2 rs3785074 16 69373083 G A 0.035 4.64 x 10-10
RFWD3 rs62053580 16 74646176 G A -0.039 4.08 x 10-8
SNP: Single nucleotide polymorphism; CHR: chromosome; OR: odds ratio; P: significance in the original study
. CC-BY 4.0 International licenseIt is made available under a
perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in(which was not certified by peer review)preprint
The copyright holder for thisthis version posted October 15, 2024. ; https://doi.org/10.1101/2024.10.12.24315151doi: medRxiv preprint
11
Supplementary Figures
Supplementary Figure 1. Principal component analysis. A) Plot of the first two (left) and the second and third (right)
principal components of genetic variation of IPF patients in the PFFPR. B) Plot of the first two (left) and the second
and third (right) principal components of genetic variation of IPF patient in PROFILE. C) Proportion of variance
explained by each PC (PFFPR on the right, and PROFILE on the left).
. CC-BY 4.0 International licenseIt is made available under a
perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in(which was not certified by peer review)preprint
The copyright holder for thisthis version posted October 15, 2024. ; https://doi.org/10.1101/2024.10.12.24315151doi: medRxiv preprint
12
Supplementary Figure 2. Distribution of qualifying variants (QV) in monogenic adult-onset pulmonary fibrosis (PF)
genes in the PFFPF and PROFILE cohorts. A) Total QVs in monogenic adult-onset PF genes in the PFFPR. B) Total QVs
in monogenic adult-onset PF genes in the PROFILE cohort. C) Variants classified in P/LP/VUS per gene in the PFFPR.
D) Variants classified in P/LP/VUS per gene in the PROFILE cohort. T: Telomere; N-T: Non telomere.
. CC-BY 4.0 International licenseIt is made available under a
perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in(which was not certified by peer review)preprint
The copyright holder for thisthis version posted October 15, 2024. ; https://doi.org/10.1101/2024.10.12.24315151doi: medRxiv preprint
13
Supplementary Figure 3. Association between prevalence of qualifying variants (QV) and PRS-IPF in the PFFPR. A)
Distribution of carriers (1) and non-carriers (0) in low and high PRS-IPF. B) Risk of carrying a QV in patients with low
polygenic risk in comparison with individuals with high polygenic risk. The odds ratio (OR) and the 95% confidence
interval (CI) were estimated using logistic regression adjusted by age of diagnosis, sex, and the two main principal
components.
. CC-BY 4.0 International licenseIt is made available under a
perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in(which was not certified by peer review)preprint
The copyright holder for thisthis version posted October 15, 2024. ; https://doi.org/10.1101/2024.10.12.24315151doi: medRxiv preprint
14
Supplementary Figure 4. Association between prevalence of qualifying variants (QV) and PRS-IPF (after excluding
the MUC5B locus) in the PFFPR. A) Distribution of PRS-IPF in carriers (1) and non-carriers (0). Vertical dotted lines
represent the mean value of the distribution. B) Risk of carrying a QV for patients with low polygenic risk (T1) and
high polygenic risk (T3) compared to those in the middle tertile. The odds ratios (OR) and the 95% confidence
intervals (CI) were estimated using logistic regression adjusted for age of diagnosis, sex, and the two main principal
components.
. CC-BY 4.0 International licenseIt is made available under a
perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in(which was not certified by peer review)preprint
The copyright holder for thisthis version posted October 15, 2024. ; https://doi.org/10.1101/2024.10.12.24315151doi: medRxiv preprint
15
Supplementary Figure 5. Association between the prevalence of qualifying variants (QV) and PRS-TL in the PFFPR.
Distribution of PRS-TL in carriers (1) and non-carriers (0). Vertical dotted lines represent the mean value of the
distribution A) Carriers (1) and non-carriers (0) in telomere and non-telomere genes. B) Carriers (1) and non-carriers
(0) in telomere genes. T-test: Student's t-test; KS: Kolmogorov-Smirnov test.
. CC-BY 4.0 International licenseIt is made available under a
perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in(which was not certified by peer review)preprint
The copyright holder for thisthis version posted October 15, 2024. ; https://doi.org/10.1101/2024.10.12.24315151doi: medRxiv preprint
16
Supplementary Figure 6. Association between prevalence of qualifying variants (QV) in telomere and non-
telomere genes and PRS-TL in the PFFPR. A) Distribution of carriers (1) and non-carriers (0) in PRS-TL tertiles. B) Risk
of carrying a QV for individuals with low polygenic risk (T1) and high polygenic risk (T3) compared to those in the
middle tertile. C) Distribution of carriers (1) and non-carriers (0) in high and low PRS-TL. D) Risk of carrying a QV in
patients with high polygenic risk in comparison with patients with low polygenic risk. The odds ratios (OR) and the
95% confidence intervals (CI) were estimated using logistic regression adjusted by age of diagnosis, sex, and the two
main principal components.
. CC-BY 4.0 International licenseIt is made available under a
perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in(which was not certified by peer review)preprint
The copyright holder for thisthis version posted October 15, 2024. ; https://doi.org/10.1101/2024.10.12.24315151doi: medRxiv preprint
17
Supplementary Figure 7. Association between prevalence of qualifying variants (QV) in telomere genes and PRS-
TL in the PFFPR. A) Distribution of carriers (1) and non-carriers (0) in PRS-TL tertiles. B) Risk of carrying a QV for
individuals with low polygenic risk (T1) and high polygenic risk (T3) compared to those in the middle tertile. C)
Distribution of carriers (1) and non-carriers (0) in high and low PRS-TL. D) Risk of carrying a QV in individuals with
high polygenic risk in comparison with individuals with low polygenic risk. The odds ratios (OR) and the 95%
confidence intervals (CI) were estimated using logistic regression adjusted by age of diagnosis, sex, and the two main
principal components.
. CC-BY 4.0 International licenseIt is made available under a
perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in(which was not certified by peer review)preprint
The copyright holder for thisthis version posted October 15, 2024. ; https://doi.org/10.1101/2024.10.12.24315151doi: medRxiv preprint
18
Supplementary Figure 8. Kaplan-Meier survival analysis for qualifying variants (QV) (per gene and group of genes)
and the MUC5B risk allele in the PFFPR. p-values for the log-rank test are shown.
. CC-BY 4.0 International licenseIt is made available under a
perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in(which was not certified by peer review)preprint
The copyright holder for thisthis version posted October 15, 2024. ; https://doi.org/10.1101/2024.10.12.24315151doi: medRxiv preprint
19
Supplementary Figure 9. Qualifying variants (QV) effect on survival in the PFFPR (excluding carriers of QV
within PARN). All analysis were performed using Cox regression models adjusted for sex, age of diagnosis, the
two main principal components, MUC5B risk allele, smoking history, forced vital capacity (FVC) % predicted,
and diffusing capacity for carbon monoxide (DLCO) % predicted. The X-axis shows Hazard-ratios (HR); the grey
line corresponds to the HR=1.0. The boxes correspond to adjusted HR and horizontal lines correspond to 95%
confidence intervals (CI).
. CC-BY 4.0 International licenseIt is made available under a
perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in(which was not certified by peer review)preprint
The copyright holder for thisthis version posted October 15, 2024. ; https://doi.org/10.1101/2024.10.12.24315151doi: medRxiv preprint
20
Supplementary Figure 10. Alternative qualifying variants (QV) classifications and effects on survival in the PFFPR.
All analysis were performed using Cox regression models adjusted for sex, age of diagnosis, the two main principal
components, MUC5B risk allele, smoking history, forced vital capacity (FVC) % predicted, and diffusing capacity for
carbon monoxide (DLCO) % predicted. The X-axis shows Hazard-ratios (HR); the grey line corresponds to the HR=1.0.
The boxes correspond to adjusted HR and horizontal lines correspond to 95% confidence intervals (CI).
. CC-BY 4.0 International licenseIt is made available under a
perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in(which was not certified by peer review)preprint
The copyright holder for thisthis version posted October 15, 2024. ; https://doi.org/10.1101/2024.10.12.24315151doi: medRxiv preprint
21
Supplementary Figure 11. Association between PRS-TL tertiles and survival in the PFFPR. A) Kaplan-Meier survival
analysis for PRS-TL tertiles (p-value for the log-rank test is shown). B) PRS-TL effect on survival. The analysis was
performed using Cox regression models adjusted for sex, age of diagnosis, the two main principal components,
smoking history, forced vital capacity (FVC) % predicted, and diffusing capacity for carbon monoxide (DLCO) %
predicted. The X-axis shows Hazard-ratios (HR); the grey line corresponds to the HR=1.0. The boxes correspond to
adjusted HR and horizontal lines correspond to 95% confidence intervals (CI).
. CC-BY 4.0 International licenseIt is made available under a
perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in(which was not certified by peer review)preprint
The copyright holder for thisthis version posted October 15, 2024. ; https://doi.org/10.1101/2024.10.12.24315151doi: medRxiv preprint
22
Supplementary Figure 12. Association between high and low PRS-TL and survival in the PFFPR. A) Kaplan-Meier
survival analysis for high/low risk PRS-TL (p-value for the log-rank test is shown). B) PRS-TL effect on survival. The
analysis was performed using Cox regression models adjusted for sex, age of diagnosis, the two main principal
components, smoking history, forced vital capacity (FVC) % predicted, and diffusing capacity for carbon monoxide
(DLCO) % predicted. The X-axis shows Hazard-ratios (HR); the grey line corresponds to the HR=1.0. The boxes
correspond to adjusted HR and horizontal lines correspond to 95% confidence intervals (CI).
. CC-BY 4.0 International licenseIt is made available under a
perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in(which was not certified by peer review)preprint
The copyright holder for thisthis version posted October 15, 2024. ; https://doi.org/10.1101/2024.10.12.24315151doi: medRxiv preprint
23
Supplementary Figure 13. Association of PRS-IPF (after excluding the MUC5B locus) and survival in the PFFPR.
Kaplan-Meier analysis showing p-values for the log-rank test.
. CC-BY 4.0 International licenseIt is made available under a
perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in(which was not certified by peer review)preprint
The copyright holder for thisthis version posted October 15, 2024. ; https://doi.org/10.1101/2024.10.12.24315151doi: medRxiv preprint
24
Supplementary Figure 14. Associations between PRS-IPF and MUC5B rs35705950 genotypes with survival among
carriers and non-carriers of qualifying variants (QV) in the PFFPR. A) Association between PRS-IPF and survival in
carriers. B) Association between PRS-IPF and survival in non-carriers. C) Association between MUC5B rs35705950
genotypes and survival in carriers. D) Association between MUC5B rs35705950 genotypes and survival in non-
carriers. Kaplan-Meier analysis, showing p-values for the log-rank test.
. CC-BY 4.0 International licenseIt is made available under a
perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in(which was not certified by peer review)preprint
The copyright holder for thisthis version posted October 15, 2024. ; https://doi.org/10.1101/2024.10.12.24315151doi: medRxiv preprint
25
Supplementary Figure 15. Association between prevalence of qualifying variants (QV) and PRS-IPF in PROFILE. A)
Distribution of PRS-IPF in carriers (1) and non-carriers (0). Vertical dotted lines represent the mean value of the
distribution. B) Distribution of carriers (1) and non-carriers (0) in high and low PRS-IPF.
. CC-BY 4.0 International licenseIt is made available under a
perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in(which was not certified by peer review)preprint
The copyright holder for thisthis version posted October 15, 2024. ; https://doi.org/10.1101/2024.10.12.24315151doi: medRxiv preprint
26
Supplementary Figure 16. Kaplan-Meier survival analysis for qualifying variants (QV) (per gene and group PF
genes) and the MUC5B risk allele in PROFILE. p-values for the log-rank test are shown.
. CC-BY 4.0 International licenseIt is made available under a
perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in(which was not certified by peer review)preprint
The copyright holder for thisthis version posted October 15, 2024. ; https://doi.org/10.1101/2024.10.12.24315151doi: medRxiv preprint
27
Supplementary references
1. Wang, B. R. et al. The Pulmonary Fibrosis Foundation Patient Registry. Rationale, Design, and
Methods. Ann Am Thorac Soc 17, 1620–1628 (2020).
2. Maher, T. M. PROFILEing idiopathic pulmonary fibrosis: rethinking biomarker discovery.
European Respiratory Review 22, 148–152 (2013).
3. Maher, T. M. et al. An epithelial biomarker signature for idiopathic pulmonary fibrosis: an
analysis from the multicentre PROFILE cohort study. The Lancet Respiratory Medicine 5,
946–955 (2017).
4. Chang, C. C. et al. Second-generation PLINK: rising to the challenge of larger and richer
datasets. GigaScience 4, s13742-015-0047–8 (2015).
5. Pedersen, B. S. et al. Somalier: rapid relatedness estimation for cancer and germline studies
using efficient genome sketches. Genome Medicine 12, 62 (2020).
6. Manichaikul, A. et al. Robust relationship inference in genome-wide association studies.
Bioinformatics 26, 2867–2873 (2010).
7. Lu, W. et al. CHARR efficiently estimates contamination from DNA sequencing data. The
American Journal of Human Genetics 110, 2068–2076 (2023).
. CC-BY 4.0 International licenseIt is made available under a
perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in(which was not certified by peer review)preprint
The copyright holder for thisthis version posted October 15, 2024. ; https://doi.org/10.1101/2024.10.12.24315151doi: medRxiv preprint