Deconstructing evolutionary histories of complex genomic rearrangements in lung malignancies

preprint OA: closed CC-BY-NC-ND-4.0
📄 Open PDF Full text JSON View at publisher
Full text 91,327 characters · extracted from oa-pdf · 3 sections · click to expand

Abstract

Somatic genomic rearrangements are hallmarks of cancer. Complex genomic rearrangements (CGRs) involving multiple intertwined structural alterations are often present in tumor genomes . CGRs frequently harbor oncogenic drivers, but their genomic architectures and etiologies are poorly understood. We used deep- coverage optical mapping technology to profile the genomic landscapes of normal lung tissues, benign pulmonary lesions, carcinoma in situ, and advanced carcinomas to examine the patterns of genome disorganization and instability in different stages of carcinogenesis in lung. Large rearrangements and CGRs were prevalent in the carcinomas. We developed omcplR to resolve the architecture of CGRs and predict their genesis from optical mapping data using genome -graph concept. We found that CGRs often aro se from hierarchical combinations of multiple, localized simple structural variations, and harbored allelic heterogeneity at the affected loci. Rearrangement patterns and associated genomic signatures suggested that chromoanasynthesis was a likely prevalent mechanism driving complex genomic rearrangements. The early rearrangement junctions in intra- chromosomal CGRs were usually localized within the same chromatin domains, but the late junctions in advanced tumors had more heterogeneous contexts suggesting progressive organizational heterogeneity. The CGRs, especially the late events therein were under positive selection. A composite signature of genomic alterations including the CGRs captured the trajectory of progressive genomic disorganization and instability with carcinogenesis in lung and underscored the extent of genomic structural heterogeneity among the in-situ tumors.

Introduction

Genomic instability and mutagenesis are hallmarks of cancer, and most tumors have rearranged genomes 1-4. While more than half of somatic point mutations and small InDels detected in cancer genomes are suspected to arise prior to malignancy during development and aging5, genomic rearrangements show very different patterns – those are rare in normal tissues but prevalent in tumors . Most genomic rearrangements in tumors are structurally simple , but complex genomic rearrangements (CGR) i.e. genomic reorganization events involving multiple intertwined structural rearrangements are common in all major cancer types 4. Recent pan-cancer studies, primarily based on short read sequencing, have identified different mechanisms of complex rearrangements such as chromothripsis, break-bridge- fusion, chromoplexy, etc – which have been collectively termed as chromoanagenesis2,6-8. Functional significance of CGRs in the context of carcinogenesis appears to be high . Focal amplifications of many known oncogenic drivers (e.g. MET, EGFR, ERBB2) frequently involve complex rearrangements, and their allelic heterogeneity provide s substrates for adaptive evolution and acquired therapeutic resistance4,9. A subset of the complex rearrangements may be associated with micronuclei or extrachromosomal circularized DNA copies, which are subject to unequal segregation during cell-division, allowing rapid and reversible gains or losses of copy number that can confer dynamic intra-tumor heterogeneity, facilitating rapid tumor evolution, and adaptive treatment resistance10. Despite their clinical significance, the genomic architectures and molecular mechanisms behind the CGRs are poorly understood. Traditional DNA FISH can detect translocations and copy number alterations but lacks the resolution to resolve the structure of complex rearrangements. Short read sequencing is routinely used for profiling tumor genomes, but it has been debated 10,11 whether this technology can effectively identify rearrangement junctions in repetitive regions or the connected genomic alteration events occurring on the same allelic haplotype, especially if such events are kb- Mb apart. Long read technology allows sequencing long DNA fragments, but efforts to r econstruct the architecture of complex genomic arrangements have been lagging. To address this gap, here we took advantage of ultra-long read DNA profiling using ultra-long optical mapping (OM) technology and developed omcplR to resolve the architectures of CGRs. .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted November 12, 2024. ; https://doi.org/10.1101/2024.11.11.623122doi: bioRxiv preprint .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted November 12, 2024. ; https://doi.org/10.1101/2024.11.11.623122doi: bioRxiv preprint .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted November 12, 2024. ; https://doi.org/10.1101/2024.11.11.623122doi: bioRxiv preprint .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted November 12, 2024. ; https://doi.org/10.1101/2024.11.11.623122doi: bioRxiv preprint .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted November 12, 2024. ; https://doi.org/10.1101/2024.11.11.623122doi: bioRxiv preprint .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted November 12, 2024. ; https://doi.org/10.1101/2024.11.11.623122doi: bioRxiv preprint .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted November 12, 2024. ; https://doi.org/10.1101/2024.11.11.623122doi: bioRxiv preprint .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted November 12, 2024. ; https://doi.org/10.1101/2024.11.11.623122doi: bioRxiv preprint .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted November 12, 2024. ; https://doi.org/10.1101/2024.11.11.623122doi: bioRxiv preprint .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted November 12, 2024. ; https://doi.org/10.1101/2024.11.11.623122doi: bioRxiv preprint .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted November 12, 2024. ; https://doi.org/10.1101/2024.11.11.623122doi: bioRxiv preprint .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted November 12, 2024. ; https://doi.org/10.1101/2024.11.11.623122doi: bioRxiv preprint .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted November 12, 2024. ; https://doi.org/10.1101/2024.11.11.623122doi: bioRxiv preprint .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted November 12, 2024. ; https://doi.org/10.1101/2024.11.11.623122doi: bioRxiv preprint .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted November 12, 2024. ; https://doi.org/10.1101/2024.11.11.623122doi: bioRxiv preprint .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted November 12, 2024. ; https://doi.org/10.1101/2024.11.11.623122doi: bioRxiv preprint 2 Unusual complexity of the genomic architecture of the CGR events raises several fundamental questions: when and how do these events arise during tumorigenesis? Are these events typically associated with genomic instability and remain under selection? Answers to these questions require examining the genomic landscapes of premalignant and malignant lesions to assess the complexity of the CGRs considering progressive loss of genomic organization and onset of genome instability during malignancy 12,13. Unfortunately, early tumor development is typically asymptomatic, not all benign growth progress to advanced disease stages, and the biobanking of relevant premalignant lesions is typically limiting. Lung malignancies are notable exceptions; pulmonary nodules are often detected during the lung cancer screening, but only a subset of the pulmonary nodules of indeterminate potential progress to become malignant (Figure 1A), even though all benign tumors progress to become cancerous 14,15; classifying the nodules with indeterminate potential remains challenging. We use OM to profile the genomes of tissue samples from different stages of lung malignancies to identify the CGRs and then apply omcplR to examine their etiologies and evolutionary histories.

Results

Cohort characteristics and pathobiology of the samples We obtained fresh frozen tissue samples from pathologically normal lung tissues, benign pulmonary lesions , carcinoma in situ, locally invasive carcinomas, and metastatic lung carcinomas from a cohort of deidentified donors (Figure 1B) under an IRB approved protocol. The pathologically normal tissues were obtained from the cancer patients, and the benign pulmonary lesions were collected from preventive surgeries. The pathologically benign lesions were typically identified as suspicious nodules of indeterminate potential and carried additional characteristics that are associated with elevated cancer risk. For instance, S4 was a well-circumscribed 0.5 x 0.5 x 0.4 cm pulmonary nodule collected via wedge resection, which was considered non-malignant. S5 was a mass measuring 1.5 x 1.3 x 1.2 cm, with ill -defined boundaries, firm and had diffuse interstitial involvement by non -necrotizing granulomatous inflammation where the stains for infectious agents were negative. S6 was characterized by a calcified hard nodule of 3.5 x 2 x 1.1 cm with calcification, and focal atypical cells, suggestive of abnormal pulmonary growth. This patient had a prior history of squamous cell carcinoma and received chemoradiation, the malignant tumor was removed via wedge resection and did not overlap the nodule . Regional inflammation was noted for several benign lesions, but stains for infectious agents were negativ e. Among the cancerous lesions about a third were presented with locally invasive and/or metastatic disease, while others were carcinoma in-situ. Histological characteristics indicated the presence of papillary, lepidic, and solid subtypes, and some tumors had mixed histology. For instance, S9 had an infiltrating adenocarcinoma of lung of 2.0 x 1.7 x 1.9 cm with lepidic (50%), papillary (5%), and acinar (45%) pattern characteristics; lymph node or pleural invasion not detected. The patient had a prior history of prostate cancer . S17 was diagnosed with invasive 1.6 cm mucin-producing adenocarcinoma, with mixed subtypes consisting of lepidic, acinar, and papillary patterns associated with granulomatous inflammation; invasion of visceral pleura was noted . S20 was a n invasive, moderately differentiated lung adenocarcinoma, acinar predominant, and had lymph node metastasis (pT1c, pN1). OM-based genomic profiling of the cohort We performed optical genome mapping at 400X effective coverage on the frozen samples using BioNano Saphyr platform16. The optical mapping (OM) data was processed and analyzed after multiple QC filters ( see Methods for details). The overall OM data quality was excellent; on average, the length of the input maps was 2375219 ±526469.2 Mb, with an N50 length of 205.35±47.30 kb, and more than 91.55% ± 6.52% of the genome maps could be mapped to the human reference genome (GRCh38), achieving a genome coverage 351.76±45.90. Some genomic regions had multiple contigs representing multiple alternate alleles due to the presence of one more genomic alteration events. We applied the BNG rare variant analysis pipeline to identify different classes of structural variations and copy number alterations. The landscapes of genomic rearrangements and other alterations in these samples are presented using circus plots ( Figure 1B). The insertions (median: 5.45kb) and deletions (median: 16.79kb) were the most prevalent variants reported in the dataset , while intra -chromosomal fusions and inter -chromosomal translocations were less common in the cohort, and more so in the non-malignant samples. For comparative assessment and validation, we performed whole genome sequencing (WGS) of 4 samples -2 sets of advanced stage carcinomas (S8 and S18) and paired normal tissues (S1 and S2) using 150bp short read PE sequencing on the Illumina platform (see Methods for details). The numbers of detectable somatic point mutations (S8: 33,541, S18: 18,445) and InDels (S8: 1396, S18: 570) were within the typical range expected for lung carcinomas 17. WGS .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted November 12, 2024. ; https://doi.org/10.1101/2024.11.11.623122doi: bioRxiv preprint Short DNA fragments (102-103bp) Frozen biospecimen Ultra -long DNA fragments (10 4-105bp) Cohort Optical mapping with BioNano Saphyr Short read sequencing with Illumina Landscape of genomic alterations Genomic locus with complex genomic rearrangement. Multiple contigs showing overlapping genomic alterations Inference about the genesis of complex genomic rearrangement A B Normal bronchia Benign pulmonary lesion Carcinoma in situ Invasive carcinoma Metastatic carcinoma S1 S2 S3 S4 S5 S6 S7 S8 S9 S10 S11 S12 S13 S14 S15 S16 S17 S18 S19 S20 Figure 1: A) Outline of the study design profiling genomic landscapes of pathologically normal lung tissues, benign dysplastic and hyperplastic lesions, and malignant lung carcinoma samples. B) Circos plots showing somatic genomic structural alterations and rearrangements in the samples in the cohort, grouped according to the disease stages. C) Representative genomic alteration in a sample (S18) is shown at high resolution, and corresponding variation in sequencing coverage in Illumina short read sequencing of the tumor is highlighted (tumor: orange, paired normal: blue). The genomic resolution of optical mapping and short read sequencing are different. The affected and adjoining genomic regions and gene annotations therein are shown below. S18: chr17:33723835 -35948096C Short read sequencing coverage Tumor Normal Optical mapping Genomic rearrangement Tumor .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted November 12, 2024. ; https://doi.org/10.1101/2024.11.11.623122doi: bioRxiv preprint 3 using short read technology identified several structural variations in the tumor samples (n=40 for S8, and n=15 for S18), but the frequency was lower than that identified using the OM technology. Furthermore, a vast majority of the high confidence somatic structural variations in tumors identified on the short-read platform were below the size threshold for detection using the OM technology. Among the large structural variations within detection range of both technologies, those called by WGS were also detected by OM . For instance, S18 has one deletion in chr17: 33.7Mb-35.9Mb that found in both platform ( Figure 1C), we totally identified four SVs (two inversions and two deletions), which were identified by both platforms . In contrast, most of the high confidence, high allele frequency (>10%) OM structural variation calls were not identified by WGS. In some cases, genomic evidence on the Illumina platform was below the confidence level for detection threshold (also see Discussion), while some others had rearrangement junctions in the repetitive or low complexity sequences that are difficult to resolve using short read technologies. Nonetheless, our observations about the extent of concordance between the platforms are similar to other studies18,19. A comparative analysis of the genomic landscapes of the samples profiled using OM across different disease stages indicated that there is an increased level of genomic rearrangements with progressively advanced disease (Figure 1B). In particular, the large copy number alterations and inter-chromosomal rearrangements were rare in pathologically normal tissues and benign samples, but prevalent in the in-situ and advanced carcinomas. This is qualitatively consistent with the observations on the mutational landscapes of healthy, diseased, and malignant lung samples profiled using short read technologies17,20. While a significant burden of somatic point mutations and small InDels in lung cancer genomes arise prior to malignancy during development and aging 5,12,15, these observations collectively suggest that large genomic rearrangements are typically uncommon in non-malignant tissues. Contig reconstruction based on the l ong OM traces allowed detection of both junctions of rearrangements events , even if those are 104-106bp apart, enabling end-to-end characterization and allelic reconstruction of the genomic alteration events. Based on the contig -level data, we further identified the instances where multiple rearrangement and/or copy number alteration events overlap or occur within the same OM contig and tagged those as clustered SVs. We identified 25-36 clustered SVs in the benign lesions, 24-59 in carcinoma in situ, and 32-59 in the invasive tumors and those with metastasis , respectively . Most clustered SVs consist ed of relatively small, localized structural variations, including deletions, insertions, duplications, and inversions, but some of clusters spanned local (~hundreds of kb) or distal (~tens of Mb) regions of chromosomes, and/or involved inter -chromosomal rearrangement events. The contig -level OM data revealed that some of those clustered SVs were due to multiple entangled genomic alteration breakpoints occurring on the same allelic contig at the locus representing complex genomic rearrangements. Long sequences captured by Bionano OM allowed us to identify and annotate known cancer genes proximal to genomic rearrangement junctions. Clustered genomic rearrangements proximal to U2AF1, IRF4, ZNF521, PDE4DIP, HIP1, WWTR1, NOTCH2, NRG1, and MAPK1 were detected in multiple samples across different disease stages. U2AF1 and HIP1 proximal rearrangement events were common in the distant and invasive carcinomas, although the preferential enrichment in advanced tumors was not statistically significant. Considering the genomic resolution of the OM data and absence of point mutation data, this gene-set does not represent a complete set of oncogenic drivers in these tumors. Receptor tyrosine kinase (RTK)/RAS/RAF pathway is frequently mutated in non -small cell lung carcinomas, while MET and ERBB2 focal amplifications are also common17. Although we did not detect these focal amplifications in our cohort using OM platform, reanalyzing existing single cell transcriptomic data28 for the samples in the cohort , we assessed relevant oncogenic pathway activities. S17 and S20, both of which are advanced carcinomas, had prominently elevated expression of the EGFR pathway, suggesting activated potential RTK signaling. Interestingly, high EGFR expression was also observed in S7, which is a benign lesion. S19, S11, and S10 had high KRAS expression, which might be due to RAS-driven tumorigenesis. Notably, S19 is an acinar carcinoma, while S10 is of papillary subtype - both subtypes are known to have frequent KRAS mutations21. omcplR for characterization of complex genomic rearrangements We developed omcplR, a framework to identify complex genomic rearrangements (CGRs) and resolve their genomic structures (see Methods). As the first step, omcpIR divided the genome into non-overlapping, equal-sized segments, and scanned them to identify the genomic segments having multiple contigs that harbor different sets of structural variation (SV) junctions. It considered those genomic segments, as well as their corresponding contigs as the initial seeds. For each seed, it then iter atively identified other co -occurring contigs, which were subsequently used in the .CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted November 12, 2024. ; https://doi.org/10.1101/2024.11.11.623122doi: bioRxiv preprint Figure 2: A) A schematic representation of the computational pipeline to identify complex genomic rearrangements and infer underlying order of events. B) Benchmarking using simulated datasets where the order of rearrangement events are known in advance. abcdef Clonal size Time (generation) ab cd ef Inferred order of events a bcd cdab ab cdabef f e contigs refContig-level data Temporal clonal dynamics Inference on simulated dataset with known order of events t contigs Rearrangement junctionsJP1-JP2JP4-JP5JP5-JP6JP3-JP6JP4-JP6 C1C2C3C4 Contig –rearrangement junction relationship JP1-JP2 JP4-JP5JP5-JP6JP3-JP6 JP4-JP6 Inferred Bayesian network JP1JP2JP3JP4JP5JP6DNA BreakpointsRef. GenomeContigs with rearrangement junctionsC1C2C3 C4 Locus of complex rearrangementAcontigsjunctions rij : OM coverage supporting a junction r11 r12 0 0 0r21 0 r23 0 0r31 0 0 r34 00 0 r43 0 r45 Contig –rearrangement junction matrix with coverage Annotated Order-of-Events JP1-JP2 JP4-JP5JP5-JP6JP3-JP6 JP4-JP6 Genomic, epigenomic, and functional context annotation C1C2C3 ab cd efC1C2C3 B Contig –rearrangement junction matrix with coverage 4 next iteration. The recursive search continued until no new contig was identified for a given seed. SV junctions were then pulled from each cluster contigs. Unique contigs were retained as alternative alleles that differed in their content, orientation, or combinations of underlying structural variations. After the processing, each non-redundant set represented a complex genomic rearrangement (CGR) event. Within the CGR events, some of the contigs carried multiple rearrangement junctions, and on the other hand, a subset of the rearrangement junctions was shared across multiple contigs. omcpIR used the joint patterns of contigs and rearrangement junctions to infer the conditional relationships among the junctions, and thereby predict the likely order of precedence of rearrangement events therein (Figure 2A). For each CGR event, omcpIR estimated the frequency of pairwise co-occurrence between the junctions, and then used a Bayesian approach to determine conditional relationships among all possible junction pairs – such that the CGR event was graphically represented as a Bayesian network, where the nodes were SV junctions and the pairs of junctions with significant conditional relationships related to edges. The likely sequence of rearrangement events during tumorigenesis could be inferred from the topology of the network top-down – early rearrangement junction events were at or near the top of the graphs, while downstream events were below, such that one can annotate the rearrangement junctions within a hierarchy as early, intermediate, or late (Figure 2A). Outside the direct decedent relationships, the order of events inference needs cautious interpretation. If a complex event has multiple disjoint graphs, no statistical inference can be made regarding the relative temporal order of occurrence of rearrangement events across the graphs. Likewise, no inference could be made regarding temporal order between different unrelated CGR events in a cancer genome. We first evaluated the performance of omcpIR using computational simulation where the ground truth about the order of events was known. We simulated population dynamics of clonally derived cells with the Galton Watson branching process (see Methods), where the cells had certain probabilities of generating genomic rearrangements at each round of cell division, which are then passed on to its progenies. Thus, a cell lineage may acquire one or more simple rearrangement events during a cell division, and then the genomic region impacted by such events may accumulate further genomic alterations in subsequent cell divisions in the progenies, resulting in a complex event. Alternately, a genomic region may occasionally accumulate multiple entangled rearrangements in a single, rare catastrophic event. While a single catastrophic CGR event would have comparable allele frequencies of the interconnected events, those that arose due to temporally distinct events would have different allele frequencies within the same CGR. Using different model parameters, representing different evolutionary scenarios and noise-level in the OM data, we propagated the clonal lineages in the simulation keeping track of the CGRs, their order of events, and tabulated the final allele frequencies of the genomic alterations therein (Figure 2B). We found that omcpIR could correctly identify the order of events in complex CGRs. The order inference was robust when the timing of emergence of the events along a clonal lineage are sufficiently spaced in time. But the model did not always distinguish the events that arose in a single catastrophic event from those that arose in quick succession in time, since those may have negligible variations in allele frequencies in a realistic dataset. Next, we scanned the somatic genomes of the premalignant lesions, as well as localized and advanced carcinomas in our cohort using omcpIR to identify the CGR events and predict their likely order of events. Most events were heterologous i.e. involving multiple classes of genomic alterations and involved multiple overlapping contigs – which reflected diversity and heterogeneity within the CGR events. A majority of the CGR events were intra-chromosomal localized within 1Mb regions, such that the rearrangements were within the same topological domains and might not perturb long-range functional interactions within the rearranged regions. Distal (>1Mb) intra-chromosomal events were more common in malignant tumors. As we discuss later, the burden of inter-chromosomal complex events increased from non-malignant to malignant samples. As a case study, we showcase a complex genomic rearrangement event located in chromosome 15 between 23.13Mb-28.62Mb of distant tumor S20, which is an advanced carcinoma (Figure 3A; see Methods for details). This CGR involved 7 alternative DNA contigs that harbored different combinations of structural variations, often involving common subsets of rearrangement junctions. We show the joint distributions of the rearrangement junctions on the contigs using a circus plot, where the width of the arcs indicates the extent of co-occurrence of rearrangement junction-pairs in that sample, involving the most prevalent joint junctions: Chr15_28.42MB||Chr15_28.42MB - Chr15_28.50MB||Chr15_28.62MB. Gene-level annotation suggested that the CGR impacted the genes GOLGA8DP, GOLGA6L2, GOLGA6L7 and MKRN3. The rearrangement junctions in this CGR were within the same chromatin 5 domain and shared similar (epi)genomic contexts (Figure 3B). All the junctions were in the strong euchromatin context in nuclear core, and nearly all of them overlapped with repetitive elements (2 junctions lacked nuclear context annotations, Chr15:23.13MB||Chr15:23.13MB and Chr15:23.36MB||Chr15:28.34MB). Close sequence-level and topological proximity of all involved rearrangement junctions are consistent with a model that DNA breaks within a replication factory and recursive rearrangements could give rise to such events (Figure 3C). Furthermore, the order of events analysis of this complex region indicated that the segment junctions Chr15:23.13MB||Chr15:23.13MB and Chr15:23.59MB||Chr15:28.51MB might be the early event (Figure 3B). Close sequence-level and topological proximity of all involved rearrangement junctions within the same chromatin domain context suggest that the DNA breaks likely occurred within a common replication factory, and we speculate that recursive DNA breaks perhaps due to persistent replication factory crisis22 might lead to such events (Figure 3C). As another case study, we examine a CGR in the in-situ tumor S8, chromosome 9 between 62.81Mb-68.23Mb that impacted the genes GXYLT1P3, TMEM252, CBWD5, FOXD4L3, FOXD4L4, LERFS, ANKRD20A3, CBWD3, and PGM5(Figure 3D). There was 8 alternative DNA contigs that exhibited different combinations of structural variations, often involving common subsets of rearrangement junctions – suggesting genetic heterogeneity. To visually represent the joint distributions of the segment junctions on the contigs, we created a circos plot, highlighting the most prevalent joint junctions: Chr9:62.81MB||Chr9:65.82MB - Chr9:65.67MB||Chr9:65.67MB, Chr9:62.81MB||Chr9:65.82MB - Chr9:63.45MB||Chr9:65.82MB, and Chr9:62.81MB|| Chr9:65.82MB - Chr9:62.90MB||Chr9:62.94MB (Figure 3E). Order of events analysis of this complex region indicated that the segment junctions Chr9:62.81MB||Chr9:65.82MB, Chr9:62.88MB||Chr9:62.95MB and Chr9:62.90MB||Chr9:62.94MB early event (Figure 3F), and Chr9:62.87MB||Chr9:62.96MB and Chr9_63.45MB||Chr9_65.82MB as the late event. The rearrangement junctions had different genomic and epigenomic contexts. Among the three early events, the majority are predominantly composed of strong chromatin within lamina-associated domains at the nuclear periphery. The immediate and late junctions occurred within different chromatin (strong open chromatin and weak open chromatin) and nuclear localization contexts (lamina-associated domains at the nuclear periphery and nuclear core) (Figure 3F). Differences between the rearrangement junctions in their allele frequencies and genomic contexts suggest that this CGR event likely arose from multiple independent rearrangement events sequentially over the course of tumorigenesis. We note that genomic rearrangements may affect chromatin domain organization, which in turn may affect DNA double strand breaks and repair pathway choices, such that etiology of such events may be complex. We identified 1116 CGRs in an initial genome-wide scan in our cohort; we refined the catalog based on the allele frequency of underlying rearrangements and overlap with high confidence genomic alterations as ‘major’. A total 216 major CGRs were found in the cohort, of which 124 CGRs had all rearrangement junctions mapped. The major CGRs were uncommon in the benign lesions (0-2 per sample), but prevalent in the carcinoma in situ (0-33 per sample), locally invasive tumors (20-24) and those with distant metastasis (6-45). Several known cancer genes such as U2AF1, IRF4, ZNF521, PDE4DIP, HIP1, and WWTR1 were near the CGR junctions in multiple samples, however the rearrangement patterns and junctions were sample specific. These genes also had frequent copy number alterations (2-11% cases) in the TCGA lung adenocarcinoma and squamous cell carcinoma cohorts21. Anyhow, based on the patterns of shared and unique rearrangement junctions, omcpIR established the order-of-events for 13 major CGRs. For the remaining cases, no clear order of precedence could be established reliably. It is possible that some of the latter may have arisen due to a single event or events that were not sufficiently separated temporally such that intermediate allelic states could be obscured by clonal sweep or low OM coverage at the junctions. (Epi)genomic contexts and DSB repair mechanisms Analyses of the topological complexity of the order-of events in the CGRs (Figure 4A) indicated a few common patterns. The numbers of relatively early and late events were comparable, and higher than those classified as intermediate (~22%). This could be partially attributed to the fact that intermediate events are only present in the CGRs with order >2. Annotations of genomic contexts indicated that 73% of all junctions in CGRs had at least one of the two junction end sequences in repeat regions (Figure 4B). Late events were marginally enriched for both junction end-sequences in the repeat regions. While most of the junction end-sequences were either both in repeat regions (62.9%), or both in non-repetitive regions (26.9%), early events were significantly more enriched (64.5%, p-value<0.05, Binomial test; Figure 4B) in heterologous junctions, where one of the two junction end-sequences was in the repeat regions. Chromatin context analysis indicates that a majority of the CGR rearrangement junctions are in D Example-2: S8 : chr9: 62.81Mb-68.23Mb Chr9_62.87MB||Chr9_62.96MBChr9_63.46MB||Chr9:65.82MBChr9:65.67MB||Chr9:65.67MBChr9_62.81MB||Chr9_68.23MBChr9_62.81MB||Chr9_65.67MBChr9_62.90MB||Chr9_62.94MBChr9:63.45MB||Chr9_65.82MB Chr9_62.81MB||Chr9_65.82MBE Chr9_62.81MB||Chr9_65.67MB **Chr9_62.81MBb||Chr9_68.23MB ** Chr9_62.81MB||Chr9_65.82MB ** **Chr9_65.67MB||Chr9_65.67MB **Chr9_62.90MBb||Chr9_62.94MB **Chr9_63.46MB||Chr9_65.82MB **Chr9_63.45MB||Chr9_65.82MB **Chr9_62.87MB||Chr9_62.96MB ** F Chr9_62.88MB||Chr9_62.95MB AExample-1: S20 : chr15: 23.13Mb-28.62Mb23.2Mb24Mb25.6Mb26.4Mb24.8Mb27.2Mb28Mb28.4Mb C Plausible persistent replication-dependent DNA breaks and chromoanasynthesis Junctions are within the same nuclear, chromatin and replication contextsGenomic DNA DNA breaks timeDNA break Chr15_23.13MB ||Chr15_23.13MBChr15_23.21MB||Chr15_28.56MB Chr15_28.42MB||Chr15_28.42MB Chr15_28.50MB||Chr15_28.62MB Chr15_23.59MB||Chr15_28.51MBChr15_23.36MB||Chr15_28.34MB ** ** ******** **Chr15_23.22MB||Chr15_23.22MB B ** Chromatin: Strong Euchromatin Weak Euchromatin Inter. Chromatin Weak Het. Chromatin Strong Het.chromatinReplication: Early Late NANuclear: Core Periphery NAContext: Repeat Exon Telomere Other** Figure 3: A) A representative intra-chromosomal complex rearrangement is shown from the sample S20, affecting genomic region chr15: 23.13Mb-28.62Mb. Genomic rearrangements and associated contigs are shown. B) Inferred order-of-events involved in the complex rearrangement. Associated nuclear, chromatin, replication and genomic contexts of the junctions are highlighted. All junctions have similar genomic and epigenomic contexts in this example. C) Replication factory crisis leading to chromoanasynthesis is being proposed as a likely mechanism to explain the genesis of this complex event. D) A representative intra-chromosomal complex rearrangement is shown from the sample S8, affecting genomic regions chr9: 62.81Mb-68.23Mb. E) Circos plot showing pairwise co-occurrence of rearrangement junctions on the same contigs. Widths of the arcs indicate optical mapping coverage. F) Inferred order-of-events involved in the complex rearrangement. (Epi)genomic contexts of the junctions are as above. 6 open chromatin, and both the junction end-sequences in open chromatin are the most common (open|open, 33.3%). Surprisingly, this was significantly more common than both the junction end-sequences in closed chromatin (closed|closed, 0.7%, p-value <0.05, Binomial test; Figure 4B). Taken together, repeat sequences in open chromatin may contribute towards a significant majority of rearrangement junctions in the CGRs. Subsequently, we investigated whether complex genomic rearrangements (CGRs) stem from heightened genomic fragility and instability within specific genomic loci. Notably, most junctions within each CGR event were unique (Figure 4C). Additionally, upon stratifying by disparate cancer stages, we observed that advanced cancer stage patients exhibited a higher prevalence of unique structural variation junctions, underscoring the complexity of CGRs (Figure 4D, Benign: 0.81±0.10; Malignant: 0.94±0.16, Mann-Whitney test p value=0.5067). Moreover, comparing contigs across different chromosomal levels (intrachromosome and interchromosome), we observed numerous CGR junctions incorporating at least one interchromosomal junction in the pre-filtered set, but this was not evident for the high confident set of major CGR events (intra-chromosomal: 4.80±3.70, interchromosomal: 7.41±5.47, Mann-Whitney test p value=3.86e-05). Further statistical analysis revealed the order of events for these junctions, indicating that parental junctions were notably complex and originated from both the same and different chromatin contexts, with a higher prevalence observed in different chromatin regions compared to identical ones (Figure 4E; Similar: 5.5±2.42; Dissimilar: 14±10, Mann-Whitney test p value= 0.1962). This trend persisted for child nodes (Figure 4F; Similar: 5.67±2.73; Dissimilar: 12.67±8.88, Mann-Whitney test p value= 0.1994) and when considering all contigs collectively (Figure 4G; Similar: 4.83±2.32; Dissimilar: 10.50±8.52, Mann-Whitney test p value= 0.1978), emphasizing the significance of accounting for different chromatin contexts in understanding CGR complexity. We examined whether the clustering patterns of breakpoints within the major CGRs could be an artifact of random overlapping events or arise by non-random, localized DNA double strand breaks due to genomic instability during tumorigenesis. For each sample, we randomly shuffled the rearrangement breakpoints within respective chromosomes preserving their chromatin, length, and genomic contexts to create simulated landscapes of genomic alterations expected by chance, repeated the simulation 103 times using a published approach23, and each time calculated the relative frequencies of occurrence of CGRs. We found that it was significantly unlikely to observe the CGRs (simulation p-value 2) rearrangement breakpoints per sample by chance at the frequency observed in our cohort, and even more unlikely to observe the CGRs with the level of complexity observed in the in situ and advanced tumors. We observed similar results by using different strategies to create genomic rearrangements and using alternative null models in the simulation (see Methods). We further analyzed scRNAseq data for pathway-level signatures, and observed that the advanced carcinomas (e.g. S17, S19, S20 etc) with extensive CGRs also had elevated signatures of DNA repair processes (Figure 4H). Although these pathway signatures are not CGR-specific, we note that active cell cycle and DNA break repair signatures in advanced carcinomas suggest continued DNA repair, which can contribute to generation of replication-dependent or independent simple and complex rearrangements, and ultimately persistent genomic instability. Structural inference of the CGRs allowed us to examine their likely etiologies considering the known mutagenesis and DNA repair mechanisms broadly classified as chromoanagenesis 6-8 . We assessed the possibilities by analyzing four attributes of the CGR events (i) intra and inter-chromosomal CGR events, (ii) order-of-events sequence, (iii) chromosomal location, and (iv) sequence contexts of the junction regions for this purpose (Figure 4I). We did not find any CGR event involving interconnected multiple inter- and intra-translocations and deletions, that are typical of the chromoplexy events. As noted earlier, most of the 216 major CGR events were intra-chromosomal and involved localized rearrangement events. None of those were within 5Mb of the telomere or had characteristic inversion and copy number profiles supporting break-fusion-bridge fusion (BFB). Among the CGRs with established order-of-events (n=13), parent and child junctions had clear allele frequency differences, which would rule out a single catastrophic event such as chromothrypsis7,24. A majority of these CGR events also showed branched architecture, often involving regions far away from the telomere, which makes exclusive BFB events unlikely as well. A vast majority of all rearrangement junctions in these CGRs was in low complexity or repeat sequences within gene-rich, transcriptionally active open chromatin domains, although we did not have base pair resolution information on the junction regions. Chromatin domain organization shapes replication timing25, such that the sequences within a chromatin domain typically replicate concurrently, and repetitive sequences permit replication-associated DNA break and repair processes - that is favorable for sustaining chromoanasynthesis-like mechanism7,26. Analysis of scRNAseq data, as above, indicated active replication associated DNA repair activities. Taken together, the genomic signatures C Figure 4. A) A schematic representation of order of event of complex genomic rearrangement and its genesis mechanism. B) Proportions of the rearrangements associated with CGRs that have similar or dissimilar genomic and epigenomic contexts at the junction end regions. C) Regression analysis showcasing the relationship between contigs cataloging complex structural variation junctions and the unique junctions identified. D) Distribution of unique complex structural variation junctions across benign and malignant (i.e. in situ, locally invasive, and distant) stages of carcinogenesis. Sequential portrayal of CGR events at the parent node (E), child node (F), and all nodes (G), encompassing both intra- and inter-chromosomal rearrangements alongside their chromatin annotations in Figure2D Similar: Contigs containing identical chromatin within intra-chromosomal regions, Dissimilar: Contigs containing disparate chromatin within intra-chromosomal regions. H) Differences in DNA repair pathway scores in the epithelial cells from different samples based on single cell RNAseq data. I) Evaluating the CGR events in light of known molecular mechanisms driving complex genomic alterations. R2=0.08, 95% CI [0.01, 0.18], n=124Unique junction frequency Dissimilar intra-chromosomal Similar intra-chromosomal AJP1JP2JP3JP4JP5JP6 C1C2C3 C4 contigs junctions JP1-JP2 JP4-JP5JP5-JP6JP3-JP6 JP4-JP6 parentchild Unique junction frequency D Benign Malignant Similar DissimilarSimilar DissimilarSimilar Dissimilar E F G Both junctions in open chromatinBoth junctions in closed chromatin B repeatsnon-repeats openother Both junctions in repeatsBoth junctions in non-repeatsother other 6040200percent 6040200percent Multiple localized DNA breaks over timeChromo-anasynthesisLikely prevalent Epithelial cells LS4LS5LS7LS9LS10LS11 LS13LS14LS15LS17LS19 HDNA repair pathway signature LS17 LS10 LS14 LS20LS19 LS19LS4 LS7LS11LS5LS20 LS9LS13 cohort CGR events ChromoplexyNot detected Circular inter-chromosomal translocations Closed circular inter-chromosomal rearrangementsNo Telomeric clustered Del/Dup/Inv Chromosome shattering and multiple DNA breaks in one catastrophic event No Order-of-events inferenceIndeterminateDetermined Repeated telomeric bridge and break Break-bridge-fusionNot detectedChromothripsisPossible Localized within the similar replication timing domainsGenomic and chromatin contexts of junctionsSimilar/dissimilarSimilarGenomic and chromatin contexts of junctionsYes I 7 of the CGRs, in aggregate, were supportive of chromoanasynthesis-like mechanisms, while did not find evidence for widespread occurrence of other chromoanagenesis mechanisms (Figure 4I). Thus, using the Principles of Ockham’s razor, we argue that chromoanasynthesis appears to be the predominant mechanism for the CGRs, especially those with inferred order of events, in our cohort (also see Discussion). Chromoanasynthesis is not exclusive and could co-operate with other DNA repair processes. Signatures of selection Genomic alterations can gain clonal prominence by selection or drift under neutral evolution (Figure 5A). Allele frequency spectrum analyses indicated that a vast majority of point mutations are under neutral evolution27, but genomic rearrangements typically have larger effects, although a similar analysis for genomic rearrangements based on short read sequencing technology is technically challenging. The direct profiling of ultra-long DNA molecules using OM allows direct inference of allele frequency at the rearrangement breakpoints enabling us to assess the clonal dominance and signatures of selection. Overall, allele frequencies of the major CGR breakpoints were higher in locally advanced and distant carcinomas compared to that in the in-situ tumors, and the differences persisted even after adjusting for tumor fraction. Since a minority of the breakpoints in any CGR are early, we hypothesized that it was likely driven by the allele frequency patterns of the late events, potentially due to selection on the secondary events instead of allelic dominance of the corresponding early events. To test that hypothesis, we compared the allele frequencies of the secondary events after normalizing them by the allele frequency of the primary events within the respective CGR and found that the normalized allele frequencies of the CGR breakpoints of the secondary events were significantly higher in locally advanced and distant carcinomas compared to that in the in-situ tumors (Figure 5B; p-value <0.05, Mann Whitney U test). Furthermore, the shape of the allele frequency distribution in the advanced tumors was skewed compared to that observed for the in-situ tumors, with an excess of proportionally higher allele frequency secondary events. Interpreting these observations in light of the site frequency distribution analysis, originally developed for the inference of selection on somatic point mutations27, we argued that the data suggests a relatively stronger positive selection on the secondary events in the advanced tumors. We integrated scRNAseq and OM data to further evaluate transcriptomic and phenotypic signatures associated with the late CGR events in light of known oncogenic driver genes, but the genomic resolution of the integrated dataset was typically limited to assess that conclusively. Nonetheless, several advanced tumors (e.g. S17) showed presence of transcriptionally distinct subclonal cell populations with higher proliferation scores and differences in hallmark pathway signatures indicating intra-tumor population dynamics, as reported by Venkat et al. 28. Anyhow, taken together, our data suggest that early events in the CGRs typically arise before the last clonal sweep, and positive selection on the secondary events promote their overall clonal prominence during the ongoing tumor evolution. Patterns of structural variations in different disease stages We compared the genome-wide patterns of simple and complex genomic rearrangements in the samples grouped according to their disease stages based on histopathological characteristics (Figure 5C-E). The burden of small structural variation events (typically 102-104bp; insertions, deletions, duplications, and inversions) increased in tumors compared to the non-malignant tissues, but the differences were marginal (Figure 5C). In contrast, copy number gain/loss and large-scale structural variations such as rearrangements within and between chromosomes were rare in precancerous tissues but orders of magnitude higher in tumors (Figure 5D). For instance, the normal or benign samples typically had less than 2 large copy number events, whereas advanced tumors had a notably higher number of events (e.g. 118 CNVs in S8, an in-situ tumor). Advanced tumors also exhibited significantly higher number of intra-/inter- chromosome translocation. Notably, we detected a substantial number of 257 intra-chromosomal fusions, primarily within chromosome 8, in the in-situ sample (S14). Somatic complex rearrangement events are more frequent in samples grouped by carcinogenesis stages, with a notable increase in prevalence in advanced disease stages (Figure 5E). These observations provide evidence for increased genomic disorganization at multiple levels with carcinogenesis. Progressive genomic disorganization with disease stages in pseudotime While some classes of genomic alterations were common in both precancerous and cancerous genomes, others arose predominantly in malignant stages. Lasso regression and Random Forest feature importance scores indicated that intra-/inter-chromosome translocation, copy number variations, local clustered SVs and complex genomic rearrangements were the most significant factors that differentiated non-malignant and malignant samples. Complex Complex rearrangementsInversionCNV gainCNV_lossDeletionDuplicationInsertionIntra-chromosomal fusionInter-chromosomal translocationlocal cluster SVs C D E F G Figure 5: A) A schematic representation of allele frequency change from two possible selection pressure. (B) Positive selection on cells carrying mutations result in an allele frequency distribution with an excess of high allele frequency variants. Normalized variation allele frequency for contigs observed in complex genomic rearrangements at various stages of carcinogenesis show significant increases in invasive and distant tumors. P value calculated by Kolmogorov-Smirnov test. C) Frequency of somatic genomic deletions, duplications, insertions, and inversions in the samples grouped by the stages of carcinogenesis. The frequency distribution shows modest increase with advanced disease stages. Y axis is in the log scale. D) Frequency of somatic intra-chromosomal fusions, inter-chromosomal translocations, CNV gain and losses in the samples grouped by the stages of carcinogenesis. The frequency distribution shows prevalence in normal tissues or benign lesions, but substantial increase starting with carcinoma in situ. Y axis is in the log scale. E) Frequency of somatic complex rearrangement events in the samples grouped by the stages of carcinogenesis, also shows greater prevalence with advanced disease stages. F) Random forest and Lasso feature importance for the different classes of genomic alterations in distinguishing pre-cancerous and cancerous samples. G) Pseudotime plot showing the trajectory of decline in genome stability and organization along the clinical stages of carcinogenesis, as measured by increased burden of different classes of genomic alterations. Complexrearrangement CNV_gainComplex rearrangementsIntra-chromosomal fusionCNV_lossInter-chromosome translocationRearrangement clustersDeletionInversionInsertionDuplication0 0.5 1.0 1.5MeanDecreaseGini Coefficients- 0.02 -0.01 0 0.01 0.02 0.03 0.14 0.018 0.0025 0.00034 -2 -4 -6 -8 Lambda Log Lambda BenignNormal Normal Benign Benign BenignBenignIn situIn situIn situIn situ In situIn situ In situIn situ Distant Distant In situInvasiveInvasiveRearranged genomemalignant/aggressive benign/normal A p-value = 0.0031p-value = 0.0007 Neutral evolutiont tPositive selection Shift in allele frequency distribution Allele freq. range0.01.0 Relative density B 8 genomic rearrangement had the highest coefficient in the LASSO analysis (Figure 5F), indicating that accumulation of complex genomic rearrangements is an important aspect of malignant disease progression. We projected the samples, which ranged from normal lung tissues to advanced lung tumors, on the UMAP space based on the features with high coefficients in the LASSO analyses to assess the extent of variation among the samples in terms of genomic disorganization and compare that with the clinical stages of disease progression (Figure 5G). There were two predominant groups, one where the normal lung tissues and benign lesions were clustered together, and the other where the locally advanced and metastatic tumors were grouped together; the latter group showed greater sample-to-sample variation indicating high inter-tumor genetic heterogeneity in advanced tumors. In situ carcinomas were the most diverse, while a subset of those were positioned with benign and normal tissues, a majority clustered with advanced malignancies, while others connected the two clusters. Thus, the UMAP projection based on the structural variation features also recapitulated the clinical stages of disease progression. A trace of the non-linear fit through the samples represented the trajectory of progressive genomic disorganization (Figure 5G). Since normal tissues have mostly diploid stable genomes, and tumor genomes are highly rearranged, especially in advanced NSCLC, the path along the trajectory can be interpreted as a measure of progressive decay of genomic stability and organization in pseudotime. This analysis indicated that the velocity of decline of genomic stability varies with disease stages. While somatic genome evolution is a continuous process and many benign lesions progress to malignancies, these results suggest that the decline in genomic organization and stability may increase after malignant transformation. DISCUSSION In summary, using deep-coverage optical mapping technology to profile the genomic landscapes of normal lung tissues, benign hyperplasia, dysplasia, carcinoma in situ, and advanced carcinomas we found that large copy-number alterations, chromosome translocations, local clustered SVs and CGRs, especially inter-chromosomal complex rearrangements were uncommon in premalignant tissues but became prevalent in carcinomas. omcplR, developed here, allowed us to resolve the architecture of complex genomic rearrangements using genome-graph concept with OM data and predict their genesis. The early rearrangement junctions in CGRs often localized within the same chromatin domains, but later events had more diverse contexts. It is known that replication timing closely follows chromatin states25, and topologically associated domains tend to replicate in foci called replication factories. Replication stress in the replication factories are frequently associated with DNA double strand breaks, leading us to speculate that replication factory crises may contribute to the genesis of many CGR events, especially those in the topologically associated domains with similar chromatin contexts. A substantial number of CGRs had multiple early events, which were joined by intermediate or late events. It is possible that genomic and epigenomic instability arising from early events may trigger downstream events. Genomic rearrangements can drive epigenetic and chromatin changes in the proximal regions29, such that one may argue30 that early genomic rearrangements may trigger downstream epigenetic changes and thereby plausibly redistribute the risk31 of subsequent genomic alteration events in the contexts of CGRs. We found that advanced tumors had more complex contigs and more inter-chromosomal events, which is consistent with this hypothesis, but further work needs to be done to establish causal inferences. Comparative assessment of data from the OM platform with short read sequencing and evolutionary simulations provided a balanced perspective on the potentials and limitations of our approach to characterize CGRs. Short read sequence technology is excellent at calling single nucleotide variants, copy number events, and small indels; but it is usually insufficient for calling rearrangement junctions in repetitive regions or detecting connected genomic alteration events occurring on the same allelic sequence, especially if such events are kb-Mb apart. and fail to fully resolve allelic heterogeneity of such events, where OM offers clear advantages. OM was particularly suited for our objective of characterization of complex, intertwined genomic rearrangement events. Reference genome assembly versions can potentially have consequences for the OM rearrangement calls, but the rearrangement junctions in the major CGR events in our cohort were usually unique. Ultimately, a hybrid approach utilizing both technologies may help resolve the complete genome architecture and provide a reasonably complete landscape of genomic alterations at molecular resolution. Nonetheless, inference on the CGR architectural helped establish their etiologies. We found that when the underlying events along a clonal lineage are sufficiently separated in time, the order inference is robust, but it is not trivial to distinguish the events that are temporally clustered from those that arose in a single catastrophic event given negligible variation in allele frequencies, considering noise in a realistic dataset. Anyhow, from the 9 perspective of somatic evolution, those events likely have similar consequences on the population dynamics downstream. Several different mechanisms such as chromoanasynthesis, chromothripsis, chromoplexy, and break-bridge fusion have been discussed in the context of complex genomic alterations - these are collectively termed as chromoanagenesis 6-8. Chromothrypsis24 is a catastrophic event involving shattering of a chromosome followed by a random restitching of chromosomal fragments, resulting in the formation of clusters of complex genomic rearrangements involving intra-chromosomal translocations and deletions; many loci affected by chromothrypsis show genomic stability in subsequent rounds of cell divisions. Therefore, most of the rearrangement junctions should be localized and have similar ploidy-adjusted allele frequencies. Nonhomologous end-joining (NHEJ) is the primary repair mechanism for chromothrypsis7,24. In contrast, chromoanasynthesis7,26, a replication-based complex rearrangement process, is dependent on context-dependent replication stress, which is persistent over time, leading to recurrent DNA breaks and different classes of rearrangements over multiple cell divisions. Therefore, many rearrangement junctions involved in such processes should have parent-child relationships. Replication-transcription collision leading to replication fork stalling, and subsequent error-prone replication rescue mechanisms such as fork-stalling and template switching (FosTes) and Microhomology-mediated break-induced replication (MMBIR) 32, which involves local sequence homologies, are suspected to be key mechanisms 8,26,32. Although a lack of base-pair resolution data for the junction regions did not allow us to assess the involvement of HR, NHEJ, or MMBIR, we note that most of the CGR junctions were in repeat sequences, within gene-rich, transcriptionally active open chromatin regions that are permissive of chromoanasynthesis-type processes. Furthermore, allele frequency differences between parent and child junctions within the same CGRs suggest that those would be unlikely due to a single catastrophic event such as chromothrypsis. Break-fusion-bridge events7 arise during mitotic process characterized by a linear sequence of rearrangement events involving inversions, duplications, and deletions in telomere-proximal regions, but the major CGR events in the filtered catalog in our cohort were not in the telomeric regions and did not have the characteristic inversion and copy number profiles. Likewise, some CGRs with indeterminate order-of-events could potentially be due to a single mutagenesis event or multiple distinct events. We did not find any CGR event in our cohort that fit the classic definition of chromoplexy7,8,33, a “closed-chain” process characterized by interconnected multiple inter- and intra-translocations and deletions arising from DSBs with precise junctions. Therefore, we suspect that chromoanasynthesis is probably the predominant mechanism driving the CGRs, especially those with clear order-of-events, in our cohort, notwithstanding the fact that in some cases it could co-occur with other DNA repair mechanisms 7. Ordering the genomic landscapes of normal, precancerous lesions, and carcinomas in pseudotime suggested a transition in genomic disorganization and instability with the onset of malignancy. While some classes of genomic alterations were common in both precancerous and cancerous genomes, others arose predominantly in malignant stages. Lasso regression and Random Forest feature importance scores indicated that intra-/inter-chromosome translocation, copy number variations, local clustered SVs and complex genomic rearrangements were the most significant factors that differentiated non-malignant and malignant samples. This analysis indicated that the velocity of decline of genomic stability varies with disease stages. The rate of genomic aging, as manifested by the extent of genomic disorganization in pseudotime, of different tumors may be different, and may change further with the accumulation of additional driver events during tumorigenesis. While somatic genome evolution is a continuous process and many benign lesions progress to malignancies, these results suggest that the decline in genomic organization and instability may increase after malignant transformation. This might be related to the overcoming of replicative senescence with the onset of malignancy. In situ tumors show substantial variation in their extent of genomic disorganization and instability. Given the inherent stochasticity in accumulation of driver genetic alterations and dynamic complexity of tumor-microenvironment interactions, the evolutionary trajectories of such tumors may not follow a deterministic path, underscoring the clinical challenges in stratifying the patients. Progression from premalignant benign lesions to malignant carcinoma involves genomic changes in tumor cells, and concurrent disruption of the coordination of interactions among different cell types typical in the normal tissues 13,14,34. In a synergistic study, Venkat et al. 28 examined the changes in the transcriptome of immune, stromal, and other cells in the tumor microenvironment in these tumors. They found changes in the transcriptome of tumor and non-tumor cells that not only promote oncogenic pathways but likely also perturb the balance of inter-cellular interactions and ultimately complex tissue-level processes. Genomic rearrangements may change copy number and regulation of 10 many genes, and also higher order, chromatin domain-level organization beyond their specific genomic contexts 30, providing substrates for such changes directly or indirectly. On the other hand, tissue-level changes such as inflammation may influence mutagenesis and DNA repair processes 34-36. Hence, there might be a complex crosstalk between the progressive deterioration of genome organization and disruption of normal patterns of tissue-level cell-cell interactions promoting cancer hallmarks during tumorigenesis in lung, such that considering both aspects may provide complementary and synergistic overview of tumor progression. METHODS Cohort selection and biospecimen preparation The cohort included pathologically normal lung tissues (n=2), benign pulmonary lesions (n=7), carcinoma in situ (n=4), locally invasive carcinoma (n=2) and metastatic lung carcinoma (n=3) samples (Fig. 1A, Table 1), obtained from deidentified donors under an IRB approved protocol. The tumors were typically resected during lobotomy or wedge resection. Tumor adjacent pathologically normal tissues were procured from the cancer patients, while the benign pulmonary lesion samples were collected from preventive surgeries. Stains for infectious agents were negative. Optical genome mapping using BioNano platform For each sample, a minimum of 0.12g frozen tissue sample was used for extraction and purification of ultra-high molecular weight genomic DNA (UHMW gDNA) following manufacturer’s protocols (Bionano, San Diego, USA). Briefly, frozen tissue was homogenized in buffer, and coarse homogenate was filtered, with resulting suspension washed and pelleted (2200 ×g for 5 minutes), and then enzymatically treated to release gDNA. 750 ng of purified high molecular weight DNA were labeled using Direct Label Enzyme 1 (DLE-1) to conjugate DL-green fluorophores at the occurrence of a 6bp motif. After clean-up of the excess of DL-green fluorophores and rapid digestion of the remaining DLE-1 enzyme by proteinase K, DNA was counterstained overnight. Labeled UHMW gDNA solution (4-16ng/uL) was loaded on Saphyr Chips (Bionano, San Diego, USA) for linearization into nanochannel arrays and imaging in the Saphyr System. Images were digitized into whole-genome optical mapping (OM) molecule and label information in real time, collecting to a minimum throughput target of 1500 Gbp. All samples were evaluated based on molecule size, labeling efficiency, and map rate and coverage of the GRCh38 reference. Rare Variant analysis and Structural Variant calling Initial QC, genome alignment, and structural variation detection and copy number calling were performed using the BioNano pipeline (BioNano Solve v3.7.1)16. Briefly, the OM molecules of a given dataset were first aligned against GRCh38. OM molecules mapping to the same genomic regions were assembled into consensus maps. OM molecule clusters not fully aligned with the reference diploid genome were identified as the candidates for capturing putative structural variants (SVs); those were assembled locally into consensus maps and realigned to the reference to generate refined and annotated SV calls, which are queried against an OGM-specific database of 179 phenotypically healthy control individuals. In parallel, a genome-wide copy number profile was also calculated based on depth of coverage, followed by normalization against control datasets and segmentation to identify genomic regions of copy number gain or loss, and aneusomies. For each SV and copy number variant (CNV) call, confidence scores were calculated. Variants occurring in masked regions of GRCh38 were not considered for interpretation and reporting. Circos plots representing different classes of genomic alterations were prepared using Bionano Access v1.7.116. software. Only the SVs with high allele frequency were shown in Fig. 1 for simplicity of visualization, although the downstream analyses considered all high confidence events including those present at low allele frequency. Short read sequencing and cross platform comparison Genomic DNA was extracted from 4 samples – 2 advanced lung adenocarcinoma and 2 pathologically normal tissues, and whole genome sequencing was performed using 150bp PE sequencing on an Illumina HiSeq X platform (~30X for tumor samples, and ~12X for the non-malignant samples, respectively). PE reads were aligned to GRCh38Decoy build of human with BWA-MEM (v0.7.13)37, after removing sequencing adaptors and duplicates using fastp (v.0.21.0)38. Single base substitutions (SBS) and small insertions and deletions (InDels) were identified using Strelka (v2.9.2)39. Variants were filtered for QSS < 40(SNVs) or QSI <40 (Indels) and having at least 20× coverage in tumor sample and 10x coverage for matched normal samples using bcftools (v1.940), and at least four reads supporting the variant. The resulting variants were annotated with ANNOVAR41, after excluding those overlapping with centromeres and segmental duplications (genomicSuperDups) and other variants in the dbSNPv150 and 1000 Genomes Project42. Alongside, Manta (v1.6.0)43 was employed to detect structural variations (SVs) using the default 11 parameters, followed by the extraction of the final SVs marked as 'PASS'. SV calls were compared between the BioNano OM and Illumina short read technology platforms for the 2 samples. We conducted an analysis to determine the degree of overlap between both junction ends within a 100kb range for high-confidence SVs of the same class. Detection of genomic regions with clusters of SVs We filtered out reported SVs (primarily insertions and deletions) that were shorter than 1kb. In addition, in some cases, the same SV may be represented by multiple OM molecule clusters with minor differences at the termini. To reduce such redundancy, we identified all analogous SVs that satisfy two criteria: (i) if both junction ends of the SVs fall within 5kb of one another, and (ii) the SVs are of the same type; in such cases we merged those into a single consensus SV to prepare the non-redundant list of structural variations in the somatic genomes. SV clusters are ³ 4 SVs that overlap with one another or have rearrangement junctions within 10kb distance. We analyzed the filtered set of SVs from the OM data, to identify such SV clusters, which were further examined to identify complex events. OM recorded structural variations using contigs, ensuring that each contig contains at least one structural variation. Additionally, the orientation and reference position of these structural variations are recorded. This enables us to infer the group of complex structural variations (complex rearrangement events) along with their orientation information. Prior to conducting any analysis, we performed a filtering step to exclude contigs with low confidence or potential false positives in recording structural variations. Contigs labeled as "_common" or "_segdupe", as well as those with a variant allele frequency (VAF) below 0.2, were removed. Subsequently, we retained the high quality contigs. OmcplR pipeline We developed omcplR package to identify the complex rearrangement events in the somatic genomes. omcplr first segment the genome into equal sized non-overlapping bins (G, default: 10kb), and track the OM contigs that capturing SVs and SV junctions that map to those bins. The genomic bins harboring multiple contigs (C, default: ³4) are selected as initial seeds for complex rearrangement detection; typically one of those contigs represent the reference allele, while other contigs may harbor one or more genomic rearrangements. The non-reference contigs in the seed region and SV rearrangement junctions therein are retained as the initial set. Next, each contig in that set is iteratively scanned to identify additional rearrangement junctions therein and their corresponding genomic bins, which are included in the set. The process is repeated until no new non-reference contig or genomic bin is identified for a given seed, and the resulting set of contigs that capturing SVs therein denote a complex rearrangement event. A complex event may be intra- or inter-chromosomal. In order to detect the complex rearrangement events within large structural variation events (such as copy number gain/loss and intra/inter-chromosomal structural variations) across various cancer stages, we employed the Grange function in R to overlap the initially identified complex rearrangement events with these regions. Subsequently, we tallied the retained complex rearrangement events. A complex event with ! contigs and " genomic rearrangement junctions therein can be represented as a "×! matrix to show the contig-junction relationship. Each element of the matrix is populated by the effective coverage of the OM traces of rearrangement junctions in the respective contigs. omcplR uses a Bayesian approach to determine the likely order-of -events within a complex rearrangement. It first identifies pairwise co-occurrence between the rearrangement junctions, before determining conditional relationships between the co-occurring pairs to predict potential order-of-event, and finally performing global optimization to finalize the relationship among pairs involved in a complex event using constraint-based algorithm (PC-stable). Each complex event is graphically represented as a Bayesian network, where the direct causal relationship junctions are connected using edges; among them, those with direct causal relationships are connected using arrows instead. Pearson's Correlation statistic was used to calculate the significance of the connection, and p-value < 0.05 was considered significant difference. Functional annotation We annotated the genomic alterations and affected regions based on (i) genomic features––whole genes, exons, repeats, and telomeres, (ii) chromatin features – strong euchromatin, weak euchromatin, intermediate chromatin, weak heterochromatin, strong heterochromatin, (iii) nuclear localization––inter- lamina regions at the nuclear interior, and lamina- associated regions at the nuclear periphery, and (iv) replication timing based on the human reference genome hg38- using a published approach44. We employed the TxDb.Hsapiens.UCSC.hg38.knownGene45 R function to retrieve geneIDs, and used the org.Hs.eg.db package to convert the geneIDs into gene symbols. Analysis of transcriptomic data 12 We obtained processed scRNAseq data from Venkat et al.28 who used 10X Chromium platform to perform single nuclei sequencing to characterize cellular transcriptome, and used that to infer cell type composition, cellular processes, and intercellular interactions of lung tissue samples across different stages of disease progression. This dataset was available for S3-S5, S7, S9-S11, S13-S17, and S19-S20; OM and scRNAseq were performed on different tissue core specimens from the same tumors. Pathway activities were estimated using approaches as described in Venkat et al.28. Data on scRNAseq-guided copy number estimates and clonal inferences for advanced carcinomas were obtained from this study as well. Evolutionary analysis and signatures of selection To generate the expected genomic landscape of rearrangements under the null model, we shuffled the genomic rearrangements 103 times preserving their chromosomes, lengths, genomic and chromatin contexts using an approach proposed previously23. For inter-chromosomal translocations, we shuffled the pairs of endpoints after preserving their chromosomal assignments and genomic and chromatin contexts. We identified the CGRs using the criteria described above during simulation and analyzed their attributes to compared against the observed CGRs in the in-situ, invasive, and distant tumors. For each CGR, we designated those with the highest allele frequency (f) as the primary events (P), and others as the secondary events (S), and normalized the allele frequencies of the latter as: f’(S)=f(S)/f(P). We used the site frequency spectrum approach proposed by Williams et al.27 to test for neutrality for the secondary events by comparing the observed allele frequency distributions for respective CGRs against that expected under a neutral evolution model. Feature importance We used LASSO (least absolute shrinkage and selection operator) a regression-based method to identify the structural variations and genomic alterations whose prevalence distinguish between malignant and non-malignant samples in the cohort. We complemented it by using Random Forest using randomforest R package and calculating feature importance scores. Evolutionary trajectory analysis We projected the samples on the UMAP or PCA space based on selected structural variation features, which had high coefficients in the LASSO analyses. SCORPIUS46 was used to draw the non-linear fit on the UMAP projection of the samples. The inflection point in the plot is indicated with an arrow. FIGURES Figure 1: A) Outline of the study design profiling genomic landscapes of pathologically normal lung tissues, benign dysplastic and hyperplastic lesions, and malignant lung carcinoma samples. B) Circos plots showing somatic genomic structural alterations and rearrangements in the samples in the cohort, grouped according to the disease stages. C) Representative genomic alteration in a sample (S18) is shown at high resolution, and corresponding variation in sequencing coverage in Illumina short read sequencing of the tumor is highlighted (tumor: orange, paired normal: blue). The genomic resolution of optical mapping and short read sequencing are different. The affected and adjoining genomic regions and gene annotations therein are shown below. Figure 2: A) A schematic representation of the computational pipeline to identify complex genomic rearrangements and infer underlying order of events. B) Benchmarking using simulated datasets where the order of rearrangement events are known in advance. Figure 3: A) A representative intra-chromosomal complex rearrangement is shown from the sample S20, affecting genomic region chr15: 23.13Mb-28.62Mb. Genomic rearrangements and associated contigs are shown. B) Inferred order-of-events involved in the complex rearrangement. Associated nuclear, chromatin, replication and genomic contexts of the junctions are highlighted. All junctions have similar genomic and epigenomic contexts in this example. C) Replication factory crisis leading to chromoanasynthesis is being proposed as a likely mechanism to explain the genesis of this complex event. D) A representative intra-chromosomal complex rearrangement is shown from the sample S8, affecting genomic regions chr9: 62.81Mb-68.23Mb. E) Circos plot showing pairwise co-occurrence of rearrangement junctions on the same contigs. Widths of the arcs indicate optical mapping coverage. 13 F) Inferred order-of-events involved in the complex rearrangement. (Epi)genomic contexts of the junctions are as above. Figure 4. A) A schematic representation of order of event of complex genomic rearrangement and its genesis mechanism. B) Proportions of the rearrangements associated with CGRs that have similar or dissimilar genomic and epigenomic contexts at the junction end regions. C) Regression analysis showcasing the relationship between contigs cataloging complex structural variation junctions and the unique junctions identified. D) Distribution of unique complex structural variation junctions across benign and malignant (i.e. in situ, locally invasive, and distant) stages of carcinogenesis. Sequential portrayal of CGR events at the parent node (E), child node (F), and all nodes (G), encompassing both intra- and inter-chromosomal rearrangements alongside their chromatin annotations in Figure2D Similar: Contigs containing identical chromatin within intra-chromosomal regions, Dissimilar: Contigs containing disparate chromatin within intra-chromosomal regions. H) Differences in DNA repair pathway scores in the epithelial cells from different samples based on single cell RNAseq data. I) Evaluating the CGR events in light of known molecular mechanisms driving complex genomic alterations. Figure 5: A) A schematic representation of allele frequency change from two possible selection pressure. (B) Positive selection on cells carrying mutations result in an allele frequency distribution with an excess of high allele frequency variants. Normalized variation allele frequency for contigs observed in complex genomic rearrangements at various stages of carcinogenesis show significant increases in invasive and distant tumors. P value calculated by Kolmogorov-Smirnov test. C) Frequency of somatic genomic deletions, duplications, insertions, and inversions in the samples grouped by the stages of carcinogenesis. The frequency distribution shows modest increase with advanced disease stages. Y axis is in the log scale. D) Frequency of somatic intra-chromosomal fusions, inter-chromosomal translocations, CNV gain and losses in the samples grouped by the stages of carcinogenesis. The frequency distribution shows prevalence in normal tissues or benign lesions, but substantial increase starting with carcinoma in situ. Y axis is in the log scale. E) Frequency of somatic complex rearrangement events in the samples grouped by the stages of carcinogenesis, also shows greater prevalence with advanced disease stages. F) Random forest and Lasso feature importance for the different classes of genomic alterations in distinguishing pre-cancerous and cancerous samples. G) Pseudotime plot showing the trajectory of decline in genome stability and organization along the clinical stages of carcinogenesis, as measured by increased burden of different classes of genomic alterations. REFERENCES 1 Stratton, M. R., Campbell, P. J. & Futreal, P. A. The cancer genome. Nature 458, 719-724 (2009). https://doi.org:10.1038/nature07943 2 Alexandrov, L. B. et al. The repertoire of mutational signatures in human cancer. Nature 578, 94-101 (2020). https://doi.org:10.1038/s41586-020-1943-3 3 Li, Y. et al. Patterns of somatic structural variation in human cancer genomes. Nature 578, 112-121 (2020). https://doi.org:10.1038/s41586-019-1913-9 4 Baker, T. M., Waise, S., Tarabichi, M. & Van Loo, P. Aneuploidy and complex genomic rearrangements in cancer evolution. Nat Cancer 5, 228-239 (2024). https://doi.org:10.1038/s43018-023-00711-y 5 Tomasetti, C., Vogelstein, B. & Parmigiani, G. Half or more of the somatic mutations in cancers of self-renewing tissues originate prior to tumor initiation. Proc Natl Acad Sci U S A 110, 1999-2004 (2013). https://doi.org:10.1073/pnas.1221068110 6 Holland, A. J. & Cleveland, D. W. Chromoanagenesis and cancer: mechanisms and consequences of localized, complex chromosomal rearrangements. Nat Med 18, 1630-1638 (2012). https://doi.org:10.1038/nm.2988 7 Krupina, K., Goginashvili, A. & Cleveland, D. W. Scrambling the genome in cancer: causes and consequences of complex chromosome rearrangements. Nat Rev Genet 25, 196-210 (2024). https://doi.org:10.1038/s41576-023-00663-0 8 Pellestor, F., Gaillard, J. B., Schneider, A., Puechberty, J. & Gatinois, V. Chromoanagenesis, the mechanisms of a genomic chaos. Semin Cell Dev Biol 123, 90-99 (2022). https://doi.org:10.1016/j.semcdb.2021.01.004 9 Tanaka, H. & Watanabe, T. Mechanisms Underlying Recurrent Genomic Amplification in Human Cancers. Trends Cancer 6, 462-477 (2020). https://doi.org:10.1016/j.trecan.2020.02.019 10 Turner, K. M. et al. Extrachromosomal oncogene amplification drives tumour evolution and genetic heterogeneity. Nature 543, 122-125 (2017). https://doi.org:10.1038/nature21356 14 11 Ahsan, M. U., Liu, Q., Perdomo, J. E., Fang, L. & Wang, K. A survey of algorithms for the detection of genomic structural variants from long-read sequencing data. Nat Methods 20, 1143-1158 (2023). https://doi.org:10.1038/s41592-023-01932-w 12 Srivastava, S., Wagner, P. D., Hughes, S. K. & Ghosh, S. PreCancer Atlas: Present and Future. Cancer Prev Res (Phila) 16, 379-384 (2023). https://doi.org:10.1158/1940-6207.CAPR-22-0435 13 Curtius, K., Wright, N. A. & Graham, T. A. An evolutionary perspective on field cancerization. Nat Rev Cancer 18, 19-32 (2018). https://doi.org:10.1038/nrc.2017.102 14 Arrigoni, M. G., Woolner, L. B., Bernatz, P. E., Miller, W. E. & Fontana, R. S. Benign tumors of the lung. A ten-year surgical experience. J Thorac Cardiovasc Surg 60, 589-599 (1970). 15 Han, G. et al. An atlas of epithelial cell states and plasticity in lung adenocarcinoma. Nature 627, 656-663 (2024). https://doi.org:10.1038/s41586-024-07113-9 16 Chan, S. et al. Structural Variation Detection and Analysis Using Bionano Optical Mapping. Methods Mol Biol 1833, 193-203 (2018). https://doi.org:10.1007/978-1-4939-8666-8_16 17 Consortium, I. T. P.-C. A. o. W. G. Pan-cancer analysis of whole genomes. Nature 578, 82-93 (2020). https://doi.org:10.1038/s41586-020-1969-6 18 Xia, L. et al. Multiplatform discovery and regulatory function analysis of structural variations in non-small cell lung carcinoma. Cell Rep 36, 109660 (2021). https://doi.org:10.1016/j.celrep.2021.109660 19 Eisfeldt, J. et al. Comprehensive structural variation genome map of individuals carrying complex chromosomal rearrangements. PLoS Genet 15, e1007858 (2019). https://doi.org:10.1371/journal.pgen.1007858 20 Frankell, A. M. et al. The evolution of lung cancer and impact of subclonal selection in TRACERx. Nature 616, 525-533 (2023). https://doi.org:10.1038/s41586-023-05783-5 21 Kadota, K. et al. KRAS Mutation Is a Significant Prognostic Factor in Early-stage Lung Adenocarcinoma. Am J Surg Pathol 40, 1579-1590 (2016). https://doi.org:10.1097/PAS.0000000000000744 22 Gaillard, H., Garcia-Muse, T. & Aguilera, A. Replication stress and cancer. Nat Rev Cancer 15, 276-289 (2015). https://doi.org:10.1038/nrc3916 23 Pedersen, B. S. & De, S. Loss of heterozygosity preferentially occurs in early replicating regions in cancer genomes. Nucleic Acids Res 41, 7615-7624 (2013). https://doi.org:10.1093/nar/gkt552 24 Stephens, P. J. et al. Massive genomic rearrangement acquired in a single catastrophic event during cancer development. Cell 144, 27-40 (2011). https://doi.org:10.1016/j.cell.2010.11.055 25 Gindin, Y., Valenzuela, M. S., Aladjem, M. I., Meltzer, P. S. & Bilke, S. A chromatin structure-based model accurately predicts DNA replication timing in human cells. Mol Syst Biol 10, 722 (2014). https://doi.org:10.1002/msb.134859 26 Liu, P. et al. Chromosome catastrophes involve replication mechanisms generating complex genomic rearrangements. Cell 146, 889-903 (2011). https://doi.org:10.1016/j.cell.2011.07.042 27 Williams, M. J. et al. Quantification of subclonal selection in cancer from bulk sequencing data. Nat Genet 50, 895-903 (2018). https://doi.org:10.1038/s41588-018-0128-6 28 Venkat, V. et al. Disruptive changes in tissue microenvironment prime oncogenic processes at different stages of carcinogenesis in lung. bioRxiv, 1-20 (2024). 29 Akdemir, K. C. et al. Disruption of chromatin folding domains by somatic genomic rearrangements in human cancer. Nat Genet 52, 294-305 (2020). https://doi.org:10.1038/s41588-019-0564-y 30 Hu, X. & De, S. Redistribution of mutation risk in cancer. Nat Cancer 5, 216-217 (2024). https://doi.org:10.1038/s43018-024-00728-x 31 Salvadores, M. & Supek, F. Cell cycle gene alterations associate with a redistribution of mutation risk across chromosomal domains in human cancers. Nat Cancer 5, 330-346 (2024). https://doi.org:10.1038/s43018-023-00707-8 32 Zhang, F. et al. The DNA replication FoSTeS/MMBIR mechanism can generate genomic, genic and exonic complex rearrangements in humans. Nat Genet 41, 849-853 (2009). https://doi.org:10.1038/ng.399 33 Baca, S. C. et al. Punctuated evolution of prostate cancer genomes. Cell 153, 666-677 (2013). https://doi.org:10.1016/j.cell.2013.03.021 34 Zhang, S. et al. Tumor initiation and early tumorigenesis: molecular mechanisms and interventional targets. Signal Transduct Target Ther 9, 149 (2024). https://doi.org:10.1038/s41392-024-01848-7 35 de Visser, K. E. & Joyce, J. A. The evolving tumor microenvironment: From cancer initiation to metastatic outgrowth. Cancer Cell 41, 374-403 (2023). https://doi.org:10.1016/j.ccell.2023.02.016 36 Hanahan, D. Hallmarks of Cancer: New Dimensions. Cancer Discov 12, 31-46 (2022). https://doi.org:10.1158/2159-8290.CD-21-1059 37 Li, H. Minimap and miniasm: fast mapping and de novo assembly for noisy long sequences. Bioinformatics 32, 2103-2110 (2016). https://doi.org:10.1093/bioinformatics/btw152 38 Chen, S., Zhou, Y., Chen, Y. & Gu, J. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics 34, i884-i890 (2018). https://doi.org:10.1093/bioinformatics/bty560 39 Saunders, C. T. et al. Strelka: accurate somatic small-variant calling from sequenced tumor-normal sample pairs. Bioinformatics 28, 1811-1817 (2012). https://doi.org:10.1093/bioinformatics/bts271 15 40 Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078-2079 (2009). https://doi.org:10.1093/bioinformatics/btp352 41 Wang, K., Li, M. & Hakonarson, H. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res 38, e164 (2010). https://doi.org:10.1093/nar/gkq603 42 Genomes Project, C. et al. A global reference for human genetic variation. Nature 526, 68-74 (2015). https://doi.org:10.1038/nature15393 43 Chen, X. et al. Manta: rapid detection of structural variants and indels for germline and cancer sequencing applications. Bioinformatics 32, 1220-1222 (2016). https://doi.org:10.1093/bioinformatics/btv710 44 Smith, K. S., Liu, L. L., Ganesan, S., Michor, F. & De, S. Nuclear topology modulates the mutational landscapes of cancer genomes. Nat Struct Mol Biol 24, 1000-1006 (2017). https://doi.org:10.1038/nsmb.3474 45 Haeussler, M. et al. The UCSC Genome Browser database: 2019 update. Nucleic Acids Res 47, D853-D858 (2019). https://doi.org:10.1093/nar/gky1095 46 Saelens, W., Cannoodt, R., Todorov, H. & Saeys, Y. A comparison of single-cell trajectory inference methods. Nat Biotechnol 37, 547-554 (2019). https://doi.org:10.1038/s41587-019-0071-9

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

Ask this paper AI returns verbatim quotes from the full text · source: oa-pdf

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2024) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc
last seen: 2026-05-20T01:45:00.602351+00:00
unpaywall
last seen: 2026-05-23T02:00:01.238055+00:00
License: CC-BY-NC-ND-4.0