Satellite DNA fragility accompanies complex genome rearrangements and ecDNA oncogene amplification in canine osteosarcomas

doi:10.64898/2025.12.31.697157

Satellite DNA fragility accompanies complex genome rearrangements and ecDNA oncogene amplification in canine osteosarcomas

2025 · doi:10.64898/2025.12.31.697157

preprint OA: closed CC-BY-ND-4.0

📄 Open PDF Full text JSON View at publisher

Full text 100,331 characters · extracted from oa-pdf · 11 sections · click to expand

Abstract

Canine osteosarcoma (OS) is an aggressive bone cancer with remarkable similarity to human OS. Despite its value as a comparative oncology model, the drivers of genomic instability remain poorly defined, hindering cross-species insights into disease progression. We applied long-read PacBio HiFi whole genome sequencing to matched tumor and blood samples from four purebred dogs to generate high-resolution maps of somatic structural variants (SVs). Tumor genomes showed extensive rearrangement complexity, with satellite repeat regions, particularly SAT1_CF, enriched near somatic SV breakpoints and occurring within genomic contexts marked by focal hypomethylation. We also identified multiple extrachromosomal DNA (ecDNA) elements carrying oncogenes indicating ecDNA-mediated oncogene amplification. Together, these data provide an integrated genomic and epigenomic view of canine OS, implicating satellite repeats as recurrent substrates for rearrangement and ecDNA as a prominent mode of genome remodeling. This work advances our understanding of OS genome dynamics and establishes a comparative framework for investigating repeat-driven instability and oncogene amplification across species.

Keywords

Canine Osteosarcoma, Structural Variation, Satellite Repeats, Extrachromosomal DNA, Long-Read Sequencing

Introduction

Osteosarcoma (OS) is a rare but devastating bone cancer in humans, occurring primarily in children and adolescents with an incidence of only ~4-5 cases per million (1). Despite advances in treatment, outcomes for patients with metastatic or relapsed disease remain poor with five- year survival rates below 30% (2, 3), highlighting an urgent need for deeper biological insight and improved therapeutic strategies. Among canines, OS is a common and highly aggressive cancer found particularly in large- and giant-breed dogs and occurring at an incidence 27-fold higher than in humans (4). Importantly, spontaneous canine OS closely recapitulates key clinical and molecular features of human OS, including anatomic distribution, metastatic behavior, and histopathology, making dogs a powerful model for comparative oncology and translational discovery (5–7). Cross-species genomic and transcriptomic studies increasingly demonstrate conserved tumor biology and clinical outcomes, underscoring the value of canine OS for biomarker discovery and therapeutic development relevant to human disease (8). .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted December 31, 2025. ; https://doi.org/10.64898/2025.12.31.697157doi: bioRxiv preprint 2 At the genomic level, both canine and human OS share critical molecular hallmarks. Both exhibit relatively low somatic point mutation burdens compared to most adult cancers (1.2 mutations/Mb), yet higher mutation rates than other pediatric malignancies (such as neuroblastoma and medulloblastoma; ~0.4-0.5 mutations/Mb) (9, 10). Moreover, OS genomes are dominated by extensive structural variation (SV) and copy-number alterations (9, 11–14). High-grade human OS tumors frequently display chromothripsis and other complex, clustered rearrangements that profoundly remodel chromosomal architecture. Comparative analyses of canine tumors and cell lines reveal similarly chaotic karyotypes, suggesting that large scale structural genome instability is a conserved and fundamental feature of OS across species (12, 15, 16). Growing evidence from cancer genomics and epigenetics implicates repetitive DNA, especially pericentromeric satellite repeats, as key substrates for genome instability (17, 18). These tandemly repeated elements are concentrated in heterochromatic regions flanking centromeres, where loss of DNA methylation has been linked to heterochromatin decondensation, replication stress, and elevated rates of chromosomal instability (18–20). Together, these observations position satellite repeats as potential hotspots for SV formation, yet their role in OS genome remodeling remains poorly defined. Another recurrent feature of aggressive solid tumors, including OS, is focal oncogene amplification on extrachromosomal DNA (ecDNA; historically referred to as “double minutes”) (12, 21–23). Pan-cancer studies demonstrate that ecDNA is common, preferentially harbors oncogenes and regulatory elements, drives intratumoral heterogeneity, and is associated with poor clinical outcomes (21). In human bone tumors, recurrent amplicons (e.g., 12q13– 15/MDM2–CDK4) are frequently localized to extrachromosomal or ring structures (24, 25), implicating ecDNA as an important mechanism of oncogene amplification in this disease (21, 22, 26). While extrachromosomal DNA (ecDNA) has emerged as a major driver of oncogene amplification and treatment resistance across many human cancers, its prevalence and functional significance in human OS remain incompletely characterized (12, 26). Similarly, the role of ecDNA in canine OS is unknown. Here, we apply PacBio HiFi whole-genome sequencing to primary tumors and matched blood samples (including normal stroma) from four purebred dogs with OS to investigate the mechanisms of structural genome instability. By jointly analyzing SVs, repetitive DNAs, native CpG methylation and ecDNA, we observed oncogene-bearing ecDNAs and also found satellite- associated, hypomethylated DNAs to be enriched near somatic SV breakpoints as prominent features of the tumors studied. Given the extensive molecular and clinical parallels between canine and human OS, these findings from a naturally occurring large animal model provide insights into repeat-driven instability and ecDNA-mediated oncogene amplification in canine OS, with direct implications for understanding and ultimately targeting these processes in human OS.

Results

Sample Characteristics and Sequencing Overview. We generated PacBio HiFi long-read whole genome sequencing data from matched tumor and peripheral blood samples from four dogs with osteosarcoma (OS): an 18-year-old male Rottweiler (Ro1759), a 12-year-old female Great Pyrenees (GP1899), an 11-year-old male .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted December 31, 2025. ; https://doi.org/10.64898/2025.12.31.697157doi: bioRxiv preprint 3 Golden Retriever (GR2023), and a 6.5-year-old female Great Dane (GD2044) (SI Appendix, Table S1). Primary tumors arose in the humerus or radius, consistent with typical appendicular OS sites. HiFi sequencing provided 51× - 96× tumor genome coverage (SI Appendix, Table S1), enabling high confidence SV and methylation analyses. Estimated tumor purity ranged from 0.53 to 0.81, and genome-wide ploidy from 1.93 to 4.69 (Figs. S1-S4). For three dogs (Ro1759, GP1899, and GR2023), additional normal stromal tissue was available and sequenced to provide independent non-neoplastic controls. Single Nucleotide Variants and Mutational Spectrum. Long-read whole genome sequencing revealed substantial intertumor variability in somatic SNV burden across canine OS, with total counts ranging from 2,506 to 6,855 variants and mutation rates of 1.06 to 2.91 SNVs/Mb (median 1.95 SNVs/Mb; Fig 1A; SI Appendix, Table S2-S3). These burdens are consistent with prior canine OS studies (27) and comparable to those reported for human pediatric OS (9), supporting a shared low-to-moderate mutational burden across species. SNVs were overwhelmingly noncoding, with only 0.89% mapping to coding sequences (CDS) and the remainder localized primarily to introns and untranslated regions (UTRs) (Fig. S5). Accordingly, CDS-restricted mutation rates were substantially higher (20.0-54.8 SNVs/Mb; SI Appendix, Table S3) reflecting strong selective constraint on protein-coding regions. The marked depletion of coding mutations despite their higher occurrence rate indicates that most protein-altering changes are incompatible with tumor cell fitness (28, 29), as cancer cells depend on preserving the function of essential genes required for basic cellular processes and tumor survival (30). Functional annotation identified only 0.42-0.77% of SNVs as having moderate or high predicted impact, including alterations in established cancer-associated genes such as TP53, RB1 (SI Appendix, Table S4). Consistent with the genetic heterogeneity characteristic of OS, only three SNVs were shared across tumors (SI Appendix, Table S5). The mutational spectrum was dominated by transversions, yielding a low transition/transversion ratio (Ti/Tv = 0.685; Fig. 1B) and distinguishing these tumors from transition-enriched, aging associated mutational patterns reported previously in canine OS (27, 31). Instead, G>T and C>A transversions were enriched, consistent with oxidative DNA damage-associated mutagenic processes (32, 33). This transversion-dominated mutational profile parallels patterns observed in human OS, where >80% of tumor exhibit BRCA-deficiency-associated signature characterized by specific single-base substitutions and oxidative damage markers (34, 35). Sample specific analyses further revealed additional heterogeneity with distinct enrichment of C>T, C>A, or T>G substitutions across individual tumors, indicating variability in dominant mutational processes within this cohort (SI Appendix, Table S6). Chromosomal distribution of SNVs was broadly uniform with localized enrichments on specific chromosomes in individual tumors (e.g., CFAX and CFA22 in GP1899; Fig S6) suggesting regional mutation accumulation rather than genome-wide mutational bias. This pattern is consistent with the heterogeneous chromosomal instability patterns reported in human OS, where distinct chromosomes and genomic regions are variably affected across tumors (9, 36). Intersecting somatic variants with canFam4 CDS annotations identified 142 coding variants across the cohort, with modest per-tumor counts (18–59 variants; SI Appendix, Table S7). These mutation burdens are comparable to human OS, which similarly displays low coding mutation frequencies (median 36 mutations, range 7-194 per tumor (37)) and generally low point-mutation burden relative to many adult cancers (9). .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted December 31, 2025. ; https://doi.org/10.64898/2025.12.31.697157doi: bioRxiv preprint 4 Insertion and Deletion (indel) Characteristics. Across the four tumors, we identified 2,058 somatic indels (<50 bp), corresponding to indel burdens of 0.047–0.438 indels/Mb (SI Appendix, Tables S8–S9). Indels were dominated by short events. Nearly 60% were single–base pair insertions or deletions with frequency declining sharply as indel size increased and fewer than 5% exceeding 10 bp (Fig. 1C). This size distribution is consistent with replication-associated mutational processes rather than large-scale DNA repair defects. Indels were overwhelmingly noncoding, with only 1–28 CDS-overlapping events per tumor and just three coding indels observed across the entire cohort (Fig. S7). This extreme depletion from coding regions (~99% noncoding) is consistent with strong purifying selection against frameshift and protein-disruptive mutations. Accordingly, functional annotation showed that nearly all indels were predicted to have minimal impact, with rare, non-recurrent coding events and exceptionally few frameshift mutations (five genes total, each observed once; SI Appendix, Table S10). These patterns indicate that indels contribute little to the functional mutational burden of canine OS compared with SNVs. Chromosome-level analyses revealed non-uniform indel distribution, with recurrent enrichment on CFA13, CFA22, CFA31, and CFAX (Fig. S8). These enrichments persisted after correction for chromosome size and coding content. Their overlap with regions previously implicated in copy-number alterations in canine OS, argues against technical bias. Analysis of indel formation mechanisms supported replication slippage as the dominant source for the large number of short insertions and deletions observed. There appeared to be limited contribution from other mechanisms such as mismatch repair deficiency or repeat expansion processes (SI Appendix, Table S11). These findings indicate that while small indels are common in canine OS genomes, they are largely biologically neutral and unlikely to be major drivers of genomic instability in this disease. Genes Affected by Somatic SNVs and Indels. TP53 was the only gene recurrently mutated in all four tumors (Fig. 1D), consistent with prior reports documenting near-universal TP53 disruption in canine OS and human human OS where TP53 is altered in approximately 90% of cases through both point mutations and SVs (11, 38), supporting its role as a core driver of disease pathogenesis across species. Additional recurrently mutated genes overlapped with those implicated in human OS and included CENPF, LRRC43, IGFBP1, MUC5AC, AASS, and ABCA13 (Fig. S9, SI Appendix, Tables S12, S13), as well as tumor suppressors previously linked to OS biology (LSAMP, DACH1) (11, 34). Pathway-level analysis revealed convergence of affected genes on cancer-relevant networks involving WNT/β- catenin, PI3K/AKT/mTOR, TGF-β signaling, cell adhesion and epithelial–mesenchymal transition, metabolism, and cell-cycle regulation (SI Appendix, Table S14), indicating that despite limited coding mutation burden, recurrent disruption of core oncogenic pathways is preserved in canine OS and mirrors the pathway-level convergence observed in human OS . These pathways—including PI3K/AKT/mTOR, WNT/β-catenin, and TGF-β signaling—are consistently implicated in human OS pathogenesis (9, 39, 40), reinforcing the translational relevance of canine OS as a comparative model for human disease. Extensive Structural Variation and Complex Rearrangements. Genome-wide analysis of structural variants (SVs ≥50 bp) revealed pervasive and highly heterogeneous structural genome instability across canine OS. Total SV counts ranged from 184 to 738 events per tumor, corresponding to an approximately threefold difference in overall SV .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted December 31, 2025. ; https://doi.org/10.64898/2025.12.31.697157doi: bioRxiv preprint 5 burden, while SV density within coding regions varied by more than 200-fold across samples (1.9–421.8 SVs/Mb CDS; Fig. 2A; Tables S15–S16). Breakend (BND) events predominated in all tumors, followed by deletions (DELs), duplications (DUPs), and insertions (INSs), consistent with extensive genome rearrangement rather than simple copy number alteration (SI Appendix, Figs. S10–S13). SV size distribution differed by class: DUPs were the largest events (median ≈41 kb), spanning 150 bp to 42 Mb; DELs had a median size of 9 kb and ranged from 50 bp to 18 Mb; INSs were substantially smaller, with a median of 240 bp and lengths ranging from 50 bp to 6.3 kb (Fig. S14). SVs were distributed non-uniformly across the genome, with multiple chromosomes showing significant enrichment or depletion after normalization for chromosome length (Fig. S15). Across the cohort, CFA12, CFA1, CFA10, CFA27, and CFAX, harbored the highest cumulative SV burdens. A pronounced hotspot on CFA12 in tumor GP1899 encompassed focal duplications and intrachromosomal rearrangements extending into the dog leukocyte antigen (DLA) region, suggesting localized structural instability involving immune-related loci. Annotation of SV junctions based on microhomology length, inserted sequence, and repeat- mediated homology revealed a consistent predominance of error-prone repair mechanisms across tumors (Fig. 2B, SI Appendix, Table S17). Non-homologous end joining (NHEJ-like) events accounted for 70.1–81.7% of SVs, with microhomology-mediated end joining (MMEJ-like) contributing an additional 10.2–11.4%. Replication-associated mechanisms such as FoSTeS/MMBIR (0.8–2.8%) and homology-driven recombination (NAHR; 2.8–7.0%) were comparatively rare. Together, these patterns indicate that double-strand break repair with little or no homology is the dominant mode of SV formation in canine OS, with microhomology- mediated processes playing a consistent secondary role. Long-read sequencing resolved extensive rearrangement complexity, revealing hundreds of multi-breakpoint SVs per tumor (161–438 BNDs), including templated insertions, foldback inversions, inverted translocations, and clustered reciprocal rearrangements (Fig. 2C). Many SVs were organized into discrete breakpoint clusters spanning one or more chromosomes, consistent with chromothripsis- and chromoplexy-like genome remodeling processes (Table S18, S19). Notably, tumor GP1899 exhibited a dense cluster of interconnected rearrangements on CFA19 displaying alternating segment orientation and size, a pattern characteristic of chromothripsis (Fig. 2D). Optical genome mapping independently validated the majority of HiFi-detected breakpoints (Fig. S16), enabling reconstruction of this localized catastrophic event from base- pair to megabase scales and confirming its substantial impact on tumor genome architecture. Despite preferential localization of SV breakpoints to intronic and other noncoding regions (Fig. S17), gene-level annotation identified recurrent, high-impact SVs affecting cancer-relevant genes (SI Appendix, Table S20). In tumor Ro1759, interchromosomal rearrangements resulted in transcript ablation of VCL, consistent with complete functional knockout. Tumor GR2023 harbored multiple independent BND events generating recurrent GRIP1–LRCH2 gene fusions with predicted truncation of both partners, implicating disruption of cytoskeletal organization and signaling pathways. In tumor GP1899, an inversion–translocation event produced high- impact fusions involving CALHM4 and truncation of the DNA repair polymerase POLB. In contrast, tumor GD2044 contained a single large tandem duplication spanning 8.3 Mb on CFA1, affecting hundreds of genes including BCL2 and TNFRSF11A. This is consistent with a dosage- driven copy-number gain rather than focal transcript disruption. Across tumors, while most SV-associated annotations were predicted to have modifier or moderate effects, a recurrent subset of high-impact rearrangements directly disrupted genes .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted December 31, 2025. ; https://doi.org/10.64898/2025.12.31.697157doi: bioRxiv preprint 6 involved in cytoskeletal regulation, DNA repair, apoptosis, and signaling. Collectively, these findings demonstrate that canine OS genomes are shaped by extreme, error-prone structural remodeling, in which clustered SVs arising from non-homologous repair processes contribute both to large-scale genome reorganization and to focal disruption of cancer relevant genes. The extent and heterogeneity of SVs observed here are highly concordant with prior canine and human OS studies describing a structurally dominated, “chaotic” genomic landscape, in which chromosomal lesions represent the predominant mutation class. Previous analyses of canine OS tumors and cell lines have reported recurrent disruption of genes such as DMD, DLG2, and SETD2, as well as extensive rearrangement across both primary and metastatic disease (27, 31, 41). Our data reinforce these observations and extend them by resolving complex, multi- breakpoint SV architectures in primary tumors using long-read sequencing. Notably, the pronounced enrichment of SVs on specific chromosomes (e.g., CFA12) and the presence of distinct SV size classes suggest contributions from both localized DNA damage repair errors and larger-scale catastrophic events. The predominance of intronic and other noncoding SVs, consistent with reports in human OS, further underscores the importance of large-scale regulatory and architectural genome disruption (rather than recurrent point mutations) as a central feature of OS pathogenesis. Differential enrichment of repetitive elements at somatic variant breakpoints. To determine whether specific repetitive genomic elements mark regions of heightened mutational susceptibility in canine OS, we performed permutation-based enrichment analyses of repetitive sequences at SNV, indel, and SV breakpoints across the four tumors. This variant- stratified approach revealed highly reproducible, repeat-class associations that differed markedly by mutation type, indicating distinct underlying mutational mechanisms. At SNV breakpoints, pericentromeric satellite repeats (most notably SAT1_CF) showed the strongest and most consistent enrichment, observed in three of four tumors (19.3–46.3-fold; Fig. 3A, B; SI Appendix, Table S21). Simple tandem repeats, particularly tetranucleotide motifs, were also recurrently enriched at SNV breakpoints in most tumors, whereas SINE elements and ancient MIR families were consistently depleted (Table S21). In contrast, the canine-specific retrotransposon subfamily, L1_Canis1, exhibited universal SNV breakpoint enrichment across all four samples (Fig. 3C), with enrichment strongest in evolutionarily young LINE-1 subfamilies, consistent with preferential mutability at recently active elements. Indel breakpoints displayed a distinct and highly convergent pattern. Simple repeats were the dominant sequence context for indel formation, showing strong and uniform enrichment across all tumors (6.5–9.1-fold; Fig. 3D; SI Appendix, Table S22), particularly at thymine-rich microsatellite motifs (adjusted p-value = 0.00112). This pattern supports replication slippage as the primary mechanism underlying indel formation. SINE elements, including tRNA-derived families and carnivore-specific SINEC subfamilies, showed moderate but consistent enrichment, whereas LTR retrotransposons and evolutionarily ancient LINE families (e.g., L2, MIR) were recurrently depleted, establishing a hierarchy of indel susceptibility governed by repeat type, sequence composition, and evolutionary age. SV breakpoints exhibited a contrasting repeat landscape: SAT1_CF centromeric satellites showed the most consistent enrichment, detected in three of four tumors (17.8–44.1-fold; Fig. 3B; SI Appendix, Table S23), with broader satellite classes also enriched in multiple samples. In striking contrast to indels, simple repeats were universally depleted at SV breakpoints across all tumors (0.17–0.99-fold), indicating that repeat contexts prone to polymerase slippage are not .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted December 31, 2025. ; https://doi.org/10.64898/2025.12.31.697157doi: bioRxiv preprint 7 major substrates for large-scale rearrangement. Retrotransposon-associated SV enrichment was more heterogeneous. Some tumors showed enrichment of LINE-1 or LTR subfamilies, including dog-specific L1 elements, whereas others did not, suggesting tumor-specific contributions from euchromatic retrotransposon-associated processes. Across variant classes, these analyses reveal a reproducible and mechanistically coherent pattern: (1) Simple repeats act as primary hotspots for indels but are coldspots for SVs. (2) Centromeric SAT1_CF satellites are recurrently enriched at both SNV and SV breakpoints. (3) SINE and LINE elements exhibit variant-type–specific and age-dependent associations. Together, these findings demonstrate that somatic mutagenesis in canine OS is strongly shaped by local repetitive sequence context, with distinct mutational processes operating preferentially within specific genomic environments. The convergence of SNV and SV breakpoints at centromeric satellite DNA, in particular, nominates these regions as recurrent sites of structural fragility and aberrant repair, motivating further analysis of their epigenetic state and contribution to genome instability. Given the recurrent enrichment of SV breakpoints at centromeric satellite repeats, we next asked whether these rearrangements are associated with localized epigenetic alterations, such as DNA hypomethylation, that are characteristic of structurally unstable genomic regions. Local hypomethylation at structural variant breakpoints. Analysis of CpG methylation inferred from HiFi sequencing kinetics revealed focal hypomethylation at somatic SV breakpoints in tumors relative to matched normal samples (SI Appendix, Table S24). SV breakpoints were frequently located within, or immediately adjacent to, hypomethylated differentially methylated regions (DMRs), indicating a strong spatial association between SV and localized methylation loss. Overlaying SV breakpoints and SV-spanning intervals with tumor-intrinsic DMRs and gene annotations identified 107–506 genes intersected by SVs per tumor (SI Appendix, Table S25). Among SVs overlapping DMRs (SV-DMRs), 69 SVs intersected hypomethylated DMRs, whereas only 5 SVs overlapped hypermethylated regions (SI Appendix, Table S25), revealing a pronounced bias toward hypomethylation at sites of SV. Of these SV–DMR overlaps, 30 occurred within gene bodies or promoter regions, highlighting candidate loci at which structural rearrangement and epigenetic alteration co-occur. Within individual tumors, several genes were disrupted by multiple independent SVs, consistent with locus-level clustering of structural instability. These included PCSK5, SIPA1L2, and RUNX2 in tumor GP1899, as well as GRID1, GRIP1, and CCDC88C in other tumors (SI Appendix, Table S25). Genes affected by hypomethylated SVs included examples with functions relevant to OS biology, including osteogenic differentiation (RUNX2), cytoskeletal organization (GRIP1, GRID1, SIPA1L2), signaling and transcriptional regulation (AUTS2, CCDC88C), and chromatin-associated processes (KANSL3) (SI Appendix, Figures S18-S23). Collectively, these findings demonstrate that SVs in canine OS are consistently associated with focal hypomethylation and tend to cluster at specific genomic loci within tumors. .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted December 31, 2025. ; https://doi.org/10.64898/2025.12.31.697157doi: bioRxiv preprint 8 Extrachromosomal DNAs concentrate cancer-associated genes and canonical OS pathways. Extrachromosomal DNA (ecDNA) was detected in 3 of 4 tumors (75%) and was absent from matched normal samples, indicating tumor-specific formation (Fig. 4A, B). Across these tumors, we identified 16 distinct circular amplicons (Ro1759: n=4; GP1899: n=11; GD2044: n=1) ranging from 103 kb to 3.07 Mb in size (median 274 kb) and amplified to copy numbers of 3.3– 8.4 (median 4.5), consistent with focal oncogenic amplification. Six ecDNAs met stringent Tier 1 criteria, while ten were classified as Tier 2 (probable ecDNA) (SI Appendix, Figures S23-S31). Notably, nearly one-third (5/16) contained interchromosomal junctions involving segments from multiple chromosomes, consistent with complex, chromothripsis-like ecDNA formation mechanisms (SI Appendix, Figures S23-S31). ecDNAs recurrently harbored genes implicated in OS-relevant signaling pathways. In tumor Ro1759, the largest ecDNA (ec63; 3.07 Mb; Tier 1) comprised 13 rearranged segments and amplified CTNNB1 (β-catenin; copy number validated by ddPCR; Fig. S32) and TGFBR2, linking ecDNA formation to Wnt/β-catenin and TGF-β signaling (Fig. 4C, Fig. S33). A second ecDNA, ec4, in this tumor amplified FGF7, consistent with involvement of the FGF/FGFR axis in OS proliferation (Fig. S33, SI Appendix, Table S26). In tumor GP1899, a high-copy ecDNA encompassed MDM2 (copy number 6.55) together with FRS2 and YEATS4, recapitulating the canonical 12q13–15 amplicon architecture characteristic of human OS (Fig. S33, SI Appendix, Table S26). Additional ecDNAs in this tumor included an amplicon containing LSAMP, with circular junctions independently validated by PCR (Fig. S34). Across tumors, ecDNA-associated genes converged on a limited set of oncogenic pathways (including p53/MDM2, Wnt/β-catenin, TGF-β, and FGFR signaling) despite heterogeneity in ecDNA number, structure, and chromosomal origin. These findings indicate that ecDNA- mediated amplification is a recurrent feature of canine OS and provides a flexible, high-copy substrate for activation of core oncogenic pathways. These results reinforce ecDNA as a central mechanism of genome remodeling and pathway dysregulation in this disease.

Discussion

Using long-read whole-genome sequencing, we show that canine osteosarcoma (OS) is characterized by extreme structural genome remodeling coupled with focal epigenetic disruption, closely mirroring key features of human OS. Although based on a modest cohort, the depth and resolution of these data refine existing models of OS pathogenesis and reinforce the value of spontaneous canine OS as a translational model for studying structural and epigenetic drivers of cancer evolution. .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted December 31, 2025. ; https://doi.org/10.64898/2025.12.31.697157doi: bioRxiv preprint 9 Structural genome remodeling as a defining feature of OS. Across tumors, structural variation dominated the mutational landscape, with extensive multi-break rearrangements, clustered breakpoint architectures, and complex configurations that are difficult to resolve using short-read approaches. These findings reinforce prior work in human and canine OS identifying SVs (not point mutations) as the principal mode of genomic alteration. The chromothripsis-like event resolved on CFA19 provides direct evidence that chromosome-scale catastrophic rearrangements occur in spontaneous canine OS, consistent with observations in human disease. Whether such events arise early during tumor initiation or continue to accumulate during progression likely varies by tumor; our observation of substantial intertumor heterogeneity in SV burden and architecture suggests that OS evolution follows multiple trajectories rather than a single canonical path. Epigenetic context and structural fragility. A key insight from this study is the consistent association between SV breakpoints and focal hypomethylation. Long-read derived methylation profiles revealed sharp, localized loss of CpG methylation surrounding rearrangement junctions, with recovery toward baseline at increasing distance. This spatial pattern supports a close coupling between epigenetic relaxation and structural instability, potentially reflecting increased susceptibility of open chromatin to DNA breakage, replication stress, or error-prone repair. Importantly, all SVs overlapping differentially methylated regions intersected hypomethylated (but not hypermethylated) domains, indicating a strong directional relationship between methylation loss and rearrangement. Repeat-enrichment analyses further implicate repetitive DNA as a substrate for this instability. While retrotransposons account for many breakpoint overlaps, centromeric satellite DNA (particularly the SAT1_CF family) showed the strongest and most consistent enrichment at both SNV and SV breakpoints. These sequences localize to heterochromatic, late-replicating regions previously linked to replication stress and double-strand break formation. Interpretation of this enrichment must account for canine chromosome architecture, as most canine chromosomes are acrocentric, placing pericentromeric satellites near chromosome ends. Without complete centromere annotations in the current reference genome, it remains unclear whether satellite enrichment reflects intrinsic sequence fragility, positional effects, or both. Nevertheless, these findings demonstrate that satellite-rich domains disproportionately participate in OS genome remodeling and highlight the power of long-read sequencing to interrogate highly repetitive regions that were previously inaccessible. .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted December 31, 2025. ; https://doi.org/10.64898/2025.12.31.697157doi: bioRxiv preprint 10 Extrachromosomal DNA as a route to oncogenic amplification. We detect extrachromosomal DNA (ecDNA) in most tumors, with circular amplicons recurrently carrying genes central to OS biology, including MDM2, CTNNB1, TGFBR2, and FGF7. These ecDNAs recapitulate canonical oncogenic architectures described in human OS, such as the 12q13–15 amplicon, and frequently involve complex, multichromosomal rearrangements. Given their high copy number and non- Mendelian inheritance, ecDNAs provide a flexible mechanism for rapid oncogene dosage modulation and intratumoral heterogeneity. The convergence of ecDNA content on a limited set of pathways (p53/MDM2, Wnt/β-catenin, TGF-β, and FGFR signaling) suggests that ecDNA- mediated amplification represents a recurrent route to pathway activation in OS. Determining how these structures influence transcriptional programs, therapeutic sensitivity, and evolutionary dynamics will require integrated functional studies.

Conclusion

and future directions This study establishes that canine OS is shaped primarily by extensive structural genome remodeling rather than by accumulation of small coding mutations. Using long-read whole- genome sequencing, we provide the first integrated view of structural variation, repetitive sequence context, native DNA methylation, and extrachromosomal DNA in naturally occurring canine OS, revealing a cancer genome dominated by clustered rearrangements, focal hypomethylation, and ecDNA-mediated oncogene amplification. Our findings identify satellite-rich genomic regions as recurrent sites of structural fragility, demonstrate that structural variants are consistently associated with localized methylation loss, and show that ecDNA frequently concentrates genes from canonical OS pathways. Together, these features point to replication stress, error-prone repair, and epigenetic destabilization as central forces shaping OS genome evolution. The convergence of ecDNA content on a limited set of signaling pathways further highlights ecDNA as a flexible substrate for oncogenic amplification and tumor adaptation. Despite a limited cohort size, this work provides a foundational framework for understanding how structural and epigenetic instability interact to drive OS biology. Dogs with spontaneous OS offer a uniquely powerful translational system (combining biological relevance, clinical accessibility, and accelerated disease timelines) for testing hypotheses generated here. Future studies leveraging larger comparative cohorts, multiomic integration, and functional validation will be essential to define the causal relationships among satellite fragility, epigenetic disruption, and ecDNA formation, and to translate these insights into genome-informed therapeutic strategies for OS in both dogs and humans.

Materials and methods

Sample collection Blood, tumor, and stromal tissues were obtained from four client-owned dogs with OS, an 18 year old male castrated Rottweiler (sample Ro1759), 12 year old female Great Pyrenees (sample GP1899), 11 year old male castrated Golden Retriever (sample GR2023), and 6.5 year old female Great Dane (sample GD2044), that were presented to the Flint Animal Cancer Center at Colorado State University (CSU), Fort Collins, Colorado (SI Appendix, Table S1). All procedures were approved by the CSU Institutional Animal Care and Use Committee, and written informed consent was obtained from each owner. Dogs were selected for inclusion if they had treatment-naïve, primary appendicular OS documented by radiographic findings and .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted December 31, 2025. ; https://doi.org/10.64898/2025.12.31.697157doi: bioRxiv preprint 11 confirmed by subsequent histopathology. Blood samples were collected via venipuncture prior to any surgical or medical intervention. Limb amputation was then performed, and tumor and stromal samples were harvested within 30 minutes of surgery. Tumor tissue was obtained using a stainless-steel Michele trephine biopsy needle, obtaining 3-5 cores from regions with clear mass effect and/or cortical bone lysis. Normal stromal tissue (bone and associated soft tissue) was sampled from anywhere around the tumor where there is grossly normal appearing tissue, to serve as a matched control. All tissues were either immediately flash-frozen in liquid nitrogen or processed fresh for downstream molecular analyses. Amputated limbs were subsequently submitted to the CSU Veterinary Diagnostic Laboratory for comprehensive histopathological confirmation and final tumor classification. PacBio HiFi long-read whole genome sequencing Genomic DNA (gDNA) was extracted from blood and tissue samples using NEB Monarch cell/blood (NEB #T3050) and NEB Monarch tissue (NEB #T3060) kits, respectively. Briefly, 0.5-1ml of blood was mixed with 3 volumes of RBC Lysis buffer. Leukocytes were pelleted by centrifuge at 1000g at 4C for 10 min. For tissue samples, about 20-80mg of tissue was ground to a fine powder in liquid nitrogen. The pelleted cells or tissue powder were then lysed in the presence of proteinase K for 45 min. RNA was digested with RNase A for 10 min. Protein Separation Solution was added to remove proteins, and sample was centrifuged at 16000g for 10 min to pellet cell debris and proteins. The supernatant was treated with isopropanol to precipitate gDNA in the presence of glass beads. The sample was incubated at room temperature (RT) on a rotating mixer to allow gDNA binding to the glass beads. The supernatant was then removed, and the gDNA-bound glass beads were washed twice with Wash Buffer. The gDNA was eluted from the glass beads using Elution Buffer by incubation at 56C. The library was prepared using SMRTbell prep kit 3.0 (PacBio). About 5ug of HMW DNA was first sheared using Megaruptor 3.0 (Diagenode). The sheared DNA was then cleaned up using 1X SMRTbell cleanup beads. The DNA was end-repaired, A-tailed and ligated with the barcoded SMRTbell adapters, followed by a cleanup to remove ligation enzyme and free adapters. This is followed by a nuclease treatment to remove DNA fragments without adaptors. Purification was performed using SMRTbell cleanup beads to remove excessive nuclease. Libraries were further subjected to gel size selection on Pippin HT (Sage Science) to remove small fragments <10kb. The size-selected and purified libraries were loaded on the Revio at 300- 350pM. Sequencing was performed for 30 hours. Bionano Genomics optical mapping Ultra-high-molecular-weight DNA was extracted from blood-tumor matched samples from one dogs (GP1899) using the Bionano Prep SP Fresh Cells DNA Isolation protocol, revision C (Document #30257), and the Bionano SP Blood & Cell DNA Isolation Kit (catalog #80030). In brief, 1.5 million cells were pelleted, resuspended in a lysis buffer containing detergents, proteinase K, and RNase A. DNA was then bound to a silica disk, washed, eluted, and homogenized by 1 hour of end-over-end rotation at 15 rpm followed by overnight equilibrium at room temperature. Isolated DNA was fluorescently labeled at CTTAAG motifs using the DLE-1 enzyme and counter-stained with the Bionano PrepTM DNA Labeling Kit – Direct Label and Stain (catalog #8005; Protocol revision F; Document #30206). Briefly, 750 ng of gDNA were .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted December 31, 2025. ; https://doi.org/10.64898/2025.12.31.697157doi: bioRxiv preprint 12 incubated with DL-Green dye and DLE-1 enzyme in DLE-1 buffer for 2 hours at 37°C, followed by enzyme heat inactivation for 20 minutes at 70°C. The labeled DNA was treated with proteinase K at 50°C for 1 hour, and excess DL-Green dye was removed by membrane adsorption. The DNA was stored at 4°C overnight to facilitate DNA homogenization and then quantified using a Qubit dsDNA HS Assay Kit (Molecular Probes/Life Technologies). Prior to imaging, labeled DNA was stained with an intercalating dye and incubated at room temperature for at least 2 hours before loading onto a Bionano Saphyr chip. Molecules were linearized and imaged on Saphyr 2nd generation instruments (Part #60325) using Instrument Control Software (ICS) version 4.9.19316.1. The DNA Saphyr system software was used to measure backbone length and to detect the positions of fluorescent labels along each molecule. Resulting molecules were de novo assembled using Bionano Solve (v3.1) and aligned to canFam4 (42) dog genome

Reference

assembly. Alignment of HiFi reads to reference genome Raw PacBio HiFi reads were aligned to the CanFam4 (42) reference genome using pbmm2 (v1.17.0), a wrapper for minimap2 optimized for PacBio long reads. Alignments were executed via SLURM batch scripts, producing sorted and indexed BAM outputs for subsequent variant calling. pbmm2 align "$refmmi" "$bamfile" "${outputdir}${sampleID}.pbmm2.bam" --preset CCS --sort -j 64 -m 16G --sample "$sampleID" samtools sort "${outputdir}${sampleID}.pbmm2.bam" -o "${outputdir}${sampleID}.pbmm2.sorted.bam" samtools index "${outputdir}${sampleID}.pbmm2.sorted.bam" rm "${outputdir}${sampleID}.pbmm2.bam" Haplotagging Small variant calling Variant detection was performed using DeepVariant (v1.6.1) (43), run in CPU mode. For each sample, sorted BAM files were processed to produce GVCF (.vcf.gz) outputs containing high- confidence SNVs and small indels. BIN_VERSION="1.6.1" module load singularity export HOME=${outputdir} cd "$HOME" singularity exec --bind /usr/lib/locale/ \ docker://google/deepvariant:${BIN_VERSION} \ /opt/deepvariant/bin/run_deepvariant \ --model_type PACBIO \ --ref ${reffile} \ --reads ${bamfile} \ --output_vcf ${outputdir}${sampleID}.${BIN_VERSION}.vcf.gz \ --num_shards $(nproc) \ .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted December 31, 2025. ; https://doi.org/10.64898/2025.12.31.697157doi: bioRxiv preprint 13 --logging_dir=${logdir} Variant phasing Phasing of heterozygous variants was conducted using HiPhase (v1.5.0-8af3f1e) (44), which employs long-range read information to phase variants across haplotypes. Inputs included the aligned BAM file, the DeepVariant VCF (.vcf.gz), and the reference FASTA. hiphase --bam ${normalbam}\ --vcf ${fixedvcfgz}\ --output-vcf ${outputdir}${sampleID}.hiphase.vcf.gz\ --reference ${reffasta} --threads 32 --ignore-read- groups Haplotagging of reads Reads were assigned to phased haplotypes using Whatshap (v2.5) (45), producing haplotagged BAM files (*.haplotagged.sorted.bam) for each sample. These files served as the foundation for downstream somatic variant detection and methylation analyses. whatshap haplotag --reference $reffasta $phasedvcf $bamfile -o ${outputdir}${sampleID}.haplotagged.bam --ignore-read-groups -- tag-supplementary --skip-missing-contigs --output- threads=$threads samtools sort -@ $threads -o ${outputdir}${sampleID}.haplotagged.sorted.bam ${outputdir}${sampleID}.haplotagged.bam samtools index -@ $threads ${outputdir}${sampleID}.haplotagged.sorted.bam Somatic variant calling Somatic variant analysis was performed using multiple complementary pipelines to capture SNVs, indels, and structural variants (SVs). Tumor (*_T) and matched normal (*_B) samples were jointly analyzed. Somatic SNVs and Indels Two independent variant callers were implemented: ClairS (v0.4.1) (46), and DeepSomatic (v1.9.0) (47). Each tool was executed via SLURM. singularity exec /projects/ch-lee- lab/labdata/yilmazf/Programs/Clair3/ClairS/clairs_latest.sif \ run_clairs \ --tumor_bam_fn ${tumorbamfile} \ --normal_bam_fn ${normalbamfile} \ --ref_fn ${reffile} \ --threads 48 \ --disable_phasing \ --enable_indel_calling \ .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted December 31, 2025. ; https://doi.org/10.64898/2025.12.31.697157doi: bioRxiv preprint 14 --enable_clair3_germline_output \ --platform hifi_revio \ --output_dir ${outputdir} \ --include_all_ctgs \ --conda_prefix miniforge/envs/clairs singularity run --nv -B /usr/lib/locale/:/usr/lib/locale/ \ docker://google/deepsomatic:"${BIN_VERSION}-gpu" \ run_deepsomatic \ --model_type=PACBIO \ --ref=$reffile \ --reads_normal=$normalbamfile \ --reads_tumor=$tumorbamfile \ -- output_vcf=${outputdir}${sampleID}_deepsomatic_output.vcf.gz \ -- output_gvcf=${outputdir}${sampleID}_deepsomatic_output.g.vcf.gz \ --sample_name_tumor=${sampleID}_Tumor \ --sample_name_normal=${sampleID}_Blood \ --num_shards=64 \ --logging_dir=${outputdir}deepsomatic_logs \ -- intermediate_results_dir=${outputdir}intermediate_results_dir Somatic SVs Two independent variant callers were implemented: savana (v1.3.5) (48), and severus (v1.6) (49). Each tool was executed via SLURM. savana --tumour $tumorbam --normal $normalbam --outdir $outputdir --ref $reffile --snp_vcf $snpvcf --sample $sampleID - -overwrite --contigs canFam4.contigs.txt --threads 32 --length 50 --pb --min_support 3 --mapq 10 --keep_inv_artefact severus \ --target-bam $tumorbam \ --control-bam $bloodbam \ --out-dir $outputdir \ -t 32 \ --between-junction-ins \ --use-supplementary-tag \ --phasing-vcf $phasedvcf \ --vntr-bed $vntrfile Variant merging and consensus ensemble variant file generation Ensemble integration .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted December 31, 2025. ; https://doi.org/10.64898/2025.12.31.697157doi: bioRxiv preprint 15 Final ensemble SNV and indel calls were constructed using a multistep approach (commands available under https://github.com/yilmazf/DogGenomesAnalyses). This ensemble served as the definitive variant set for downstream analyses. Final ensemble SV calls were constructed using Minda (49), a meta-caller integrating multiple VCFs to generate high-confidence consensus variants. This ensemble served as the definitive variant set for downstream analyses. minda.py ensemble --vcfs $severus $savana --out_dir $ensembleout --min_support 2 Tumor purity and ploidy The Wakhan (v0.2.0) tool was applied to integrate somatic SVs with haplotagged alignments and phased variants, providing insight into allele-specific SV patterns. python wakhan.py --threads 32 --reference $reffasta --target-bam $haplobamfile --normal-phased-vcf $normalphasedvcf --genome-name $sampleID --out-dir-plots $outputdir --breakpoints $severusvcf - -loh-enable --pdf-enable --copynumbers-subclonal-enable -- reference-name canFam4 --contigs chr1-38,chrX --user-input-genes $genesbed Detection of chromothripsis, chromoplexy, and kataegis events To systematically detect and classify three types of genomic catastrophe events: chromothripsis- like events, chromoplexy-like events, and kataegis-like events from tumor sequencing data. Input data requirements. The pipeline requires structural variant (SV) calls in VCF format, copy number (CN) segment data (TSV, CSV, or VCF format), and gene annotation data (GTF format or pre-computed gene-overlap files). For kataegis detection, single nucleotide variant (SNV) and indel data in VCF format are optionally provided. Chromosome nomenclature was standardized across all input files using a normalization procedure that either adds or strips the "chr" prefix, ensuring consistent chromosome identifiers throughout the analysis. Chromothripsis detection. Chromothripsis detection utilized SV breakpoint clustering combined with CN oscillation patterns. The pipeline implements multiple detection models: a legacy three- boolean criterion model, (12) "minimal" and "full" (localized template assembly-like) chromothripsis classification schemes, and a unified confidence scoring system. CN segments were optionally rounded to integer values before oscillation analysis. Two oscillation metrics were available: a "two-state" metric that evaluates transitions between distinct CN states, and an "adjacent" metric that examines consecutive segment pairs. The minimum run length criterion for oscillation was set (default: 7 segments), and consecutive segments with equal CN values could be compressed before oscillation counting. A tolerance parameter (cn_eps) was applied to treat CN values within a specified threshold as equivalent after rounding. For each detected cluster, the pipeline calculated breakpoint density (breakpoints per megabase), identified gene overlaps using the provided gene annotations, evaluated whether the event exhibited localized template assembly (LTA) characteristics, and assigned confidence levels (no- .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted December 31, 2025. ; https://doi.org/10.64898/2025.12.31.697157doi: bioRxiv preprint 16 evidence, low-confidence, mid-confidence, high-confidence) based on the strength of chromothripsis features. Chromoplexy detection. Chromoplexy events were detected using a graph-based approach that identifies chains of interconnected SVs spanning multiple chromosomes. The pipeline constructed an SV edge table from the input VCF, built a graph representation of SV connections, and identified connected components representing potential chromoplexy chains. CN segment data were integrated to assess the copy number context of detected chains. Each chromoplexy component was assigned a confidence score based on chain length, chromosomal complexity, and consistency with chromoplexy hallmarks. Structural variant formation event types SVs were analyzed using a custom workflow that integrates breakpoint-level transposable- element (TE) annotations, non-TE homology matching, microhomology quantification, and templated-insertion detection to infer the most probable mutational mechanism for each event. The pipeline accepts SVs encoded in a VCF file containing standard SVTYPE annotations (DEL, DUP, INV, INS, BND). For INS and BND events where END coordinates may be missing, the breakpoint position is used as both POS and END, ensuring such events are retained. When available, the script preferentially uses the DETAILED_TYPE INFO field; otherwise SVTYPE values are normalized (e.g., DUP:TANDEM → DUP). Breakpoints used for TE and homology annotation were provided as a BED7 file containing chromosome, start, end, SV identifier, coarse SV class, source label, and side designation (“start” or “end”). Transposable element annotation of SV breakpoints. TE overlap was determined by intersecting breakpoint intervals with a RepeatMasker-formatted BED file using bedtools intersect. The RepeatMasker “name” field was parsed to extract TE family, class, and strand orientation. For breakpoints overlapping multiple TEs, the TE with the maximum overlap length was retained for scoring. For each SV, TE annotations from the “start” and “end” breakpoints were paired by SV identifier. TE families, strand orientations, and overlap lengths were propagated into downstream scoring functions used to identify TE-mediated non-allelic homologous recombination (NAHR). Annotation of Non-TE homologous Pairs. When provided, non-TE homology was supplied as a BEDPE file describing matched pairs of homologous genomic loci (e.g., segmental duplications). The BEDPE file was converted to a single-interval BED format with pair identifiers and strand relationships encoded in the feature name field. Each breakpoint was intersected with these intervals (left-outer join), and the best-overlapping homology annotation for each breakpoint side was retained. Paired breakpoint annotations were compared to determine whether both breakpoints mapped to the same homology pair, the shared strand orientation (direct vs. inverted), and the maximum percent identity across the two sides. Microhomology and templated-insertion analysis. Microhomology at the SV junction was computed using local reference-sequence queries to an indexed FASTA file (via pysam). For each SV, up to 50 bp upstream of POS and 50 bp downstream of END were compared to identify the longest exact microhomologous tract. Templated-insertion length was extracted from VCF INFO fields (SVINSLEN or INSLEN), with robust parsing to accommodate integer or floating-point encodings. Missing insertion lengths were treated as zero. .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted December 31, 2025. ; https://doi.org/10.64898/2025.12.31.697157doi: bioRxiv preprint 17 Mechanism scoring framework. For each SV, three independent lines of evidence were evaluated: - TE-mediated NAHR (TE family match + orientation rules) - Non-TE NAHR (shared homology pair + strand and identity) - Junction-based mechanisms (microhomology and insertion size) - Each category contributes a score reflecting its support for specific mechanisms. TE-mediated NAHR scoring. TE-NAHR was evaluated only when both breakpoints overlapped TEs of the same family. Expected strand relationships were required: - DEL/DUP: concordant TE strands - INV: opposite TE strands Scores were increased by greater TE overlap. If the TE family matched but orientations were inconsistent with NAHR, the event was labeled TE_pair_other. Non-TE NAHR scoring. Homology-mediated NAHR was considered only when both breakpoints mapped to the same homology pair. Percent identity modulated the score. Direct (+) orientation supported DEL/DUP, whereas inverted (–) orientation supported INV. Junction-based mechanism scoring. Microhomology length and templated-insertion size were used to score canonical repair mechanisms: - FoSTeS/MMBIR-like: ≥2 bp microhomology and ≥20 bp insertion - MMEJ-like: 2–25 bp microhomology and ≤10 bp insertion - NHEJ-like: ≤1 bp microhomology and ≤10 bp insertion Events not meeting any criteria were classified as Unclassified. Mechanism selection. All evidence categories were scored independently. The final mechanism call was chosen using a highest-score-wins strategy. Ties were resolved by alphabetical order of tied labels. The full score map, along with TE-based, non-TE-based, and junction-based reasoning strings, was recorded for auditability. Repetitive elements enrichment analysis All analyses were performed on a high-performance computing (HPC) cluster managed by SLURM. Each job was executed on a single compute node using 64 CPU cores and 256 GB RAM. The environment was managed using a dedicated Conda environment, ensuring reproducibility of all Python- and bedtools-based analyses. All intermediate processing steps were executed within sample-specific output directories, using a temporary directory (TMPDIR) isolated per sample. Structural variant pre-processing. SV calls for each sample were supplied as VCF files. Before analysis, VCFs were cleaned to remove non-standard or unplaced contigs and extraneous metadata lines. We excluded entries mapping to Super-Scaffold_3465, KP081776.1, and CM016470.1, following the reference genome filtering standards of the project. Copy-number (CN) profiles generated by Savana were converted to BEDGRAPH format and similarly filtered to retain only autosomal and assembled chromosomes. These CN data were later used to correct for genomic biases in the permutation-based enrichment tests. Breakpoint extraction and definition of uncertainty windows. SV breakends were extracted from each VCF using a custom Python parser (Step2.1.vcf_to_breakbends.py). The parser reports 1-bp breakend intervals for all SV classes (DEL, DUP, INV, INS, BND), incorporating POS and END when present and respecting PASS filters. Filtered breakends were expanded into genomic “uncertainty windows” using bedtools slop, applying a ±200 bp window for SVs. This window length provides conservative inclusion of .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted December 31, 2025. ; https://doi.org/10.64898/2025.12.31.697157doi: bioRxiv preprint 18 breakpoint uncertainty in the absence of CIPOS/CIEND fields and is chosen to be consistent with previous SV breakpoint enrichment frameworks. Resulting windows were restricted to accessible genomic regions using an externally defined accessibility mask, ensuring all downstream randomization respected regions of mappable and callable reference sequence. Detection of observed TE–breakpoint overlap. To quantify observed breakpoint–TE intersections, we overlapped accessible SV breakpoint windows with a RepeatMasker annotation BED file using bedtools intersect. For each overlapping event, RepeatMasker metadata fields were parsed to extract TE family, class (LINE, SINE, LTR, DNA), and repName. A summary of observed TE families intersecting SV breakpoints was generated for quality control and for guiding TE-level enrichment tests. Permutation-based TE enrichment analysis. We quantified enrichment of TE sequence at SV breakpoints using a permutation framework implemented in Step2.2.permute_enrichment_v8.py. This method evaluates: - Enrichment of breakpoints in any TE, - Enrichment by TE class (e.g., LINE, SINE, LTR, DNA), - Enrichment by TE family (e.g., L1, SINEC_Cf), - Enrichment by specific TE names/repNames. Randomization framework. Randomized breakpoints were generated using bedtools shuffle with the following constraints: - Chromosome-preserving shuffling (-chrom) to maintain chromosome-level breakpoint density. - Restriction to accessible genome only (-incl), matching the filtering applied to real breakpoints. - Preservation of interval lengths (±200 bp windows). - Exclusion of reference gaps and other uncallable regions. - A total of 10,000 permutations per sample were performed (seed = 13) using 64 parallel threads. Bias correction. To mitigate genomic covariates influencing TE density, the permutation model incorporates stratified shuffling across GC‑content bins (n=3), mappability bins (n=3), and copy‑number bins from Savana CN tracks (n=4). The codebase supports optional centromere‑proximity bins, but these were not applied here because high‑confidence centromere coordinates are not available for canFam4. This matched-covariate randomization ensures that enrichment statistics are not confounded by heterogeneous genomic landscapes, including the variable mappability characteristic of repetitive sequences. Significance testing. For each TE category with ≥1 observed overlap (default behavior), the following were computed: - Observed overlap count - Mean and 95% CI of overlaps across permutations - Z-scores - Empirical permutation p-values - Benjamini–Hochberg FDR-adjusted q-values An enrichment was considered significant when q < 0.05. Null distributions for “any TE” enrichment tests were optionally saved for visualization. .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted December 31, 2025. ; https://doi.org/10.64898/2025.12.31.697157doi: bioRxiv preprint 19 Methylation analysis We used haplotagged bam files with pb-cpg-tools (v3.0.0) for methylation analysis. pb-CpG- tools provides the tool aligned_bam_to_cpg_scores, which can generate site methylation probabilities from mapped HiFi reads, including probabilities for each haplotype when reads are haplotagged. aligned_bam_to_cpg_scores \ --bam ${bamfile} \ --output-prefix ${outputdir}${sampleID}.reference \ --threads 64 \ --pileup-mode count \ --ref $reffile \ --modsites-mode reference Next, we processed the results using the following step-by-step approach. Input data and preprocessing. CpG methylation data were processed using pb-CpG-tools v3 counts. For each sample (blood, tumor, and optional normal tissue), we read the output files containing per-CpG methylated and unmethylated read counts. Only genomic coordinates and count fields were retained. BED inputs were treated as 0-based, half-open intervals, while GTF/GFF inputs were converted from 1-based closed to 0-based half-open coordinates. To ensure base-level comparability across samples, CpG sites were merged using exact coordinate matching. Sites failing to meet minimum coverage thresholds (default ≥10 reads per sample) were excluded from downstream testing. Differentially methylated cytosine (DMC) calling Blood-only mode. In settings where only blood was available, or when explicitly requested, we performed single-sample DMC calling using an exact binomial test against a user-specified null methylation probability (p-naught equals 0.5 by default). For each CpG, we calculated the methylation proportion beta equals mod divided by the quantity mod plus unmod, and evaluated deviation from the null. Resulting p-values were FDR-corrected using the Benjamini–Hochberg method. CpGs with absolute value of delta beta greater than or equal to 0.20 and q less than or equal to 0.05 were labeled as significantly hypo- or hyper-methylated. Two-sample contrasts. For pairwise comparisons—blood vs tumor (T–B), blood vs normal (N– B), or tumor vs normal (T–N)—CpG sites present in both samples were tested using Fisher's exact test on 2 by 2 tables of methylated vs unmethylated counts. Effect sizes were defined as delta beta equals beta subscript 2 minus beta subscript 1, and significance was assessed using FDR-corrected p-values. Significant CpGs (absolute value of delta beta greater than or equal to 0.20, q less than or equal to 0.05) were classified as hyper- or hypo-methylated relative to the first sample. All statistical tests were parallelized when requested, and used conservative handling of edge cases (e.g., zero-coverage rows → p = 1.0). Differentially Methylated Region (DMR) Construction. Significant CpGs from each contrast were clustered into DMRs using a greedy genomic clustering algorithm. CpGs appearing within less than or equal to 250 bp of each other (default) were merged into the same DMR when all contributing CpGs showed methylation changes in the same direction (hyper or hypo). .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted December 31, 2025. ; https://doi.org/10.64898/2025.12.31.697157doi: bioRxiv preprint 20 Clusters were retained if they contained greater than or equal to 3 CpGs and the mean absolute value of delta beta exceeded a user-defined threshold (typically greater than or equal to 0.20). Each DMR was annotated with: - genomic span - CpG count - mean delta beta - minimum q-value - hyper/hypo direction - DMRs were exported in BED and table formats. Triad-Based Tumor-Intrinsic Methylation Analysis. When blood (B), tumor (T), and matched normal tissue (N) were all available, we performed a triad analysis integrating the T–B, T–N, and N–B contrasts. Triad merging and metric computation. For each CpG appearing in any contrast, values from all contrasts were merged. We computed three tumor-specificity metrics: - comp_ratio equals absolute value of delta beta N-B divided by absolute value of delta beta T-B - Quantifies the fraction of tumor–blood difference explainable by normal tissue. - tumor_frac equals absolute value of delta beta T-N divided by absolute value of delta beta T-B - Estimates the proportion of tumor–blood effect originating from tumor tissue. - Tumor Specificity Index (TSI) equals the quantity absolute value of delta beta T-N minus absolute value of delta beta N-B, divided by the quantity absolute value of delta beta T-N plus absolute value of delta beta N-B - Ranges from negative 1 to 1, with positive values favoring tumor-specific alterations. Classification of CpG sites. Each CpG was assigned to one of four categories based on significance patterns, effect sizes, and the above metrics: - Tumor-intrinsic hyper/hypo (high-confidence T–N signal, low normal contribution, strong tumor fraction or TSI) - Blood-driven changes (normal–blood differences dominate over tumor–normal) - Mixed effects (comparable contributions of tumor and normal or discordant directions) - Insufficient evidence (failure to meet thresholds) High-confidence tumor-intrinsic CpGs required absolute value of delta beta T-N greater than or equal to 0.30 and q T-N less than or equal to 0.01, unless otherwise specified. Tumor-intrinsic DMRs. If requested, tumor-intrinsic CpGs were clustered (TN-anchored) into final tumor-intrinsic DMRs using the same clustering strategy as above. Genomic element and structural variant annotation. We supplied genomic annotations (BED/GTF) (canFam4) and structural variants (BED, BEDPE, or VCF). All DMRs were annotated with their minimum distance to each feature class, using a sweep-line interval matching algorithm optimized for large genome-wide sets. Distances were classified as: - Overlap (distance = 0) - Proximal (0 1 kb or no nearby interval) For BEDPE structural variants, both breakpoints were treated as independent 1-bp intervals. An annotated DMR table was produced containing all distances across all feature types and SVs. .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted December 31, 2025. ; https://doi.org/10.64898/2025.12.31.697157doi: bioRxiv preprint 21 ecDNA detection We used CReSIL (v1.2.0) (50) to detect ecDNAs on tumor tissue and, when available, matched blood and/or normal stromal controls. cresil trim -t 32 -fq ${fastqfile} -r $refmmi -o ${outputdir}${sampleID}.trim Then, we processed cresil ecDNA predictions using the following step-by-step process. CReSIL output tables require the fields id, merge_region, and merge_len, with additional columns (e.g., numreads, coverage, totalbase, ctc) incorporated when present. Tumor data were required for all analyses, whereas blood and normal controls were optional. All tables were imported using a delimiter-agnostic parser that supports both plain text and gzip-compressed files. For each CReSIL record, genomic intervals were parsed from the merge_region field, normalized, and merged into non-overlapping segments. A minimum ecDNA size threshold (default: ≥100 kb) was applied to filter out low-confidence or sub-threshold circles. Cross-Sample Subtraction and Control Handling. To identify tumor-specific ecDNA, we applied a bin-based subtraction strategy. Each circle’s merged intervals were converted into genomic bins of user-specified size (default: 1 kb), enabling fast comparison of presence/absence across samples. Tumor circles were removed if any of their genomic bins overlapped those from available controls. Presence in controls (present_in_B, present_in_N) was recorded for all circles, regardless of whether the corresponding control sample was provided. When blood and/or normal stroma were supplied, the pipeline also generated Blood-only and Normal-only structural callsets by subtracting genomic bins found in other available samples (tumor and the alternate control). These structural sets were produced irrespective of whether copy-number data were available for controls. Tumor quality control metrics. Tumor-level quality control (QC) metrics were computed directly from CReSIL outputs. For each tumor circle, reads-per-kilobase (reads_per_kb) was calculated as numreads / (merge_len / 1000). Summary statistics including the total number of eccDNAs, total supporting reads, mean circle length, total bases, and average coverage were generated and written to {prefix}.tumor_qc_summary.txt. The top N tumor circles ranked by reads_per_kb (default: N=10) were reported in {prefix}.tumor_topReadsPerKb.tsv. Copy-number integration and ecDNA tier classification. Sample-specific copy-number information was incorporated using bedGraph files for the tumor (required) and optionally for blood and normal controls. bedGraph files were standardized to four columns (chrom, start, end, CN) and indexed per chromosome for efficient overlap queries. For each circle, copy-number metrics were computed by intersecting merged intervals with CN segments. Weighted mean CN, weighted median CN, maximum CN, the number of CN bins overlapped, and total base pairs intersected were calculated. ecDNA were classified into three confidence tiers according to circle size and CN metrics: - Tier 1 (High-confidence ecDNA): size ≥500 kb and median CN ≥4. - Tier 2 (Probable ecDNA): size ≥100 kb with median CN ≥4, or size ≥100 kb with maximum CN ≥6. - Below CN threshold: circles that did not meet Tier 1 or Tier 2 criteria. Tumor CN-augmented tables were written to {prefix}.withCN.tsv. Control CN-augmented and high-confidence lists were produced only when control CN bedGraphs were provided. .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted December 31, 2025. ; https://doi.org/10.64898/2025.12.31.697157doi: bioRxiv preprint 22 Generation of final tumor ecDNA callset. The final tumor ecDNA set consisted of all Tier 1 and Tier 2 circles. For each final circle, merged genomic intervals were exported as BED format ({prefix}.final_mergedIntervals.bed). Equivalent outputs were generated for blood and normal controls whenever high-confidence calls were available. Gene, promoter overlap annotation. Gene annotations were incorporated using either a user- supplied canFam4 GTF/refFlat file or by downloading the UCSC canFam4 RefSeq GTF. Gene models were collapsed per gene to obtain minimal genomic spans. Promoter intervals were supplied as a BED file or generated de novo from transcription start sites (TSS), using an upstream/downstream window (default: −2000 bp / +200 bp). Enhancer annotations were optionally provided as BED files. For each final ecDNA, merged intervals were intersected with genes, promoters, and enhancers. Analogous files were produced for blood and normal ecDNA when available. Tumor cell frequency estimation. Tumor cell support for each ecDNA was calculated using one of two approaches: - Direct per-circle cell support file (--cells-t), supporting formats: - long form: id, cell_id - counts: id, n_cells - lists: id, supporting_cells Per-cell overlap inference, using individual cell CReSIL outputs and Jaccard-based presence detection. A circle was considered present in a cell if the Jaccard index exceeded a threshold (default: ≥0.20) and overlapping base pairs exceeded a minimum (default: ≥10 kb). Circle-level cell counts were optionally normalized by total profiled cells (--total-cells-t). When requested, a pre-filled template of circle IDs with zero counts was generated. Summary reporting and output structure. All analysis steps generated a comprehensive summary file ({prefix}.summary.txt) detailing controls used, the number of tumor-only circles, Tier 1 and Tier 2 counts, promoter/enhancer hits, cell-support information, and all output file paths. ddPCR Validation The CNV of CTNNB1 was verified using droplet digital polymerase chain reaction (ddPCR). A reaction mix of 20 ng gDNA, 11 ul of Supermix for probes (No dUTP) (Bio-Rad Laboratories), 1.1 ul customized canine CTNNB1 CNV assay (assay ID: dCNS572026389) (Bio-Rad Laboratories), 1.1 ul customized canine RPP30 CNV assay (assay ID: dCNS431780688) (Bio- Rad Laboratories), 5 U of HeaII diluted with CutSmart buffer (New England Biolabs) was added into well of 96-well ddPCR plate and brought up to 22 ul with nuclease-free water. The ddPCR plate was then heat sealed and ran on the Automated Droplet Generator with Droplet Generation Oil for Probes (Bio-Rad Laboratories) following manufacturer’s instructions. After droplet generation, the heat-sealed ddPCR plate with droplets was run on BioRad C1000 Deep-Well Touch Thermal Cycler for PCR (10 minutes at 95 °C, 40 cycles of 94 °C for 30 seconds and 60 °C for 1 minute, then 98 °C for 10 min and cooling to 4°C). Ideal annealing temperature was determined by the prior temperature gradient run using the MSC cell line. Droplets were analyzed using the QX200 Droplet Reader and the QuantaSoft software (Bio-Rad Laboratories). The copy number of CTNNB1 gene was calculated by QuantaSoft, normalized to RPP30

Reference

gene. .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted December 31, 2025. ; https://doi.org/10.64898/2025.12.31.697157doi: bioRxiv preprint 23 PCR Validation Targets To validate predicted breakpoints and junction points in ecDNA candidates, we performed PCR amplification across putative junction sites. For each ecDNA candidate, we designed primer pairs flanking three distinct breakpoints/junction points identified through long-read sequencing. Primers were designed to amplify across the predicted junctions, with one primer binding to each side of the breakpoint, such that successful amplification would only occur if the predicted junction was present in the sample. PCR reactions were performed using Quick-Load® Taq 2X Master Mix (New England Biolabs) according to the manufacturer's protocol. Each 25 μL reaction contained 1 ng of genomic DNA template, 1× PCR buffer and 0.2 μM of each primer. Thermal cycling conditions consisted of initial denaturation at 95°C for 60 seconds, followed by 40-45 cycles of 95°C for 30 seconds, 59- 63°C for 30 seconds (annealing temperature optimized for each primer pair), and 68°C extension time (based on expected product size, 1 minute per kb), with a final extension at 68°C for 5 minutes. PCR products were visualized on 1% agarose gels stained with SYBR Safe DNA Gel Stain. PCR product and sequencing primers were submitted for Sanger Sequencing (QuintaraBio) to confirm the precise breakpoint junctions. Sequencing results were aligned to the reference genome using Snapgene Viewer to verify that the junction sequences matched the predicted breakpoint coordinates. A breakpoint was considered validated if PCR amplification produced a product of the expected size in basepairs and Sanger sequencing confirmed the junction sequence spanning the predicted breakpoint. Optical genome maps de novo assembly - orthogonal validation Single-molecule genome maps of GP1899 were de novo assembled using the assembly pipeline Bionano Solve v3.5, developed by Bionano Genomics with default settings using the command below. In short, a pairwise comparison of DNA molecules (min 250 kbp) was generated to produce the initial consensus genome maps. During an extension step, molecules were aligned to genome maps, and maps were extended based on the molecules aligning past the map ends. Overlapping genome maps were then merged. Extension and merge steps were repeated five times before a final refinement of the genome maps. Clusters of molecules aligned to genome maps with unaligned ends >30 kbp in the extension step were re-assembled to identify all alleles. Resulting files were used for chromothripsis and ecDNA validations. Python Solve3.5.1_01142020/Pipeline/1.0/pipelineCL.py -T 64 -U -j 64 -jp 64 -N 6 -f 0.25 -I 5 -w -c 3 \ -y \ -b $bionano_bnx \ -l $output_dir \ -t Solve3.5.1_01142020/RefAligner/1.0/ \ -a Solve3.5.1_01142020/RefAligner/1.0/optArguments_nonhaplotype_DLE 1_saphyr.xml \ -r $referencefile .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted December 31, 2025. ; https://doi.org/10.64898/2025.12.31.697157doi: bioRxiv preprint 24 Data, Materials, and Software Availability All PacBio HiFi long-read sequencing bam files will be available under NCBI Project ID. Custom codes used are available under https://github.com/yilmazf/DogGenomesAnalyses.

Acknowledgements

Funding was provided by the National Institutes of Health (NIH) grant U24HG007497 (to C.L. and F.Y.) and the Martin J. Gavin Endowment at Connecticut Children’s Medical Center (to C.L.L.). We sincerely thank Jeffrey J. Schoenebeck and Louise Van der Weyden for their thoughtful and critical review of the manuscript draft, which greatly strengthened the final version. We thank Jee Young Kwon, Leigh Maher, and Qihui Zhu for their assistance with the organization of canine samples. We gratefully acknowledge the Genome Technologies Service at The Jackson Laboratory for expert assistance with the work described herein. We also thank the Scientific Services at The Jackson Laboratory for their assistance with this work, and Research IT for providing computational infrastructure. Finally, we acknowledge the generous contributions of the dog owners who provided samples to Colorado State University, as well as the support of Colorado State University—especially Irene Mok and Dr. Susan Lana—in supplying the canine osteosarcoma materials used in this study. Author Contributions C.L. and C.C.L. conceived and supervised the study. F.Y. designed and performed the computational analyses, including long-read sequence alignment, variant calling, structural variant detection, repeat-enrichment analyses, DNA methylation profiling, ecDNA identification, and data integration, and drafted the manuscript. S.A.S. performed PCR validation experiments and curated biosamples. J.D. performed digital droplet PCR validation of copy number alterations in ecDNAs. P.N., and F.H.O. contributed to data analysis. F.Y., S.A.S., F.H.O., P.N., C.C.L., and C.L. contributed to manuscript editing. Competing interests: C. L. is a scientific advisory board member of Nabsys and Genome Insight. .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted December 31, 2025. ; https://doi.org/10.64898/2025.12.31.697157doi: bioRxiv preprint 25

References

1. L. Mirabello, R. J. Troisi, S. A. Savage, Osteosarcoma incidence and survival rates from 1973 to 2004: data from the Surveillance, Epidemiology, and End Results Program. Cancer 115, 1531–1543 (2009). 2. D. J. Harrison, D. S. Geller, J. D. Gill, V. O. Lewis, R. Gorlick, Current and future therapeutic approaches for osteosarcoma. Expert Rev. Anticancer Ther. 18, 39–50 (2018). 3. J. S. Whelan, L. E. Davis, Osteosarcoma, chondrosarcoma, and chordoma. J. Clin. Oncol. 36, 188–193 (2018). 4. S. Simpson, et al., Comparative review of human and canine osteosarcoma: morphology, epidemiology, prognosis, treatment and genetics. Acta Vet. Scand. 59, 71 (2017). 5. J. M. Fenger, C. A. London, W. C. Kisseberth, Canine osteosarcoma: a naturally occurring disease to inform pediatric oncology. ILAR J. 55, 69–85 (2014). 6. M. Paoloni, et al., Canine tumor cross-species genomics uncovers targets linked to osteosarcoma progression. BMC Genomics 10, 625 (2009). 7. E. A. Ostrander, D. L. Dreger, J. M. Evans, Canine cancer genomics: Lessons for canine and human health. Annu. Rev. Anim. Biosci. 7, 449–472 (2019). 8. S. Patkar, et al., Large-scale comparative analysis of canine and human osteosarcomas uncovers conserved clinically relevant tumor microenvironment subtypes. Clin. Cancer Res. 30, 5630–5642 (2024). 9. J. A. Perry, et al., Complementary genomic approaches highlight the PI3K/mTOR pathway as a common vulnerability in osteosarcoma. Proc. Natl. Acad. Sci. U. S. A. 111, E5564–73 (2014). 10. S. N. Gröbner, et al., The landscape of genomic alterations across childhood cancers. Nature 555, 321–327 (2018). 11. L. C. Sayles, et al., Genome-informed targeted therapy for osteosarcoma. Cancer Discov. 9, 46–63 (2019). 12. J. Espejo Valle-Inclan, et al., Ongoing chromothripsis underpins osteosarcoma genome complexity and clonal evolution. Cell 188, 352–370.e22 (2025). 13. S. Rajan, et al., Structurally complex osteosarcoma genomes exhibit limited heterogeneity within individual tumors and across evolutionary time. Cancer Res. Commun. 3, 564–575 (2023). 14. M. D. Kinnaman, et al., Subclonal somatic copy-number alterations emerge and dominate in recurrent osteosarcoma. Cancer Res. 83, 3796–3812 (2023). 15. A. Y. Angstadt, et al., Characterization of canine osteosarcoma by array comparative genomic hybridization and RT-qPCR: signatures of genomic imbalance in canine .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted December 31, 2025. ; https://doi.org/10.64898/2025.12.31.697157doi: bioRxiv preprint 26 osteosarcoma parallel the human counterpart. Genes Chromosomes Cancer 50, 859–874 (2011). 16. K. Megquier, et al., The genomic landscape of canine osteosarcoma cell lines reveals conserved structural complexity and pathway alterations. PLoS One 17, e0274383 (2022). 17. N. H. Brückmann, C. B. Pedersen, H. J. Ditzel, M. F. Gjerstorff, Epigenetic reprogramming of pericentromeric satellite DNA in premalignant and malignant lesions. Mol. Cancer Res. 16, 417–427 (2018). 18. M. F. Gjerstorff, Novel insights into epigenetic reprogramming and destabilization of pericentromeric heterochromatin in cancer. Front. Oncol. 10, 594163 (2020). 19. T. Nakagawa, et al., DNA hypomethylation on pericentromeric satellite regions significantly correlates with loss of heterozygosity on chromosome 9 in urothelial carcinomas. J. Urol. 173, 243–246 (2005). 20. A. Mendez-Bermudez, et al., Selective pericentromeric heterochromatin dismantling caused by TP53 activation during senescence. Nucleic Acids Res. 50, 7493–7510 (2022). 21. H. Kim, et al., Extrachromosomal DNA is associated with oncogene amplification and poor outcome across multiple cancers. Nat. Genet. 52, 891–897 (2020). 22. R. G. W. Verhaak, V. Bafna, P. S. Mischel, Extrachromosomal oncogene amplification in tumour pathogenesis and evolution. Nat. Rev. Cancer 19, 283–288 (2019). 23. C. López-Ginés, C. Carda-Batalla, L. López-Terrada, A. Llombart-Bosch, Presence of double minutes and monosomy 17p in xenografted human osteosarcomas. Cancer Genet. Cytogenet. 90, 57–62 (1996). 24. J. Szymanska, et al., Ring chromosomes in parosteal osteosarcoma contain sequences from 12q13-15: a combined cytogenetic and comparative genomic hybridization study. Genes Chromosomes Cancer 16, 31–34 (1996). 25. F. Dujardin, et al., MDM2 and CDK4 immunohistochemistry is a valuable tool in the differential diagnosis of low-grade osteosarcomas and other primary fibro-osseous lesions of the bone. Mod. Pathol. 24, 624–637 (2011). 26. E. Yi, R. Chamorro González, A. G. Henssen, R. G. W. Verhaak, Extrachromosomal DNA amplifications in cancer. Nat. Rev. Genet. 23, 760–771 (2022). 27. H. L. Gardner, et al., Canine osteosarcoma genome sequencing identifies recurrent mutations in DMD and the histone methyltransferase gene SETD2. Commun. Biol. 2, 266 (2019). 28. I. Martincorena, et al., Universal patterns of selection in cancer and somatic tissues. Cell 173, 1823 (2018). 29. D. Weghorn, S. Sunyaev, Bayesian inference of negative and positive selection in human .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted December 31, 2025. ; https://doi.org/10.64898/2025.12.31.697157doi: bioRxiv preprint 27 cancers. Nat. Genet. 49, 1785–1788 (2017). 30. F. Alfieri, G. Caravagna, M. H. Schaefer, Cancer genomes tolerate deleterious coding mutations through somatic copy number amplifications of wild-type regions. Nat. Commun. 14, 3594 (2023). 31. S. Sakthikumar, et al., SETD2 is recurrently mutated in whole-exome sequenced canine osteosarcoma. Cancer Res. 78, 3421–3431 (2018). 32. L. B. Alexandrov, et al., Signatures of mutational processes in human cancer. Nature 500, 415–421 (2013). 33. A. Viel, et al., A specific mutational signature associated with DNA 8-oxoguanine persistence in MUTYH-defective colorectal cancer. EBioMedicine 20, 39–49 (2017). 34. M. Kovac, et al., Exome sequencing of osteosarcoma reveals mutation signatures reminiscent of BRCA deficiency. Nat. Commun. 6, 8940 (2015). 35. Y. Jiang, et al., Multi-omics analysis identifies osteosarcoma subtypes with distinct prognosis indicating stratified treatment. Nat. Commun. 13, 7207 (2022). 36. Y. Yang, et al., Genomic and transcriptomic characterization of pre-operative chemotherapy response in patients with osteosarcoma. Research Square (2023). 37. L. Xie, et al., The clinical implications of tumor mutational burden in osteosarcoma. Front. Oncol. 10, 595527 (2020). 38. X. Chen, et al., Recurrent somatic structural variations contribute to tumorigenesis in pediatric osteosarcoma. Cell Rep. 7, 104–112 (2014). 39. Z. Ji, J. Shen, Y. Lan, Q. Yi, H. Liu, Targeting signaling pathways in osteosarcoma: Mechanisms and clinical studies. MedComm 4, e308 (2023). 40. B. K. Nirala, T. Yamamichi, J. T. Yustein, Deciphering the signaling mechanisms of osteosarcoma tumorigenesis. Int. J. Mol. Sci. 24, 11367 (2023). 41. S. Sakthikumar, et al., Genomic analysis across 53 canine cancer types reveals novel mutations and high clinical actionability potential. Vet. Comp. Oncol. 22, 30–41 (2024). 42. C. Wang, et al., A novel canine reference genome resolves genomic architecture and uncovers transcript complexity. Commun. Biol. 4, 185 (2021). 43. R. Poplin, et al., A universal SNP and small-indel variant caller using deep neural networks. Nat. Biotechnol. 36, 983–987 (2018). 44. J. M. Holt, et al., HiPhase: jointly phasing small, structural, and tandem repeat variants from HiFi sequencing. Bioinformatics 40 (2024). 45. M. Martin, et al., WhatsHap: fast and accurate read-based phasing. bioRxiv 085050 (2016). .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted December 31, 2025. ; https://doi.org/10.64898/2025.12.31.697157doi: bioRxiv preprint 28 46. Z. Zheng, et al., ClairS: a deep-learning method for long-read somatic small variant calling. bioRxiv (2023). 47. J. Park, et al., Accurate somatic small variant discovery for multiple sequencing technologies with DeepSomatic. Nat. Biotechnol. (2025). https://doi.org/10.1038/s41587- 025-02839-x. 48. H. Elrick, et al., SAVANA: reliable analysis of somatic structural variants and copy number aberrations using long-read sequencing. Nat. Methods 22, 1436–1446 (2025). 49. A. G. Keskus, et al., Severus detects somatic structural variation and complex rearrangements in cancer genomes using long-read sequencing. Nat. Biotechnol. (2025). https://doi.org/10.1038/s41587-025-02618-8. 50. V. Wanchai, et al., CReSIL: accurate identification of extrachromosomal circular DNA from long-read sequences. Brief. Bioinform. 23 (2022). Figure Captions Fig. 1. Somatic SNV and indel landscapes in four canine osteosarcomas. (A) Total number of somatic single nucleotide variants (SNVs) per tumor. (B) Counts of somatic SNVs stratified by base substitution class (C>A, C>G, C>T, T>A, T>C, T>G) for each tumor. Transver sions (C>A, C>G, T>A, T>G) are more frequent than transitions (C>T, T>C) in all four tumors (binomial test, p 10 bp). Single -base deletions (DEL_1bp) are the most common class overall (ranging from 31 to 399), followed by single-base insertions (INS_1bp). Larger indels (>10bp) are rare. Sample GP1899 exhibits substantially higher indel counts than the other tumors. (D) Recurrently mutated genes showing coding variants stacked by variant type. Bar segments are subdivided by variant type (SNV in orange, insertions in blue hatched, deletions in teal hatched). Fig. 2. Structural variant architecture and inferred formation mechanisms in four canine osteosarcomas. (A) Circos plot summarizing somatic structural variants (SVs) across all autosomes and the X chromosome for tumors Ro1759, GP1899, GR2023, and GD2044. Concentric rings (outer to inner) show: chromosome ideograms, total SV density, breakend (BND) density, and per class densities for deletions (DEL), insertions (INS), duplications (DUP), inversions (INV), and copy number variant SVs (CNV). Colored arcs repr esent BND links connecting rearranged loci. Each tumor is plotted in a distinct color (legend, top right). (B) Distribution of complex SV classes in each tumor. Bars show counts of BND-derived architectures including templated insertions (Templated_ins), t emplated insertions with inversions (Templated_ins_inv), complex inversions (complex_inv), duplicated inverted segments (dup_inv_segment), foldback inversions, inversion -translocation events (inv_tra), reciprocal inversions with associated deletions (recip rocal_inv_del), tandem duplications, and reciprocal translocations (reciprocal_tra). (C) Proportional contribution of inferred SV formation mechanisms by tumor. Each cell shows the fraction of SVs assigned to a given mechanism class: Fork Stalling and Template Switching/Microhomology-Mediated Break-Induced Replication-like (FoSTeS_MMBIR_like), Microhomology-Mediated End Joining-like (MMEJ_like), Non-Allelic Homologous Recombination with transposable elements direct (NAHR_TE_direct), NAHR non - .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted December 31, 2025. ; https://doi.org/10.64898/2025.12.31.697157doi: bioRxiv preprint 29 TE direct (NAHR_nonTE_direct), Non-Homologous End Joining-like (NHEJ_like), TE pair other, and Unclassified. (D) Focal view of a chromothripsis region on CFA19 in tumor GP1899. The Circos plot displays CFA19 (outer ring) and its rearrangement partners (CFA 1, CFA4, CFA6, CFA10, CFA13, CFA17, CFA27, CFA28, CFA38, and CFAX). Genomic coordinates are marked every 5 Mb. From outer to inner, tracks depict copy number profile along CFA19 (heatmap; copy number 0 -10), SV classes on CFA19 [deletions (DEL, orange), dup lications (DUP, blue), inversions (INV, green), insertions (INS, red), and breakends/translocations (BND/TRA, purple)]. Arcs are connecting clustered breakpoints and are colored by cluster assignment where present. Partner chromosomes are color -coded with ideograms shown around the periphery. This pattern of dense, localized rearrangements and oscillating copy number is consistent with chromothripsis centered on CFA19. Fig. 3. Enrichment of different variant types in repetitive genomic elements across dog samples. (A) Heatmap showing log2 fold enrichment of SNVs (single nucleotide variants), indels (insertions/deletions), and SVs (structural variants) across different repeat classes (simple repeats, SINE, SAT1_CF, and LINE) in four samples (Ro1759, GP1899, GR2023, GD2044). Asterisks indicate statistical significance (*p < 0.05, **p < 0.01, ***p < 0.001). (B) Fold enrichment of SNVs (blue) and SVs (orange) in SAT1_CF satellite repeats across the four samples. SNVs show consistently higher enrichment (19.3-46.3×) compared to SVs (2.4-44.1×), with GP1899 showing the highest SNV enrichment. (C) Fold enrichment of SNVs in L1_Canis1 (LINE -1) elements across samples, showing modest but consistent enrichment (1.7-1.8×) with statistical significance (**p < 0.01, *p < 0.05) in all samples. (D) Enrichment of different microsatellite motifs at indel sites. (TTTA)n and (AATA)n repeats show the highest enrichment (up to 170× and 79×, respectively), while (AT)n and (TA)n motifs show lower enrichment levels (10 -20×) across samples. Fig. 4. Extrachromosomal DNA (ecDNA) burden, oncogene content, and architecture in osteosarcoma. (A) Relationship between ecDNA size (kb) and copy number. Each point represents a circular amplicon detected across the three ecDNA positive tumors (Ro1759, GP1899, GD2044). The x-axis shows ecDNA size and the y-axis shows estimated copy number. Tier 1 ecDNA (high- confidence circular amplicons) are shown in red and Tier 2 probable ecDNA are shown in blue. The dot size is proportional to the size of ecDNAs. (B) Per -tumor ecDNA burden. Stacked bars indicate the number of ecDNAs detected in each tumor (Ro1759, GP1899, and GD2044), stratified by Tier 1 (red) and Tier 2 probable (blue) elements. (C) Architecture of a representative Tier 1 ecDNA (ec63) from tumor Ro1759. Circ ular plots depict a 3.07 Mb ecDNA comprising 13 genomic segments (legend, bottom) with a median copy number of ~4 (maximum 4.8). The outer track shows the reference chromosomal coordinates for the contributing segments. Inner annotation tracks mark gene fe atures overlapped by the ecDNA (CDS, 5' UTR, 3' UTR, exons, introns, gene-only regions). Gene labels highlight notable loci carried on this ecDNA, including CTNNB1, TGFBR2, and neighboring genes. Inward colored vertical lines denote segment boundaries and junctions, illustrating the rearranged structure of the circular amplicon. .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted December 31, 2025. ; https://doi.org/10.64898/2025.12.31.697157doi: bioRxiv preprint .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted December 31, 2025. ; https://doi.org/10.64898/2025.12.31.697157doi: bioRxiv preprint .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted December 31, 2025. ; https://doi.org/10.64898/2025.12.31.697157doi: bioRxiv preprint .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted December 31, 2025. ; https://doi.org/10.64898/2025.12.31.697157doi: bioRxiv preprint .CC-BY-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted December 31, 2025. ; https://doi.org/10.64898/2025.12.31.697157doi: bioRxiv preprint

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

⚙ Ask this paper AI returns verbatim quotes from the full text · source: oa-pdf ⓘ

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2025) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc: last seen: 2026-05-20T01:45:00.602351+00:00
unpaywall: last seen: 2026-05-24T02:00:01.246996+00:00

License: CC-BY-ND-4.0