REPLAY: A reproducible and user-friendly application for DNA replication timing analysis from Repli-seq data

preprint OA: closed
Full text 43,454 characters · extracted from oa-html · 10 sections · click to expand

Abstract

Background DNA replication timing (RT) is a fundamental feature of genome organization that is regulated in a cell-type-specific manner and frequently altered in disease. Repli-seq is the standard approach for genome-wide RT profiling; however, its analysis typically requires multiple independent tools and custom scripts, limiting reproducibility, portability, and accessibility, particularly for users without computational expertise. In addition, existing workflows often lack standardization and require substantial user intervention.

Results

We developed REPLAY, a fully automated, reproducible, and user-friendly application for replication timing analysis. REPLAY is distributed as a standalone executable that enables end-to-end processing from compressed FASTQ files to genome-wide RT profiles without requiring software installation or programming experience. Through an intuitive graphical interface, users can configure analysis parameters, including input and output directories, reference genome, normalization strategy (quantile, median, or interquartile range), and smoothing. The application integrates all processing steps—quality control, trimming, alignment, binning, RT log2 calculation, normalization, smoothing, and visualization—within a single automated workflow. Application of REPLAY to publicly available datasets demonstrate accurate reconstruction of RT profiles and high reproducibility across samples.

Conclusions

REPLAY offers a portable, reproducible, and accessible solution for the analysis of RT data. By eliminating the need for command-line tools and complex installations, it lowers the entry barrier enabling standardized analysis across diverse research settings.

Keywords

DNA replication timing, Repli-seq, genomics, epigenome, automated workflow, Apptainer, reproducibility

Background

DNA replication in eukaryotic cells is a highly coordinated process that follows a defined temporal program known as replication timing (RT), where distinct genomic regions replicate at specific stages of S-phase [1–4]. This program is fundamentally linked to the functional organization of the nucleus, strongly correlated with 3D genome architecture, chromatin epigenetic states, and transcriptional activity [3, 5–8]. RT is highly cell-type specific, with approximately half of the genome undergoing dynamic reorganization during development to coordinate with cell fate specification [9–11]. Furthermore, aberrations in the RT program are linked to genome instability [12–16] and are a hallmark of disease, such as cancer [17–21], highlighting the importance of RT for genomic function. The current gold standard for genome-wide RT profiling is Repli-seq [21–24]. In the standard “Early/Late” (E/L) Repli-seq, cells are pulse-labeled with a nucleotide analog (typically BrdU), sorted into early and late S-phase fractions via fluorescence-activated cell sorting (FACS), and processed for high-throughput sequencing [21, 22, 25, 26]. The resulting data is analyzed to generate a log2 ratio of early-to-late enrichment, providing a comprehensive map of replication domains across the genome [22, 27]. Despite the widespread utility of Repli-seq, bioinformatic analysis remains a significant bottleneck. Existing workflows are often fragmented, requiring researchers to manually combine independent tools for quality control, adapter trimming, alignment, filtering, and normalization [28–30]. This process demands substantial expertise in Unix and R environments, creating a steep entry barrier for non-computational researchers. Moreover, the lack of standardized software environments often leads to limited portability and reproducibility across different computational platforms, as variations in tool versions and normalization strategies can introduce inconsistencies between studies. To address these limitations, we developed REPLAY, a fully automated, reproducible, and portable application designed to streamline Repli-seq data analysis. REPLAY integrates the entire processing workflow—from raw FASTQ files to normalized RT profiles—in an executable application that exploits the Snakemake [31] workflow management system. To ensure absolute reproducibility across diverse computational environments, including local workstations and High-Performance Computing (HPC) clusters, the application is distributed via an integrated pyinstaller executable and an Apptainer (Singularity) container [32]. Furthermore, REPLAY features a user-friendly graphical user interface, allowing researchers to execute complex bioinformatic tasks without programming expertise. By providing a unified and accessible application, REPLAY facilitates the standardization of RT analysis and its broader adoption across the research community. IMPLEMENTATION Software architecture and application design REPLAY is distributed as a standalone executable application, allowing users to perform complete RT analysis without installing dependencies or interacting with command-line tools. The application integrates all workflow components within a unified environment, simplifying deployment across different operating systems. We implemented REPLAY using Snakemake, a Python-based workflow management system that ensures task execution order, parallelization, and re-entrancy [31]. The complete execution workflow is illustrated in Supplementary Figure 1. To overcome dependency conflicts and ensure reproducibility across distinct platforms, the entire Snakemake workflow is encapsulated within an Apptainer (formerly Singularity) container [32, 33]. This containerization approach isolates the software environment, ensuring that all necessary dependencies are included and that the workflow operates consistently irrespective of the underlying operating system. This methodology is particularly advantageous for execution on High-Performance Computing (HPC) clusters, as well as in local workstation computers. REPLAY implements a complete workflow that fully automates the Repli-seq data processing, including sequencing read trimming and filtering, alignment to the reference genome, data binning, calculation of RT values (Log2[E/L]), normalization and smoothing (Figure 1A). Users can execute the application without requiring administrative privileges or manually installing bioinformatics tools, thereby simplifying the deployment process. Moreover, our portable application includes a pyinstaller executable implementing a graphical user interface (GUI), developed using the PySide6 module (Qt for Python) for cross-platform GUI applications, that simplifies the workflow and facilitates the processing of Repli-seq data, specifically designed for users without programming knowledge (Figure 1B). This significantly lowers the barrier to adoption and broadens the application’s usability in diverse research settings. The REPLAY automated workflow generates several key outputs. These outputs include the raw genome-wide RT profiles, which are expressed as RT Log2 ratios, as well as normalized and smoothed RT profiles. In addition to these profiles, the application produces quality and reproducibility metrics, essential for evaluating the reliability and accuracy of the generated data. The outputs, including the RT profiles and quality metrics, provide a clear and comprehensive overview of the analysis results (Figure 1C). Graphical user interface REPLAY offers a user-friendly interface designed to simplify the entire process. The graphical user interface (GUI) was developed using PySide6 (Qt for Python) to enhance accessibility and user experience by configuring and executing the entire workflow without the need for complex command-line interactions. REPLAY-GUI allows users to easily select working and output directories, choose between single-end and pair-end sequencing types, select the reference genome and their preferred normalization and smoothing methods to suit their analysis needs (Figure 2). The platform accepts input files in the form of compressed FASTQ files (.fastq.gz extension). These files are automatically extracted and processed by our software, ensuring a seamless workflow. REPLAY interface is equipped with real-time progress monitoring and output visualization features, which facilitate workflow management and allow users to track the status of their tasks and view progress as they are processed (Figure 2). Additionally, our application automatically scans the files in the input directory, evaluates whether their names are valid for processing (including proper pairing of Early and Late S-phase libraries per sample), and renames the libraries if needed. Users can also process samples from distinct organisms by configuring their corresponding reference genomes, select the desired resolution (bin size), and filter the specific contig sequences to process (Supplementary Figure 2). Smoothing level is also customizable through the advanced settings (Supplementary Figure 3). Finally, the GUI also includes a “Results” section that compiles quality metrics including read mapping statistics, bin coverage, autocorrelation function, raw RT profile visualization, and raw and normalized data distributions (Supplementary Figure 4). Data preprocessing and Alignment The REPLAY application executes a series of pre-established steps to ensure accurate and efficient data processing: Preprocessing: Adapter sequences are identified and removed. REPLAY removes adapters from raw *.fastq.gz files via cutadapt [34]. The cutadapt tool was selected because it supports distinct sequencing technologies, can remove sequences from both 5’ and 3’ ends, incorporates quality trimming, and supports pair-end reads. Moreover, it is commonly used for processing Repli-seq data [21, 22, 28, 29, 35]. REPLAY automatically populates the standard Illumina adapter sequences for single-end or pair-end sequencing data. However, users can input any custom adapter sequences. Alignment: Following preprocessing, sequencing reads are mapped to the reference genome using the Burrows-Wheeler Alignment (BWA-MEM) tool [36]. This tool was chosen for its exceptional efficiency and accuracy in aligning sequencing reads from Repli-seq data [21, 22, 35]. Filtering: Post-alignment processing includes a filtering process to eliminate PCR duplicates and low-quality mappings, specifically those with a mapping quality score (MAPQ) below 30. This step is crucial for maintaining high-fidelity data. REPLAY utilizes SAMtools’ duplicate marking (markdup) feature [37], which was selected for its rapid performance and efficient memory usage. Binning: The genome is partitioned into non-overlapping windows, with the size of these windows determined by user-defined resolutions, such as 5 kb, 10 kb, 20 kb, or 100 kb. REPLAY relies on BEDtools for genomic interval manipulation [38]. Quantification: In the final step, read counts are calculated per window and normalized as Reads Per Kilobase per Million mapped reads (RPKM). This normalization process ensures that the data is comparable across different samples and conditions, facilitating accurate quantitative analysis. The intermediary outputs are bedgraph files containing the reads per bin for each S-phase fraction, which are then processed in the subsequent steps for RT profile calculation. Calculation of RT log2 Ratios RT profiles are determined by analyzing the relative abundance of DNA in early and late S-phase fractions of the cell cycle. To quantify this, REPLAY computes the RPKM per genomic bin in both early versus late S-phase fractions. Next, it calculates the ratio of RPKM values between these two phases. Finally, the software transforms these ratio values into a Log2 scale to facilitate easier comparison and interpretation of the data: The resulting RT profiles are output as bedgraph files that contain the raw RT data across the genome, allowing for direct visualization and downstream analysis. Quality Control and Diagnostics A key feature of REPLAY is the automated generation of comprehensive diagnostic metrics (Figure 3), which is critical for evaluating the integrity and reliability of the data being analyzed. These metrics evaluate data quality according to established field standards through several key analyses: Read Mapping and Filtering Statistics: These statistics provide detailed information on the efficiency and accuracy of aligning sequence reads to the reference genome [21]. They help in understanding how well the reads are mapped and which reads are filtered out, ensuring that only high-quality data is retained for further analysis (Figure 3A). Genome coverage Analysis: This analysis calculates the fraction of genomic windows that have non-zero coverage. For high-quality Repli-seq datasets, a threshold of ≥ 80% coverage is expected [21]. This ensures that a substantial portion of the genome is adequately sequenced, which is essential for accurate RT analyses (Figure 3B). Values lower than 80% indicate insufficient sequencing depth or poor S-phase fractions sorting. Autocorrelation (ACF): The workflow computes the ACF value to assess the spatial consistency between adjacent replication domains [21, 26, 27]. Datasets that achieve an ACF value of ≥ 0.8 are flagged as high-quality (Figure 3C). This metric helps in determining the degree of correlation in the RT data, indicating how well the neighbor replication domains replicate consistently during the S-phase. RT data visualization: REPLAY facilitates rapid visual inspection of the data. These visualizations include raw RT signals that are expected to clearly show enrichment towards early (positive values) and late (negative values) replication (Figure 3D). Raw RT signal distributions are also plotted to inspect potential bias in the data ((Figure 3E). High-quality Repli-seq datasets should show a bimodal distribution with peaks at early and late replication. REPLAY compiles these quality metrics into a “Results” section for rapid user diagnostics (Supplementary Figure 4). These metrics collectively provide valuable insights into the quality of sequencing and the reliability of RT profiles, enabling researchers to make informed decisions regarding the data’s suitability for downstream analyses. Data normalization and smoothing REPLAY provides distinct normalization strategies to address systematic biases and users can select the normalization method. These strategies include: None: No normalization is performed but data can still be smoothed to reduce noise in raw RT signals (see below). Quantile Normalization: This method adjusts the distribution of data points across various samples to achieve equivalent quantiles, thereby facilitating comparability. It is the recommended normalization technique commonly used for Repli-seq analyses [21, 22, 28, 29]. Users have the option to utilize the target dataset provided (generated by averaging multiple high-quality datasets) or select their own target dataset for normalization purposes. Median Normalization: This approach normalizes data by adjusting each sample’s median to a common value, thereby mitigating the impact of outliers. Interquartile Range (IQR)-based Normalization: This technique normalizes data by scaling it based on the interquartile range, which is the difference between the 75th and 25th percentiles. This approach effectively reduces the influence of extreme values on the data. Additionally, REPLAY applies smoothing to further enhance data quality and reduce noise. One option is using Locally Estimated Scatterplot Smoothing (LOESS), a non-parametric approach that fits multiple regression models within localized subsets of the data [40]. This smoothing is performed separately for each chromosome, allowing the method to account for chromosome-specific variations. This LOESS smoothing interpolates the data through a low-degree polynomial fitting function to remove stochastic outliers. REPLAY uses a recommended setting of a span = 500 kilobases (kb). This window size is optimized for mammalian genomes to provide a flexible and robust way to smooth the signal while preserving a clear signal of early and late replication domains [21]. However, users can specify the desired window size to adjust the levels of smoothing (Supplementary Figure 3). Users may also select gaussian smoothing to apply a 1-D gaussian filter across the dataset or to apply no smoothing. By applying these normalization and smoothing techniques, REPLAY provides RT profiles ready for further analysis and interpretation (Figure 4). Normalized and smoothed RT profiles are saved as bedGraph tracks, which can be readily visualized using standard tools or directly loaded in the Integrative Genomics Viewer (IGV) [41] or UCSC [42] browsers. This output format allows researchers to inspect and interpret the RT landscape across the genome, providing an accessible and intuitive way to verify data quality and identify biological patterns. Reproducibility and software portability Although distributed as an executable application, REPLAY internally leverages Snakemake [31] workflow management and a containerized environment to ensure reproducibility and consistent performance across platform systems. This ensures that the workflow can be adapted to different hardware and software configurations. All dependencies required for the workflow are encapsulated within an Apptainer container [32], providing a secure and portable way to package software and its dependencies, ensuring consistent behavior across different platforms and operating systems. This design choice allows REPLAY to be executed on both high-performance computing systems and local workstations, producing identical results from the same inputs provided (Figure 5). To facilitate widespread adoption, REPLAY has been designed to operate seamlessly across all major operating systems (Table 1). The application is natively compiled for the x86–64 architecture, ensuring optimal compatibility with the server-grade hardware standard in genomic research. Windows users can launch REPLAY via the Windows Subsystem for Linux (WSL2), which provides a seamless, Linux-native execution experience through the graphical user interface. Similarly, REPLAY supports macOS on x86–64 (Intel) systems through virtualization (Table 1). Thus, REPLAY simplifies complex command-line-based bioinformatics tools by employing virtualization and containerization techniques. Table 1. | Operating System | Implementation | User Experience | |---|---|---| | Linux | Native Executable | Native: Double-click to launch; utilizes local Apptainer/Singularity. | | Windows | Windows Subsystem for Linux (WSL2) | Runs within the WSL2 environment with full GUI support. | | macOS (x86-64 architecture) | Linux Virtual Machine (VM) | Operates via VM (e.g., VirtualBox, Lima or Parallels) to bridge Linux-specific dependencies |

Results

Generation of RT profiles and validation To assess the performance of REPLAY, we analyzed publicly available Repli-seq datasets generated across multiple human cell types. Specifically, we used datasets produced by the 4D Nucleome Project [39, 43], which provide high-quality measurements of early and late S-phase fractions. REPLAY successfully generated genome-wide RT profiles from raw sequencing data. To evaluate accuracy, we compared REPLAY-derived RT profiles to reference profiles computed using the 4D Nucleome processing pipeline [28]. This comparison demonstrated strong concordance across genomic regions (Figure 6A–B). In addition, analysis of independent biological replicates revealed high reproducibility, with RT profiles showing strong agreement across samples within the same cell type (Figure 6C). Distinct RT patterns observed between different cell types further confirm the ability of REPLAY to capture biologically meaningful variation in RT programs. Together, these results demonstrate that REPLAY accurately reconstructs genome-wide RT profiles and produces robust, reproducible outputs across datasets. Comparison with existing tools and workflow Current approaches for Repli-seq data analysis rely on a combination of independent tools and partially automated workflows, often requiring substantial user intervention and computational expertise. Pipelines developed by large-scale efforts, including those from the ENCODE and the 4D Nucleome Project [28, 29], implement standardized processing steps such as read trimming, alignment, filtering, and signal generation. However, these workflows typically produce intermediate outputs (e.g., coverage or RPKM tracks) and do not include downstream steps such as normalization, smoothing, or final RT profile generation. As a result, additional processing and user intervention are required to obtain fully interpretable RT profiles. Other implementations from individual laboratories [44–50], are primarily command-line based and consist on multi-step workflows requiring user coordination across processing steps. Available tools greatly vary in scope and functionality. For example, START-R [30] provides an interactive web interface but is primarily designed for visualization, requires preprocessed genomic signal files, and does not support complete end-to-end processing of Repli-seq data. Workflow-based implementations such as Repli-seq-nf [50] improve automation but require manual configuration of parameters, command-line execution, and familiarity with workflow management systems. Because these tools vary in scope, level of automation, and required user intervention, direct benchmarking across methods is not straightforward. Thus, we compared their capabilities and level of integration (Table 1). REPLAY distinguishes itself by integrating all steps of replication timing analysis within a single standalone application, enabling fully automated processing from raw FASTQ files to normalized RT profiles. Unlike traditional pipelines that require the manual setup of Conda environments, R libraries, and workflow managers, REPLAY is distributed as a standalone executable that includes all necessary dependencies. Through its graphical interface, users can configure parameters and execute analyses without command-line interaction or manual intervention. As summarized in Table 1, REPLAY is the only solution that combines end-to-end processing, full automation, and a graphical user interface within a unified framework.

Discussion

REPLAY addresses key limitations in current approaches to RT analysis by providing a unified, reproducible, and user-friendly application. Existing workflows typically rely on combinations of independent tools for quality control, alignment, and downstream processing, often requiring command-line execution and complex software environments [21, 22, 28, 29]. In contrast, REPLAY is distributed as a standalone executable that enables end-to-end analysis from raw FASTQ files to finalized RT profiles without requiring installation or programming expertise. By integrating all processing steps within a graphical interface, REPLAY reduces variability introduced by custom scripting and promotes reproducibility across laboratories. A key feature of REPLAY is its level of integration and automation. While pipelines developed by large-scale efforts such as the ENCODE and the 4D Nucleome Project implement standardized preprocessing steps, they typically generate intermediate outputs (e.g., coverage or RPKM tracks) and do not include downstream normalization, smoothing, or final RT profile generation. As a result, additional processing and user intervention are required to obtain biologically interpretable RT profiles. Other tools address specific aspects of the workflow but differ in scope and level of automation, making direct benchmarking challenging. By contrast, REPLAY integrates all steps within a single framework, enabling fully automated and standardized analysis. In addition to usability, REPLAY provides substantial improvements in computational efficiency and portability. End-to-end analyses are completed in less than one hour for a human sample, compared to substantially longer runtimes reported for multi-step workflows. These gains are achieved without compromising accuracy, as RT profiles generated by REPLAY show high concordance with independently derived profiles (Pearson’s r ≥ 0.98). Furthermore, containerization using Apptainer (Singularity) ensures a consistent computational environment, enabling reproducible results across platforms ranging from local workstations to high-performance computing (HPC) systems. The inclusion of a graphical user interface (GUI) implemented with PySide6 further expands accessibility. By allowing users to configure parameters such as bin size, normalization method, and smoothing without command-line interaction, REPLAY lowers the entry barrier for researchers without computational expertise and facilitates broader adoption of RT profiling. The current version of REPLAY is optimized for standard early/late (E/L) Repli-seq data. Future developments will focus on extending this framework to support multi-fraction Repli-seq, enabling higher temporal resolution and improved detection of replication initiation and termination zones [35, 51]. In addition, extending REPLAY to single-cell Repli-seq data analysis will provide a standardized framework for investigating cell-to-cell variability in RT. This is particularly relevant given recent conflicting reports on the establishment of RT programs during early mammalian development [15, 52–54], highlighting the need for robust computational approaches that incorporate stringent quality control, minimize analytical biases, and identify technical artifacts that can confound the interpretation of single-cell RT data [55]. Finally, the modular and containerized design of REPLAY provides a foundation for integration with emerging multi-omic approaches, such as PARTAGE, which enables the joint profiling of copy number variation, replication timing, and gene expression from the same sample [35]. Together, these features position REPLAY as a comprehensive and accessible framework for standardized RT analysis across diverse experimental and computational settings.

Conclusions

REPLAY provides an integrated and user-friendly solution for Repli-seq data analysis, bridging the gap between complex bioinformatics workflows and experimental research needs. By combining automated workflow management, containerized and portable software environments, and an intuitive graphical interface within a standalone application, REPLAY enables end-to-end generation of high-quality replication timing profiles with minimal user intervention. This framework facilitates reproducible and standardized analyses, supporting broader adoption of replication timing studies and improving consistency across datasets and laboratories. Supplementary Material Table 2. Comparison of REPLAY with representative tools and workflows for Repli-seq data analysis. | REPLAY | ENCODE Repli-seq workflow | 4D nucleome Repli-seq workflow | START-R | Repliseq-nf | | |---|---|---|---|---|---| | End-to-end processing | Complete (FASTQ → RT profiles) | No | No | No | Partial | | User interface | Graphical user interface (GUI) | Command line | Command line | Web-based interface | Command line | | Automation level | Fully automated (Snakemake) | Manual | Manual | Partial | Partial (user-configured) | | Workflow | Fully automated (Snakemake) | Multi-step workflow (script-based) | Multi-step workflow (script-based) | R Shinny App | Nextflow | | Programming skills required | No | Yes | Yes | No | Yes | | Containerization | Apptainer | None | Docker | Docker | Docker/Apptainer | | Input data | Compressed FASTQ | FASTQ | FASTQ | Preprocessed BedGraph | FASTQ | | Output data | Normalized and smoothed RT, QC metrics | Intermediate signal tracks | Coverage / RPKM signal | Normalized RT | Normalized RT | | Normalization | User-configurable | Not included (requires downstream processing) | Not included (requires downstream processing) | User-configurable | Quantile | | Smoothing | User-configurable | Not included (requires downstream processing) | Not included (requires downstream processing) | User-configurable | User-configurable | | Integrated QC metrics | Mapping, coverage, ACF, RT distributions | Standard FASTQC | Not provided | Limited | Standard FASTQC | | Primary use | End-to-end analysis | Data generation | Data generation | Visualization | Partial pipeline |

Acknowledgements

The authors acknowledge the Minnesota Supercomputing Institute (MSI) at the University of Minnesota for providing resources that contributed to the research results reported within this paper. During the preparation of this work the authors used GPT-5.3 in order to assist with the development, refinement and debugging of the included code. The authors reviewed and edited the code as needed and take full responsibility for the content of the publication. FUNDING SOURCES This work was supported by NIH/NIGMS grant R35GM137950 to J.C.R.M.; and institutional support from the University of Minnesota to J.C.R.M. Funding Statement This work was supported by NIH/NIGMS grant R35GM137950 to J.C.R.M.; and institutional support from the University of Minnesota to J.C.R.M. AVAILABILITY AND REQUIREMENTS Project name: REPLAY Project home page: https://github.com/Rivera-Mulia-Lab/REPLAY Operating system(s): Linux, Windows (via WSL2), and macOS (via virtualization) Programming language: Python, Bash. Other requirements: Processor (CPU): x86-64 architecture. Minimum 2 cores. 6 cores or higher recommended. The Snakemake backend automatically parallelizes tasks based on available threads. Memory (RAM): 10 GB minimum. 32 GB or higher is recommended for processing large mammalian genomes (e.g., human or mouse) at high resolutions (< 10 kb bins). - Storage: - Containers: ~1 GB for the executable and Apptainer/Singularity image. - Genome + Index: ~8.3 GB for hg38 - Input Data: Variable (depends on FASTQ size). - Output Data: Assume at least ~13x input for intermediate files. e.g. 1.21GB fastq.gz -> 14.8GB trimmed reads/bam/bedgraphs License: GPL-3.0.

References

- 1.Rivera-Mulia JC, Gilbert DM. Replicating Large Genomes: Divide and Conquer. Mol Cell. 2016;62:756–65. 10.1016/j.molcel.2016.05.007. [DOI] [PMC free article] [PubMed] [Google Scholar] - 2.Rhind N. DNA replication timing: Biochemical mechanisms and biological significance. Bioessays. 2022;:e2200097. 10.1002/bies.202200097. [DOI] [PMC free article] [PubMed] [Google Scholar] - 3.Vouzas AE, Gilbert DM. Replication timing and transcriptional control: beyond cause and effect - part IV. Curr Opin Genet Dev. 2023;79:102031. 10.1016/j.gde.2023.102031. [DOI] [PMC free article] [PubMed] [Google Scholar] - 4.Hyrien O, Guilbaud G, Krude T. The double life of mammalian DNA replication origins. Genes Dev. 2025;39:304–24. 10.1101/gad.352227.124. [DOI] [PMC free article] [PubMed] [Google Scholar] - 5.Rivera-Mulia JC, Gilbert DM. Replication timing and transcriptional control: beyond cause and effect-part III. Curr Opin Cell Biol. 2016;40:168–78. 10.1016/j.ceb.2016.03.022. [DOI] [PMC free article] [PubMed] [Google Scholar] - 6.Poulet A, Li B, Dubos T, Rivera-Mulia JC, Gilbert DM, Qin ZS. RT States: systematic annotation of the human genome using cell type-specific replication timing programs. Bioinformatics. 2018;35:2167–76. 10.1093/bioinformatics/bty957. [DOI] [Google Scholar] - 7.Liu Y, Zhangding Z, Liu X, Hu J. Chromatin-centric insights into DNA replication. Trends Genet. 2025;41:412–24. 10.1016/j.tig.2024.12.003. [DOI] [PubMed] [Google Scholar] - 8.Vouzas AE, Sasaki T, Rivera-Mulia JC, Turner JL, Brown AN, Alexander KE, et al. Transcription elongation can be sufficient, but is not necessary, to advance replication timing. EMBO Rep. 2026;:1–36. 10.1038/s44319-026-00735-2. [DOI] [PMC free article] [PubMed] [Google Scholar] - 9.Rivera-Mulia JC, Buckley Q, Sasaki T, Zimmerman J, Didier RA, Nazor K, et al. Dynamic changes in replication timing and gene expression during lineage specification of human pluripotent stem cells. Genome Res. 2015;25:1091–103. 10.1101/gr.187989.114. [DOI] [PMC free article] [PubMed] [Google Scholar] - 10.Rivera-Mulia JC, Kim S, Gabr H, Chakraborty A, Ay F, Kahveci T, et al. Replication timing networks reveal a link between transcription regulatory circuits and replication timing control. Genome Res. 2019;29:1415–28. 10.1101/gr.247049.118. [DOI] [PMC free article] [PubMed] [Google Scholar] - 11.ENCODE Project Consortium, Moore JE, Purcaro MJ, Pratt HE, Epstein CB, Shoresh N, et al. Expanded encyclopaedias of DNA elements in the human and mouse genomes. Nature. 2020;583:699–710. 10.1038/s41586-020-2493-4. [DOI] [PMC free article] [PubMed] [Google Scholar] - 12.Kodali S, Meyer-Nava S, Landry S, Chakraborty A, Rivera-Mulia JC, Feng W. Epigenomic signatures associated with spontaneous and replication stress-induced DNA double strand breaks. Front Genet. 2022;13:907547. 10.3389/fgene.2022.907547. [DOI] [PMC free article] [PubMed] [Google Scholar] - 13.Sarni D, Sasaki T, Irony Tur-Sinai M, Miron K, Rivera-Mulia JC, Magnuson B, et al. 3D genome organization contributes to genome instability at fragile sites. Nat Commun. 2020;11:3613. 10.1038/s41467-020-17448-2. [DOI] [PMC free article] [PubMed] [Google Scholar] - 14.Sakamoto M, Hori S, Yamamoto A, Yoneda T, Kuriya K, Takebayashi S-I. scRepli-Seq: A Powerful Tool to Study Replication Timing and Genome Instability. Cytogenet Genome Res. 2022;:1–10. 10.1159/000527168. [DOI] [PubMed] [Google Scholar] - 15.Takahashi S, Kyogoku H, Hayakawa T, Miura H, Oji A, Kondo Y, et al. Embryonic genome instability upon DNA replication timing program emergence. Nature. 2024;633:686–94. 10.1038/s41586-024-07841-y. [DOI] [PMC free article] [PubMed] [Google Scholar] - 16.Berkemeier F, Cook PR, Boemo MA. DNA replication timing reveals genome-wide features of transcription and fragility. Nat Commun. 2025;16:4658. 10.1038/s41467-025-59991-w. [DOI] [PMC free article] [PubMed] [Google Scholar] - 17.Rivera-Mulia JC, Desprat R, Trevilla-Garcia C, Cornacchia D, Schwerer H, Sasaki T, et al. DNA replication timing alterations identify common markers between distinct progeroid diseases. Proc Natl Acad Sci U S A. 2017;114:E10972–80. 10.1073/pnas.1711613114. [DOI] [PMC free article] [PubMed] [Google Scholar] - 18.Sasaki T, Rivera-Mulia JC, Vera D, Zimmerman J, Das S, Padget M, et al. Stability of patient-specific features of altered DNA replication timing in xenografts of primary human acute lymphoblastic leukemia. Exp Hematol. 2017;51:71–82.e3. 10.1016/j.exphem.2017.04.004. [DOI] [PMC free article] [PubMed] [Google Scholar] - 19.Dixon JR, Xu J, Dileep V, Zhan Y, Song F, Le VT, et al. Integrative detection and analysis of structural variation in cancer genomes. Nat Genet. 2018;50:1388–98. 10.1038/s41588-018-0195-8. [DOI] [PMC free article] [PubMed] [Google Scholar] - 20.Rivera-Mulia JC, Sasaki T, Trevilla-Garcia C, Nakamichi N, Knapp DJHF, Hammond CA, et al. Replication timing alterations in leukemia affect clinically relevant chromosome domains. Blood Advances. 2019;3:3201–13. 10.1182/bloodadvances.2019000641. [DOI] [PMC free article] [PubMed] [Google Scholar] - 21.Rivera-Mulia JC, Trevilla-Garcia C, Martinez-Cifuentes S. Optimized Repli-seq: improved DNA replication timing analysis by next-generation sequencing. Chromosome Res. 2022. 10.1007/s10577-022-09703-7. [DOI] [Google Scholar] - 22.Marchal C, Sasaki T, Vera D, Wilson K, Sima J, Rivera-Mulia JC, et al. Genome-wide analysis of replication timing by next-generation sequencing with E/L Repli-seq. Nat Protoc. 2018;13:819–39. 10.1038/nprot.2017.148. [DOI] [PMC free article] [PubMed] [Google Scholar] - 23.Hulke ML, Massey DJ, Koren A. Genomic methods for measuring DNA replication dynamics. Chromosome Res. 2019;9:1–19. 10.1007/s10577-019-09624-y. [DOI] [Google Scholar] - 24.Wheeler E, Mickelson-Young L, Wear EE, Burroughs M, Bass HW, Concia L, et al. A comparison of genomic methods to assess DNA replication timing. Sci Rep. 2025;15:17761. 10.1038/s41598-025-02699-0. [DOI] [PMC free article] [PubMed] [Google Scholar] - 25.Meyer-Nava S, Shetty AV, and Rivera-Mulia JC. Repli-seq Sample Preparation using Cell Sorting with Cell-Permeant Dyes. Curr Protoc. 2023;3. 10.1002/cpz1.945. [DOI] [Google Scholar] - 26.4DN Standard 2-stage Repli-seq protocol. https://data.4dnucleome.org/protocols/5e017160-ad8b-49c1-a129-ccbd29966b6d/. Accessed 13 Apr 2026. - 27.Ryba T, Battaglia D, Pope BD, Hiratani I, Gilbert DM. Genome-scale analysis of replication timing: from bench to bioinformatics. Nat Protoc. 2011;6:870–95. 10.1038/nprot.2011.328. [DOI] [PMC free article] [PubMed] [Google Scholar] - 28.4D Nucleome Repli-seq Processing Pipeline. https://data.4dnucleome.org/resources/data-analysis/repli-seq-processing-pipeline. Accessed 13 Apr 2026. - 29.ENCODE Repli-seq Processing Pipeline. https://www.encodeproject.org/pipelines/ENCPL734EDH/. Accessed 13 Apr 2026. - 30.Hadjadj D, Denecker T, Guérin E, Kim S-J, Fauchereau F, Baldacci G, et al. Efficient, quick and easy-to-use DNA replication timing analysis with START-R suite. NAR Genom Bioinform. 2020;2. 10.1093/nargab/lqaa045. [DOI] [Google Scholar] - 31.Mölder F, Jablonski KP, Letcher B, Hall MB, van Dyken PC, Tomkins-Tinch CH, et al. Sustainable data analysis with Snakemake. F1000Res. 2021;10:33. 10.12688/f1000research.29032.3. [DOI] [PMC free article] [PubMed] [Google Scholar] - 32.Kurtzer GM, Sochat V, Bauer MW. Singularity: Scientific containers for mobility of compute. PLoS One. 2017;12:e0177459. 10.1371/journal.pone.0177459. [DOI] [PMC free article] [PubMed] [Google Scholar] - 33.Kurtzer GM, cclerget, Bauer M, Kaneshiro I, Trudgian D, Godlove D. hpcng/singularity: Singularity 3.7.3. Zenodo; 2021. 10.5281/ZENODO.1310023. [DOI] [Google Scholar] - 34.Martin M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet.journal. 2011;17:10–2. 10.14806/ej.17.1.200. [DOI] [Google Scholar] - 35.Sadu Murari LS, Dickinson Q, Meyer-Nava S, Rivera-Mulia JC. Parallel analysis of replication timing, gene expression, and copy number with PARTAGE. Genome Res. 2026;:gr.281532.125. 10.1101/gr.281532.125. [DOI] [Google Scholar] - 36.Li H, Durbin R. Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics. 2010;26:589–95. 10.1093/bioinformatics/btp698. [DOI] [PMC free article] [PubMed] [Google Scholar] - 37.Danecek P, Bonfield JK, Liddle J, Marshall J, Ohan V, Pollard MO, et al. Twelve years of SAMtools and BCFtools. Gigascience. 2021;10. 10.1093/gigascience/giab008. [DOI] [Google Scholar] - 38.Quinlan AR, Hall IM. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 2010;26:841–2. 10.1093/bioinformatics/btq033. [DOI] [PMC free article] [PubMed] [Google Scholar] - 39.Reiff SB, Schroeder AJ, Kırlı K, Cosolo A, Bakker C, Lee S, et al. The 4D Nucleome Data Portal as a resource for searching and visualizing curated nucleomics data. Nat Commun. 2022;13:2365. 10.1038/s41467-022-29697-4. [DOI] [PMC free article] [PubMed] [Google Scholar] - 40.Cleveland WS, Devlin SJ. Locally weighted regression: An approach to regression analysis by local fitting. J Am Stat Assoc. 1988;83:596. 10.2307/2289282. [DOI] [Google Scholar] - 41.Thorvaldsdóttir H, Robinson JT, Mesirov JP. Integrative Genomics Viewer (IGV): high- performance genomics data visualization and exploration. Brief Bioinform. 2013;14:178–92. 10.1093/bib/bbs017. [DOI] [PMC free article] [PubMed] [Google Scholar] - 42.Casper J, Speir ML, Raney BJ, Perez G, Nassar LR, Lee CM, et al. The UCSC Genome Browser database: 2026 update. Nucleic Acids Res. 2026;54:D1331–5. 10.1093/nar/gkaf1250. [DOI] [PMC free article] [PubMed] [Google Scholar] - 43.Dekker J, Oksuz BA, Zhang Y, Wang Y, Minsk MK, Kuang S, et al. An integrated view of the structure and function of the human 4D nucleome. Nature. 2026;649:759–76. 10.1038/s41586-025-09890-3. [DOI] [PMC free article] [PubMed] [Google Scholar] - 44.Marchal C. Repli-seq. Github. https://github.com/ClaireMarchal/repli-seq [Google Scholar] - 45.Vera D. shart: Scripts for the High-throughput Analysis of Replication Timing. Github. https://github.com/dvera/shart [Google Scholar] - 46.Chen team at I. Curie. RepliSeq: R package for the analysis of Repli-Seq data to study DNA replication timing program: From count matrices, data normalization to replication timing calculation. Github. https://github.com/CL-CHEN-Lab/RepliSeq [Google Scholar] - 47.Rausch T. repliseq: Repli-Seq analysis pipeline. Github. https://github.com/tobiasrausch/repliseq [Google Scholar] - 48.repliseq_pipeline: Bioinformatic pipeline for the analysis of Repli-seq data. Github. https://github.com/CSOgroup/repliseq_pipeline [Google Scholar] - 49.RepliTimer: Step-by-step instructions and Snakemake pipeline for processing Replication Timing Data. Github. https://github.com/SansamLab-Pipelines-Genomics/RepliTimer [Google Scholar] - 50.Repliseq-nf at v0.1.0. Github. https://github.com/PavriLab/repliseq-nf/tree/v0.1.0 [Google Scholar] - 51.Zhao PA, Sasaki T, Gilbert DM. High-resolution Repli-Seq defines the temporal choreography of initiation, elongation and termination of replication in mammalian cells. Genome Biol. 2020;21:76. 10.1186/s13059-020-01983-8. [DOI] [PMC free article] [PubMed] [Google Scholar] - 52.Halliwell JA, Martin-Gonzalez J, Hashim A, Dahl JA, Hoffmann ER, Lerdrup M. Sex-specific DNA-replication in the early mammalian embryo. Nat Commun. 2024;15:6323. 10.1038/s41467-024-50727-w. [DOI] [PMC free article] [PubMed] [Google Scholar] - 53.Nakatani T, Schauer T, Altamirano-Pacheco L, Klein KN, Ettinger A, Pal M, et al. Emergence of replication timing during early mammalian development. Nature. 2023;:1–9. 10.1038/s41586-023-06872-1. [DOI] [Google Scholar] - 54.Xu S, Wang N, Zuccaro MV, Gerhardt J, Iyyappan R, Scatolin GN, et al. DNA replication in early mammalian embryos is patterned, predisposing lamina-associated regions to fragility. Nat Commun. 2024;15:5247. 10.1038/s41467-024-49565-7. [DOI] [PMC free article] [PubMed] [Google Scholar] - 55.Rivera-Mulia JC. The long-standing relationship between replication timing, gene expression, and chromatin accessibility is maintained in early mouse embryogenesis. bioRxiv. 2025;:2025.11.10.687227. 10.1101/2025.11.10.687227. [DOI] [Google Scholar] Associated Data This section collects any data citations, data availability statements, or supplementary materials included in this article. Supplementary Materials Data Availability Statement Project name: REPLAY Project home page: https://github.com/Rivera-Mulia-Lab/REPLAY Operating system(s): Linux, Windows (via WSL2), and macOS (via virtualization) Programming language: Python, Bash. Other requirements: Processor (CPU): x86-64 architecture. Minimum 2 cores. 6 cores or higher recommended. The Snakemake backend automatically parallelizes tasks based on available threads. Memory (RAM): 10 GB minimum. 32 GB or higher is recommended for processing large mammalian genomes (e.g., human or mouse) at high resolutions (< 10 kb bins). - Storage: - Containers: ~1 GB for the executable and Apptainer/Singularity image. - Genome + Index: ~8.3 GB for hg38 - Input Data: Variable (depends on FASTQ size). - Output Data: Assume at least ~13x input for intermediate files. e.g. 1.21GB fastq.gz -> 14.8GB trimmed reads/bam/bedgraphs License: GPL-3.0.

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

Ask this paper AI returns verbatim quotes from the full text · source: oa-html

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2026) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc
last seen: 2026-05-20T01:45:00.602351+00:00
unpaywall
last seen: 2026-06-20T06:35:16.286784+00:00