ExoShorkie: Predicting RNA-seq coverage of exogenous genomes in yeast by transfer learning

preprint OA: closed
Full text JSON View at publisher
Full text 1,644 characters · extracted from oa-doi-fallback · 2 sections · click to expand

Abstract

Motivation Predicting the RNA-seq coverage of native and exogenous sequences is central to many endogenous and synthetic-biology applications. Substantial progress has been made in developing methods to predict the RNA-seq coverage of native genomic sequences, with the recently developed Shorkie achieving state-of-the-art performance in yeast. However, prediction performance of these methods over the challenging out-of-distribution exogenous DNA is still unknown. Recent studies measured RNA-seq coverage of large exogenous genomes in yeast, providing a unique opportunity to train machine-learning models on a large exogenous sequence space and to improve both prediction performance and our understanding of regulatory mechanisms.

Results

Here, we introduce ExoShorkie, a method we developed by extending Shorkie through transfer learning across multiple exogenous RNA-seq datasets. We demonstrate that ExoShorkie significantly improves prediction performance on held-out exogenous genomes and outperforms both a native-genome-trained Shorkie baseline and Yorzoi, the only competing method in predicting exogenous RNA-seq coverage in yeast, in cross-validation and in leave-one-genome-out evaluations. Furthermore, through interpretability analyses we reveal biologically meaningful regulatory motifs and distinct regulatory rules in exogenous genomes in yeast, providing new insights into transcriptional regulation beyond native genomic contexts. Availability and implementation ExoShorkie is available at https://github.com/OrensteinLab/ExoShorkie. Competing Interest Statement The authors have declared no competing interest.

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

Ask this paper AI returns verbatim quotes from the full text · source: oa-doi-fallback

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2026) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc
last seen: 2026-05-20T01:45:00.602351+00:00