Full text
6,598 characters
· extracted from
preprint-html
· click to expand
Decoding Gene Enhancers with DNA Language Models: A Survey of Methods, Limitations, and Prospects | Authorea try { document.documentElement.classList.add('js'); } catch (e) { } var _gaq = _gaq || []; _gaq.push(['_setAccount', 'G-8VDV14Y67G']); _gaq.push(['_trackPageview']); (function() { var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true; ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js'; var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s); })(); Skip to main content Preprints Collections Wiley Open Research IET Open Research Ecological Society of Japan All Collections About About Authorea FAQs Contact Us Quick Search anywhere Search for preprint articles, keywords, etc. Search Search ADVANCED SEARCH SCROLL This is a preprint and has not been peer reviewed. Data may be preliminary. 23 February 2026 V1 Latest version Share on Decoding Gene Enhancers with DNA Language Models: A Survey of Methods, Limitations, and Prospects Authors : Reyhaneh T Tabesh and Gholamreza Rafiee 0000-0002-1268-031X [email protected] Authors Info & Affiliations https://doi.org/10.22541/au.177188156.63526126/v1 220 views 163 downloads Contents Abstract Supplementary Material Information & Authors Metrics & Citations View Options References Figures Tables Media Share Abstract Gene enhancers are DNA elements located away from the genes they regulate, yet they control when and where those genes are expressed and are crucial for development, cell identity, and disease. Because their sequence patterns are not always strong, their effects can span long genomic distances, and their activity is highly tissue-specific, they are difficult to identify, especially in cancer, where enhancer reprogramming can activate oncogenic programs. This survey traces enhancer prediction methods from convolutional neural networks to foundation-scale genomic language models, focusing on sequence-first pipelines. We examine how design choices in tokenization (fixed k-mers versus subword and adaptive schemes), model architecture, and pretraining objectives influence performance, generalization, and interpretability, and we highlight evaluation challenges related to data leakage, domain shift, calibration, and the biological validity of benchmark tasks. Finally, we outline a roadmap for future enhancer modelling that emphasizes hybrid architectures, multi-omic integration, cancer-specific enhancer reprogramming, and standardized, biologically grounded evaluation frameworks. Together, these directions position genomic language models as a promising, though still maturing, foundation for sequence-based regulatory genomics. Supplementary Material File (gene_enhancer_dna_language_model_survey_rt_rr.pdf) Download 388.01 KB Information & Authors Information Version history V1 Version 1 23 February 2026 Copyright This work is licensed under a Non Exclusive No Reuse License. Keywords deep learning enhancer reprogramming gene enhancement gene enhancers genomic language models multi-omics integration regulatory genomics tokenization strategies Authors Affiliations Reyhaneh T Tabesh Reyhaneh T. Tabesh is an independent researcher Amirkabir University of Technology View all articles by this author Gholamreza Rafiee 0000-0002-1268-031X [email protected] View all articles by this author Metrics & Citations Metrics Article Usage 220 views 163 downloads .FvxKWukQNSOunydq8rnd { width: 100px; } Citations Download citation Reyhaneh T Tabesh, Gholamreza Rafiee. Decoding Gene Enhancers with DNA Language Models: A Survey of Methods, Limitations, and Prospects. Authorea . 23 February 2026. DOI: https://doi.org/10.22541/au.177188156.63526126/v1 If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download. For more information or tips please see 'Downloading to a citation manager' in the Help menu . Format Please select one from the list RIS (ProCite, Reference Manager) EndNote BibTex Medlars RefWorks Direct import Tips for downloading citations document.getElementById('citMgrHelpLink').addEventListener('click', function() { popupHelp(this.href); return false; }); $(".js__slcInclude").on("change", function(e){ if ($(this).val() == 'refworks') $('#direct').prop("checked", false); $('#direct').prop("disabled", ($(this).val() == 'refworks')); }); View Options View options PDF View PDF Figures Tables Media Share Share Share article link Copy Link Copied! Copying failed. Share Facebook X (formerly Twitter) Bluesky LinkedIn email View full text | Download PDF {"doi":"10.22541/au.177188156.63526126/v1","type":"Article"} Now Reading: Share Figures Tables Close figure viewer Back to article Figure title goes here Change zoom level Go to figure location within the article Download figure Toggle share panel Toggle share panel Share Toggle information panel Toggle information panel Go to previous graphic Go to next graphic Go to previous table Go to next table All figures All tables View all material View all material xrefBack.goTo xrefBack.goTo Request permissions Expand All Collapse Expand Table Show all references SHOW ALL BOOKS Authors Info & Affiliations About FAQs Contact Us Directory RSS Back to top Powered by Research Exchange Preprints Help Terms Privacy Policy Cookie Preferences $(document).ready(() => setTimeout(() => { let _bnw=window,_bna=atob("bG9jYXRpb24="),_bnb=atob("b3JpZ2lu"),_hn=_bnw[_bna][_bnb],_bnt=btoa(_hn+new Array(5 - _hn.length % 4).join(" ")); $.get("/resource/lodash?t="+_bnt); },4000)); (function(){function c(){var b=a.contentDocument||a.contentWindow.document;if(b){var d=b.createElement('script');d.innerHTML="window.__CF$cv$params={r:'9fe055e7dcbaad07',t:'MTc3OTE2NDcwMw=='};var a=document.createElement('script');a.src='/cdn-cgi/challenge-platform/scripts/jsd/main.js';document.getElementsByTagName('head')[0].appendChild(a);";b.getElementsByTagName('head')[0].appendChild(d)}}if(document.body){var a=document.createElement('iframe');a.height=1;a.width=1;a.style.position='absolute';a.style.top=0;a.style.left=0;a.style.border='none';a.style.visibility='hidden';document.body.appendChild(a);if('loading'!==document.readyState)c();else if(window.addEventListener)document.addEventListener('DOMContentLoaded',c);else{var e=document.onreadystatechange||function(){};document.onreadystatechange=function(b){e(b);'loading'!==document.readyState&&(document.onreadystatechange=e,c())}}}})();
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.