A Reproducible and Unified Benchmark of Deep Learning Feature Selection Across Simulations and Multi-Omics datasets

preprint OA: closed
Full text JSON View at publisher
Full text 14,395 characters · extracted from preprint-html · click to expand
A Reproducible and Unified Benchmark of Deep Learning Feature Selection Across Simulations and Multi-Omics datasets | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Article A Reproducible and Unified Benchmark of Deep Learning Feature Selection Across Simulations and Multi-Omics datasets Yalu Wen, QINGYU MENG, Xiaoyan Sun, Ning Li, Long Liu, Deqiang Zheng This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-8390237/v1 This work is licensed under a CC BY 4.0 License Status: Under Review Version 1 posted You are reading this latest preprint version Abstract Reliable feature selection is critical for extracting small, interpretable biomarker panels from high-dimensional omics, yet deep learning–based feature selection methods have rarely been compared systematically. We present a unified and reproducible benchmark of post-hoc explainers—attribution (GradientSHAP, and DeepLIFT), perturbation (LIME, Feature Ablation, and Occlusion) and embedded selectors (CancelOut, EAR-FS, and GRACES). Using standardized preprocessing, a shared neural network, and Optuna tuning, we evaluate feature selection performance and computational efficiency in simulations and downstream predictive accuracy on diverse real datasets, including gene expression datasets with binary outcomes, five TCGA projects spanning mRNA, methylation, and SNP modalities with multi-class outcomes, and ADNI gene expression with continuous neuroimaging phenotypes. EAR-FS and GRACES consistently perform best: GRACES is most robust but computationally intensive, whereas EAR-FS achieves similar accuracy with much lower computational cost. Classical methods remain competitive when signals are sparse and near-linear. Post-hoc explainers contribute most as interpretability tools and model auditors rather than as primary subset selectors. To enable reproducibility and broad adoption, we provide a software hub and website implementing all these methods with standardized pipelines and evaluation routines, facilitating efficient feature selection under typical constraints of budget, sample size, and turnaround time. Biological sciences/Computational biology and bioinformatics/Machine learning Biological sciences/Computational biology and bioinformatics/Computational models Biological sciences/Computational biology and bioinformatics/Statistical methods Biological sciences/Computational biology and bioinformatics/Standards Biological sciences/Computational biology and bioinformatics/Software Full Text Additional Declarations There is NO Competing Interest. Supplementary Files supplementaryinformation.pdf Supplementary file Cite Share Download PDF Status: Under Review Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-8390237","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Article","associatedPublications":[],"authors":[{"id":564180416,"identity":"45fe10a3-f713-4b50-89b8-e6907a311607","order_by":0,"name":"Yalu Wen","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAA30lEQVRIie3QPQrCMByH4X8pOCV2VYroEVoy+nWVhl5AFxEUTBHiUugaQTxDp7q2BHRRL+DoBYSOLjajS2o3h7yQIcPDLwTAZPrbltVBNgNo/UyuilhNiMWbEC9H/nN+nNAkLqJytoC+w7B81RBC9llIxS3auuIOvsjbgacnOHNxZtPUsZiLOVgpIC+oIac3PmwU2b4rMlUkr1uxMZM0xRFXK7QiPtORrnRKF50vRFwLPkT3TigkIjoB7QunJVqvekkcygdajMbJLibaHxvI73sHwNZuAPS1zzaZTCaT6gMS50QcVlmDrAAAAABJRU5ErkJggg==","orcid":"","institution":"Department of Statistics, University of Auckland","correspondingAuthor":true,"prefix":"","firstName":"Yalu","middleName":"","lastName":"Wen","suffix":""},{"id":564180417,"identity":"e12b5d80-678c-4921-95df-12ccb9786e6c","order_by":1,"name":"QINGYU MENG","email":"","orcid":"","institution":"Department of Statistics, University of Auckland","correspondingAuthor":false,"prefix":"","firstName":"QINGYU","middleName":"","lastName":"MENG","suffix":""},{"id":564180418,"identity":"fb36037d-49f1-4605-ba72-4fdd06c04386","order_by":2,"name":"Xiaoyan Sun","email":"","orcid":"","institution":"Department of Statistics, University of Auckland","correspondingAuthor":false,"prefix":"","firstName":"Xiaoyan","middleName":"","lastName":"Sun","suffix":""},{"id":564180419,"identity":"90d98567-0ca0-453c-a916-22a09a94a4d2","order_by":3,"name":"Ning Li","email":"","orcid":"","institution":"Department of Epidemiology and Health Statistics, School of Public Health, Capital Medical University","correspondingAuthor":false,"prefix":"","firstName":"Ning","middleName":"","lastName":"Li","suffix":""},{"id":564180420,"identity":"84aaee77-b5f6-4511-88e5-f19fb82b2813","order_by":4,"name":"Long Liu","email":"","orcid":"","institution":"Department of Health Statistics, School of Public Health, Binzhou Medical University","correspondingAuthor":false,"prefix":"","firstName":"Long","middleName":"","lastName":"Liu","suffix":""},{"id":564180421,"identity":"07326a09-66e4-4585-a7e1-3d61c068877f","order_by":5,"name":"Deqiang Zheng","email":"","orcid":"","institution":"Department of Epidemiology and Health Statistics, Capital Medical University School of Public Health","correspondingAuthor":false,"prefix":"","firstName":"Deqiang","middleName":"","lastName":"Zheng","suffix":""}],"badges":[],"createdAt":"2025-12-18 02:40:15","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-8390237/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-8390237/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":100812975,"identity":"e047b553-6a91-4235-b84d-527f7cea4881","added_by":"auto","created_at":"2026-01-21 16:00:11","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":530616,"visible":true,"origin":"","legend":"","description":"","filename":"main.pdf","url":"https://assets-eu.researchsquare.com/files/rs-8390237/v1/0c4a0cb1206eb3276bb43316.pdf"},{"id":100949388,"identity":"13279c5a-d986-418b-80d6-8ac0be329fab","added_by":"auto","created_at":"2026-01-23 07:01:37","extension":"json","order_by":1,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":7965,"visible":true,"origin":"","legend":"","description":"","filename":"NCOMMS25102314.json","url":"https://assets-eu.researchsquare.com/files/rs-8390237/v1/73aa0f6f972e16345d5bc9ae.json"},{"id":100812976,"identity":"58b35c18-2e99-47a2-b2b3-5fa380730183","added_by":"auto","created_at":"2026-01-21 16:00:11","extension":"pdf","order_by":2,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":548778,"visible":true,"origin":"","legend":"","description":"","filename":"supplementaryinformation.pdf","url":"https://assets-eu.researchsquare.com/files/rs-8390237/v1/144d99f2d63fcd1aec75f4de.pdf"},{"id":100952597,"identity":"c3d8c956-f18e-47be-8522-ea77b752d8f1","added_by":"auto","created_at":"2026-01-23 07:17:16","extension":"pdf","order_by":1,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":539187,"visible":true,"origin":"","legend":"Article File","description":"","filename":"main.pdf","url":"https://assets-eu.researchsquare.com/files/rs-8390237/v1_covered_e15304d8-c171-4018-a8ef-add9259a4fe6.pdf"},{"id":100858292,"identity":"6b4cf458-9c9d-43e3-bd3c-0427d4acb2be","added_by":"auto","created_at":"2026-01-22 07:24:09","extension":"pdf","order_by":1,"title":"","display":"","copyAsset":false,"role":"supplement","size":548778,"visible":true,"origin":"","legend":"Supplementary file","description":"","filename":"supplementaryinformation.pdf","url":"https://assets-eu.researchsquare.com/files/rs-8390237/v1/000c31aa4a86ab1ae4226b66.pdf"}],"financialInterests":"There is \u003cb\u003eNO\u003c/b\u003e Competing Interest.","formattedTitle":"A Reproducible and Unified Benchmark of Deep Learning Feature Selection Across Simulations and Multi-Omics datasets","fulltext":[],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":false,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":true,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":true,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":true,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"nature-portfolio","isNatureJournal":true,"hasQc":false,"allowDirectSubmit":false,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"","title":"Nature Portfolio","twitterHandle":"","acdcEnabled":false,"dfaEnabled":false,"editorialSystem":"ejp","reportingPortfolio":"","inReviewEnabled":true,"inReviewRevisionsEnabled":false},"keywords":"","lastPublishedDoi":"10.21203/rs.3.rs-8390237/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-8390237/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"Reliable feature selection is critical for extracting small, interpretable biomarker panels from high-dimensional omics, yet deep learning–based feature selection methods have rarely been compared systematically. We present a unified and reproducible benchmark of post-hoc explainers—attribution (GradientSHAP, and DeepLIFT), perturbation (LIME, Feature Ablation, and Occlusion) and embedded selectors (CancelOut, EAR-FS, and GRACES). Using standardized preprocessing, a shared neural network, and Optuna tuning, we evaluate feature selection performance and computational efficiency in simulations and downstream predictive accuracy on diverse real datasets, including gene expression datasets with binary outcomes, five TCGA projects spanning mRNA, methylation, and SNP modalities with multi-class outcomes, and ADNI gene expression with continuous neuroimaging phenotypes. EAR-FS and GRACES consistently perform best: GRACES is most robust but computationally intensive, whereas EAR-FS achieves similar accuracy with much lower computational cost. Classical methods remain competitive when signals are sparse and near-linear. Post-hoc explainers contribute most as interpretability tools and model auditors rather than as primary subset selectors. To enable reproducibility and broad adoption, we provide a software hub and website implementing all these methods with standardized pipelines and evaluation routines, facilitating efficient feature selection under typical constraints of budget, sample size, and turnaround time.","manuscriptTitle":"A Reproducible and Unified Benchmark of Deep Learning Feature Selection Across Simulations and Multi-Omics datasets","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2026-01-21 16:00:00","doi":"10.21203/rs.3.rs-8390237/v1","editorialEvents":[],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"nature-communications","isNatureJournal":true,"hasQc":false,"allowDirectSubmit":false,"externalIdentity":"NCOMMS","sideBox":"Learn more about [Nature Communications](http://www.nature.com/ncomms/)","snPcode":"","submissionUrl":"https://mts-ncomms.nature.com/","title":"Nature Communications","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"ejp","reportingPortfolio":"Nature Communications","inReviewEnabled":true,"inReviewRevisionsEnabled":false}}],"origin":"","ownerIdentity":"fc186204-4080-4454-b9aa-3841e0406a55","owner":[],"postedDate":"January 21st, 2026","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"under-review","subjectAreas":[{"id":60073283,"name":"Biological sciences/Computational biology and bioinformatics/Machine learning"},{"id":60073284,"name":"Biological sciences/Computational biology and bioinformatics/Computational models"},{"id":60073285,"name":"Biological sciences/Computational biology and bioinformatics/Statistical methods"},{"id":60073286,"name":"Biological sciences/Computational biology and bioinformatics/Standards"},{"id":60073287,"name":"Biological sciences/Computational biology and bioinformatics/Software"}],"tags":[],"updatedAt":"2026-03-09T18:31:24+00:00","versionOfRecord":[],"versionCreatedAt":"2026-01-21 16:00:00","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-8390237","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-8390237","identity":"rs-8390237","version":["v1"]},"buildId":"XKTyCvWXoU3ODBz1xrDgd","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

Ask this paper AI returns verbatim quotes from the full text · source: preprint-html

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2026) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc
last seen: 2026-05-20T01:45:00.602351+00:00