Benchmarking Machine Learning Methods for Synthetic Lethality Prediction in Cancer | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Article Benchmarking Machine Learning Methods for Synthetic Lethality Prediction in Cancer Jie Zheng, YiMiao Feng, Yahui Long, He Wang, Yang Ouyang, Quan Li, and 1 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-3671637/v1 This work is licensed under a CC BY 4.0 License Status: Published Journal Publication published 20 Oct, 2024 Read the published version in Nature Communications → Version 1 posted You are reading this latest preprint version Abstract Synthetic lethality (SL) is a type of genetic interaction that occurs when defects in two genes cause cell death, while a defect in a single gene does not. Targeting an SL partner of a gene mutated in cancer can selectively kill tumor cells. Traditional wet-lab experiments for SL screening are resource-intensive. Hence, many computational methods have been developed for virtual screening of SL gene pairs. This study benchmarks recent machine learning methods for SL prediction, including three matrix factorization and eight deep learning models. We scrutinize model performance using various data splitting scenarios, negative sample ratios, and negative sampling methods on both classification and ranking tasks to assess the models’ generalizability and robustness. Our benchmark analyzed performance differences among the models and emphasized the importance of data and real-world scenarios. Finally, we suggest future directions to improve machine learning methods for SL discovery in terms of predictive power and interpretability. Biological sciences/Computational biology and bioinformatics/Machine learning Biological sciences/Computational biology and bioinformatics/Data mining Biological sciences/Drug discovery/Target identification Biological sciences/Cancer/Cancer genetics Biological sciences/Genetics/Genetic interaction Full Text Additional Declarations There is NO Competing Interest. Supplementary Files matricstable.xlsx Dataset 1 supplement.pdf Supplementary Materials Cite Share Download PDF Status: Published Journal Publication published 20 Oct, 2024 Read the published version in Nature Communications → Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-3671637","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Article","associatedPublications":[],"authors":[{"id":254727488,"identity":"25cc2c0a-1dc4-44f5-ba2f-e0cdca740a32","order_by":0,"name":"Jie Zheng","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAABMElEQVRIie3Qv0vDQBQH8AuBN72Y9UJa8y+kHJyCv/4VQyAuQR2dNBDIFOqaouC/EBCcTw7qorgWdGiXzEKlRC3oRcyWUEfBfOGOx/E+vOMR0qXLXwytLqTEJFhVgoC6dXW0aBWxoprAr4iKK34IWUWci3g8e+1trrPHh9v58fuzs+bEqiDb/VzoxbSBaJfjA9ZDyvjk0LdHw2KQAPh2RgKWC9hwG4hOQ25TpN7NBF3bSKWWAHIdifRygUAbCNCjRUXOrs/v2YciewmYC0U+WwnSEKwXpPsuUeOwlJ6aAoqIVkJpwG0FB9kk5FtGJP0EAmaj67ORBN5EnMwvrLf01DHVx55wKXeuYjmb48luf3gXF03kewVGWu8vqd+qVekt/VVjWdblsr2rS5cuXf5vvgAuVFu1dyQdpgAAAABJRU5ErkJggg==","orcid":"https://orcid.org/0000-0001-6774-9786","institution":"ShanghaiTech University","correspondingAuthor":true,"prefix":"","firstName":"Jie","middleName":"","lastName":"Zheng","suffix":""},{"id":254727489,"identity":"24646fba-28aa-4e80-9302-4c81b44a0d67","order_by":1,"name":"YiMiao Feng","email":"","orcid":"","institution":"School of Information Science and Technology ShanghaiTech University Lingang Laboratory","correspondingAuthor":false,"prefix":"","firstName":"YiMiao","middleName":"","lastName":"Feng","suffix":""},{"id":254727490,"identity":"926d9916-40b0-43c4-b114-b28cc7458e68","order_by":2,"name":"Yahui Long","email":"","orcid":"","institution":"Agency for Science, Technology and Research (A*STAR)","correspondingAuthor":false,"prefix":"","firstName":"Yahui","middleName":"","lastName":"Long","suffix":""},{"id":254727491,"identity":"fa3ebb1b-49ae-4c75-b174-dbcb0f29a6c4","order_by":3,"name":"He Wang","email":"","orcid":"","institution":"ShanghaiTech University","correspondingAuthor":false,"prefix":"","firstName":"He","middleName":"","lastName":"Wang","suffix":""},{"id":254727492,"identity":"25de4b2c-0fa9-4880-acdc-d678594a18ef","order_by":4,"name":"Yang Ouyang","email":"","orcid":"","institution":"ShanghaiTech University","correspondingAuthor":false,"prefix":"","firstName":"Yang","middleName":"","lastName":"Ouyang","suffix":""},{"id":254727493,"identity":"b372ae6e-f2fa-4d62-a0d7-c3b36342a030","order_by":5,"name":"Quan Li","email":"","orcid":"","institution":"ShanghaiTech University","correspondingAuthor":false,"prefix":"","firstName":"Quan","middleName":"","lastName":"Li","suffix":""},{"id":254727494,"identity":"ed6ec75a-793f-4f87-8b50-fdd27774dd7b","order_by":6,"name":"Min Wu","email":"","orcid":"","institution":"Agency for Science, Technology and Research (A*STAR)","correspondingAuthor":false,"prefix":"","firstName":"Min","middleName":"","lastName":"Wu","suffix":""}],"badges":[],"createdAt":"2023-11-27 10:09:10","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-3671637/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-3671637/v1","draftVersion":[],"editorialEvents":[{"content":"https://doi.org/10.1038/s41467-024-52900-7","type":"published","date":"2024-10-20T04:00:00+00:00"}],"editorialNote":"","failedWorkflow":false,"files":[{"id":67092940,"identity":"e38d40de-a9d5-427d-b7d2-9b184fd94539","added_by":"auto","created_at":"2024-10-21 07:05:35","extension":"pdf","order_by":1,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":3981400,"visible":true,"origin":"","legend":"Article File","description":"","filename":"manuscriptlatest.pdf","url":"https://assets-eu.researchsquare.com/files/rs-3671637/v1_covered_d2ec717b-26c0-4d3c-8117-f688eaf386a4.pdf"},{"id":49199023,"identity":"0f88f99f-f2e4-4479-9887-a594de0a3970","added_by":"auto","created_at":"2024-01-05 03:47:55","extension":"xlsx","order_by":1,"title":"","display":"","copyAsset":false,"role":"supplement","size":177154,"visible":true,"origin":"","legend":"Dataset 1","description":"","filename":"matricstable.xlsx","url":"https://assets-eu.researchsquare.com/files/rs-3671637/v1/91be01bbb0fdae68278a9e11.xlsx"},{"id":49199024,"identity":"680af82d-e61f-4594-b09a-cfd597a7669e","added_by":"auto","created_at":"2024-01-05 03:47:55","extension":"pdf","order_by":2,"title":"","display":"","copyAsset":false,"role":"supplement","size":14008228,"visible":true,"origin":"","legend":"\u003cp\u003eSupplementary Materials\u003c/p\u003e","description":"","filename":"supplement.pdf","url":"https://assets-eu.researchsquare.com/files/rs-3671637/v1/5445d1015be2dd315932d1a8.pdf"}],"financialInterests":"There is \u003cb\u003eNO\u003c/b\u003e Competing Interest.","formattedTitle":"Benchmarking Machine Learning Methods for Synthetic Lethality Prediction in Cancer","fulltext":[],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":false,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":true,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":true,"isAuthorSuppliedPdf":true,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":true,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"nature-portfolio","isNatureJournal":true,"hasQc":false,"allowDirectSubmit":false,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"","title":"Nature Portfolio","twitterHandle":"","acdcEnabled":false,"dfaEnabled":false,"editorialSystem":"ejp","reportingPortfolio":"","inReviewEnabled":true,"inReviewRevisionsEnabled":false},"keywords":"","lastPublishedDoi":"10.21203/rs.3.rs-3671637/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-3671637/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"Synthetic lethality (SL) is a type of genetic interaction that occurs when defects in two genes cause cell death, while a defect in a single gene does not. Targeting an SL partner of a gene mutated in cancer can selectively kill tumor cells. Traditional wet-lab experiments for SL screening are resource-intensive. Hence, many computational methods have been developed for virtual screening of SL gene pairs. This study benchmarks recent machine learning methods for SL prediction, including three matrix factorization and eight deep learning models. We scrutinize model performance using various data splitting scenarios, negative sample ratios, and negative sampling methods on both classification and ranking tasks to assess the models’ generalizability and robustness. Our benchmark analyzed performance differences among the models and emphasized the importance of data and real-world scenarios. Finally, we suggest future directions to improve machine learning methods for SL discovery in terms of predictive power and interpretability.","manuscriptTitle":"Benchmarking Machine Learning Methods for Synthetic Lethality Prediction in Cancer","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2024-01-05 03:47:50","doi":"10.21203/rs.3.rs-3671637/v1","editorialEvents":[],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"nature-communications","isNatureJournal":true,"hasQc":false,"allowDirectSubmit":false,"externalIdentity":"NCOMMS","sideBox":"Learn more about [Nature Communications](http://www.nature.com/ncomms/)","snPcode":"","submissionUrl":"https://mts-ncomms.nature.com/","title":"Nature Communications","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"ejp","reportingPortfolio":"Nature Communications","inReviewEnabled":true,"inReviewRevisionsEnabled":false}}],"origin":"","ownerIdentity":"fb4fbe05-c95a-43af-8297-05301e9aac24","owner":[],"postedDate":"January 5th, 2024","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"published-in-journal","subjectAreas":[{"id":26905956,"name":"Biological sciences/Computational biology and bioinformatics/Machine learning"},{"id":26905957,"name":"Biological sciences/Computational biology and bioinformatics/Data mining"},{"id":26905958,"name":"Biological sciences/Drug discovery/Target identification"},{"id":26905959,"name":"Biological sciences/Cancer/Cancer genetics"},{"id":26905960,"name":"Biological sciences/Genetics/Genetic interaction"}],"tags":[],"updatedAt":"2024-10-21T07:05:26+00:00","versionOfRecord":{"articleIdentity":"rs-3671637","link":"https://doi.org/10.1038/s41467-024-52900-7","journal":{"identity":"nature-communications","isVorOnly":false,"title":"Nature Communications"},"publishedOn":"2024-10-20 04:00:00","publishedOnDateReadable":"October 20th, 2024"},"versionCreatedAt":"2024-01-05 03:47:50","video":"","vorDoi":"10.1038/s41467-024-52900-7","vorDoiUrl":"https://doi.org/10.1038/s41467-024-52900-7","workflowStages":[]},"version":"v1","identity":"rs-3671637","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-3671637","identity":"rs-3671637","version":["v1"]},"buildId":"qtupq5eGEP_6zYnWcrvyt","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.