Supervised Learning for Predicting Unknown Modifying Variables in Pliable Lasso: Applications to High-Dimensional Datasets | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Article Supervised Learning for Predicting Unknown Modifying Variables in Pliable Lasso: Applications to High-Dimensional Datasets Zainab Subhi Mahmood Hawrami, Mehmet Ali Cengiz, Emre Dünder This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-7495915/v1 This work is licensed under a CC BY 4.0 License Status: Published Journal Publication published 23 Feb, 2026 Read the published version in Scientific Reports → Version 1 posted 11 You are reading this latest preprint version Abstract Accurate outcome prediction often requires modeling complex interactions between input features and context-specific modifiers. The pliable lasso is a flexible regression framework that integrates such modifiers into the prediction process. In many real- world applications, however, these modifiers are unobserved at test time and must be estimated. This study investigates the performance of eight supervised machine learning algorithms for estimating the modifier matrix Z in a pliable lasso model under a known-to-unknown scenario. The analysis considers both classification accuracy for modifier estimation and regression accuracy for the final response prediction, using simulated data and two relevant real-world datasets: the Superconductivity dataset and the Mice Protein Expression dataset. Results indicate that tree-based ensemble models (e.g., XGBoost, Random Forest, Decision Tree) deliver superior modifier classification (AUC > 0.99), while regularized models such as Lasso and Elastic Net achieve the best regression performance. The findings support a hybrid modeling approach in which tree-based classifiers estimate modifying variables, followed by regularized regression for accurate and interpretable predictions. This strategy holds promise for data-driven modeling in high-dimensional engineering systems where partial contextual information is available. Physical sciences/Engineering Physical sciences/Mathematics and computing Full Text Additional Declarations No competing interests reported. Cite Share Download PDF Status: Published Journal Publication published 23 Feb, 2026 Read the published version in Scientific Reports → Version 1 posted Reviews received at journal 10 Sep, 2025 Reviewers agreed at journal 10 Sep, 2025 Reviewers agreed at journal 08 Sep, 2025 Reviewers agreed at journal 05 Sep, 2025 Reviewers agreed at journal 05 Sep, 2025 Reviewers agreed at journal 05 Sep, 2025 Reviewers invited by journal 05 Sep, 2025 Editor assigned by journal 05 Sep, 2025 Editor invited by journal 03 Sep, 2025 Submission checks completed at journal 02 Sep, 2025 First submitted to journal 02 Sep, 2025 You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-7495915","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Article","associatedPublications":[],"authors":[{"id":513571810,"identity":"c5d3534f-f558-46e7-8356-7f63141b15df","order_by":0,"name":"Zainab Subhi Mahmood Hawrami","email":"","orcid":"","institution":"Ministry of Higher Education and Scientific Research","correspondingAuthor":false,"prefix":"","firstName":"Zainab","middleName":"Subhi Mahmood","lastName":"Hawrami","suffix":""},{"id":513571811,"identity":"67f73242-4f88-4d25-b918-bc9dc79cb41f","order_by":1,"name":"Mehmet Ali Cengiz","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAA9ElEQVRIiWNgGAWjYBADHgb2xgYoO4FYLTwHSdTCwCABV0lAi2776cQPH3fUysjPfNy64WfOYQZ+9hwDxh81uLWYncndLDnzzHEextmJbTd7tx1mkOx5Y8DMcwyPlgO525h5247xMEsntt3gBWoxuJFjwMzAhkfL+bfbmP8CtbBJHmy7+Reoxf4GyGH/8Gi5AbSFsa2Gh0eCse022BaJHAMG3jZ8Wt5uluxtO8AjwZPYdlt2WzqPxJlnBYd5+/A5LHfjh59tdfby7cef3Xy7zVqOvz1548Mf33BrgYLDcBYPiDhAUAMDQx0RakbBKBgFo2DEAgAE0VW/Ttwa2AAAAABJRU5ErkJggg==","orcid":"","institution":"Imam Mohammad ibn Saud Islamic University","correspondingAuthor":true,"prefix":"","firstName":"Mehmet","middleName":"Ali","lastName":"Cengiz","suffix":""},{"id":513571812,"identity":"f8b137ae-627d-46dd-864a-4847f64abf63","order_by":2,"name":"Emre Dünder","email":"","orcid":"","institution":"Ondokuz Mayıs University","correspondingAuthor":false,"prefix":"","firstName":"Emre","middleName":"","lastName":"Dünder","suffix":""}],"badges":[],"createdAt":"2025-08-30 14:38:16","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-7495915/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-7495915/v1","draftVersion":[],"editorialEvents":[{"content":"https://doi.org/10.1038/s41598-026-36854-y","type":"published","date":"2026-02-23T15:57:19+00:00"}],"editorialNote":"","failedWorkflow":false,"files":[{"id":103765565,"identity":"b6ec9e67-1791-4908-ba8a-fec76dd18858","added_by":"auto","created_at":"2026-03-02 16:04:33","extension":"pdf","order_by":1,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":864000,"visible":true,"origin":"","legend":"","description":"","filename":"articleofSupervisedLearningforPredictingUnknownModifyingVariablesinPliableLasso2.pdf","url":"https://assets-eu.researchsquare.com/files/rs-7495915/v1_covered_2226b307-59f3-4482-b013-7d2ecce8c29b.pdf"}],"financialInterests":"No competing interests reported.","formattedTitle":"Supervised Learning for Predicting Unknown Modifying Variables in Pliable Lasso: Applications to High-Dimensional Datasets","fulltext":[],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":false,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":true,"isAuthorSuppliedPdf":true,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":true,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"scientific-reports","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"scirep","sideBox":"Learn more about [Scientific Reports](http://www.nature.com/srep/)","snPcode":"","submissionUrl":"","title":"Scientific Reports","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"stoa","reportingPortfolio":"Scientific Reports","inReviewEnabled":true,"inReviewRevisionsEnabled":true},"keywords":"","lastPublishedDoi":"10.21203/rs.3.rs-7495915/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-7495915/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eAccurate outcome prediction often requires modeling complex interactions between input features and context-specific modifiers. The pliable lasso is a flexible regression framework that integrates such modifiers into the prediction process. In many real- world applications, however, these modifiers are unobserved at test time and must be estimated. This study investigates the performance of eight supervised machine learning algorithms for estimating the modifier matrix Z in a pliable lasso model under a known-to-unknown scenario. The analysis considers both classification accuracy for modifier estimation and regression accuracy for the final response prediction, using simulated data and two relevant real-world datasets: the Superconductivity dataset and the Mice Protein Expression dataset. Results indicate that tree-based ensemble models (e.g., XGBoost, Random Forest, Decision Tree) deliver superior modifier classification (AUC\u0026thinsp;\u0026gt;\u0026thinsp;0.99), while regularized models such as Lasso and Elastic Net achieve the best regression performance. The findings support a hybrid modeling approach in which tree-based classifiers estimate modifying variables, followed by regularized regression for accurate and interpretable predictions. This strategy holds promise for data-driven modeling in high-dimensional engineering systems where partial contextual information is available.\u003c/p\u003e","manuscriptTitle":"Supervised Learning for Predicting Unknown Modifying Variables in Pliable Lasso: Applications to High-Dimensional Datasets","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-09-12 20:25:19","doi":"10.21203/rs.3.rs-7495915/v1","editorialEvents":[{"type":"communityComments","content":0},{"type":"editorInvitedReview","content":"","date":"2025-09-10T16:04:53+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"205490322088334648114146153314342258499","date":"2025-09-10T15:41:10+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"174316259689139020828262987878908065908","date":"2025-09-08T06:56:15+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"216902517169042745816423537791844053183","date":"2025-09-05T15:33:44+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"117918963248817655166160090359478027037","date":"2025-09-05T15:26:46+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"151667959360024663555551667318890313173","date":"2025-09-05T15:26:00+00:00","index":"hide","fulltext":""},{"type":"reviewersInvited","content":"","date":"2025-09-05T15:11:32+00:00","index":"","fulltext":""},{"type":"editorAssigned","content":"","date":"2025-09-05T15:07:25+00:00","index":"","fulltext":""},{"type":"editorInvited","content":"","date":"2025-09-03T17:56:11+00:00","index":"","fulltext":""},{"type":"checksComplete","content":"","date":"2025-09-02T14:29:35+00:00","index":"","fulltext":""},{"type":"submitted","content":"Scientific Reports","date":"2025-09-02T14:14:31+00:00","index":"","fulltext":""}],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"scientific-reports","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"scirep","sideBox":"Learn more about [Scientific Reports](http://www.nature.com/srep/)","snPcode":"","submissionUrl":"","title":"Scientific Reports","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"stoa","reportingPortfolio":"Scientific Reports","inReviewEnabled":true,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"8312029d-0f4c-4be7-b9ab-3c8469d77304","owner":[],"postedDate":"September 12th, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"published-in-journal","subjectAreas":[{"id":54559073,"name":"Physical sciences/Engineering"},{"id":54559074,"name":"Physical sciences/Mathematics and computing"}],"tags":[],"updatedAt":"2026-03-02T16:01:24+00:00","versionOfRecord":{"articleIdentity":"rs-7495915","link":"https://doi.org/10.1038/s41598-026-36854-y","journal":{"identity":"scientific-reports","isVorOnly":false,"title":"Scientific Reports"},"publishedOn":"2026-02-23 15:57:19","publishedOnDateReadable":"February 23rd, 2026"},"versionCreatedAt":"2025-09-12 20:25:19","video":"","vorDoi":"10.1038/s41598-026-36854-y","vorDoiUrl":"https://doi.org/10.1038/s41598-026-36854-y","workflowStages":[]},"version":"v1","identity":"rs-7495915","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-7495915","identity":"rs-7495915","version":["v1"]},"buildId":"8U1c8b4HqxoKbykW_rLl7","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.