Distribution-Informed Machine Learning for Flash Flood Susceptibility: Integrating Weibull Extreme Value Theory with Interpretable Models | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Article Distribution-Informed Machine Learning for Flash Flood Susceptibility: Integrating Weibull Extreme Value Theory with Interpretable Models Farrukh A. Chishtie, Rana U. Ali, Abdolreza Bahremand, Mujtaba Hassan, and 1 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-8273683/v1 This work is licensed under a CC BY 4.0 License Status: Under Review Version 1 posted 5 You are reading this latest preprint version Abstract Flash floods represent one of the deadliest weather-related hazards globally, yet their prediction remains fundamentally challenged by extreme class imbalance in observational data. This study addresses a critical methodological crisis: traditional evaluation metrics—both overall accuracy and Area Under the ROC Curve (AUC)—are profoundly misleading for rare event prediction. We demonstrate empirically how models achieving 94% accuracy and AUC exceeding 0.95 can simultaneously fail to detect 40% of flood events. Moving beyond conventional approaches, we introduce a paradigm shift from ad hoc feature engineering to distribution theory-informed feature generation. By integrating Extreme Value Theory through Weibull distribution analysis, we derive 24 features from rigorous statistical characterization of precipitation extremes rather than heuristic transformations. Evaluating seven model configurations for flash flood susceptibility in Nova Scotia, Canada, using Environment and Climate Change Canada operational warning thresholds we find that an Artificial Neural Network with selected features achieves 90% recall and 90.6% balanced accuracy. SHAP analysis reveals that the intensity-duration product—a distribution-informed physical process feature—dominates predictions with mean $|$SHAP$|$ = 0.127, validating both hydrological understanding and the distribution-informed framework. These findings provide essential guidance for practitioners: comprehensive reporting of balanced accuracy, precision, and recall is mandatory for imbalanced datasets where traditional metrics mask operational failure. Earth and environmental sciences/Climate sciences Earth and environmental sciences/Hydrology Physical sciences/Mathematics and computing Earth and environmental sciences/Natural hazards Flash floods Machine learning Weibull distribution Extreme Value Theory Distribution-informed modeling Class imbalance SHAP interpretability Rare event prediction Full Text Additional Declarations No competing interests reported. Cite Share Download PDF Status: Under Review Version 1 posted Reviewers invited by journal 10 Dec, 2025 Editor invited by journal 08 Dec, 2025 Editor assigned by journal 06 Dec, 2025 Submission checks completed at journal 06 Dec, 2025 First submitted to journal 03 Dec, 2025 You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-8273683","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Article","associatedPublications":[],"authors":[{"id":560375445,"identity":"30f3fffe-8ca3-4d5b-8f51-49c99b5b3f89","order_by":0,"name":"Farrukh A. Chishtie","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAAt0lEQVRIie3OMQrCMBTG8SdCuhSyvt4iUqibvUpCoVM9gUUz1Sv0JM6FB3XxAB0cMmUSFFwcOljqJAhNN4f89x/fB+Dz/WGBBgYSGuDOJGw+BCM9i8BARONMArLG9Nd93BUrA+XGgYT5WqjKYtIVsYA2myYpFAyVpoFsT7jQS4cVfmMoe8K4HsnBgeCwIhmhwJGQC7EJqoqi+nJ/CtmeXY5lNnr1xPkxV+ZR7qbJd3Iu8Pl8Pt/v3tk/N42cn0HXAAAAAElFTkSuQmCC","orcid":"","institution":"University of British Columbia","correspondingAuthor":true,"prefix":"","firstName":"Farrukh","middleName":"A.","lastName":"Chishtie","suffix":""},{"id":560375446,"identity":"81eb0a05-752c-4ab1-8e11-e7af7810c47e","order_by":1,"name":"Rana U. Ali","email":"","orcid":"","institution":"Peaceful Society, Science and Innovation Foundation","correspondingAuthor":false,"prefix":"","firstName":"Rana","middleName":"U.","lastName":"Ali","suffix":""},{"id":560375447,"identity":"343700a3-bb29-4822-93ff-2b4f9e227947","order_by":2,"name":"Abdolreza Bahremand","email":"","orcid":"","institution":"Gorgan University of Agricultural Sciences and Natural Resources","correspondingAuthor":false,"prefix":"","firstName":"Abdolreza","middleName":"","lastName":"Bahremand","suffix":""},{"id":560375448,"identity":"10f53599-49c0-43ac-8b4c-8498719e5862","order_by":3,"name":"Mujtaba Hassan","email":"","orcid":"","institution":"Institute of Space Technology","correspondingAuthor":false,"prefix":"","firstName":"Mujtaba","middleName":"","lastName":"Hassan","suffix":""},{"id":560375449,"identity":"cf71db8b-c504-4a88-a3fc-433ce95c4fab","order_by":4,"name":"John J. Clague","email":"","orcid":"","institution":"Simon Fraser University","correspondingAuthor":false,"prefix":"","firstName":"John","middleName":"J.","lastName":"Clague","suffix":""}],"badges":[],"createdAt":"2025-12-03 21:08:06","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-8273683/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-8273683/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":98301613,"identity":"8b134041-36c7-4324-9cb7-bc86df58d6bf","added_by":"auto","created_at":"2025-12-16 10:10:40","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":1687694,"visible":true,"origin":"","legend":"","description":"","filename":"MLFlashFloodsNSCanadaChishtieetal2025.pdf","url":"https://assets-eu.researchsquare.com/files/rs-8273683/v1/0e4eeafc42be94f2fb522563.pdf"},{"id":98301611,"identity":"c05f63c7-7ee9-46be-b4aa-797b9a12120f","added_by":"auto","created_at":"2025-12-16 10:10:39","extension":"json","order_by":1,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":7323,"visible":true,"origin":"","legend":"","description":"","filename":"a7abfd4d3f4247b0adfe804f74bb7ada.json","url":"https://assets-eu.researchsquare.com/files/rs-8273683/v1/8e0ab454169bb282f6fe9f21.json"},{"id":98301622,"identity":"c6a49d84-e898-4e6c-b11b-fa641aebe202","added_by":"auto","created_at":"2025-12-16 10:10:46","extension":"pdf","order_by":1,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":982278,"visible":true,"origin":"","legend":"","description":"","filename":"MLFlashFloodsNSCanadaChishtieetal2025.pdf","url":"https://assets-eu.researchsquare.com/files/rs-8273683/v1_covered_eaf3c17e-9b7c-4f83-948b-271a858c8efa.pdf"}],"financialInterests":"No competing interests reported.","formattedTitle":"Distribution-Informed Machine Learning for Flash Flood Susceptibility: Integrating Weibull Extreme Value Theory with Interpretable Models","fulltext":[],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":false,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":true,"isAuthorSuppliedPdf":true,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":true,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"scientific-reports","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"scirep","sideBox":"Learn more about [Scientific Reports](http://www.nature.com/srep/)","snPcode":"","submissionUrl":"","title":"Scientific Reports","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"stoa","reportingPortfolio":"Scientific Reports","inReviewEnabled":true,"inReviewRevisionsEnabled":true},"keywords":"Flash floods, Machine learning, Weibull distribution, Extreme Value Theory, Distribution-informed modeling, Class imbalance, SHAP interpretability, Rare event prediction","lastPublishedDoi":"10.21203/rs.3.rs-8273683/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-8273683/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"Flash floods represent one of the deadliest weather-related hazards globally, yet their prediction remains fundamentally challenged by extreme class imbalance in observational data. This study addresses a critical methodological crisis: traditional evaluation metrics—both overall accuracy and Area Under the ROC Curve (AUC)—are profoundly misleading for rare event prediction. We demonstrate empirically how models achieving 94\\% accuracy and AUC exceeding 0.95 can simultaneously fail to detect 40\\% of flood events. Moving beyond conventional approaches, we introduce a paradigm shift from ad hoc feature engineering to distribution theory-informed feature generation. By integrating Extreme Value Theory through Weibull distribution analysis, we derive 24 features from rigorous statistical characterization of precipitation extremes rather than heuristic transformations. Evaluating seven model configurations for flash flood susceptibility in Nova Scotia, Canada, using Environment and Climate Change Canada operational warning thresholds we find that an Artificial Neural Network with selected features achieves 90\\% recall and 90.6\\% balanced accuracy. SHAP analysis reveals that the intensity-duration product—a distribution-informed physical process feature—dominates predictions with mean $|$SHAP$|$ = 0.127, validating both hydrological understanding and the distribution-informed framework. These findings provide essential guidance for practitioners: comprehensive reporting of balanced accuracy, precision, and recall is mandatory for imbalanced datasets where traditional metrics mask operational failure.","manuscriptTitle":"Distribution-Informed Machine Learning for Flash Flood Susceptibility: Integrating Weibull Extreme Value Theory with Interpretable Models","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-12-16 10:10:15","doi":"10.21203/rs.3.rs-8273683/v1","editorialEvents":[{"type":"communityComments","content":0},{"type":"reviewersInvited","content":"","date":"2025-12-10T21:52:30+00:00","index":"","fulltext":""},{"type":"editorInvited","content":"","date":"2025-12-08T09:03:30+00:00","index":"","fulltext":""},{"type":"editorAssigned","content":"","date":"2025-12-06T09:44:16+00:00","index":"","fulltext":""},{"type":"checksComplete","content":"","date":"2025-12-06T09:43:36+00:00","index":"","fulltext":""},{"type":"submitted","content":"Scientific Reports","date":"2025-12-03T20:51:54+00:00","index":"","fulltext":""}],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"scientific-reports","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"scirep","sideBox":"Learn more about [Scientific Reports](http://www.nature.com/srep/)","snPcode":"","submissionUrl":"","title":"Scientific Reports","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"stoa","reportingPortfolio":"Scientific Reports","inReviewEnabled":true,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"1d45e023-81d1-4d20-b015-228d6502cc2e","owner":[],"postedDate":"December 16th, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"under-review","subjectAreas":[{"id":59647985,"name":"Earth and environmental sciences/Climate sciences"},{"id":59647986,"name":"Earth and environmental sciences/Hydrology"},{"id":59647987,"name":"Physical sciences/Mathematics and computing"},{"id":59647988,"name":"Earth and environmental sciences/Natural hazards"}],"tags":[],"updatedAt":"2026-04-21T05:24:38+00:00","versionOfRecord":[],"versionCreatedAt":"2025-12-16 10:10:15","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-8273683","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-8273683","identity":"rs-8273683","version":["v1"]},"buildId":"8U1c8b4HqxoKbykW_rLl7","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.