Cognitive Load Prediction Driven by Mobile Application Interface Complexity Analysis: A Framework Integrating Visual Features and Multimodal Modeling

doi:10.21203/rs.3.rs-9063864/v1

Cognitive Load Prediction Driven by Mobile Application Interface Complexity Analysis: A Framework Integrating Visual Features and Multimodal Modeling

2026 · doi:10.21203/rs.3.rs-9063864/v1

preprint OA: closed

Full text JSON View at publisher

Full text 17,385 characters · extracted from preprint-html · click to expand

Cognitive Load Prediction Driven by Mobile Application Interface Complexity Analysis: A Framework Integrating Visual Features and Multimodal Modeling | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Article Cognitive Load Prediction Driven by Mobile Application Interface Complexity Analysis: A Framework Integrating Visual Features and Multimodal Modeling Zihui Ni, Fan Zhang, ZainudinBin Siran, CheeOnn Wong, Yuan Zeng This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-9063864/v1 This work is licensed under a CC BY 4.0 License Status: Under Review Version 1 posted 13 You are reading this latest preprint version Abstract Current mobile application interface designs neglect cognitive differences among users such as the elderly and those with color blindness, and struggle to adapt to dynamic usage scenarios, leading to inaccurate complexity assessments. This paper integrates visual feature analysis and cognitive load estimation to construct a multimodal fusion modeling framework for the complexity of mobile application interface design. The study encodes three types of inputs in parallel at the temporal window level, extracting and quantifying visual features using CIE Lab-based color contrast statistics, U2-Net (U Square Net) saliency maps, and ResNet50 (Residual Network 50 layers) intermediate layer semantic embedding. Markov entropy rate and navigation path features are used to characterize the interaction load. For implicit cognitive load, physiological load indicators such as real-time heart rate and gaze/pupil features are extracted. Then, modal embeddings are obtained through modality-specific projection and nonlinear mapping, respectively. Gated Multi-modal Unit (GMU) and multi-head cross-modal self-attention are used to weight and model inter-modal information. Transformer Encoder is then used to capture long-term and short-term temporal dependencies. Joint regression outputs cognitive load and high/low load binary classification results, and a multi-task loss is constructed to constrain learning. Finally, the interpretability layer uses SHAP to generate global and local contributions and drive factors to action mapping. Lightweight closed-loop bias correction is used online to eliminate systematic biases for elderly and colorblind individuals. LinUCB context arms are used for online context-arm learning to select appropriate actions. Distillation and INT8 quantization are used at the inference end to meet the real-time and energy consumption constraints of mobile devices. The experiment achieves an RMSE of 0.180 ± 0.020 in cognitive load regression and an F1 score of 0.92 ± 0.03 in high-load segment detection. Even under dynamic context switching from EnvLight to FastMove, the RMSE (root mean square error) remains between 0.15 and 0.20, with only 0.22 for the elderly and 0.23 for the colorblind group. The core of this study lies in distinguishing between objective interface complexity (OCI) and subjective cognitive load. OCI quantifies design attributes, while subjective cognitive load reflects the user's psychological response to a specific task. This paper's multimodal framework uses OCI as input, combined with behavioral and physiological signals, to predict continuous cognitive load and supports interpretable analysis and online interface optimization.This improves the personalized adaptability and dynamic optimization capabilities of the interface design, as well as the accuracy of cognitive load assessment. It provides a reproducible technical path and quantitative benchmark for quantifying and adapting interface cognitive load for heterogeneous user groups and dynamic usage scenarios. Physical sciences/Engineering Physical sciences/Mathematics and computing Biological sciences/Neuroscience Mobile Application Interface Design Complexity Visual Features Cognitive Load Dynamic Context Full Text Additional Declarations No competing interests reported. Cite Share Download PDF Status: Under Review Version 1 posted Reviews received at journal 18 May, 2026 Reviews received at journal 17 May, 2026 Reviews received at journal 13 May, 2026 Reviewers agreed at journal 21 Apr, 2026 Reviewers agreed at journal 19 Apr, 2026 Reviewers agreed at journal 17 Apr, 2026 Reviewers agreed at journal 16 Apr, 2026 Reviewers agreed at journal 16 Apr, 2026 Reviewers invited by journal 16 Apr, 2026 Editor assigned by journal 08 Apr, 2026 Editor invited by journal 18 Mar, 2026 Submission checks completed at journal 13 Mar, 2026 First submitted to journal 12 Mar, 2026 You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-9063864","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Article","associatedPublications":[],"authors":[{"id":627255607,"identity":"53a6435b-f60f-4251-857b-2b5579a714fd","order_by":0,"name":"Zihui Ni","email":"","orcid":"","institution":"Multimedia University","correspondingAuthor":false,"prefix":"","firstName":"Zihui","middleName":"","lastName":"Ni","suffix":""},{"id":627255611,"identity":"59481010-dfcd-4c84-bea5-dd46cf1a2bd3","order_by":1,"name":"Fan Zhang","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAA3UlEQVRIie3QrwvCQBTA8XccXHqw+gR//AuTgWmwf+VWlhSMxiUX1L7mXyCYlk8PTBOrweAQzBoFmQ6Nyjab4b75fXi8B2Ay/WHe/KTUBYmFHNhRjtxqwmLhZ3HTLQjn9jEN6hB0HHQDFgIXjWy8riacoEfY161FZG1GUiiwooksJaITBkSpdhLNxV7iASjdLksJEmyoO9X+m9AZbBqUEyI2Jj9/vMhQ2rqa2FQ8SuF7C0hZiwiWhfi6xSGpAqy8xYuti76jbiW7VXa95W7bimbl5CP8bdxkMplMX3sC2VZNH+z0d7wAAAAASUVORK5CYII=","orcid":"","institution":"University of Malaya","correspondingAuthor":true,"prefix":"","firstName":"Fan","middleName":"","lastName":"Zhang","suffix":""},{"id":627255614,"identity":"b2a9bb80-310b-4a5c-9244-d14adcb365fe","order_by":2,"name":"ZainudinBin Siran","email":"","orcid":"","institution":"Multimedia University","correspondingAuthor":false,"prefix":"","firstName":"ZainudinBin","middleName":"","lastName":"Siran","suffix":""},{"id":627255623,"identity":"4f59f9f5-2cc9-49ed-9ade-fc46e21547a4","order_by":3,"name":"CheeOnn Wong","email":"","orcid":"","institution":"Multimedia University","correspondingAuthor":false,"prefix":"","firstName":"CheeOnn","middleName":"","lastName":"Wong","suffix":""},{"id":627255627,"identity":"235fdba4-fbd8-492e-8495-198ace9da24c","order_by":4,"name":"Yuan Zeng","email":"","orcid":"","institution":"Multimedia University","correspondingAuthor":false,"prefix":"","firstName":"Yuan","middleName":"","lastName":"Zeng","suffix":""}],"badges":[],"createdAt":"2026-03-08 11:38:10","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-9063864/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-9063864/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":107706855,"identity":"c41d89e8-0ae3-4c8b-816b-0be8f2020f01","added_by":"auto","created_at":"2026-04-24 09:18:54","extension":"pdf","order_by":1,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":958943,"visible":true,"origin":"","legend":"","description":"","filename":"3.13Manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-9063864/v1_covered_a57522bb-fcf3-4554-902c-b37b4e270573.pdf"}],"financialInterests":"No competing interests reported.","formattedTitle":"Cognitive Load Prediction Driven by Mobile Application Interface Complexity Analysis: A Framework Integrating Visual Features and Multimodal Modeling","fulltext":[],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":false,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":true,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":true,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"scientific-reports","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"scirep","sideBox":"Learn more about [Scientific Reports](http://www.nature.com/srep/)","snPcode":"","submissionUrl":"","title":"Scientific Reports","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"stoa","reportingPortfolio":"Scientific Reports","inReviewEnabled":true,"inReviewRevisionsEnabled":true},"keywords":"Mobile Application Interface, Design Complexity, Visual Features, Cognitive Load, Dynamic Context","lastPublishedDoi":"10.21203/rs.3.rs-9063864/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-9063864/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eCurrent mobile application interface designs neglect cognitive differences among users such as the elderly and those with color blindness, and struggle to adapt to dynamic usage scenarios, leading to inaccurate complexity assessments. This paper integrates visual feature analysis and cognitive load estimation to construct a multimodal fusion modeling framework for the complexity of mobile application interface design. The study encodes three types of inputs in parallel at the temporal window level, extracting and quantifying visual features using CIE Lab-based color contrast statistics, U2-Net (U Square Net) saliency maps, and ResNet50 (Residual Network 50 layers) intermediate layer semantic embedding. Markov entropy rate and navigation path features are used to characterize the interaction load. For implicit cognitive load, physiological load indicators such as real-time heart rate and gaze/pupil features are extracted. Then, modal embeddings are obtained through modality-specific projection and nonlinear mapping, respectively. Gated Multi-modal Unit (GMU) and multi-head cross-modal self-attention are used to weight and model inter-modal information. Transformer Encoder is then used to capture long-term and short-term temporal dependencies. Joint regression outputs cognitive load and high/low load binary classification results, and a multi-task loss is constructed to constrain learning. Finally, the interpretability layer uses SHAP to generate global and local contributions and drive factors to action mapping. Lightweight closed-loop bias correction is used online to eliminate systematic biases for elderly and colorblind individuals. LinUCB context arms are used for online context-arm learning to select appropriate actions. Distillation and INT8 quantization are used at the inference end to meet the real-time and energy consumption constraints of mobile devices. The experiment achieves an RMSE of 0.180\u0026thinsp;\u0026plusmn;\u0026thinsp;0.020 in cognitive load regression and an F1 score of 0.92\u0026thinsp;\u0026plusmn;\u0026thinsp;0.03 in high-load segment detection. Even under dynamic context switching from EnvLight to FastMove, the RMSE (root mean square error) remains between 0.15 and 0.20, with only 0.22 for the elderly and 0.23 for the colorblind group. The core of this study lies in distinguishing between objective interface complexity (OCI) and subjective cognitive load. OCI quantifies design attributes, while subjective cognitive load reflects the user's psychological response to a specific task. This paper's multimodal framework uses OCI as input, combined with behavioral and physiological signals, to predict continuous cognitive load and supports interpretable analysis and online interface optimization.This improves the personalized adaptability and dynamic optimization capabilities of the interface design, as well as the accuracy of cognitive load assessment. It provides a reproducible technical path and quantitative benchmark for quantifying and adapting interface cognitive load for heterogeneous user groups and dynamic usage scenarios.\u003c/p\u003e","manuscriptTitle":"Cognitive Load Prediction Driven by Mobile Application Interface Complexity Analysis: A Framework Integrating Visual Features and Multimodal Modeling","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2026-04-23 22:21:48","doi":"10.21203/rs.3.rs-9063864/v1","editorialEvents":[{"type":"communityComments","content":0},{"type":"editorInvitedReview","content":"","date":"2026-05-18T10:34:04+00:00","index":"hide","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2026-05-17T21:53:27+00:00","index":"hide","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2026-05-13T12:32:59+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"297890372916291887446134757443551795464","date":"2026-04-21T18:41:06+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"34932688993780290081685572060167485405","date":"2026-04-20T02:33:54+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"315754026585023418441105810358925453919","date":"2026-04-17T15:17:49+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"144911937238946532547059951753531524677","date":"2026-04-16T11:09:51+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"122124958829706783816045036107557226493","date":"2026-04-16T09:11:42+00:00","index":"hide","fulltext":""},{"type":"reviewersInvited","content":"","date":"2026-04-16T07:12:32+00:00","index":"","fulltext":""},{"type":"editorAssigned","content":"","date":"2026-04-08T12:05:13+00:00","index":"","fulltext":""},{"type":"editorInvited","content":"","date":"2026-03-18T14:14:17+00:00","index":"","fulltext":""},{"type":"checksComplete","content":"","date":"2026-03-13T16:00:12+00:00","index":"","fulltext":""},{"type":"submitted","content":"Scientific Reports","date":"2026-03-13T03:23:02+00:00","index":"","fulltext":""}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"scientific-reports","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"scirep","sideBox":"Learn more about [Scientific Reports](http://www.nature.com/srep/)","snPcode":"","submissionUrl":"","title":"Scientific Reports","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"stoa","reportingPortfolio":"Scientific Reports","inReviewEnabled":true,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"9ecf905c-457c-416d-925a-1a7307ec259e","owner":[],"postedDate":"April 23rd, 2026","published":true,"recentEditorialEvents":[{"type":"editorInvitedReview","content":"","date":"2026-05-18T10:34:04+00:00","index":123,"fulltext":""},{"type":"editorInvitedReview","content":"","date":"2026-05-17T21:53:27+00:00","index":122,"fulltext":""},{"type":"editorInvitedReview","content":"","date":"2026-05-13T12:32:59+00:00","index":121,"fulltext":""}],"rejectedJournal":[],"revision":"","amendment":"","status":"under-review","subjectAreas":[{"id":66762646,"name":"Physical sciences/Engineering"},{"id":66762647,"name":"Physical sciences/Mathematics and computing"},{"id":66762648,"name":"Biological sciences/Neuroscience"}],"tags":[],"updatedAt":"2026-04-23T22:21:48+00:00","versionOfRecord":[],"versionCreatedAt":"2026-04-23 22:21:48","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-9063864","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-9063864","identity":"rs-9063864","version":["v1"]},"buildId":"XKTyCvWXoU3ODBz1xrDgd","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

⚙ Ask this paper AI returns verbatim quotes from the full text · source: preprint-html ⓘ

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2026) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc: last seen: 2026-05-20T01:45:00.602351+00:00