SCAT: The Self-Correcting Aesthetic Transformer for Explainable Facial Beauty Prediction | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Short Report SCAT: The Self-Correcting Aesthetic Transformer for Explainable Facial Beauty Prediction Djamel Eddine Boukhari, Ali Chemsa This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-7003463/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract Modeling human aesthetic perception is a fundamental challenge in computer vision. While deep learning has significantly advanced Facial Beauty Prediction (FBP), state-of-the-art models suffer from two critical, interlinked limitations: a performance plateau with Pearson Correlation (PC) coefficients seldom exceeding 0.90, and a ”black box” nature that offers no insight into their reasoning. We posit that these limitations stem from a failure to emulate the hierarchical, part-based reasoning inherent to human aesthetic judgment. In this work, we propose the Self-Correcting Aesthetic Transformer (SCAT), a novel, explainable-by-design framework that overcomes these challenges. SCAT introduces a two-stage architecture featuring a Semantic Parser to disentangle the face into explicit part embeddings (e.g., eyes, mouth) and a Corrector Aggregator to reason about their harmonious interplay. The model is trained with a novel self-correcting loss that enforces internal consistency between its part-based and holistic evaluations. To facilitate this, we present FBP5500-Subscores, a large-scale dataset with granular part-level aesthetic annotations. Extensive experiments demonstrate that SCAT achieves a new state-of-the-art Pearson Correlation of 0.935, thereby breaking the long-standing performance barrier, while simultaneously providing transparent , human-intelligible predictions. Our work bridges the critical gap between 1 predictive power and interpretability in FBP and suggests a structured reasoning paradigm for other subjective visual assessment tasks. Facial Beauty Prediction Explainable AI Vision Transformer Attention Mechanism Part-based Reasoning Self-Correcting Models Computational Aesthetics Full Text Additional Declarations No competing interests reported. Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-7003463","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Short Report","associatedPublications":[],"authors":[{"id":479765887,"identity":"35ec212b-de9c-4911-b602-b8fa6598d4ed","order_by":0,"name":"Djamel Eddine Boukhari","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAABWklEQVRIie3QMUvDQBTA8RcCyXLNfCVSv0JCBpUK/SovBNollUKWDIUGCudSnC2IfoVCofPBQboEXAOCpItzdSgFsXhXKqZidsH8h1y4lx/hDqCu7g/mAOi6XAkAolzPAbifAMQAhpoX6sHVZ5WEHkh2IFhNvqaKyBeNQWnzmJyZE88bDJ9PwMTVWx7TlrUMWPF6L64silqBsWhZXJ+vv8nFJPOC2zQiQIrADjPqNbPVtTtdiMigqDuYCa/JjUHpN04eeoIYSDo5ot1n1J/lPrMbC+Ez8rihPhP+jBMHf5IdEqAYvPd3dLQnH3eKcHMryUgRfkyCBtuTrt1PKDqKaIkkZmKAJOhw4ibls6SR17hBdZZuO0ypO81WrDlJe4roFLOeOxVGdHRj47lNNtgBMwyewuHlqbXsvdDtsO0/jEFbr+O23BmXb6wUwV+3ZXrVwORVk7q6urp/3ie0coUwlRrbkAAAAABJRU5ErkJggg==","orcid":"","institution":"University of El Oued","correspondingAuthor":true,"prefix":"","firstName":"Djamel","middleName":"Eddine","lastName":"Boukhari","suffix":""},{"id":479765888,"identity":"8154c9cb-f77f-40e3-a7a5-396964bf3ea0","order_by":1,"name":"Ali Chemsa","email":"","orcid":"","institution":"University of El Oued","correspondingAuthor":false,"prefix":"","firstName":"Ali","middleName":"","lastName":"Chemsa","suffix":""}],"badges":[],"createdAt":"2025-06-29 15:08:07","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-7003463/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-7003463/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":86203449,"identity":"ff48db3c-8ff8-48fa-a9b0-8e97c62255f0","added_by":"auto","created_at":"2025-07-08 02:08:03","extension":"pdf","order_by":1,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":375918,"visible":true,"origin":"","legend":"","description":"","filename":"NatureLaTeXTemplate2.pdf","url":"https://assets-eu.researchsquare.com/files/rs-7003463/v1_covered_08fcfb5a-f4f4-4597-81e3-95260e0b1f03.pdf"}],"financialInterests":"No competing interests reported.","formattedTitle":"SCAT: The Self-Correcting Aesthetic Transformer for Explainable Facial Beauty Prediction","fulltext":[],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":false,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":true,"hideJournal":true,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":true,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":true,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"Facial Beauty Prediction, Explainable AI, Vision Transformer, Attention Mechanism, Part-based Reasoning, Self-Correcting Models, Computational Aesthetics","lastPublishedDoi":"10.21203/rs.3.rs-7003463/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-7003463/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"Modeling human aesthetic perception is a fundamental challenge in computer vision. While deep learning has significantly advanced Facial Beauty Prediction (FBP), state-of-the-art models suffer from two critical, interlinked limitations: a performance plateau with Pearson Correlation (PC) coefficients seldom exceeding 0.90, and a ”black box” nature that offers no insight into their reasoning. We posit that these limitations stem from a failure to emulate the hierarchical, part-based reasoning inherent to human aesthetic judgment. In this work, we propose the Self-Correcting Aesthetic Transformer (SCAT), a novel, explainable-by-design framework that overcomes these challenges. SCAT introduces a two-stage architecture featuring a Semantic Parser to disentangle the face into explicit part embeddings (e.g., eyes, mouth) and a Corrector Aggregator to reason about their harmonious interplay. The model is trained with a novel self-correcting loss that enforces internal consistency between its part-based and holistic evaluations. To facilitate this, we present FBP5500-Subscores, a large-scale dataset with granular part-level aesthetic annotations. Extensive experiments demonstrate that SCAT achieves a new state-of-the-art Pearson Correlation of 0.935, thereby breaking the long-standing performance barrier, while simultaneously providing transparent , human-intelligible predictions. Our work bridges the critical gap between 1 predictive power and interpretability in FBP and suggests a structured reasoning paradigm for other subjective visual assessment tasks.","manuscriptTitle":"SCAT: The Self-Correcting Aesthetic Transformer for Explainable Facial Beauty Prediction","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-07-08 01:59:56","doi":"10.21203/rs.3.rs-7003463/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"8e12c9a8-7ecc-4df6-9916-22ae8a4d4867","owner":[],"postedDate":"July 8th, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[],"tags":[],"updatedAt":"2025-07-08T01:59:56+00:00","versionOfRecord":[],"versionCreatedAt":"2025-07-08 01:59:56","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-7003463","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-7003463","identity":"rs-7003463","version":["v1"]},"buildId":"8U1c8b4HqxoKbykW_rLl7","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.