Graph Neural Networks for Multi-modal Skin Lesion Classification Using Metadata and Visual Features | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article Graph Neural Networks for Multi-modal Skin Lesion Classification Using Metadata and Visual Features Thi Trang Nguyen, Vu Tien Sinh, Van Hieu Vu, Viet Anh Nguyen This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-7109516/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract Accurate classification of skin lesions is crucial for early detection of melanoma and other skin cancers. However, challenges such as inter-class similarity, intra-class variation, and limited labeled data persist. We propose a novel multi-modal graph neural network (GNN) that integrates three complementary modalities: deep visual features from pre-trained convolutional neural networks (CNNs), handcrafted descriptors (e.g., HSV histograms, fractals), and structured clinical metadata (e.g., age, sex, lesion location).Each lesion sample is represented as a node in a \(k\) -nearest neighbor graph constructed in the fused multi-modal feature space. A two-layer graph convolutional network is used to perform relational learning and classification. The model is trained with class-weighted cross-entropy loss and evaluated using repeated stratified 5-fold cross-validation.Experiments on the HAM10000 and ISIC2020 datasets show that our method consistently outperforms strong baselines. On HAM10000, the proposed GNN achieves an accuracy of 0.960 \(\pm\) 0.003, F1 score of 0.959 $ \pm $ 0.004, and AUC of 0.999 $ \pm $ 0.000. On ISIC2020, it reaches 0.999 $ \pm $ 0.001 in both F1 and accuracy, with an AUC of 1.000. These results validate the effectiveness of multi-modal fusion and graph-based reasoning in skin lesion diagnosis and suggest the method's potential for real-world clinical deployment. Skin lesion classification Graph Neural Network Multi-modal learning Deep features Handcrafted features Clinical metadata Full Text Additional Declarations No competing interests reported. Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-7109516","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":493992392,"identity":"875f1b6e-cacb-44ec-946f-9e690b94f182","order_by":0,"name":"Thi Trang Nguyen","email":"","orcid":"","institution":"Vietnam Academy of Science and Technology","correspondingAuthor":false,"prefix":"","firstName":"Thi","middleName":"Trang","lastName":"Nguyen","suffix":""},{"id":493992394,"identity":"96256c83-de00-4c3b-bbe6-ac052b36557f","order_by":1,"name":"Vu Tien Sinh","email":"","orcid":"","institution":"Vietnam Academy of Science and Technology","correspondingAuthor":false,"prefix":"","firstName":"Vu","middleName":"Tien","lastName":"Sinh","suffix":""},{"id":493992396,"identity":"a1bbf284-de2a-44a8-9c12-4085f4b186e4","order_by":2,"name":"Van Hieu Vu","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAAtklEQVRIiWNgGAWjYFACHoYDDAw2PAzMJGpJA2ohXg8PiDgMxMRqMbiRe/DAx7bzMvzs/AcYftQwJG4nrCUv4eDMtts8ks3MDIw9xxgSdzYQ0CLZc8bgMC9Qi8FhoMN4GxiMDQ4Qo+Vv2zmwFsa/xGjhZ+8xOMzYdgCshRloixxRWg72nEsG+cXgsMwxCcJa2Jh5jD/8KLOz5+c/+PDhmxobHoJaUABQsQQp6kfBKBgFo2AU4AIAFGA4qjmbQt0AAAAASUVORK5CYII=","orcid":"","institution":"Vietnam Academy of Science and Technology","correspondingAuthor":true,"prefix":"","firstName":"Van","middleName":"Hieu","lastName":"Vu","suffix":""},{"id":493992398,"identity":"47764961-2700-4ed1-8e72-f61cab925a35","order_by":3,"name":"Viet Anh Nguyen","email":"","orcid":"","institution":"Vietnam Academy of Science and Technology","correspondingAuthor":false,"prefix":"","firstName":"Viet","middleName":"Anh","lastName":"Nguyen","suffix":""}],"badges":[],"createdAt":"2025-07-12 17:23:12","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-7109516/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-7109516/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":100950939,"identity":"0272394e-893a-4e73-a1e5-3f03a0685e33","added_by":"auto","created_at":"2026-01-23 07:09:39","extension":"pdf","order_by":1,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":3334436,"visible":true,"origin":"","legend":"","description":"","filename":"DiscoverArtificialIntelligence.pdf","url":"https://assets-eu.researchsquare.com/files/rs-7109516/v1_covered_33743710-ab76-4bf3-8b2d-09cbb5018306.pdf"}],"financialInterests":"No competing interests reported.","formattedTitle":"Graph Neural Networks for Multi-modal Skin Lesion Classification Using Metadata and Visual Features","fulltext":[],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":false,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":true,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":true,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":true,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"Skin lesion classification, Graph Neural Network, Multi-modal learning, Deep features, Handcrafted features, Clinical metadata","lastPublishedDoi":"10.21203/rs.3.rs-7109516/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-7109516/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eAccurate classification of skin lesions is crucial for early detection of melanoma and other skin cancers. However, challenges such as inter-class similarity, intra-class variation, and limited labeled data persist. We propose a novel multi-modal graph neural network (GNN) that integrates three complementary modalities: deep visual features from pre-trained convolutional neural networks (CNNs), handcrafted descriptors (e.g., HSV histograms, fractals), and structured clinical metadata (e.g., age, sex, lesion location).Each lesion sample is represented as a node in a \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(k\\)\u003c/span\u003e\u003c/span\u003e-nearest neighbor graph constructed in the fused multi-modal feature space. A two-layer graph convolutional network is used to perform relational learning and classification. The model is trained with class-weighted cross-entropy loss and evaluated using repeated stratified 5-fold cross-validation.Experiments on the HAM10000 and ISIC2020 datasets show that our method consistently outperforms strong baselines. On HAM10000, the proposed GNN achieves an accuracy of 0.960 \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\pm\\)\u003c/span\u003e\u003c/span\u003e 0.003, F1 score of 0.959 \u003cspan\u003e$\u003c/span\u003e\\pm\u003cspan\u003e$\u003c/span\u003e 0.004, and AUC of 0.999 \u003cspan\u003e$\u003c/span\u003e\\pm\u003cspan\u003e$\u003c/span\u003e 0.000. On ISIC2020, it reaches 0.999 \u003cspan\u003e$\u003c/span\u003e\\pm\u003cspan\u003e$\u003c/span\u003e 0.001 in both F1 and accuracy, with an AUC of 1.000. These results validate the effectiveness of multi-modal fusion and graph-based reasoning in skin lesion diagnosis and suggest the method's potential for real-world clinical deployment.\u003c/p\u003e","manuscriptTitle":"Graph Neural Networks for Multi-modal Skin Lesion Classification Using Metadata and Visual Features","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-08-01 09:18:49","doi":"10.21203/rs.3.rs-7109516/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"9f8f7611-d3f6-4c63-9328-0ebff67ea217","owner":[],"postedDate":"August 1st, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[],"tags":[],"updatedAt":"2026-01-22T17:54:51+00:00","versionOfRecord":[],"versionCreatedAt":"2025-08-01 09:18:49","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-7109516","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-7109516","identity":"rs-7109516","version":["v1"]},"buildId":"8U1c8b4HqxoKbykW_rLl7","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.