A Thangka cultural element classification model based on self-supervised contrastive learning and MS-Triplet Attention

doi:10.21203/rs.3.rs-3828910/v1

A Thangka cultural element classification model based on self-supervised contrastive learning and MS-Triplet Attention

2024 · doi:10.21203/rs.3.rs-3828910/v1

preprint OA: closed

Full text JSON View at publisher

Full text 13,652 characters · extracted from preprint-html · click to expand

A Thangka cultural element classification model based on self-supervised contrastive learning and MS-Triplet Attention | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article A Thangka cultural element classification model based on self-supervised contrastive learning and MS-Triplet Attention Wenjing Tang, Qing Xie This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-3828910/v1 This work is licensed under a CC BY 4.0 License Status: Published Journal Publication published 24 Apr, 2024 Read the published version in The Visual Computer → Version 1 posted 8 You are reading this latest preprint version Abstract Being a significant repository of Buddhist imagery, Thangka images are valuable historical materials of Tibetan studies, which covers many domains such as Tibetan history, politics, culture, social life and even traditional medicine and astronomy. Thangka cultural element images are the essence of Thangka images. Hence Thangka cultural element images classification is one of the most important work of knowledge representation and mining in the field of Thangka, and is the foundation of digital protection of Thangka images. However, due to the limited quantity, high complexity and the intricate textures of Thangka images, the classification of Thangka images is limited to a small number of categories and coarse granularity. Thus a novel fusion texture feature dual-branch Thangka cultural elements classification model based on the attention mechanism and self-supervised contrastive learning has been proposed in this paper. Specifically, to address the issue of insufficient labeled samples and improve the classification performance, this method utilizes a large amount of unlabeled irrelevant data to pre-train the feature extractor through self-supervised learning. During the fine-tuning stage of the downstream task, a dual-branch feature extraction structure incorporating texture features has been designed, and MS-Triplet Attetnion proposed by us is used for the integration of important features. Additionally, to address the problem of sample imbalance and the existence of a large number of difficult samples in the Thangka cultural element data set, the Gradient Harmonizing Mechanism Loss has been adopted, and it has been improved by introducing a self designed adaptive mechanism. The experimental results on Thangka cultural elements dataset prove the superiority of the proposed method over the state-of-the-art methods.The source code of our proposed algorithm and the related datasets is available at https://github.com/WiniTang/MS-BiCLR. Tibetan Thangka classification sample imbalance problem self-supervised contrastive learning gradient harmonizing mechanism loss attention mechanism Full Text Additional Declarations No competing interests reported. Cite Share Download PDF Status: Published Journal Publication published 24 Apr, 2024 Read the published version in The Visual Computer → Version 1 posted Editorial decision: Revision requested 08 Feb, 2024 Reviews received at journal 19 Jan, 2024 Reviewers agreed at journal 05 Jan, 2024 Reviewers agreed at journal 03 Jan, 2024 Reviewers invited by journal 03 Jan, 2024 Editor assigned by journal 02 Jan, 2024 Submission checks completed at journal 02 Jan, 2024 First submitted to journal 02 Jan, 2024 You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-3828910","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":265094538,"identity":"015b3fe7-d4af-41b6-85d3-76660b229810","order_by":0,"name":"Wenjing Tang","email":"","orcid":"","institution":"Wuhan University of Technology","correspondingAuthor":false,"prefix":"","firstName":"Wenjing","middleName":"","lastName":"Tang","suffix":""},{"id":265094539,"identity":"0a349b94-c268-44cf-b322-ee5e66095576","order_by":1,"name":"Qing Xie","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAA00lEQVRIiWNgGAWjYBAC9gbmhgNAWo5BAsxnJqyF5wAjWIsxUAtjA9FaQHRiA/Fa2BsbD/O2WadvuN1j/oChwjqxgf3sAfxaeA42HJxxJj13w50zhg0MZ9ITG3jyEvBqsZdIbDjwoeJw7rYbOYYNjG2HgS7kMcBvi/zDhgMJBofTzcBa/hGjRYIRbEsCREsDMVp4EsF+Mdx/I61wRsKxdOM2nhwCWtgPH/4MDDF5yRnJGz58qLGW7Wc/g18LFECjIwGI2YhRz0BUDI6CUTAKRsHIBQCwVEplYUgUlwAAAABJRU5ErkJggg==","orcid":"","institution":"Wuhan University of Technology","correspondingAuthor":true,"prefix":"","firstName":"Qing","middleName":"","lastName":"Xie","suffix":""}],"badges":[],"createdAt":"2024-01-02 08:14:12","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-3828910/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-3828910/v1","draftVersion":[],"editorialEvents":[{"content":"https://doi.org/10.1007/s00371-024-03397-0","type":"published","date":"2024-04-25T00:25:14+00:00"}],"editorialNote":"","failedWorkflow":false,"files":[{"id":55697027,"identity":"4fc49bf5-4237-40d7-9ae2-e393b346d1a7","added_by":"auto","created_at":"2024-05-02 02:01:11","extension":"pdf","order_by":1,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":1930275,"visible":true,"origin":"","legend":"","description":"","filename":"Article12.pdf","url":"https://assets-eu.researchsquare.com/files/rs-3828910/v1_covered_8179846c-b4a3-43e9-8843-023da54a0d33.pdf"}],"financialInterests":"No competing interests reported.","formattedTitle":"A Thangka cultural element classification model based on self-supervised contrastive learning and MS-Triplet Attention","fulltext":[],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":false,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":true,"isAuthorSuppliedPdf":true,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":true,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"the-visual-computer","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"tvcj","sideBox":"Learn more about [The Visual Computer](http://link.springer.com/journal/371)","snPcode":"371","submissionUrl":"https://submission.nature.com/new-submission/371/3","title":"The Visual Computer","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"stoa","reportingPortfolio":"Springer Hybrid","inReviewEnabled":true,"inReviewRevisionsEnabled":false},"keywords":"Tibetan Thangka classification, sample imbalance problem, self-supervised contrastive learning, gradient harmonizing mechanism loss, attention mechanism","lastPublishedDoi":"10.21203/rs.3.rs-3828910/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-3828910/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eBeing a significant repository of Buddhist imagery, Thangka images are valuable historical materials of Tibetan studies, which covers many domains such as Tibetan history, politics, culture, social life and even traditional medicine and astronomy. Thangka cultural element images are the essence of Thangka images. Hence Thangka cultural element images classification is one of the most important work of knowledge representation and mining in the field of Thangka, and is the foundation of digital protection of Thangka images. However, due to the limited quantity, high complexity and the intricate textures of Thangka images, the classification of Thangka images is limited to a small number of categories and coarse granularity. Thus a novel fusion texture feature dual-branch Thangka cultural elements classification model based on the attention mechanism and self-supervised contrastive learning has been proposed in this paper. Specifically, to address the issue of insufficient labeled samples and improve the classification performance, this method utilizes a large amount of unlabeled irrelevant data to pre-train the feature extractor through self-supervised learning. During the fine-tuning stage of the downstream task, a dual-branch feature extraction structure incorporating texture features has been designed, and MS-Triplet Attetnion proposed by us is used for the integration of important features. Additionally, to address the problem of sample imbalance and the existence of a large number of difficult samples in the Thangka cultural element data set, the Gradient Harmonizing Mechanism Loss has been adopted, and it has been improved by introducing a self designed adaptive mechanism. The experimental results on Thangka cultural elements dataset prove the superiority of the proposed method over the state-of-the-art methods.The source code of our proposed algorithm and the related datasets is available at https://github.com/WiniTang/MS-BiCLR.\u003c/p\u003e","manuscriptTitle":"A Thangka cultural element classification model based on self-supervised contrastive learning and MS-Triplet Attention","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2024-01-04 03:26:55","doi":"10.21203/rs.3.rs-3828910/v1","editorialEvents":[{"type":"communityComments","content":0},{"type":"decision","content":"Revision requested","date":"2024-02-08T17:25:26+00:00","index":"","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2024-01-19T05:05:11+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"70d23aee-7478-4444-8176-5a5b9d8c50d5","date":"2024-01-05T05:25:47+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"f8f8a0c8-109f-4c4d-bc54-6124a6d7a145","date":"2024-01-03T06:58:10+00:00","index":"hide","fulltext":""},{"type":"reviewersInvited","content":"","date":"2024-01-03T05:20:37+00:00","index":"","fulltext":""},{"type":"editorAssigned","content":"","date":"2024-01-02T14:00:37+00:00","index":"","fulltext":""},{"type":"checksComplete","content":"","date":"2024-01-02T10:15:22+00:00","index":"","fulltext":""},{"type":"submitted","content":"The Visual Computer","date":"2024-01-02T07:59:38+00:00","index":"","fulltext":""}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"the-visual-computer","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"tvcj","sideBox":"Learn more about [The Visual Computer](http://link.springer.com/journal/371)","snPcode":"371","submissionUrl":"https://submission.nature.com/new-submission/371/3","title":"The Visual Computer","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"stoa","reportingPortfolio":"Springer Hybrid","inReviewEnabled":true,"inReviewRevisionsEnabled":false}}],"origin":"","ownerIdentity":"84dfbc44-97ce-44b3-88a0-bbeb10f817d5","owner":[],"postedDate":"January 4th, 2024","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"published-in-journal","subjectAreas":[],"tags":[],"updatedAt":"2024-05-02T00:25:14+00:00","versionOfRecord":{"articleIdentity":"rs-3828910","link":"https://doi.org/10.1007/s00371-024-03397-0","journal":{"identity":"the-visual-computer","isVorOnly":false,"title":"The Visual Computer"},"publishedOn":"2024-04-25 00:25:14","publishedOnDateReadable":"April 25th, 2024"},"versionCreatedAt":"2024-01-04 03:26:55","video":"","vorDoi":"10.1007/s00371-024-03397-0","vorDoiUrl":"https://doi.org/10.1007/s00371-024-03397-0","workflowStages":[]},"version":"v1","identity":"rs-3828910","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-3828910","identity":"rs-3828910","version":["v1"]},"buildId":"qtupq5eGEP_6zYnWcrvyt","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

⚙ Ask this paper AI returns verbatim quotes from the full text · source: preprint-html ⓘ

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2024) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc: last seen: 2026-05-19T01:45:01.086888+00:00