Cross-Domain Tomato Disease Classification via Flexible Contrastive Clustering in Vision-Language Models

preprint OA: closed
Full text JSON View at publisher
Full text 12,922 characters · extracted from preprint-html · click to expand
Cross-Domain Tomato Disease Classification via Flexible Contrastive Clustering in Vision-Language Models | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article Cross-Domain Tomato Disease Classification via Flexible Contrastive Clustering in Vision-Language Models Muhammad Shafay, Divya Velayudhan, Taimur Hassan, Muhammad Owais, and 2 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-9198236/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract Plant disease detection systems face significant challenges in cross-domain generalization, particularly when transitioning from controlled laboratory settings to diverse field conditions. Traditional deep learning approaches exhibit severe performance degradation across different imaging environments, limiting practical deployment in real-world agricultural scenarios. This paper introduces a novel Flexible Contrastive Clustering (FCC) framework for zero-shot tomato disease classification that addresses fundamental generalization limitations through vision-language learning. Unlike standard CLIP’s one-to-one image-text pairing, our method leverages one-to-many relationships where each disease image is associated with multiple diverse textual descriptions, enabling robust representation learning across linguistic variations. The FCC framework optimizes class-based clustering in joint embedding space through a specialized loss function that treats all same-class descriptions as positives, facilitating effective handling of both seen and unseen disease categories during zero-shot evaluation. We evaluate our approach on PlantDoc training data (740 images) and test across four diverse tomato disease datasets totaling 17,313 images, spanning laboratory and field conditions. Experimental results demonstrate substantial improvements over state-of-the-art vision-language models, achieving an average of 30.15% accuracy and 28.05% weighted F1-score on average across all test datasets. Our method shows particularly strong performance on field datasets, achieving 59.70% accuracy on FieldPlant and 26.52% on Tomato Village, significantly outperforming existing approaches. Attention visualization analysis reveals effective disease localization capabilities for both seen and unseen categories, validating the practical applicability of our approach for real-world agricultural monitoring systems. Agricultural Engineering Vision-language models cross-domain generalization plant disease classification flexible contrastive clustering agricultural computer vision Full Text Additional Declarations The authors declare no competing interests. Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-9198236","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":610545782,"identity":"6415386d-7cda-41d2-b558-2b80c0559970","order_by":0,"name":"Muhammad Shafay","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAA3UlEQVRIiWNgGAWjYBACNmbGhsM///2rn8/eRqQWPvbmg48Z2A4wbuw5RqQWOZ5jycYgLQ030oh1mESOmXQBzx1mxpnP0iQYauwY+NsbiNAyQ+IZG7t02jEJhmPJDBJnDhDWIsFjwMzDODu9TQLoQgaGGwnEaElglmC4eRyo5d8BBvn7DwhoAXmf58BhA4YbbMckGNsOMBjcwK+DgQ0YyA9nNqQlGPakJVsk9iXzGJ4h4DD5ZsaGAx8bbBLk2Y8Z3vjwzU5O7vgBAtagAKD5PKSoHwWjYBSMglGAAwAA1FRBoi/4814AAAAASUVORK5CYII=","orcid":"","institution":"Khalifa University, Abu Dhabi, UAE","correspondingAuthor":true,"prefix":"","firstName":"Muhammad","middleName":"","lastName":"Shafay","suffix":""},{"id":610549066,"identity":"884a6e0e-044b-4b30-a1c5-e8b9065e9fac","order_by":1,"name":"Divya Velayudhan","email":"","orcid":"","institution":"Khalifa University, Abu Dhabi, UAE","correspondingAuthor":false,"prefix":"","firstName":"Divya","middleName":"","lastName":"Velayudhan","suffix":""},{"id":610549068,"identity":"5279dddd-fb57-4999-860d-dc70fd71f970","order_by":2,"name":"Taimur Hassan","email":"","orcid":"","institution":"Abu Dhabi University, Abu Dhabi, UAE","correspondingAuthor":false,"prefix":"","firstName":"Taimur","middleName":"","lastName":"Hassan","suffix":""},{"id":610549069,"identity":"a5cd5828-f224-4f3c-8ab1-1df991ec4d0b","order_by":3,"name":"Muhammad Owais","email":"","orcid":"","institution":"Khalifa University, Abu Dhabi, UAE","correspondingAuthor":false,"prefix":"","firstName":"Muhammad","middleName":"","lastName":"Owais","suffix":""},{"id":610549072,"identity":"31d6ddea-8603-4882-bf03-3c471f25a1ac","order_by":4,"name":"Irfan Hussain","email":"","orcid":"","institution":"Khalifa University, Abu Dhabi, UAE","correspondingAuthor":false,"prefix":"","firstName":"Irfan","middleName":"","lastName":"Hussain","suffix":""},{"id":610549075,"identity":"7a531801-bb0e-4a8d-afa9-dfcf32277d09","order_by":5,"name":"Naoufel Werghi","email":"","orcid":"","institution":"Khalifa University, Abu Dhabi, UAE","correspondingAuthor":false,"prefix":"","firstName":"Naoufel","middleName":"","lastName":"Werghi","suffix":""}],"badges":[],"createdAt":"2026-03-23 09:16:46","currentVersionCode":1,"declarations":{"humanSubjects":false,"vertebrateSubjects":false,"conflictsOfInterestStatement":false,"humanSubjectEthicalGuidelines":false,"humanSubjectConsent":false,"humanSubjectClinicalTrial":false,"humanSubjectCaseReport":false,"vertebrateSubjectEthicalGuidelines":false},"doi":"10.21203/rs.3.rs-9198236/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-9198236/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":105564889,"identity":"6a4d5982-eb6d-48c0-917f-e48624b50902","added_by":"auto","created_at":"2026-03-27 12:51:13","extension":"pdf","order_by":1,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":851198,"visible":true,"origin":"","legend":"","description":"","filename":"PaperCrossDomainTomato.pdf","url":"https://assets-eu.researchsquare.com/files/rs-9198236/v1_covered_8fba223c-7d7e-458d-b2f9-a56e366413c6.pdf"}],"financialInterests":"The authors declare no competing interests.","formattedTitle":"\u003cp\u003eCross-Domain Tomato Disease Classification via Flexible Contrastive Clustering in Vision-Language Models\u003c/p\u003e","fulltext":[],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":false,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":true,"hideJournal":true,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":true,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":true,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"Vision-language models, cross-domain generalization, plant disease classification, flexible contrastive clustering, agricultural computer vision","lastPublishedDoi":"10.21203/rs.3.rs-9198236/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-9198236/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003ePlant disease detection systems face significant challenges in cross-domain generalization, particularly when transitioning from controlled laboratory settings to diverse field conditions. Traditional deep learning approaches exhibit severe performance degradation across different imaging environments, limiting practical deployment in real-world agricultural scenarios. This paper introduces a novel Flexible Contrastive Clustering (FCC) framework for zero-shot tomato disease classification that addresses fundamental generalization limitations through vision-language learning. Unlike standard CLIP’s one-to-one image-text pairing, our method leverages one-to-many relationships where each disease image is associated with multiple diverse textual descriptions, enabling robust representation learning across linguistic variations. The FCC framework optimizes class-based clustering in joint embedding space through a specialized loss function that treats all same-class descriptions as positives, facilitating effective handling of both seen and unseen disease categories during zero-shot evaluation. We evaluate our approach on PlantDoc training data (740 images) and test across four diverse tomato disease datasets totaling 17,313 images, spanning laboratory and field conditions. Experimental results demonstrate substantial improvements over state-of-the-art vision-language models, achieving an average of 30.15% accuracy and 28.05% weighted F1-score on average across all test datasets. Our method shows particularly strong performance on field datasets, achieving 59.70% accuracy on FieldPlant and 26.52% on Tomato Village, significantly outperforming existing approaches. Attention visualization analysis reveals effective disease localization capabilities for both seen and unseen categories, validating the practical applicability of our approach for real-world agricultural monitoring systems.\u003c/p\u003e","manuscriptTitle":"Cross-Domain Tomato Disease Classification via Flexible Contrastive Clustering in Vision-Language Models","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2026-03-24 06:12:29","doi":"10.21203/rs.3.rs-9198236/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"3df705bd-ed6d-4cd5-8489-8fac2f0f1e31","owner":[],"postedDate":"March 24th, 2026","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[{"id":64955699,"name":"Agricultural Engineering"}],"tags":[],"updatedAt":"2026-03-24T06:12:29+00:00","versionOfRecord":[],"versionCreatedAt":"2026-03-24 06:12:29","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-9198236","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-9198236","identity":"rs-9198236","version":["v1"]},"buildId":"XKTyCvWXoU3ODBz1xrDgd","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

Ask this paper AI returns verbatim quotes from the full text · source: preprint-html

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2026) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc
last seen: 2026-05-20T01:45:00.602351+00:00