GMS-JIGNet: Guided Multi-Scale Jigsaw Puzzles for Self-Supervised Artificial Spot Segmentation in Fundus Photography | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Article GMS-JIGNet: Guided Multi-Scale Jigsaw Puzzles for Self-Supervised Artificial Spot Segmentation in Fundus Photography Jaehan Joo, Hunyoul Lee, Suk Chan Kim This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-6441851/v1 This work is licensed under a CC BY 4.0 License Status: Published Journal Publication published 16 Jul, 2025 Read the published version in Scientific Reports → Version 1 posted 10 You are reading this latest preprint version Abstract Dust and sensor noise often create artificial spots in fundus photography, and clinicians may occasionally misinterpret them as pathological signs such as microaneurysms. Reliable computer-aided diagnosis depends on accurately identifying and segmenting such artifacts. However, producing pixel-level annotations for these subtle structures remains labor-intensive and challenging to scale. We propose GMS-JIGNet, a self-supervised segmentation framework based on guided multi-scale jigsaw puzzles and contrastive learning, to address this issue. The method learns spatially-aware representations from unlabeled data by solving jigsaw puzzles across multiple resolutions while selectively injecting positional hints for uninformative regions. The downstream segmentation model receives these representations and uses the ViT encoders from the pretext task as fixed feature extractors and a lightweight FPN decoder. Experimental results on a large-scale fundus dataset show that our proposed model achieves state-of-the-art performance across various metrics, including IoU, DICE, and SSIM, even when trained with only a few labeled images. Moreover, we conducted ablation studies to assess how well our architecture performs under different training hyperparameter setups. The results support the effectiveness of guided self-supervised learning in medical image segmentation and suggest its strong potential for clinical use, especially in settings with limited labeled data. Health sciences/Diseases/Eye diseases Health sciences/Health care/Medical imaging Health sciences/Biomarkers/Diagnostic markers Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Full Text Additional Declarations No competing interests reported. Cite Share Download PDF Status: Published Journal Publication published 16 Jul, 2025 Read the published version in Scientific Reports → Version 1 posted Editorial decision: Revision requested 02 Jun, 2025 Reviews received at journal 29 May, 2025 Reviewers agreed at journal 17 May, 2025 Reviews received at journal 16 May, 2025 Reviewers agreed at journal 16 May, 2025 Reviewers invited by journal 13 May, 2025 Editor assigned by journal 13 May, 2025 Editor invited by journal 22 Apr, 2025 Submission checks completed at journal 21 Apr, 2025 First submitted to journal 13 Apr, 2025 You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-6441851","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Article","associatedPublications":[],"authors":[{"id":456439668,"identity":"90eab05a-7db9-4b78-9c47-a43099ca6217","order_by":0,"name":"Jaehan Joo","email":"","orcid":"","institution":"Pusan National University","correspondingAuthor":false,"prefix":"","firstName":"Jaehan","middleName":"","lastName":"Joo","suffix":""},{"id":456439676,"identity":"d13a7da3-93a9-431b-8e1d-c87cda0cb12a","order_by":1,"name":"Hunyoul Lee","email":"","orcid":"","institution":"Pusan National University","correspondingAuthor":false,"prefix":"","firstName":"Hunyoul","middleName":"","lastName":"Lee","suffix":""},{"id":456439677,"identity":"3e044493-c91f-47c5-b98c-e987046aaba9","order_by":2,"name":"Suk Chan Kim","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAAxElEQVRIiWNgGAWjYFACNoYDDBUIrgGRWs6QqoWBsY0ULfLtxxIP3Zx3J3F++9kDDD9qGIzNGwhoMTiTduBw7rZniRvO5CUw9hxjMJM5QEiLBHsDUMvhxA0MOQYMvA0MNhIEHTYDpGXO4cT5/W8MGP8So4XhBhvQYQ2HExtu5BgwA20xI6gF6JeEwznHnhlvuPHG4LDMMQljwg5rP2b8Oafmjuz8/hzDh29qbAxnEHQYBByAkYR9gqplFIyCUTAKRgFWAADmkkM03ZedyAAAAABJRU5ErkJggg==","orcid":"","institution":"Pusan National University","correspondingAuthor":true,"prefix":"","firstName":"Suk","middleName":"Chan","lastName":"Kim","suffix":""}],"badges":[],"createdAt":"2025-04-14 02:38:12","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-6441851/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-6441851/v1","draftVersion":[],"editorialEvents":[{"content":"https://doi.org/10.1038/s41598-025-07077-4","type":"published","date":"2025-07-16T15:57:07+00:00"}],"editorialNote":"","failedWorkflow":false,"files":[{"id":82891361,"identity":"b49b7f60-0ebb-41ca-b03c-3ad694707089","added_by":"auto","created_at":"2025-05-16 12:12:45","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":20282805,"visible":true,"origin":"","legend":"\u003cp\u003eOverview of the proposed pretext task model. The input image is split into 4×4, 8×8, and 16×16 patches, each processed through a GP block before being passed to a shared ViT encoder. PE denotes patch embedding, and GP denotes the Guided Permutation block.\u003c/p\u003e","description":"","filename":"Figure1.png","url":"https://assets-eu.researchsquare.com/files/rs-6441851/v1/b8d00e3eaf23b853a63c0a61.png"},{"id":82889115,"identity":"9deb0f19-5a23-426b-95c4-9eb920d37f04","added_by":"auto","created_at":"2025-05-16 12:04:45","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":20696143,"visible":true,"origin":"","legend":"\u003cp\u003eComparison between traditional jigsaw permutation and the proposed guided jigsaw permutation.\u003c/p\u003e","description":"","filename":"Figure2.png","url":"https://assets-eu.researchsquare.com/files/rs-6441851/v1/085df3beb5229631c56d4bd2.png"},{"id":82889108,"identity":"ce9a9d64-a3b4-44f7-b195-1392fd9ccb3c","added_by":"auto","created_at":"2025-05-16 12:04:44","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":8981311,"visible":true,"origin":"","legend":"\u003cp\u003eOverview of the proposed downstream segmentation model for artificial spot detection. The same fundus image is passed to three ViT encoders pretrained on different jigsaw scales (4×4, 8×8, and 16×16). Each encoder outputs intermediate token features from transformer blocks 5, 7, and 11. Features from the same layer level across encoders are concatenated channel-wise and fused via a 1×1 convolution to form an aggregated representation at each depth. These aggregated features are passed to a shared FPN-style decoder composed of hierarchical decoder blocks with skip connections. Each decoder block consists of two 3×3 convolutional layers, each followed by BatchNorm and ReLU activation. Decoder outputs are progressively upsampled, and the final prediction is produced as a 512×512 artificial spot mask.\u003c/p\u003e","description":"","filename":"Figure3.png","url":"https://assets-eu.researchsquare.com/files/rs-6441851/v1/f6b3ef6b081f1217c02cd54b.png"},{"id":82889109,"identity":"f6d9c461-e788-4f3b-9b63-2560e6daeb60","added_by":"auto","created_at":"2025-05-16 12:04:45","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":13107283,"visible":true,"origin":"","legend":"\u003cp\u003eQualitative results from the pretext task. The original image (left) is split into jigsaw patches at three scales: 4×4, 8×8, and 16×16. Ground truth layouts (a, c, e) and corresponding model predictions (b, d, f) are shown.\u003c/p\u003e","description":"","filename":"Figure4.png","url":"https://assets-eu.researchsquare.com/files/rs-6441851/v1/564afb1756f6d32b98c54737.png"},{"id":82891360,"identity":"a583f371-2d57-45bf-a7bc-50c5e5617e3d","added_by":"auto","created_at":"2025-05-16 12:12:45","extension":"png","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":2791130,"visible":true,"origin":"","legend":"\u003cp\u003eQualitative results of the proposed model on artificial spot segmentation. Each row shows the original fundus image, the ground truth mask, and the predicted mask. The visual results show that the proposed model can accurately segment artificial spot regions.\u003c/p\u003e","description":"","filename":"Figure5.png","url":"https://assets-eu.researchsquare.com/files/rs-6441851/v1/a45c6e7145651d87b908e2a0.png"},{"id":87219467,"identity":"83c09fc5-3c57-4af5-808d-5991e61ef546","added_by":"auto","created_at":"2025-07-21 16:05:04","extension":"pdf","order_by":1,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":5134347,"visible":true,"origin":"","legend":"","description":"","filename":"Manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-6441851/v1_covered_cf7c258e-c4aa-4df3-a531-970da0e79dd1.pdf"}],"financialInterests":"No competing interests reported.","formattedTitle":"GMS-JIGNet: Guided Multi-Scale Jigsaw Puzzles for Self-Supervised Artificial Spot Segmentation in Fundus Photography","fulltext":[],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":false,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":true,"isAuthorSuppliedPdf":true,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":true,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"scientific-reports","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"scirep","sideBox":"Learn more about [Scientific Reports](http://www.nature.com/srep/)","snPcode":"","submissionUrl":"","title":"Scientific Reports","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"stoa","reportingPortfolio":"Scientific Reports","inReviewEnabled":true,"inReviewRevisionsEnabled":true},"keywords":"","lastPublishedDoi":"10.21203/rs.3.rs-6441851/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-6441851/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"Dust and sensor noise often create artificial spots in fundus photography, and clinicians may occasionally misinterpret them as pathological signs such as microaneurysms. Reliable computer-aided diagnosis depends on accurately identifying and segmenting such artifacts. However, producing pixel-level annotations for these subtle structures remains labor-intensive and challenging to scale. We propose GMS-JIGNet, a self-supervised segmentation framework based on guided multi-scale jigsaw puzzles and contrastive learning, to address this issue. The method learns spatially-aware representations from unlabeled data by solving jigsaw puzzles across multiple resolutions while selectively injecting positional hints for uninformative regions. The downstream segmentation model receives these representations and uses the ViT encoders from the pretext task as fixed feature extractors and a lightweight FPN decoder. Experimental results on a large-scale fundus dataset show that our proposed model achieves state-of-the-art performance across various metrics, including IoU, DICE, and SSIM, even when trained with only a few labeled images. Moreover, we conducted ablation studies to assess how well our architecture performs under different training hyperparameter setups. The results support the effectiveness of guided self-supervised learning in medical image segmentation and suggest its strong potential for clinical use, especially in settings with limited labeled data.","manuscriptTitle":"GMS-JIGNet: Guided Multi-Scale Jigsaw Puzzles for Self-Supervised Artificial Spot Segmentation in Fundus Photography","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-05-16 12:04:40","doi":"10.21203/rs.3.rs-6441851/v1","editorialEvents":[{"type":"communityComments","content":0},{"type":"decision","content":"Revision requested","date":"2025-06-02T15:07:21+00:00","index":"","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2025-05-29T13:38:00+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"107101307703168242912234539736784590693","date":"2025-05-18T00:53:57+00:00","index":"hide","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2025-05-16T06:18:05+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"78873410640697975220540099750673068735","date":"2025-05-16T06:01:54+00:00","index":"hide","fulltext":""},{"type":"reviewersInvited","content":"","date":"2025-05-13T12:39:41+00:00","index":"","fulltext":""},{"type":"editorAssigned","content":"","date":"2025-05-13T12:11:54+00:00","index":"","fulltext":""},{"type":"editorInvited","content":"","date":"2025-04-22T14:21:16+00:00","index":"","fulltext":""},{"type":"checksComplete","content":"","date":"2025-04-21T10:28:39+00:00","index":"","fulltext":""},{"type":"submitted","content":"Scientific Reports","date":"2025-04-14T02:27:31+00:00","index":"","fulltext":""}],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"scientific-reports","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"scirep","sideBox":"Learn more about [Scientific Reports](http://www.nature.com/srep/)","snPcode":"","submissionUrl":"","title":"Scientific Reports","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"stoa","reportingPortfolio":"Scientific Reports","inReviewEnabled":true,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"0a4dda7d-942c-440f-a3a0-6f8824e65d07","owner":[],"postedDate":"May 16th, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"published-in-journal","subjectAreas":[{"id":48514632,"name":"Health sciences/Diseases/Eye diseases"},{"id":48514633,"name":"Health sciences/Health care/Medical imaging"},{"id":48514634,"name":"Health sciences/Biomarkers/Diagnostic markers"}],"tags":[],"updatedAt":"2025-07-21T16:01:39+00:00","versionOfRecord":{"articleIdentity":"rs-6441851","link":"https://doi.org/10.1038/s41598-025-07077-4","journal":{"identity":"scientific-reports","isVorOnly":false,"title":"Scientific Reports"},"publishedOn":"2025-07-16 15:57:07","publishedOnDateReadable":"July 16th, 2025"},"versionCreatedAt":"2025-05-16 12:04:40","video":"","vorDoi":"10.1038/s41598-025-07077-4","vorDoiUrl":"https://doi.org/10.1038/s41598-025-07077-4","workflowStages":[]},"version":"v1","identity":"rs-6441851","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-6441851","identity":"rs-6441851","version":["v1"]},"buildId":"8U1c8b4HqxoKbykW_rLl7","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.