Strongly Connected Components Are All You Need: Graph-Theoretic Interpretability and Optimization for Vision Transformers | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article Strongly Connected Components Are All You Need: Graph-Theoretic Interpretability and Optimization for Vision Transformers Devansh Garg This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-7877883/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract Vision Transformers have revolutionized computer vision with unprecedented performance, yet their attention mechanisms remain largely opaque, hindering both trust and optimization in critical applications [1, 2]. Despite significant advances in transformer interpretability, existing approaches provide limited structural insights into how attention patterns encode visual information, creating a fundamental barrier to understanding these powerful models [3, 4]. We address this challenge by introducing a novel, comprehensive graph-theoretic framework for analyzing Vision Transformer attention mechanisms. Our approach treats attention matrices as directed graphs and applies Tarjan’s Strongly Connected Components algorithm to reveal hidden structural patterns within attention flows [5]. Through systematic evaluation of 22 pre-trained models across ViT, DeiT, and CLIP architectures, we discover that Strongly Connected Components with meaningful connectivity directly correspond to semantically coherent visual features—providing a mathematical foundation for understanding attention-based feature learning [6]. This insight enables a novel optimization strategy: layers containing minimal meaningful connectivity can be selectively ablated through attention linearization, achieving 16-30% inference speedup with only 0-7% accuracy loss for large models. Our findings demonstrate that ViT models exhibit superior structural organization (92.6% effectiveness) compared to DeiT (51.1%) and CLIP (57.2%) variants, while patch32 configurations achieve optimal performance tradeoffs [7, 8]. Beyond interpretability, this work establishes graph theory as a powerful lens for transformer analysis and provides practical tools for deploying efficient vision models without sacrificing semantic understanding [9]. Artificial Intelligence and Machine Learning Graphical Systems Vision Transformers Graph Theory Attention Mechanisms Interpretability Model Optimization Full Text Additional Declarations The authors declare no competing interests. Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-7877883","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":530651816,"identity":"9d405465-c497-4da8-9d56-28781f05d7a9","order_by":0,"name":"Devansh Garg","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAA8UlEQVRIiWNgGAWjYDCCAzCSmbGBgaECyGZmbiBaS2MDwxkGqF6itDAwMDYwtkFovDr4jvcYfi74c8eugZ25/dHNebXR/O1ALT8qtuHUInnmjLH0zLZnyQ1AhzXnbjueO+Mw0LaeM7dxajG4kZYgzdtwOJkBouVYbgNQCzNjGx4t958l/+b5A9My51jufIJabjAfk+ZhO2wH0dJQk7uBkBbJM8nHrHnbDiewAbXMzjl2IHcjUMtBfH7hO36w+TbQYfb8/McffM6pqcudd/7wwQc/KnBrgYHENgh9GEweIKgeCOyhdB0xikfBKBgFo2CEAQBYw2Fg46EDVgAAAABJRU5ErkJggg==","orcid":"https://orcid.org/0009-0000-8210-2526","institution":"Indian Institute of Technology, Mandi","correspondingAuthor":true,"prefix":"","firstName":"Devansh","middleName":"","lastName":"Garg","suffix":""}],"badges":[],"createdAt":"2025-10-16 12:50:18","currentVersionCode":1,"declarations":{"humanSubjects":false,"vertebrateSubjects":false,"conflictsOfInterestStatement":false,"humanSubjectEthicalGuidelines":false,"humanSubjectConsent":false,"humanSubjectClinicalTrial":false,"humanSubjectCaseReport":false,"vertebrateSubjectEthicalGuidelines":false},"doi":"10.21203/rs.3.rs-7877883/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-7877883/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":93895753,"identity":"a86bb784-3341-45ca-8c6e-421b87d28168","added_by":"auto","created_at":"2025-10-20 03:16:30","extension":"pdf","order_by":1,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":23512195,"visible":true,"origin":"","legend":"","description":"","filename":"Preprintgraphtheory.pdf","url":"https://assets-eu.researchsquare.com/files/rs-7877883/v1_covered_4fbf1b4d-b3b4-45e7-b9a4-58a8531e24a7.pdf"}],"financialInterests":"The authors declare no competing interests.","formattedTitle":"\u003cp\u003eStrongly Connected Components Are All You Need: Graph-Theoretic Interpretability and Optimization for Vision Transformers\u003c/p\u003e","fulltext":[],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":false,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":true,"hideJournal":true,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":true,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":true,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"Vision Transformers, Graph Theory, Attention Mechanisms, Interpretability, Model Optimization","lastPublishedDoi":"10.21203/rs.3.rs-7877883/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-7877883/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eVision Transformers have revolutionized computer vision with unprecedented performance, yet their attention mechanisms remain largely opaque, hindering both trust and optimization in critical applications [1, 2]. Despite significant advances in transformer interpretability, existing approaches provide limited structural insights into how attention patterns encode visual information, creating a fundamental barrier to understanding these powerful models [3, 4]. We address this challenge by introducing a novel, comprehensive graph-theoretic framework for analyzing Vision Transformer attention mechanisms. Our approach treats attention matrices as directed graphs and applies Tarjan’s Strongly Connected Components algorithm to reveal hidden structural patterns within attention flows [5]. Through systematic evaluation of 22 pre-trained models across ViT, DeiT, and CLIP architectures, we discover that Strongly Connected Components with meaningful connectivity directly correspond to semantically coherent visual features—providing a mathematical foundation for understanding attention-based feature learning [6]. This insight enables a novel optimization strategy: layers containing minimal meaningful connectivity can be selectively ablated through attention linearization, achieving 16-30% inference speedup with only 0-7% accuracy loss for large models. Our findings demonstrate that ViT models exhibit superior structural organization (92.6% effectiveness) compared to DeiT (51.1%) and CLIP (57.2%) variants, while patch32 configurations achieve optimal performance tradeoffs [7, 8]. Beyond interpretability, this work establishes graph theory as a powerful lens for transformer analysis and provides practical tools for deploying efficient vision models without sacrificing semantic understanding [9].\u003c/p\u003e","manuscriptTitle":"Strongly Connected Components Are All You Need: Graph-Theoretic Interpretability and Optimization for Vision Transformers","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-10-20 03:07:25","doi":"10.21203/rs.3.rs-7877883/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"041d5dda-5102-460e-a452-ba269738f5ae","owner":[],"postedDate":"October 20th, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[{"id":56412929,"name":"Artificial Intelligence and Machine Learning"},{"id":56412930,"name":"Graphical Systems"}],"tags":[],"updatedAt":"2025-10-20T03:07:25+00:00","versionOfRecord":[],"versionCreatedAt":"2025-10-20 03:07:25","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-7877883","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-7877883","identity":"rs-7877883","version":["v1"]},"buildId":"8U1c8b4HqxoKbykW_rLl7","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.