Improving Technical Diagram Analysis using Deep Feature Extraction, OCR Integration, and Graph-Based Structural Reasoning with Multilingual Question Answering and Diagram Summarization

preprint OA: closed
Full text JSON View at publisher
Full text 12,510 characters · extracted from preprint-html · click to expand
Improving Technical Diagram Analysis using Deep Feature Extraction, OCR Integration, and Graph-Based Structural Reasoning with Multilingual Question Answering and Diagram Summarization | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article Improving Technical Diagram Analysis using Deep Feature Extraction, OCR Integration, and Graph-Based Structural Reasoning with Multilingual Question Answering and Diagram Summarization Veerababu Reddy, Ashok Katari, Uma Maheswara Reddy Syamala, Yugandhar Viyyapu, and 1 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-9483134/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract Technical diagrams consist of complex visual structures along with embedded textual information, making their accurate interpretation a challenging task in image processing. Conventional approaches primarily rely on either visual feature extraction or text detection independently, which limits their ability to effectively capture structural relationships, contextual dependencies, and interactions between visual and textual components within diagrams. In this paper, an improved deep learning-based framework for technical diagram analysis is presented, extending traditional methods through the integration of visual, textual, and structural information. The proposed approach employs convolutional neural networks for robust visual feature extraction and incorporates Optical Character Recognition (OCR) for detecting and recognizing textual elements within diagrams. Furthermore, a graph-based structural modeling mechanism is introduced to represent relationships among diagram components, enabling enhanced contextual understanding and more accurate interpretation of complex layouts. Building upon this foundation, the framework is further extended with multilingual question answering and automated diagram summarization capabilities, where the multilingual interface enables interaction across multiple languages and the summarization module generates concise and meaningful textual descriptions of diagram content. The proposed method achieves an accuracy of 85.6%, precision of 83.2%, recall of 82.7%, and F1-score of 82.9%. The project dataset is available at https://doi.org/10.5281/zenodo.19487461, and the source code is accessible at https://github.com/mahesh- 1415/multi_ligual_techVQA. Technical Diagram Analysis Image Processing Deep Feature Extraction Optical Character Recognition (OCR) Graph-Based Structural Modeling Multilingual Question Answering Diagram Summarization Full Text Additional Declarations No competing interests reported. Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-9483134","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":629141451,"identity":"e868c07f-d608-4d70-bd69-b0c285447e4c","order_by":0,"name":"Veerababu Reddy","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAABHklEQVRIie3RMWuDQBTA8SeCLgZXhWK+womQDKX4VU6Ezo4dMpwIdpHMfoxMLdkMD5JF6OrgYBFculgCKYUQehYCQkylW6H3H94dx/tNByAS/d0yIN1R8amDxHrvY4QSAib7HeEH+WHtu3nqb+qHRTmdG1QxaHB0ncKP3oJFCfpjJmFwSW6Ke9/Ot429TjtCiPdUePFtum3AyClgekkMLZ+ZTEFpVVCZcEJn+SZ2NAUBCgDUhsjLxyc7oXsmrpN05IQwvUbURJHCGD1OpIoTaaWGUT2JEchVEjtmuGz8dVJ1xPHSXRjLkyVqdu6xQSLLr+/sUN49qxSy9mi5eqTWe+2AlrVD3A+QXnp7vikGH3y596djye34jkgkEv2jvgAni21Z0IDXAQAAAABJRU5ErkJggg==","orcid":"","institution":"Vignan's Lara Institute of Technology and Science","correspondingAuthor":true,"prefix":"","firstName":"Veerababu","middleName":"","lastName":"Reddy","suffix":""},{"id":629141452,"identity":"07d3e6c9-3222-4afb-a7bb-685fdc518b41","order_by":1,"name":"Ashok Katari","email":"","orcid":"","institution":"Vignan's Lara Institute of Technology and Science","correspondingAuthor":false,"prefix":"","firstName":"Ashok","middleName":"","lastName":"Katari","suffix":""},{"id":629141453,"identity":"16d01a07-6c48-4c1e-bf23-01f463bd2322","order_by":2,"name":"Uma Maheswara Reddy Syamala","email":"","orcid":"","institution":"Vignan's Lara Institute of Technology and Science","correspondingAuthor":false,"prefix":"","firstName":"Uma","middleName":"Maheswara Reddy","lastName":"Syamala","suffix":""},{"id":629141454,"identity":"ebee5a0a-535f-4cd9-a89b-83de445ca049","order_by":3,"name":"Yugandhar Viyyapu","email":"","orcid":"","institution":"Vignan's Lara Institute of Technology and Science","correspondingAuthor":false,"prefix":"","firstName":"Yugandhar","middleName":"","lastName":"Viyyapu","suffix":""},{"id":629141455,"identity":"a3f2c246-0aba-49a7-95e7-e99bd6f00c77","order_by":4,"name":"Siva Ganesh Challa","email":"","orcid":"","institution":"Vignan's Lara Institute of Technology and Science","correspondingAuthor":false,"prefix":"","firstName":"Siva","middleName":"Ganesh","lastName":"Challa","suffix":""}],"badges":[],"createdAt":"2026-04-21 10:56:17","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-9483134/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-9483134/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":107870615,"identity":"1041d319-a08c-4d24-8428-c2e063f8fd05","added_by":"auto","created_at":"2026-04-27 07:40:09","extension":"pdf","order_by":1,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":1741034,"visible":true,"origin":"","legend":"","description":"","filename":"ImprovingTechnicalDiagramAnalysisusingDeepFeatureExtraction.pdf","url":"https://assets-eu.researchsquare.com/files/rs-9483134/v1_covered_3d4071cf-0c87-4298-8acc-3f126c567007.pdf"}],"financialInterests":"No competing interests reported.","formattedTitle":"Improving Technical Diagram Analysis using Deep Feature Extraction, OCR Integration, and Graph-Based Structural Reasoning with Multilingual Question Answering and Diagram Summarization","fulltext":[],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":false,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":true,"hideJournal":true,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":true,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":true,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"Technical Diagram Analysis, Image Processing, Deep Feature Extraction, Optical Character Recognition (OCR), Graph-Based Structural Modeling, Multilingual Question Answering, Diagram Summarization","lastPublishedDoi":"10.21203/rs.3.rs-9483134/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-9483134/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eTechnical diagrams consist of complex visual structures along with embedded textual information, making their accurate interpretation a challenging task in image processing. Conventional approaches primarily rely on either visual feature extraction or text detection independently, which limits their ability to effectively capture structural relationships, contextual dependencies, and interactions between visual and textual components within diagrams. In this paper, an improved deep learning-based framework for technical diagram analysis is presented, extending traditional methods through the integration of visual, textual, and structural information. The proposed approach employs convolutional neural networks for robust visual feature extraction and incorporates Optical Character Recognition (OCR) for detecting and recognizing textual elements within diagrams. Furthermore, a graph-based structural modeling mechanism is introduced to represent relationships among diagram components, enabling enhanced contextual understanding and more accurate interpretation of complex layouts. Building upon this foundation, the framework is further extended with multilingual question answering and automated diagram summarization capabilities, where the multilingual interface enables interaction across multiple languages and the summarization module generates concise and meaningful textual descriptions of diagram content. The proposed method achieves an accuracy of 85.6%, precision of 83.2%, recall of 82.7%, and F1-score of 82.9%. The project dataset is available at https://doi.org/10.5281/zenodo.19487461, and the source code is accessible at https://github.com/mahesh- 1415/multi_ligual_techVQA.\u003c/p\u003e","manuscriptTitle":"Improving Technical Diagram Analysis using Deep Feature Extraction, OCR Integration, and Graph-Based Structural Reasoning with Multilingual Question Answering and Diagram Summarization","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2026-04-27 03:26:09","doi":"10.21203/rs.3.rs-9483134/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"b3988f64-c7aa-4c86-bd18-1685f20b5173","owner":[],"postedDate":"April 27th, 2026","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[],"tags":[],"updatedAt":"2026-04-27T03:26:09+00:00","versionOfRecord":[],"versionCreatedAt":"2026-04-27 03:26:09","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-9483134","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-9483134","identity":"rs-9483134","version":["v1"]},"buildId":"XKTyCvWXoU3ODBz1xrDgd","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

Ask this paper AI returns verbatim quotes from the full text · source: preprint-html

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2026) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc
last seen: 2026-05-20T01:45:00.602351+00:00