Malware Reverse Engineering with Large Language Model for Superior Code Comprehensibility and IoC Recommendations | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article Malware Reverse Engineering with Large Language Model for Superior Code Comprehensibility and IoC Recommendations Ashley Q. Williamson, Michael Beauparlant This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-4471373/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract Malware reverse engineering, the process of dissecting malicious software to understand its functionality and behavior, faces significant challenges due to the complexity and obfuscation techniques employed by modern malware. The application of Gemini Pro for interpreting reverse-engineered malware code introduces a novel and significant approach to enhancing the understanding of complex malware behaviors. By leveraging advanced natural language processing capabilities, the model provides detailed and accurate explanations of malware's functional components, offering substantial improvements over traditional analysis methods. The study demonstrates the model's proficiency in identifying key operational mechanisms and recommending relevant indicators of compromise, which are crucial for effective threat detection and mitigation. A comprehensive comparative analysis reveals that Gemini Pro outperforms conventional static and dynamic analysis tools in terms of clarity, coherence, and time efficiency. Detailed case studies of various malware samples, including Ramnit, Kelihos, and Lollipop, illustrate the model's ability to generate clear and actionable insights, thereby facilitating better decision-making in cybersecurity contexts. The findings underscore the potential of integrating advanced natural language processing models into cybersecurity workflows to significantly enhance the efficiency and effectiveness of malware analysis and mitigation efforts. Artificial Intelligence and Machine Learning Cybersecurity Malware Analysis Natural Language Processing Reverse Engineering Indicators of Compromise Full Text Additional Declarations The authors declare no competing interests. Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-4471373","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":306287877,"identity":"12c688c0-6d7c-4cab-9f1d-63fc86d80c24","order_by":0,"name":"Ashley Q. Williamson","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAA/0lEQVRIiWNgGAWjYNACAwsZNgjLJgFMJRQQ1CLBA9WSlsAAYiUYELRGggfKOAzRwoBHC38D8wZmngIJHj7208kfPu44n8cv35344YEBgzy/2AHsxh9gK2DmATmMJ3eb5Mwzt4sl23g3SwAdZjhzdgJ2aw7wGDDngP2Su42Zt+124oZjvBtAWhIMbmPXIg/Xwv9282fetnMgLZt/4NNiANcikbtBmrftAEjLNry2GIL88ges5S3QL23JiTPbcrdZJBhI4PSL3AHmDYwz/tjIyffnbv7wsc0usZ/57OabPyps5PmlcXhf/oH5D2ziEtiVQ72DT3IUjIJRMApGAQMDADLdVAINEM+VAAAAAElFTkSuQmCC","orcid":"https://orcid.org/0009-0007-1102-2837","institution":"Lumina Cognition Lab","correspondingAuthor":true,"prefix":"","firstName":"Ashley","middleName":"Q.","lastName":"Williamson","suffix":""},{"id":306287878,"identity":"ae1caa31-a6ce-4b4d-b8eb-3979806b3322","order_by":1,"name":"Michael Beauparlant","email":"","orcid":"https://orcid.org/0009-0007-0395-4650","institution":"Lumina Cognition Lab","correspondingAuthor":false,"prefix":"","firstName":"Michael","middleName":"","lastName":"Beauparlant","suffix":""}],"badges":[],"createdAt":"2024-05-24 09:13:57","currentVersionCode":1,"declarations":{"humanSubjects":false,"vertebrateSubjects":false,"conflictsOfInterestStatement":false,"humanSubjectEthicalGuidelines":false,"humanSubjectConsent":false,"humanSubjectClinicalTrial":false,"humanSubjectCaseReport":false,"vertebrateSubjectEthicalGuidelines":false},"doi":"10.21203/rs.3.rs-4471373/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-4471373/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":57180382,"identity":"b2cd5fd2-df41-40b2-86b0-08357ca6420e","added_by":"auto","created_at":"2024-05-27 04:06:41","extension":"pdf","order_by":1,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":237470,"visible":true,"origin":"","legend":"","description":"","filename":"2024.pdf","url":"https://assets-eu.researchsquare.com/files/rs-4471373/v1_covered_3b982baf-6700-467b-a659-4dfc5e343319.pdf"}],"financialInterests":"The authors declare no competing interests.","formattedTitle":"\u003cp\u003eMalware Reverse Engineering with Large Language Model for Superior Code Comprehensibility and IoC Recommendations\u003c/p\u003e","fulltext":[],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":false,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":true,"hideJournal":true,"highlight":"","institution":"Lumina Cognition Lab","isAcceptedByJournal":false,"isAuthorSuppliedPdf":true,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":true,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"Cybersecurity, Malware Analysis, Natural Language Processing, Reverse Engineering, Indicators of Compromise","lastPublishedDoi":"10.21203/rs.3.rs-4471373/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-4471373/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eMalware reverse engineering, the process of dissecting malicious software to understand its functionality and behavior, faces significant challenges due to the complexity and obfuscation techniques employed by modern malware. The application of Gemini Pro for interpreting reverse-engineered malware code introduces a novel and significant approach to enhancing the understanding of complex malware behaviors. By leveraging advanced natural language processing capabilities, the model provides detailed and accurate explanations of malware's functional components, offering substantial improvements over traditional analysis methods. The study demonstrates the model's proficiency in identifying key operational mechanisms and recommending relevant indicators of compromise, which are crucial for effective threat detection and mitigation. A comprehensive comparative analysis reveals that Gemini Pro outperforms conventional static and dynamic analysis tools in terms of clarity, coherence, and time efficiency. Detailed case studies of various malware samples, including Ramnit, Kelihos, and Lollipop, illustrate the model's ability to generate clear and actionable insights, thereby facilitating better decision-making in cybersecurity contexts. The findings underscore the potential of integrating advanced natural language processing models into cybersecurity workflows to significantly enhance the efficiency and effectiveness of malware analysis and mitigation efforts.\u003c/p\u003e","manuscriptTitle":"Malware Reverse Engineering with Large Language Model for Superior Code Comprehensibility and IoC Recommendations","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2024-05-27 03:58:34","doi":"10.21203/rs.3.rs-4471373/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"e7052e42-5e51-4f4e-b41b-2d9ecd485e45","owner":[],"postedDate":"May 27th, 2024","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[{"id":32350627,"name":"Artificial Intelligence and Machine Learning"}],"tags":[],"updatedAt":"2024-05-27T03:58:34+00:00","versionOfRecord":[],"versionCreatedAt":"2024-05-27 03:58:34","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-4471373","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-4471373","identity":"rs-4471373","version":["v1"]},"buildId":"8U1c8b4HqxoKbykW_rLl7","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.