A Two-Stream Deep Learning Approach for Enhanced Two-Person Human Interaction Recognition in Videos

preprint OA: closed
Full text JSON View at publisher
Full text 13,041 characters · extracted from preprint-html · click to expand
A Two-Stream Deep Learning Approach for Enhanced Two-Person Human Interaction Recognition in Videos | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article A Two-Stream Deep Learning Approach for Enhanced Two-Person Human Interaction Recognition in Videos Hemel Sharker Akash, Md Abdur Rahim, Abu Saleh Musa Miah, Hyoun-Sup Lee, and 2 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-5103346/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract Human Interaction Recognition (HIR) between two people in videos is a critical field in computer vision and pattern recognition, aimed at identifying and understanding human interaction and actions for applications such as healthcare, surveillance, and human-computer interaction. Despite its significance, video-based HIR faces challenges in achieving satisfactory performance due to the complexity of human actions, variations in motion, different viewpoints, and environmental factors. In the study, we proposed a two-stream deep learning-based HIR system to address these challenges and improve the accuracy and reliability of HIR systems. In the process, two streams extract hierarchical features based on the skeleton and RGB information, respectively. In the first stream, we utilised YOLOv8-Pose for human pose extraction, then extracted features with three stacked LSM modules and enhanced them with a dense layer that is considered the final feature of the first stream. In the second stream, we utilized SAM on the input videos, and after filtering the Segment Anything Model (SAM) feature, we employed integrated LSTM and GRU to extract the long-range dependency feature and then enhanced them with a dense layer that was considered the final feature for the 2nd stream module. Here, SAM is utilized for segmented mesh generation, and ImageNet for feature extraction from images or meshes, focusing on extracting relevant features from sequential image data. Moreover, we newly created a custom filter function to enhance computational efficiency to eliminate irrelevant key points and mesh components from the dataset. We concatenated the two stream features and produced the final feature that fed into the classification module. The extensive experiment with the benchmark dataset of the proposed model has 96.07\% accuracy. The high-performance accuracy of the proposed model proved its superiority. Artificial Intelligence and Machine Learning Keypoint Mesh Model HIR SRGB-Model MobileNetv2 deep bidirectional LSTM. Full Text Additional Declarations The authors declare no competing interests. Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-5103346","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":355225787,"identity":"efb7b6bf-4a50-4ea9-91bc-4e68672a1217","order_by":0,"name":"Hemel Sharker Akash","email":"","orcid":"","institution":"Department of Computer Science and Engineering, Pabna University of Science and Technology","correspondingAuthor":false,"prefix":"","firstName":"Hemel","middleName":"Sharker","lastName":"Akash","suffix":""},{"id":355225788,"identity":"8414935a-cb3e-4494-aa73-0082e521ed4e","order_by":1,"name":"Md Abdur Rahim","email":"","orcid":"","institution":"Department of Computer Science and Engineering, Pabna University of Science and Technology","correspondingAuthor":false,"prefix":"","firstName":"Md","middleName":"Abdur","lastName":"Rahim","suffix":""},{"id":355225789,"identity":"a1e06ef5-9160-4c70-a730-4d1575556ca9","order_by":2,"name":"Abu Saleh Musa Miah","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAAvUlEQVRIiWNgGAWjYLCCD0DMBmUzE6WDcQbJWph5SHKT7oz0h59t2xjk+fjPGDD8qGFgNyekxexGjrF0bhuDYRvDGQPGnmMMzJYNhLUwALX8Z2xj7DFg4G1gYDY4QFBL+uPflm0M9m3MPAaMf4nTkmAmzdjGkNjGxmPATJwtZ96YWfacY0hu42ErOCxzTIIIvxxPf3zjRxmD7fz+wxsfvqmxSSYYYigA6CSJZAOStICAHelaRsEoGAWjYLgDAKiWNZipvRzdAAAAAElFTkSuQmCC","orcid":"","institution":"School of Computer Science and Engineering, The University of Aizu","correspondingAuthor":true,"prefix":"","firstName":"Abu","middleName":"Saleh Musa","lastName":"Miah","suffix":""},{"id":355225790,"identity":"15951b1d-e8dc-46be-bc7a-45f447b61fe7","order_by":3,"name":"Hyoun-Sup Lee","email":"","orcid":"","institution":"Department of Applied Software Engineering, Dongeui University","correspondingAuthor":false,"prefix":"","firstName":"Hyoun-Sup","middleName":"","lastName":"Lee","suffix":""},{"id":355225791,"identity":"028667ac-0d20-4aeb-9997-a45cea0839e4","order_by":4,"name":"Si-Woong Jang","email":"","orcid":"","institution":"Department of Computer Engineering, Dongeui University","correspondingAuthor":false,"prefix":"","firstName":"Si-Woong","middleName":"","lastName":"Jang","suffix":""},{"id":355225792,"identity":"4598d4cf-605a-48ff-95ae-00ac06e07b59","order_by":5,"name":"Jungpil Shin","email":"","orcid":"","institution":"School of Computer Science and Engineering, The University of Aizu","correspondingAuthor":false,"prefix":"","firstName":"Jungpil","middleName":"","lastName":"Shin","suffix":""}],"badges":[],"createdAt":"2024-09-17 13:42:48","currentVersionCode":1,"declarations":{"humanSubjects":false,"vertebrateSubjects":false,"conflictsOfInterestStatement":false,"humanSubjectEthicalGuidelines":false,"humanSubjectConsent":false,"humanSubjectClinicalTrial":false,"humanSubjectCaseReport":false,"vertebrateSubjectEthicalGuidelines":false},"doi":"10.21203/rs.3.rs-5103346/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-5103346/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":64817881,"identity":"62685e39-91cf-496d-8d8d-bb421490ac68","added_by":"auto","created_at":"2024-09-19 07:06:47","extension":"pdf","order_by":1,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":1098199,"visible":true,"origin":"","legend":"","description":"","filename":"HARDrRahimviversion1AR9.pdf","url":"https://assets-eu.researchsquare.com/files/rs-5103346/v1_covered_8e8fd9fb-cda2-414d-9d87-e1435e1449d4.pdf"}],"financialInterests":"The authors declare no competing interests.","formattedTitle":"\u003cp\u003eA Two-Stream Deep Learning Approach for Enhanced\u003c/p\u003e\n\u003cp\u003eTwo-Person Human Interaction Recognition in Videos\u003c/p\u003e","fulltext":[],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":false,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":true,"hideJournal":true,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":true,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":true,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"Keypoint Mesh Model, HIR, SRGB-Model, MobileNetv2, deep bidirectional LSTM.","lastPublishedDoi":"10.21203/rs.3.rs-5103346/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-5103346/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eHuman Interaction Recognition (HIR) between two people in videos is a critical field in computer vision and pattern recognition, aimed at identifying and understanding human interaction and actions for applications such as healthcare, surveillance, and human-computer interaction. Despite its significance, video-based HIR faces challenges in achieving satisfactory performance due to the complexity of human actions, variations in motion, different viewpoints, and environmental factors. In the study, we proposed a two-stream deep learning-based HIR system to address these challenges and improve the accuracy and reliability of HIR systems. In the process, two streams extract hierarchical features based on the skeleton and RGB information, respectively. In the first stream, we utilised YOLOv8-Pose for human pose extraction, then extracted features with three stacked LSM modules and enhanced them with a dense layer that is considered the final feature of the first stream. In the second stream, we utilized SAM on the input videos, and after filtering the Segment Anything Model (SAM) feature, we employed integrated LSTM and GRU to extract the long-range dependency feature and then enhanced them with a dense layer that was considered the final feature for the 2nd stream module. Here, SAM is utilized for segmented mesh generation, and ImageNet for feature extraction from images or meshes, focusing on extracting relevant features from sequential image data. Moreover, we newly created a custom filter function to enhance computational efficiency to eliminate irrelevant key points and mesh components from the dataset. We concatenated the two stream features and produced the final feature that fed into the classification module. The extensive experiment with the benchmark dataset of the proposed model has 96.07\\% accuracy. The high-performance accuracy of the proposed model proved its superiority.\u003c/p\u003e","manuscriptTitle":"A Two-Stream Deep Learning Approach for Enhanced\nTwo-Person Human Interaction Recognition in Videos","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2024-09-19 06:58:40","doi":"10.21203/rs.3.rs-5103346/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"efae8be9-cd22-4604-bc9f-f5931c335533","owner":[],"postedDate":"September 19th, 2024","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[{"id":37759427,"name":"Artificial Intelligence and Machine Learning"}],"tags":[],"updatedAt":"2024-09-19T06:58:40+00:00","versionOfRecord":[],"versionCreatedAt":"2024-09-19 06:58:40","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-5103346","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-5103346","identity":"rs-5103346","version":["v1"]},"buildId":"qtupq5eGEP_6zYnWcrvyt","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

Ask this paper AI returns verbatim quotes from the full text · source: preprint-html

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Outcome instruments

MUSA

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2024) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc
last seen: 2026-05-20T01:45:00.602351+00:00