Spectral Pyramid Pooling and Fused Keypoint Generation in ResNet-50 for Robust 3D Object Detection | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article Spectral Pyramid Pooling and Fused Keypoint Generation in ResNet-50 for Robust 3D Object Detection R. Ramana, V. Vasudevan, B. S. Murugan This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-4179876/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract Accurate and robust 3D object detection under various environments is a challenging task and 3D object detection mainly relies on the direction, position and size of the objects. The traditional object detection approaches are affected based on diverse issues such as background clutter, camera view-point variations and occlusions. To avoid these issues in the field of object detection, one of the novel approaches named as Faster Region-based Convolutional Neural ResNet-50 (FRCNResNet-50) model is proposed that detects and classifies the 3D objects in the images. At first, the images are collected from three different data sources such as KITTI dataset, a nuScene dataset and an MIT Indoor Scene dataset. Then these images are preprocessed to enhance the image quality and generalization ability of the proposed model. The ResNet-50 model is designed to extract features by using a Spectral Pyramid Pooling (SPP) layer and a Fused Keypoint Generation (FKG) layer that enhances detection efficiency and reduces computational cost. The FRCNN model is implemented to detect 3D objects that include the ROI pooling layer for multi-class classification and for presenting its corresponding regression bounding box. The experimental validation is performed based on the significant measurements and quantitative analyses that showed the proposed model achieved better performances of 98.58% from accuracy analysis and 98 ms from computational time. 3D object detection regression bounding box ResNet-50 convolutional neural network faster region model keypoint generation and spectral pyramid pooling Full Text Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-4179876","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":500013938,"identity":"34e877c3-c5a1-4cc3-abf1-732da94f634d","order_by":0,"name":"R. Ramana","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAA00lEQVRIiWNgGAWjYDADNobkA0BKQoYULWkJIC08pNiTYwAiCWsxn3346YafO+wY+NhzPr+6UWPBw8B++OgGfFpkzqWZ3ew9k8zAxvN2m3XOMaDDeNLSbuDTAlRidoO3jZmBTSJ3m3EOG5AvwWNGQAv7t5t/2+qBWnKeGef8I0oLj9lt3rbDIC3Mj3PbiNNSdlu27TgPG88zM+bcPgkgg6Bf2LfdfNtWLSffnvz4c863Ojl+9sPH8GqBAVB0sEmAWGzEKIcB5g+kqB4Fo2AUjIKRAwAVvz5CWTKfHgAAAABJRU5ErkJggg==","orcid":"","institution":"Kalasalingam Academy of Research and Education (Deemed to be University)","correspondingAuthor":true,"prefix":"","firstName":"R.","middleName":"","lastName":"Ramana","suffix":""},{"id":500013939,"identity":"2cdd767f-7b98-4090-9e96-9faf7deeed66","order_by":1,"name":"V. Vasudevan","email":"","orcid":"","institution":"Kalasalingam Academy of Research and Education (Deemed to be University)","correspondingAuthor":false,"prefix":"","firstName":"V.","middleName":"","lastName":"Vasudevan","suffix":""},{"id":500013940,"identity":"d1a129c0-6dc9-45fb-89a2-735ad6865ab0","order_by":2,"name":"B. S. Murugan","email":"","orcid":"","institution":"Kalasalingam Academy of Research and Education (Deemed to be University)","correspondingAuthor":false,"prefix":"","firstName":"B.","middleName":"S.","lastName":"Murugan","suffix":""}],"badges":[],"createdAt":"2024-03-28 06:02:33","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-4179876/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-4179876/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":89643320,"identity":"a18844e2-8941-4790-ae3c-ff95bdd6bab8","added_by":"auto","created_at":"2025-08-22 08:30:10","extension":"pdf","order_by":1,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":1125141,"visible":true,"origin":"","legend":"","description":"","filename":"Manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-4179876/v1_covered_83f3c7c3-d4c1-461c-afe7-85d71a59bb05.pdf"}],"financialInterests":"","formattedTitle":"Spectral Pyramid Pooling and Fused Keypoint Generation in ResNet-50 for Robust 3D Object Detection","fulltext":[],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":false,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":true,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":true,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":true,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"3D object detection, regression bounding box, ResNet-50, convolutional neural network, faster region model, keypoint generation and spectral pyramid pooling","lastPublishedDoi":"10.21203/rs.3.rs-4179876/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-4179876/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eAccurate and robust 3D object detection under various environments is a challenging task and 3D object detection mainly relies on the direction, position and size of the objects. The traditional object detection approaches are affected based on diverse issues such as background clutter, camera view-point variations and occlusions. To avoid these issues in the field of object detection, one of the novel approaches named as Faster Region-based Convolutional Neural ResNet-50 (FRCNResNet-50) model is proposed that detects and classifies the 3D objects in the images. At first, the images are collected from three different data sources such as KITTI dataset, a nuScene dataset and an MIT Indoor Scene dataset. Then these images are preprocessed to enhance the image quality and generalization ability of the proposed model. The ResNet-50 model is designed to extract features by using a Spectral Pyramid Pooling (SPP) layer and a Fused Keypoint Generation (FKG) layer that enhances detection efficiency and reduces computational cost. The FRCNN model is implemented to detect 3D objects that include the ROI pooling layer for multi-class classification and for presenting its corresponding regression bounding box. The experimental validation is performed based on the significant measurements and quantitative analyses that showed the proposed model achieved better performances of 98.58% from accuracy analysis and 98 ms from computational time.\u003c/p\u003e","manuscriptTitle":"Spectral Pyramid Pooling and Fused Keypoint Generation in ResNet-50 for Robust 3D Object Detection","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-08-21 06:52:33","doi":"10.21203/rs.3.rs-4179876/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"77e944d2-d915-464d-8eb5-f470055b1c2a","owner":[],"postedDate":"August 21st, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[],"tags":[],"updatedAt":"2025-08-22T08:22:02+00:00","versionOfRecord":[],"versionCreatedAt":"2025-08-21 06:52:33","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-4179876","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-4179876","identity":"rs-4179876","version":["v1"]},"buildId":"XKTyCvWXoU3ODBz1xrDgd","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.