Feature Extraction with Refinement and Rebuilding Module for Visual Tracking | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article Feature Extraction with Refinement and Rebuilding Module for Visual Tracking Wenshuang Zhang, Lu Li, Pengcheng Sha, Dezheng Zhang, Jun Wang This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-6521857/v1 This work is licensed under a CC BY 4.0 License Status: Published Journal Publication published 05 Sep, 2025 Read the published version in Signal, Image and Video Processing → Version 1 posted 5 You are reading this latest preprint version Abstract Convolutional neural network based trackers have achieved excellent tracking performance in terms of accuracy and speed. The feature extraction network is an essential component of trackers. However, existing feature extraction sub-networks do not remove redundant spatial and channel information. There is significant redundancy in deep neural networks, not only in model parameters but also in the spatial and channel dimensions of feature maps. However, existing methods only reduce redundancy in either the channel or spatial dimension. As a result, redundancy issues in neural networks remain unresolved. In this work, we design a feature extraction subnetwork with a refinement and rebuilding module. Spatial and channel feature information is fully utilized to obtain more accurate target location information for the target template and search region, and also highlights the foreground information and suppresses background information. The template branch and search branch use weight separation to remove redundant features and reconstruct the remained features. This suppresses redundancy in the spatial dimension and enhances feature representation. A split transformation and fusion strategy is employed to reduce redundancy in the channel dimension as well as computational cost and storage. We propose a new tracking framework with Spatial Feature Refinement Module and the Channel Feature Rebuilding Module. We evaluated the proposed tracker on LaSOT, TrackingNet, NFS, UAV123, GOT-10K and TNL2K benchmarks, achieving leading performance with a tracking speed of 105 FPS. Convolutional neural network Visual tracking Feature extraction Refinement and Rebuilding Module Full Text Additional Declarations No competing interests reported. Cite Share Download PDF Status: Published Journal Publication published 05 Sep, 2025 Read the published version in Signal, Image and Video Processing → Version 1 posted Editorial decision: Revision requested 04 May, 2025 Reviewers invited by journal 04 May, 2025 Editor assigned by journal 24 Apr, 2025 Submission checks completed at journal 24 Apr, 2025 First submitted to journal 24 Apr, 2025 You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-6521857","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":451709302,"identity":"f9a2d373-bad1-47d7-836d-643d9e9b926c","order_by":0,"name":"Wenshuang Zhang","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAABCElEQVRIiWNgGAWjYDACCTQGDz8z8+EHJGmRkWxnSzMgSYuNwXkeBQlcqkFAfnbzs4df22zy5Gf3GH4u+MXAY3yYh8GAocYmGpcWxjnHzI1l29KKDe6cMZae2cfAY3aY98ADhmNpuQ04tDBLJJhJS247nLhBIsdAmrcHpIUvwYCx4TBOLWwS6d+AWv4nzp+RY/wbpMW4mcdAAp8WHokcM8mP2w4kNtzIMZPm+cHAY8BMQIuERE6ZNOO/5MQNN9LKrHkbgGYcBgZyAh6/yM9I3yb544wd0GHJm2/z/GGw5+8/fPjBhxobnFrAQcADYzG2/YcwEvAoByv8AWf+IaB0FIyCUTAKRiQAADLRVNpGzrq4AAAAAElFTkSuQmCC","orcid":"","institution":"Qingdao Preschool Education College","correspondingAuthor":true,"prefix":"","firstName":"Wenshuang","middleName":"","lastName":"Zhang","suffix":""},{"id":451709303,"identity":"6f2151dd-bdd5-40f4-b4fe-d4166cb22bab","order_by":1,"name":"Lu Li","email":"","orcid":"","institution":"Qingdao Preschool Education College","correspondingAuthor":false,"prefix":"","firstName":"Lu","middleName":"","lastName":"Li","suffix":""},{"id":451709306,"identity":"6ea29b90-01dd-4758-934f-8c5200783158","order_by":2,"name":"Pengcheng Sha","email":"","orcid":"","institution":"Nanchang Institute of Technology","correspondingAuthor":false,"prefix":"","firstName":"Pengcheng","middleName":"","lastName":"Sha","suffix":""},{"id":451709308,"identity":"02baa11f-01e2-4668-aea2-7926b6b73ddd","order_by":3,"name":"Dezheng Zhang","email":"","orcid":"","institution":"Qingdao Agricultural University","correspondingAuthor":false,"prefix":"","firstName":"Dezheng","middleName":"","lastName":"Zhang","suffix":""},{"id":451709310,"identity":"bbeee3a6-3627-4f6a-8fc1-357d0e2382a2","order_by":4,"name":"Jun Wang","email":"","orcid":"","institution":"Nanchang Institute of Technology","correspondingAuthor":false,"prefix":"","firstName":"Jun","middleName":"","lastName":"Wang","suffix":""}],"badges":[],"createdAt":"2025-04-24 14:53:05","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-6521857/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-6521857/v1","draftVersion":[],"editorialEvents":[{"content":"https://doi.org/10.1007/s11760-025-04487-9","type":"published","date":"2025-09-05T15:57:32+00:00"}],"editorialNote":"","failedWorkflow":false,"files":[{"id":90827956,"identity":"745a3e64-0d06-4ab4-9f8c-85a97dddce2b","added_by":"auto","created_at":"2025-09-08 16:04:03","extension":"pdf","order_by":1,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":1033285,"visible":true,"origin":"","legend":"","description":"","filename":"SCRTrack.pdf","url":"https://assets-eu.researchsquare.com/files/rs-6521857/v1_covered_f5976057-46ca-4a08-8cad-1d4375b19fa0.pdf"}],"financialInterests":"No competing interests reported.","formattedTitle":"Feature Extraction with Refinement and Rebuilding Module for Visual Tracking","fulltext":[],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":false,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":true,"isAuthorSuppliedPdf":true,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":true,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"signal-image-and-video-processing","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"sivp","sideBox":"Learn more about [Signal, Image and Video Processing](http://link.springer.com/journal/11760)","snPcode":"11760","submissionUrl":"https://submission.nature.com/new-submission/11760/3","title":"Signal, Image and Video Processing","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"em","reportingPortfolio":"Springer Hybrid","inReviewEnabled":true,"inReviewRevisionsEnabled":false},"keywords":"Convolutional neural network, Visual tracking, Feature extraction, Refinement and Rebuilding Module","lastPublishedDoi":"10.21203/rs.3.rs-6521857/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-6521857/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"Convolutional neural network based trackers have achieved excellent tracking performance in terms of accuracy and speed. The feature extraction network is an essential component of trackers. However, existing feature extraction sub-networks do not remove redundant spatial and channel information. There is significant redundancy in deep neural networks, not only in model parameters but also in the spatial and channel dimensions of feature maps. However, existing methods only reduce redundancy in either the channel or spatial dimension. As a result, redundancy issues in neural networks remain unresolved. In this work, we design a feature extraction subnetwork with a refinement and rebuilding module. Spatial and channel feature information is fully utilized to obtain more accurate target location information for the target template and search region, and also highlights the foreground information and suppresses background information. The template branch and search branch use weight separation to remove redundant features and reconstruct the remained features. This suppresses redundancy in the spatial dimension and enhances feature representation. A split transformation and fusion strategy is employed to reduce redundancy in the channel dimension as well as computational cost and storage. We propose a new tracking framework with Spatial Feature Refinement Module and the Channel Feature Rebuilding Module. We evaluated the proposed tracker on LaSOT, TrackingNet, NFS, UAV123, GOT-10K and TNL2K benchmarks, achieving leading performance with a tracking speed of 105 FPS.","manuscriptTitle":"Feature Extraction with Refinement and Rebuilding Module for Visual Tracking","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-05-07 03:03:14","doi":"10.21203/rs.3.rs-6521857/v1","editorialEvents":[{"type":"communityComments","content":0},{"type":"decision","content":"Revision requested","date":"2025-05-04T17:25:15+00:00","index":"","fulltext":""},{"type":"reviewersInvited","content":"","date":"2025-05-04T17:25:03+00:00","index":"","fulltext":""},{"type":"editorAssigned","content":"","date":"2025-04-25T02:13:22+00:00","index":"","fulltext":""},{"type":"checksComplete","content":"","date":"2025-04-25T02:13:01+00:00","index":"","fulltext":""},{"type":"submitted","content":"Signal, Image and Video Processing","date":"2025-04-24T14:37:23+00:00","index":"","fulltext":""}],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"signal-image-and-video-processing","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"sivp","sideBox":"Learn more about [Signal, Image and Video Processing](http://link.springer.com/journal/11760)","snPcode":"11760","submissionUrl":"https://submission.nature.com/new-submission/11760/3","title":"Signal, Image and Video Processing","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"em","reportingPortfolio":"Springer Hybrid","inReviewEnabled":true,"inReviewRevisionsEnabled":false}}],"origin":"","ownerIdentity":"88324b17-aab7-45ef-b934-b5b0b6fe69ae","owner":[],"postedDate":"May 7th, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"published-in-journal","subjectAreas":[],"tags":[],"updatedAt":"2025-09-08T16:00:10+00:00","versionOfRecord":{"articleIdentity":"rs-6521857","link":"https://doi.org/10.1007/s11760-025-04487-9","journal":{"identity":"signal-image-and-video-processing","isVorOnly":false,"title":"Signal, Image and Video Processing"},"publishedOn":"2025-09-05 15:57:32","publishedOnDateReadable":"September 5th, 2025"},"versionCreatedAt":"2025-05-07 03:03:14","video":"","vorDoi":"10.1007/s11760-025-04487-9","vorDoiUrl":"https://doi.org/10.1007/s11760-025-04487-9","workflowStages":[]},"version":"v1","identity":"rs-6521857","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-6521857","identity":"rs-6521857","version":["v1"]},"buildId":"8U1c8b4HqxoKbykW_rLl7","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.