DARE: A large-scale handwritten DAte REcognition system

preprint OA: gold CC-BY-4.0
📄 Open PDF Full text JSON View at publisher
Full text 13,720 characters · extracted from preprint-html · click to expand
DARE: A large-scale handwritten DAte REcognition system | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article DARE: A large-scale handwritten DAte REcognition system Christian Møller Dahl, Torben Skov Dyg Johansen, Emil Nørmark Sørensen, and 2 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-6412887/v1 This work is licensed under a CC BY 4.0 License Status: Under Review Version 1 posted 9 You are reading this latest preprint version Abstract Handwritten text recognition for historical documents is an important task, but it remains challenging due to insufficient training data combined with wide variability in writing styles and degradation of historical documents. In the context of recognizing handwritten dates, we propose a model based on the EfficientNetV2 architecture. The model is characterized by its fast training speed, robust-ness to parameter choices, and accurate transcription of handwritten dates from various sources. For our training process, we build and introduce a database containing nearly 10 million tokens derived from over 2.2 million images of handwritten dates, extracted and segmented from diverse historical documents. Considering that dates are among the most prevalent pieces of information in historical documents, and given the existence of millions of such documents in historical archives, achieving efficient and automated transcription of dates holds the potential for substantial cost savings compared to manual transcription efforts. We demonstrate that training on handwritten text that exhibits substantial variability in writing styles yields robust models for recognizing general handwritten text and that transfer learning from the DARE system increases transcription accuracy substantially, allowing one to obtain high accuracy even when using relatively small training samples on entirely new types of documents. The DARE database is freely available at https://www.kaggle.com/datasets/sdusimonwittrock/dare-database . Code to be made available at https://github.com/TorbenSDJohansen/DARE . Handwritten text recognition Dataset Transfer learning Handwritten dates Foundation model Full Text Additional Declarations No competing interests reported. Cite Share Download PDF Status: Under Review Version 1 posted Editorial decision: Revision requested 06 Jul, 2025 Reviews received at journal 30 May, 2025 Reviews received at journal 27 May, 2025 Reviewers agreed at journal 06 May, 2025 Reviewers agreed at journal 05 May, 2025 Reviewers invited by journal 04 May, 2025 Editor assigned by journal 21 Apr, 2025 Submission checks completed at journal 16 Apr, 2025 First submitted to journal 09 Apr, 2025 You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-6412887","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":481269220,"identity":"c6f46367-53ab-4778-b2e7-7c1b6222c63d","order_by":0,"name":"Christian Møller Dahl","email":"","orcid":"","institution":"University of Southern Denmark","correspondingAuthor":false,"prefix":"","firstName":"Christian","middleName":"Møller","lastName":"Dahl","suffix":""},{"id":481269221,"identity":"6cdcd561-e64c-4c7b-929b-80f28624fee7","order_by":1,"name":"Torben Skov Dyg Johansen","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAAuklEQVRIiWNgGAWjYHACA4MPUAYDA1sCYfU8QJWGM0jWwsxDkhZ7/sMbim3+2EUz8B/eJsFQlkaELRJpBca5bcm5DRJpZRIM53KI0cJjYJzbwAzUwmMmwdhWQYQW/jMGxhZ/6nMb+M8Qq4Uhx8CYge1wbgNDDkgLMQ67kVZg2Nt2PLdNIq3YIuEcEd5n7z+8zeDHn+rcfv7DG298KEsmrAUI2AzAJIhIIEoDAwPzAyIVjoJRMApGwUgFAJAKMY0GNIAnAAAAAElFTkSuQmCC","orcid":"","institution":"University of Southern Denmark","correspondingAuthor":true,"prefix":"","firstName":"Torben","middleName":"Skov Dyg","lastName":"Johansen","suffix":""},{"id":481269222,"identity":"d374706c-d9b0-407f-a825-4c7e6525eb23","order_by":2,"name":"Emil Nørmark Sørensen","email":"","orcid":"","institution":"University of Bristol","correspondingAuthor":false,"prefix":"","firstName":"Emil","middleName":"Nørmark","lastName":"Sørensen","suffix":""},{"id":481269223,"identity":"17ed7e88-4dc2-427e-9635-68c6a5bedcde","order_by":3,"name":"Christian Emil Westermann","email":"","orcid":"","institution":"Rooftop Analytics","correspondingAuthor":false,"prefix":"","firstName":"Christian","middleName":"Emil","lastName":"Westermann","suffix":""},{"id":481269225,"identity":"8f2bfdb6-5bbe-4b07-aea2-95f524a8295d","order_by":4,"name":"Simon Friis Wittrock","email":"","orcid":"","institution":"","correspondingAuthor":false,"prefix":"","firstName":"Simon","middleName":"Friis","lastName":"Wittrock","suffix":""}],"badges":[],"createdAt":"2025-04-09 14:38:09","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-6412887/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-6412887/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":98476891,"identity":"3cf37e0d-cfbd-4f01-b330-064327e91833","added_by":"auto","created_at":"2025-12-18 03:55:24","extension":"json","order_by":0,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":6834,"visible":true,"origin":"","legend":"","description":"","filename":"49fbef078735479dbd31da987a50870e.json","url":"https://assets-eu.researchsquare.com/files/rs-6412887/v1/67dcadc86e53fd9ae0d615f4.json"},{"id":98623476,"identity":"e8eae4c3-4d8c-4b7a-b73f-1c6fb7874763","added_by":"auto","created_at":"2025-12-19 17:06:25","extension":"pdf","order_by":1,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":3116611,"visible":true,"origin":"","legend":"","description":"","filename":"DAREIJDAR.pdf","url":"https://assets-eu.researchsquare.com/files/rs-6412887/v1_covered_f0ccc608-d98a-43ac-9f7f-2dd8b9d4bddc.pdf"}],"financialInterests":"No competing interests reported.","formattedTitle":"DARE: A large-scale handwritten DAte REcognition system","fulltext":[],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":false,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":true,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":true,"isAuthorSuppliedPdf":true,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":true,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"international-journal-on-document-analysis-and-recognition-ijdar","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"ijda","sideBox":"Learn more about [International Journal on Document Analysis and Recognition (IJDAR)](http://link.springer.com/journal/10032)","snPcode":"10032","submissionUrl":"https://submission.nature.com/new-submission/10032/3","title":"International Journal on Document Analysis and Recognition (IJDAR)","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"em","reportingPortfolio":"Springer Hybrid","inReviewEnabled":true,"inReviewRevisionsEnabled":false},"keywords":"Handwritten text recognition, Dataset, Transfer learning, Handwritten dates, Foundation model","lastPublishedDoi":"10.21203/rs.3.rs-6412887/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-6412887/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"Handwritten text recognition for historical documents is an important task, but it remains challenging due to insufficient training data combined with wide variability in writing styles and degradation of historical documents. In the context of recognizing handwritten dates, we propose a model based on the EfficientNetV2 architecture. The model is characterized by its fast training speed, robust-ness to parameter choices, and accurate transcription of handwritten dates from various sources. For our training process, we build and introduce a database containing nearly 10 million tokens derived from over 2.2 million images of handwritten dates, extracted and segmented from diverse historical documents. Considering that dates are among the most prevalent pieces of information in historical documents, and given the existence of millions of such documents in historical archives, achieving efficient and automated transcription of dates holds the potential for substantial cost savings compared to manual transcription efforts. We demonstrate that training on handwritten text that exhibits substantial variability in writing styles yields robust models for recognizing general handwritten text and that transfer learning from the DARE system increases transcription accuracy substantially, allowing one to obtain high accuracy even when using relatively small training samples on entirely new types of documents. The DARE database is freely available at https://www.kaggle.com/datasets/sdusimonwittrock/dare-database. Code to be made available at https://github.com/TorbenSDJohansen/DARE.","manuscriptTitle":"DARE: A large-scale handwritten DAte REcognition system","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-12-18 03:55:20","doi":"10.21203/rs.3.rs-6412887/v1","editorialEvents":[{"type":"communityComments","content":0},{"type":"decision","content":"Revision requested","date":"2025-07-06T12:10:21+00:00","index":"","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2025-05-30T16:23:53+00:00","index":"hide","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2025-05-27T20:37:20+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"336095196086323514782705488163627176345","date":"2025-05-06T09:41:04+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"51876767962658976729323934698741035150","date":"2025-05-05T11:25:37+00:00","index":"hide","fulltext":""},{"type":"reviewersInvited","content":"","date":"2025-05-04T09:11:53+00:00","index":"","fulltext":""},{"type":"editorAssigned","content":"","date":"2025-04-22T02:20:45+00:00","index":"","fulltext":""},{"type":"checksComplete","content":"","date":"2025-04-16T10:22:27+00:00","index":"","fulltext":""},{"type":"submitted","content":"International Journal on Document Analysis and Recognition (IJDAR)","date":"2025-04-09T14:23:59+00:00","index":"","fulltext":""}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"international-journal-on-document-analysis-and-recognition-ijdar","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"ijda","sideBox":"Learn more about [International Journal on Document Analysis and Recognition (IJDAR)](http://link.springer.com/journal/10032)","snPcode":"10032","submissionUrl":"https://submission.nature.com/new-submission/10032/3","title":"International Journal on Document Analysis and Recognition (IJDAR)","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"em","reportingPortfolio":"Springer Hybrid","inReviewEnabled":true,"inReviewRevisionsEnabled":false}}],"origin":"","ownerIdentity":"68054bef-2938-49ef-ac7d-6eb134748e4f","owner":[],"postedDate":"December 18th, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"under-review","subjectAreas":[],"tags":[],"updatedAt":"2026-04-07T10:25:16+00:00","versionOfRecord":[],"versionCreatedAt":"2025-12-18 03:55:20","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-6412887","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-6412887","identity":"rs-6412887","version":["v1"]},"buildId":"8U1c8b4HqxoKbykW_rLl7","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

Ask this paper AI returns verbatim quotes from the full text · source: preprint-html

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2025) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc
last seen: 2026-05-20T01:45:00.602351+00:00
unpaywall
last seen: 2026-05-21T05:10:58.409756+00:00
License: CC-BY-4.0