From Text to Sectors: Classifying 140 Years of Swiss Firm Registrations | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article From Text to Sectors: Classifying 140 Years of Swiss Firm Registrations Danyl Denysenko, Filippo Pasquali, Jesper Findahl, Andrea Mocci, and 1 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-9280077/v1 This work is licensed under a CC BY 4.0 License Status: Under Review Version 1 posted 7 You are reading this latest preprint version Abstract Disentangling structural economic shifts from persistent factors requires firm-level data of sufficient granularity and historical depth. In this study, we address this challenge by employing large language models (LLMs) to systematically classify business purpose descriptions from the multilingual Swiss Commercial Registry into standardized sectoral categories. Drawing on historical data spanning over 140 years, we classify more than two million firm registrations, providing granular coverage of the entire Swiss economy. We report three principal findings. First, zero-shot LLMs exhibit strong classification performance across sectors and languages, and demonstrate temporal robustness in predictive accuracy. Second, we trace the economic transformation of Switzerland, consistent with broader European trends, but documented here at the unusually fine-grained level of the individual firm. Third, we identify persistent cultural differences in sectoral entrepreneurship preferences along the Swiss language border. Ultimately, this paper demonstrates that LLMs can unlock previously untapped administrative data, offering new perspectives for historical economic analysis. Text as data Large Language Models text classification economic development historical timeseries firm-level data Switzerland. Full Text Additional Declarations Competing interest reported. We wish to disclose that Prof. Elliott Ash, an Editor of the "Textual Analysis in Economics and Finance" special issue, is a Principal Investigator on the Sinergia-funded project that made our research possible. He is therefore familiar with the authors and with the underlying data. He did not, however, participate in the development of this particular classification effort, nor was he involved in the preparation of the manuscript. Cite Share Download PDF Status: Under Review Version 1 posted Reviews received at journal 17 May, 2026 Reviewers agreed at journal 10 Apr, 2026 Reviewers agreed at journal 10 Apr, 2026 Reviewers invited by journal 10 Apr, 2026 Editor assigned by journal 06 Apr, 2026 Submission checks completed at journal 06 Apr, 2026 First submitted to journal 31 Mar, 2026 You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-9280077","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":621257846,"identity":"862e492e-38f2-4844-8d7b-9ecec2adad02","order_by":0,"name":"Danyl Denysenko","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAABEElEQVRIiWNgGAWjYDCCA0D8gIEZzpNjYGBsADET8GpJQNJiTLqWxAYoE6cWvuNnj0kkMFjLmUukP3xccMYmfcPt5uaPXxjs8nBpkTyTlwbUkm5sOSPH2HjGjbTcDXcOtknLMCQX49JicCDHDKjlcOKGGzls0jwfDuduuJHYxizBcADuQgwt59+AtdRvuJH+/DdQS7rBjcTmz3i13IDYkmBwI8GMmecGiJHYIPkBjxbJG2+MLRIM0g03nHljLM1zJs1wJtBh0gwGyTi18J3PMbzxocJa3uB4+sPPPMds5PlupD/++KPCDqcWIGCRYDAAUgIJCCFmHgPc6kHyH8AU/wGEEOMPvDpGwSgYBaNghAEAFRRjnH7Q/XkAAAAASUVORK5CYII=","orcid":"","institution":"University of St. Gallen","correspondingAuthor":true,"prefix":"","firstName":"Danyl","middleName":"","lastName":"Denysenko","suffix":""},{"id":621257847,"identity":"2b8f25f0-b3be-4b47-8131-58843f16fe28","order_by":1,"name":"Filippo Pasquali","email":"","orcid":"","institution":"University of St. Gallen","correspondingAuthor":false,"prefix":"","firstName":"Filippo","middleName":"","lastName":"Pasquali","suffix":""},{"id":621257850,"identity":"243f3f5f-3e5e-4ded-a314-ec8a23cfda27","order_by":2,"name":"Jesper Findahl","email":"","orcid":"","institution":"University of Lugano","correspondingAuthor":false,"prefix":"","firstName":"Jesper","middleName":"","lastName":"Findahl","suffix":""},{"id":621257851,"identity":"3da186fc-430f-446e-a6f0-872147569a96","order_by":3,"name":"Andrea Mocci","email":"","orcid":"","institution":"University of Lugano","correspondingAuthor":false,"prefix":"","firstName":"Andrea","middleName":"","lastName":"Mocci","suffix":""},{"id":621257852,"identity":"accfba60-b09f-4b53-9aed-70299e5ee7f1","order_by":4,"name":"Gianmarco Torchetti","email":"","orcid":"","institution":"ETH Zurich","correspondingAuthor":false,"prefix":"","firstName":"Gianmarco","middleName":"","lastName":"Torchetti","suffix":""}],"badges":[],"createdAt":"2026-03-31 12:53:48","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-9280077/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-9280077/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":107483662,"identity":"39174440-578a-4878-9d25-928416cfaa55","added_by":"auto","created_at":"2026-04-22 02:28:39","extension":"pdf","order_by":1,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":11448016,"visible":true,"origin":"","legend":"","description":"","filename":"SubmissionSJES31March2026.pdf","url":"https://assets-eu.researchsquare.com/files/rs-9280077/v1_covered_69258b94-5a69-4a2f-808b-243720198019.pdf"}],"financialInterests":"Competing interest reported. We wish to disclose that Prof. Elliott Ash, an Editor of the \"Textual Analysis in Economics and Finance\" special issue, is a Principal Investigator on the Sinergia-funded project that made our research possible. He is therefore familiar with the authors and with the underlying data. He did not, however, participate in the development of this particular classification effort, nor was he involved in the preparation of the manuscript.","formattedTitle":"From Text to Sectors: Classifying 140 Years of Swiss Firm Registrations","fulltext":[],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":false,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":true,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":true,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"swiss-journal-of-economics-and-statistics","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"sjes","sideBox":"Learn more about [Swiss Journal of Economics and Statistics](https://sjes.springeropen.com/)","snPcode":"41937","submissionUrl":"https://submission.nature.com/new-submission/41937/3","title":"Swiss Journal of Economics and Statistics","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"stoa","reportingPortfolio":"Springer Open","inReviewEnabled":true,"inReviewRevisionsEnabled":true},"keywords":"Text as data, Large Language Models, text classification, economic development, historical timeseries, firm-level data, Switzerland.","lastPublishedDoi":"10.21203/rs.3.rs-9280077/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-9280077/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"Disentangling structural economic shifts from persistent factors requires firm-level data of sufficient granularity and historical depth. In this study, we address this challenge by employing large language models (LLMs) to systematically classify business purpose descriptions from the multilingual Swiss Commercial Registry into standardized sectoral categories. Drawing on historical data spanning over 140 years, we classify more than two million firm registrations, providing granular coverage of the entire Swiss economy. We report three principal findings. First, zero-shot LLMs exhibit strong classification performance across sectors and languages, and demonstrate temporal robustness in predictive accuracy. Second, we trace the economic transformation of Switzerland, consistent with broader European trends, but documented here at the unusually fine-grained level of the individual firm. Third, we identify persistent cultural differences in sectoral entrepreneurship preferences along the Swiss language border. Ultimately, this paper demonstrates that LLMs can unlock previously untapped administrative data, offering new perspectives for historical economic analysis.","manuscriptTitle":"From Text to Sectors: Classifying 140 Years of Swiss Firm Registrations","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2026-04-17 18:03:32","doi":"10.21203/rs.3.rs-9280077/v1","editorialEvents":[{"type":"communityComments","content":0},{"type":"editorInvitedReview","content":"","date":"2026-05-17T12:12:32+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"250014664082419708496687928179803970353","date":"2026-04-11T03:46:35+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"94730616766675587803075990197531515492","date":"2026-04-10T16:49:48+00:00","index":"hide","fulltext":""},{"type":"reviewersInvited","content":"","date":"2026-04-10T07:15:13+00:00","index":"","fulltext":""},{"type":"editorAssigned","content":"","date":"2026-04-06T18:31:17+00:00","index":"","fulltext":""},{"type":"checksComplete","content":"","date":"2026-04-06T12:22:30+00:00","index":"","fulltext":""},{"type":"submitted","content":"Swiss Journal of Economics and Statistics","date":"2026-03-31T12:48:45+00:00","index":"","fulltext":""}],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"swiss-journal-of-economics-and-statistics","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"sjes","sideBox":"Learn more about [Swiss Journal of Economics and Statistics](https://sjes.springeropen.com/)","snPcode":"41937","submissionUrl":"https://submission.nature.com/new-submission/41937/3","title":"Swiss Journal of Economics and Statistics","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"stoa","reportingPortfolio":"Springer Open","inReviewEnabled":true,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"d628e7c9-a7a9-4e80-bc7c-ab5bd25a99b8","owner":[],"postedDate":"April 17th, 2026","published":true,"recentEditorialEvents":[{"type":"editorInvitedReview","content":"","date":"2026-05-17T12:12:32+00:00","index":8,"fulltext":""}],"rejectedJournal":[],"revision":"","amendment":"","status":"under-review","subjectAreas":[],"tags":[],"updatedAt":"2026-04-17T18:03:32+00:00","versionOfRecord":[],"versionCreatedAt":"2026-04-17 18:03:32","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-9280077","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-9280077","identity":"rs-9280077","version":["v1"]},"buildId":"XKTyCvWXoU3ODBz1xrDgd","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.