The Representation of SDG-Related Research in Bibliometric Databases: Persisting Imbalances and Varying Perspectives

preprint OA: closed
Full text JSON View at publisher
Full text 11,939 characters · extracted from preprint-html · click to expand
The Representation of SDG-Related Research in Bibliometric Databases: Persisting Imbalances and Varying Perspectives | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article The Representation of SDG-Related Research in Bibliometric Databases: Persisting Imbalances and Varying Perspectives Matteo Ottaviani, Stephan Stahlschmidt This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-8147329/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract Large bibliometric databases, such as Web of Science, Scopus, and OpenAlex, play a crucial role for decision-makers in science and science policy, as they are used as sources for informing decisions at both national and international levels, in public and private sectors. Although these databases facilitate bibliometric analyses, they are performative, affecting the visibility of scientific outputs and the measurement of participating entities. Recently, they have also incorporated the UN’s Sustainable Development Goals (SDGs) into their respective classifications, which have been criticized for their diverging nature. On another note, their infrastructural information processing is, of course, susceptible to emerging technologies. As a matter of fact, AI-supported and -powered tools have recently entered research practice and society at large. Large Language Models (LLMs), the branch of generative AI specifically focused on text, underlie their operation. By leveraging their features (i.e., in particular, mirroring what is thoroughly embedded in their training data under certain conditions), LLMs act as data magnifiers on SDG-classified publications to detect data biases that bibliometric databases are affected by. Within a broader perspective, our general setup serves as a conceptual exercise that characterizes the expected macro-level effects on the representation of SDG-related research in bibliometric databases, originating from the introduction of a generic LLM-based tool. Our analysis shows that the deployment of LLMs in the information processing of bibliometric databases reveals a systematic overlook in the data (i.e., scientific publications classified by SDGs) of the most disadvantaged categories of individuals, the poorest countries, and underrepresented topics that SDG targets explicitly focus on. Conversely, an unsolicited hegemonic role played by economic superpowers and Global North is identified. Sustainable Development Goals SDG Classification Bibliometric Databases Large Language Models OpenAlex Web of Science Scopus Information Processing Full Text Additional Declarations No competing interests reported. Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-8147329","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":562363282,"identity":"e1520894-ad62-4a9b-8ead-8ea8bd660f90","order_by":0,"name":"Matteo Ottaviani","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAAxElEQVRIiWNgGAWjYNACAwYGNvYGA2biVLMxQ7XwHCBJCwhIJBCphV++/+BnngK7PD7Jx5s/F7YxyBkcIKBFso2ZWZrHILmYTTqtwHhmG4MxQS0Gx5gZJGcYMCe2SecYJPNuY0jcQIQW5p8zDOoT2yTPGBwGaqknRgubxAeDw4ltEjyGzUAtCUT4JdnM4oPB8cQ2nrRiZt5/EoYzCWnhZz74+EbCn+rE+e2HN3/mOWMjz0dICzqQIFH9KBgFo2AUjAKsAACLCTibCDvH9QAAAABJRU5ErkJggg==","orcid":"","institution":"German Centre for Higher Education Research and Science Studies","correspondingAuthor":true,"prefix":"","firstName":"Matteo","middleName":"","lastName":"Ottaviani","suffix":""},{"id":562363283,"identity":"66bb7490-4822-4392-a97b-7a2140518884","order_by":1,"name":"Stephan Stahlschmidt","email":"","orcid":"","institution":"German Centre for Higher Education Research and Science Studies","correspondingAuthor":false,"prefix":"","firstName":"Stephan","middleName":"","lastName":"Stahlschmidt","suffix":""}],"badges":[],"createdAt":"2025-11-18 15:53:23","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-8147329/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-8147329/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":98729161,"identity":"0d5e2f00-9932-4f2a-8085-5f2bf6207716","added_by":"auto","created_at":"2025-12-22 04:33:15","extension":"json","order_by":0,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":5738,"visible":true,"origin":"","legend":"","description":"","filename":"62cebfab9eb34ab0a644b0f6a37e853b.json","url":"https://assets-eu.researchsquare.com/files/rs-8147329/v1/0930a30bb2f9a2fb3dbb8c05.json"},{"id":98776541,"identity":"60b86335-5995-4f78-8978-003526cee244","added_by":"auto","created_at":"2025-12-22 12:23:03","extension":"pdf","order_by":1,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":1376362,"visible":true,"origin":"","legend":"","description":"","filename":"ottavianistahlschmidtsdg.pdf","url":"https://assets-eu.researchsquare.com/files/rs-8147329/v1_covered_731407c0-049b-4e07-bc24-229a6e66da79.pdf"}],"financialInterests":"No competing interests reported.","formattedTitle":"The Representation of SDG-Related Research in Bibliometric Databases: Persisting Imbalances and Varying Perspectives","fulltext":[],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":false,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":true,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":true,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":true,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"Sustainable Development Goals, SDG Classification, Bibliometric Databases, Large Language Models, OpenAlex, Web of Science, Scopus, Information Processing","lastPublishedDoi":"10.21203/rs.3.rs-8147329/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-8147329/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"Large bibliometric databases, such as Web of Science, Scopus, and OpenAlex, play a crucial role for decision-makers in science and science policy, as they are used as sources for informing decisions at both national and international levels, in public and private sectors. Although these databases facilitate bibliometric analyses, they are performative, affecting the visibility of scientific outputs and the measurement of participating entities. Recently, they have also incorporated the UN’s Sustainable Development Goals (SDGs) into their respective classifications, which have been criticized for their diverging nature. On another note, their infrastructural information processing is, of course, susceptible to emerging technologies. As a matter of fact, AI-supported and -powered tools have recently entered research practice and society at large. Large Language Models (LLMs), the branch of generative AI specifically focused on text, underlie their operation. By leveraging their features (i.e., in particular, mirroring what is thoroughly embedded in their training data under certain conditions), LLMs act as data magnifiers on SDG-classified publications to detect data biases that bibliometric databases are affected by. Within a broader perspective, our general setup serves as a conceptual exercise that characterizes the expected macro-level effects on the representation of SDG-related research in bibliometric databases, originating from the introduction of a generic LLM-based tool. Our analysis shows that the deployment of LLMs in the information processing of bibliometric databases reveals a systematic overlook in the data (i.e., scientific publications classified by SDGs) of the most disadvantaged categories of individuals, the poorest countries, and underrepresented topics that SDG targets explicitly focus on. Conversely, an unsolicited hegemonic role played by economic superpowers and Global North is identified.","manuscriptTitle":"The Representation of SDG-Related Research in Bibliometric Databases: Persisting Imbalances and Varying Perspectives","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-12-22 04:33:10","doi":"10.21203/rs.3.rs-8147329/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"9248ebdf-a8a3-4064-bfae-9e617e9a4eca","owner":[],"postedDate":"December 22nd, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[],"tags":[],"updatedAt":"2026-03-05T06:39:57+00:00","versionOfRecord":[],"versionCreatedAt":"2025-12-22 04:33:10","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-8147329","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-8147329","identity":"rs-8147329","version":["v1"]},"buildId":"XKTyCvWXoU3ODBz1xrDgd","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

Ask this paper AI returns verbatim quotes from the full text · source: preprint-html

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2025) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc
last seen: 2026-05-20T01:45:00.602351+00:00