The Fools are Certain; the Wise are Doubtful: Exploring LLM Confidence in Code Completion

doi:10.21203/rs.3.rs-7302683/v1

The Fools are Certain; the Wise are Doubtful: Exploring LLM Confidence in Code Completion

2025 · doi:10.21203/rs.3.rs-7302683/v1

preprint OA: closed

Full text JSON View at publisher

Full text 13,922 characters · extracted from preprint-html · click to expand

The Fools are Certain; the Wise are Doubtful: Exploring LLM Confidence in Code Completion | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article The Fools are Certain; the Wise are Doubtful: Exploring LLM Confidence in Code Completion Zoe Kotti, Konstantina Dritsa, Diomidis Spinellis, Panos Louridas This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-7302683/v2 This work is licensed under a CC BY 4.0 License Archived Versions: Posted Version 2 posted You are reading this latest preprint version Abstract Code completion entails the task of providing missing tokens given a surrounding context. It can boost developer productivity while providing a powerful code discovery tool. Following the Large Language Model (LLM) wave, code completion has been approached with diverse LLMs fine-tuned on code (code LLMs). The performance of code LLMs can be assessed with downstream and intrinsic metrics. Downstream metrics are usually employed to evaluate the practical utility of a model, but can be unreliable and require complex calculations and domain-specific knowledge. In contrast, intrinsic metrics such as perplexity, entropy, and mutual information, which measure model confidence or uncertainty, are simple, versatile, and universal across LLMs and tasks, and can serve as proxies for functional correctness and hallucination risk in LLM-generated code. Motivated by this, we evaluate the confidence of LLMs when generating code by measuring code perplexity across programming languages, models, and datasets using various LLMs, and a sample of 2254 files from 881 GitHub projects. We find that strongly-typed languages exhibit lower perplexity than dynamically typed languages. Scripting languages also demonstrate higher perplexity. Shell appears universally high in perplexity, whereas Java appears low. Code perplexity depends on the employed LLM; under a fixed model, relative language-level rankings are moderately stable across evaluation corpora. Although code comments often increase perplexity, the language ranking based on perplexity is barely affected by their presence. LLM researchers, developers, and users can employ our findings to assess the benefits and suitability of LLM-based code completion in specific software projects based on how language, model choice, and code characteristics impact model confidence. Full Text Additional Declarations No competing interests reported. Cite Share Download PDF Archived Versions: Posted Version 2 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-7302683","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[{"code":2,"date":"2026-05-07 01:49:48","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"automated-software-engineering","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"ause","sideBox":"Learn more about [Automated Software Engineering](http://link.springer.com/journal/10515)","snPcode":"10515","submissionUrl":"https://submission.nature.com/new-submission/10515/3","title":"Automated Software Engineering","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"em","reportingPortfolio":"Springer Hybrid","inReviewEnabled":true,"inReviewRevisionsEnabled":false}}],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":632119659,"identity":"d64ebd7c-399f-455f-965c-732ba04379dd","order_by":0,"name":"Zoe Kotti","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAAu0lEQVRIiWNgGAWjYHACZhDBww9jEaklwYBHsoFULQwGB4jVIt/efNjg448/MsbHzxg+LmCozSeohbHnWHLiDKDDzM7kGBvPYDhu2UDQVRI5xod5QFoO5JhJ8zAcMyBoC5tE/mewFuP+N+a/idLCI5HDnAzSYiCRY8bMw1BDWIsEzzFjwxlpxjwSN54VS88wOEBYCzDEHkt8sJGz5+9P3vi5oKKOsBY0YHCYVB0MDHWkaxkFo2AUjIJhDwCOsDJXs19wwgAAAABJRU5ErkJggg==","orcid":"","institution":"Athens University of Economics and Business","correspondingAuthor":true,"prefix":"","firstName":"Zoe","middleName":"","lastName":"Kotti","suffix":""},{"id":632119660,"identity":"9b55f299-bfb1-46da-86c3-f37a14532ec1","order_by":1,"name":"Konstantina Dritsa","email":"","orcid":"","institution":"Athens University of Economics and Business","correspondingAuthor":false,"prefix":"","firstName":"Konstantina","middleName":"","lastName":"Dritsa","suffix":""},{"id":632119662,"identity":"acb3aa01-26df-4c4c-a76e-a260441004e2","order_by":2,"name":"Diomidis Spinellis","email":"","orcid":"","institution":"Athens University of Economics and Business","correspondingAuthor":false,"prefix":"","firstName":"Diomidis","middleName":"","lastName":"Spinellis","suffix":""},{"id":632119663,"identity":"e7c74dd4-0153-4bc6-b37a-a42521a8d6bf","order_by":3,"name":"Panos Louridas","email":"","orcid":"","institution":"Athens University of Economics and Business","correspondingAuthor":false,"prefix":"","firstName":"Panos","middleName":"","lastName":"Louridas","suffix":""}],"badges":[],"createdAt":"2025-08-05 16:23:11","currentVersionCode":2,"declarations":"","doi":"10.21203/rs.3.rs-7302683/v2","doiUrl":"https://doi.org/10.21203/rs.3.rs-7302683/v2","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":108805985,"identity":"97c19500-1d13-4906-876c-777d7aaa7a1a","added_by":"auto","created_at":"2026-05-08 15:27:22","extension":"pdf","order_by":1,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":367055,"visible":true,"origin":"","legend":"","description":"","filename":"codepred.pdf","url":"https://assets-eu.researchsquare.com/files/rs-7302683/v2_covered_2b47ac10-ca21-4c8a-b3d1-0cbd36f8905d.pdf"}],"financialInterests":"No competing interests reported.","formattedTitle":"The Fools are Certain; the Wise are Doubtful: Exploring LLM Confidence in Code Completion","fulltext":[],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":false,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":true,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":true,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"automated-software-engineering","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"ause","sideBox":"Learn more about [Automated Software Engineering](http://link.springer.com/journal/10515)","snPcode":"10515","submissionUrl":"https://submission.nature.com/new-submission/10515/3","title":"Automated Software Engineering","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"em","reportingPortfolio":"Springer Hybrid","inReviewEnabled":true,"inReviewRevisionsEnabled":false},"keywords":"","lastPublishedDoi":"10.21203/rs.3.rs-7302683/v2","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-7302683/v2","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"Code completion entails the task of providing missing tokens given a surrounding context. It can boost developer productivity while providing a powerful code discovery tool. Following the Large Language Model (LLM) wave, code completion has been approached with diverse LLMs fine-tuned on code (code LLMs). The performance of code LLMs can be assessed with downstream and intrinsic metrics. Downstream metrics are usually employed to evaluate the practical utility of a model, but can be unreliable and require complex calculations and domain-specific knowledge. In contrast, intrinsic metrics such as perplexity, entropy, and mutual information, which measure model confidence or uncertainty, are simple, versatile, and universal across LLMs and tasks, and can serve as proxies for functional correctness and hallucination risk in LLM-generated code. Motivated by this, we evaluate the confidence of LLMs when generating code by measuring code perplexity across programming languages, models, and datasets using various LLMs, and a sample of 2254 files from 881 GitHub projects. We find that strongly-typed languages exhibit lower perplexity than dynamically typed languages. Scripting languages also demonstrate higher perplexity. Shell appears universally high in perplexity, whereas Java appears low. Code perplexity depends on the employed LLM; under a fixed model, relative language-level rankings are moderately stable across evaluation corpora. Although code comments often increase perplexity, the language ranking based on perplexity is barely affected by their presence. LLM researchers, developers, and users can employ our findings to assess the benefits and suitability of LLM-based code completion in specific software projects based on how language, model choice, and code characteristics impact model confidence.","manuscriptTitle":"The Fools are Certain; the Wise are Doubtful: Exploring LLM Confidence in Code Completion","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-09-22 08:22:40","doi":"10.21203/rs.3.rs-7302683/v1","editorialEvents":[{"type":"communityComments","content":0},{"type":"decision","content":"Revision requested","date":"2026-02-11T05:11:37+00:00","index":"","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2026-02-05T07:48:42+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"167476935853255323900633411564008205693","date":"2026-01-07T09:21:08+00:00","index":"hide","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2025-10-12T07:54:51+00:00","index":"hide","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2025-10-11T12:03:56+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"218526842529540062810976793726748890179","date":"2025-09-25T13:21:31+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"271544106014788526799536781214948413147","date":"2025-09-24T05:00:46+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"173055385786517496366655202344026826083","date":"2025-09-14T02:39:56+00:00","index":"hide","fulltext":""},{"type":"reviewersInvited","content":"","date":"2025-09-13T10:04:50+00:00","index":"","fulltext":""},{"type":"editorAssigned","content":"","date":"2025-08-07T17:58:16+00:00","index":"","fulltext":""},{"type":"checksComplete","content":"","date":"2025-08-07T17:58:11+00:00","index":"","fulltext":""},{"type":"submitted","content":"Automated Software Engineering","date":"2025-08-05T16:10:15+00:00","index":"","fulltext":""}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"automated-software-engineering","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"ause","sideBox":"Learn more about [Automated Software Engineering](http://link.springer.com/journal/10515)","snPcode":"10515","submissionUrl":"https://submission.nature.com/new-submission/10515/3","title":"Automated Software Engineering","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"em","reportingPortfolio":"Springer Hybrid","inReviewEnabled":true,"inReviewRevisionsEnabled":false}}],"origin":"","ownerIdentity":"d1da7307-fce1-438c-8eda-da73b403acf3","owner":[],"postedDate":"May 7th, 2026","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"under-review","subjectAreas":[],"tags":[],"updatedAt":"2026-04-28T19:38:28+00:00","versionOfRecord":[],"versionCreatedAt":"2026-05-07 01:49:48","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v2","identity":"rs-7302683","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-7302683","identity":"rs-7302683","version":["v2"]},"buildId":"8U1c8b4HqxoKbykW_rLl7","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

⚙ Ask this paper AI returns verbatim quotes from the full text · source: preprint-html ⓘ

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2025) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc: last seen: 2026-05-20T01:45:00.602351+00:00