Structured Grading for Advanced University Mathematics Tests Using LLM Agentic Workflow with Accuracy and Methodology Metrics | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article Structured Grading for Advanced University Mathematics Tests Using LLM Agentic Workflow with Accuracy and Methodology Metrics Kwame Atta Gyamfi, Gabriel Obed Fosu, Joseph Abeiku Ackora-Prah, and 3 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-9607535/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract Assessment of open-ended work in university mathematics (e.g., proofs and multi-line calculations) is labour-intensive and subject to rater variability. Recent advances in natural-language processing (NLP) and large language models (LLMs) have enabled automated scoring of open responses with increasing agreement to human judgement, but validity, transparency, and rubric alignment remain open challenges in higher education contexts. This paper proposes two structured grading evaluation frameworks for university-level mathematics, suitable for both standalone and agent-based LLM grading, and designed to accommodate both proof-based and analytical responses. The resulting frameworks are implemented over student responses provided in academically rigorous setting, and compared against human-assigned grades (expert grading) as ground truth. Initial results show that stand-alone LLMs align reasonably well with human graders on reasoning-centred tasks but struggle on analytically demanding problems, while the Agent-based framework demonstrates more consistent performance across both domains and better recognizes partial correctness. These findings highlight the limitations of prompt-only grading and demonstrate that structured, multi-agent workflows offer a viable path toward transparent, reliable, and pedagogically aligned automated assessment in advanced mathematics education. Artificial Intelligence and Machine Learning Educational Philosophy and Theory Agentic AI Large Language Models Natural language processing partial differential equation discrete mathematics essay-based assessments Automated grading Multi-agent systems Mathematics assessment Open-ended responses Full Text Additional Declarations The authors declare no competing interests. Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-9607535","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":634116522,"identity":"d879659a-a633-4560-88ce-ff233429d69b","order_by":0,"name":"Kwame Atta Gyamfi","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAABGElEQVRIie3NPUvDQBjA8SccxCXuCUf9DJFCcCgtfpMeAV0CbsVBy0EhU8FNMqj3Fc5FcUs4uC7BOeUcuujUwk1S0MELKh3a+LIJ3h8Onnu4Hwdgs/3B/NXozHKdf8yeOeH3BIVFVhP3NwR5PyFBFj9yDafk7Fw4olM+EMaoM1uk0L1tINg/iKYZTEimYhBJ9US4BLR7mUJ8RzeTHdyPlAeSUHWUi0QLwl1w8bYhYd5EDp/VqyGs/mXPEJbC1stXBOMkUgAnhNcEKkGoBBcZ0m0iAZsPpuMwb18bUoxL0eaSjIKLe7/fRPxyclMtj4etKxUjvZSixUai0PNBp9dE3gvFPl3dnHr2Cd389rNhb221vrHZbLb/2htlm2pwXCuIvQAAAABJRU5ErkJggg==","orcid":"https://orcid.org/0000-0003-2647-0097","institution":"Kwame Nkrumah University of Science and Technology","correspondingAuthor":true,"prefix":"","firstName":"Kwame","middleName":"Atta","lastName":"Gyamfi","suffix":""},{"id":634116523,"identity":"9c84f092-4580-448d-96a9-dcd273d4db68","order_by":1,"name":"Gabriel Obed Fosu","email":"","orcid":"","institution":"Kwame Nkrumah University of Science and Technology","correspondingAuthor":false,"prefix":"","firstName":"Gabriel","middleName":"Obed","lastName":"Fosu","suffix":""},{"id":634118152,"identity":"965f19ab-2368-4b10-9a18-5f103bb619b6","order_by":2,"name":"Joseph Abeiku Ackora-Prah","email":"","orcid":"","institution":"Kwame Nkrumah University of Science and Technology","correspondingAuthor":false,"prefix":"","firstName":"Joseph","middleName":"Abeiku","lastName":"Ackora-Prah","suffix":""},{"id":634118153,"identity":"5671f392-eeed-401b-b9f0-fdfb684d6368","order_by":3,"name":"Joseph Agyapong Mensah","email":"","orcid":"","institution":"Ashesi University","correspondingAuthor":false,"prefix":"","firstName":"Joseph","middleName":"Agyapong","lastName":"Mensah","suffix":""},{"id":634126374,"identity":"f26c6f1b-50b0-4c93-b8a4-392bbe373431","order_by":4,"name":"Winner Mawuanam Komla Adufutse","email":"","orcid":"","institution":"Kwame Nkrumah University of Science and Technology","correspondingAuthor":false,"prefix":"","firstName":"Winner","middleName":"Mawuanam Komla","lastName":"Adufutse","suffix":""},{"id":634126376,"identity":"75513d7d-155b-4126-bd99-6c863a41726b","order_by":5,"name":"Frimpong Caleb Ampong","email":"","orcid":"","institution":"Kwame Nkrumah University of Science and Technology","correspondingAuthor":false,"prefix":"","firstName":"Frimpong","middleName":"Caleb","lastName":"Ampong","suffix":""}],"badges":[],"createdAt":"2026-05-04 10:58:27","currentVersionCode":1,"declarations":{"humanSubjects":false,"vertebrateSubjects":false,"conflictsOfInterestStatement":false,"humanSubjectEthicalGuidelines":false,"humanSubjectConsent":false,"humanSubjectClinicalTrial":false,"humanSubjectCaseReport":false,"vertebrateSubjectEthicalGuidelines":false},"doi":"10.21203/rs.3.rs-9607535/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-9607535/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":108511202,"identity":"111cec1e-ffc6-4368-9cf5-fc8490bf3493","added_by":"auto","created_at":"2026-05-05 12:42:07","extension":"pdf","order_by":1,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":616437,"visible":true,"origin":"","legend":"","description":"","filename":"LLMgradingpaper7.pdf","url":"https://assets-eu.researchsquare.com/files/rs-9607535/v1_covered_a04a2084-1b64-45ff-b3ad-9a0ae6e0ea56.pdf"}],"financialInterests":"The authors declare no competing interests.","formattedTitle":"\u003cp\u003eStructured Grading for Advanced University Mathematics Tests Using LLM Agentic Workflow with Accuracy and Methodology Metrics\u003c/p\u003e","fulltext":[],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":false,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":true,"hideJournal":true,"highlight":"","institution":"Kwame Nkrumah University of Science and Technology","isAcceptedByJournal":false,"isAuthorSuppliedPdf":true,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":true,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"Agentic AI, Large Language Models, Natural language processing, partial differential equation, discrete mathematics, essay-based assessments, Automated grading, Multi-agent systems, Mathematics assessment, Open-ended responses","lastPublishedDoi":"10.21203/rs.3.rs-9607535/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-9607535/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eAssessment of open-ended work in university mathematics (e.g., proofs and multi-line calculations) is labour-intensive and subject to rater variability. Recent advances in natural-language processing (NLP) and large language models (LLMs) have enabled automated scoring of open responses with increasing agreement to human judgement, but validity, transparency, and rubric alignment remain open challenges in higher education contexts. This paper proposes two structured grading evaluation frameworks for university-level mathematics, suitable for both standalone and agent-based LLM grading, and designed to accommodate both proof-based and analytical responses. \u0026nbsp;The resulting frameworks are implemented over student responses provided in academically rigorous setting, and compared against human-assigned grades (expert grading) as ground truth.\u0026nbsp;Initial results show that stand-alone LLMs align reasonably well with human graders on reasoning-centred tasks but struggle on analytically demanding problems, while the Agent-based framework demonstrates more consistent performance across both domains and better recognizes partial correctness. These findings highlight the limitations of prompt-only grading and demonstrate that structured, multi-agent workflows offer a viable path toward transparent, reliable, and pedagogically aligned automated assessment in advanced mathematics education.\u003c/p\u003e","manuscriptTitle":"Structured Grading for Advanced University Mathematics Tests Using LLM Agentic Workflow with Accuracy and Methodology Metrics","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2026-05-05 12:40:47","doi":"10.21203/rs.3.rs-9607535/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"ec69bc27-c0ee-4d80-947a-b78d7ec407a9","owner":[],"postedDate":"May 5th, 2026","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[{"id":67476242,"name":"Artificial Intelligence and Machine Learning"},{"id":67476243,"name":"Educational Philosophy and Theory"}],"tags":[],"updatedAt":"2026-05-05T12:40:47+00:00","versionOfRecord":[],"versionCreatedAt":"2026-05-05 12:40:47","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-9607535","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-9607535","identity":"rs-9607535","version":["v1"]},"buildId":"XKTyCvWXoU3ODBz1xrDgd","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.