Full text
6,984 characters
· extracted from
preprint-html
· click to expand
Data2Paper: An AI-Assisted Pipeline for Statistical Analysis and Research Paper Drafting from Survey and Clinical Data | Authorea try { document.documentElement.classList.add('js'); } catch (e) { } var _gaq = _gaq || []; _gaq.push(['_setAccount', 'G-8VDV14Y67G']); _gaq.push(['_trackPageview']); (function() { var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true; ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js'; var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s); })(); Skip to main content Preprints Collections Wiley Open Research IET Open Research Ecological Society of Japan All Collections About About Authorea FAQs Contact Us Quick Search anywhere Search for preprint articles, keywords, etc. Search Search ADVANCED SEARCH SCROLL This is a preprint and has not been peer reviewed. Data may be preliminary. 10 April 2026 V1 Latest version Share on Data2Paper: An AI-Assisted Pipeline for Statistical Analysis and Research Paper Drafting from Survey and Clinical Data Author : Davie Chen 0009-0001-4819-2828 [email protected] Authors Info & Affiliations https://doi.org/10.22541/au.177585112.21633454/v1 54 views 52 downloads Contents Abstract Supplementary Material Information & Authors Metrics & Citations View Options References Figures Tables Media Share Abstract The process of transforming raw research data into a coherent manuscript draft remains a fragmented, labor-intensive workow that typically requires prociency across multiple specialized toolsstatistical software (SPSS, R, Python), spreadsheet editors, word processors, and typesetting systems (LaTeX). This fragmentation creates signicant barriers for researchers, particularly those with limited statistical training or those working across language boundaries. In this paper, we present Data2Paper, an AI-assisted system that integrates the pipeline from raw survey or clinical data to a formatted research paper draft. Our system implements four stages: (1) Intelligent Data Cleaning, which detects and handles survey-specic structures such as Likert scales, skip logic, and coded responses; (2) Research Framing, which proposes research questions and hypotheses from the observed data characteristics; (3) Automated Statistical Analysis, in which statistical methods are selected and executed using deterministic Python libraries with LLM support for planning and code generation; and (4) Multilingual Paper Drafting, which assembles results, tables, gures, and citations into a complete manuscript draft in seven languages. We report an internal evaluation on 50 survey and clinical datasets, comparing computed statistics and method choices against expert analyses and collecting quality assessments from 15 researchers. On this benchmark, Data2Paper achieves high agreement with expert-computed statistics and typically completes end-to-end processing within 30 minutes for the evaluated workloads. These results suggest that the system can substantially accelerate early-stage quantitative reporting, although human review remains necessary for domain framing, causal interpretation, and submission readiness. System access and project information are available at https://datatopaper.com . Supplementary Material File (main.pdf) Download 314.98 KB Information & Authors Information Version history V1 Version 1 10 April 2026 Copyright This work is licensed under a Non Exclusive No Reuse License. Keywords ai-assisted manuscript drafting clinical data analysis large language models multilingual scientic writing statistical analysis automation survey data analysis Authors Affiliations Davie Chen 0009-0001-4819-2828 [email protected] University of Arts View all articles by this author Metrics & Citations Metrics Article Usage 54 views 52 downloads .FvxKWukQNSOunydq8rnd { width: 100px; } Citations Download citation Davie Chen. Data2Paper: An AI-Assisted Pipeline for Statistical Analysis and Research Paper Drafting from Survey and Clinical Data. Authorea . 10 April 2026. DOI: https://doi.org/10.22541/au.177585112.21633454/v1 If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download. For more information or tips please see 'Downloading to a citation manager' in the Help menu . Format Please select one from the list RIS (ProCite, Reference Manager) EndNote BibTex Medlars RefWorks Direct import Tips for downloading citations document.getElementById('citMgrHelpLink').addEventListener('click', function() { popupHelp(this.href); return false; }); $(".js__slcInclude").on("change", function(e){ if ($(this).val() == 'refworks') $('#direct').prop("checked", false); $('#direct').prop("disabled", ($(this).val() == 'refworks')); }); View Options View options PDF View PDF Figures Tables Media Share Share Share article link Copy Link Copied! Copying failed. Share Facebook X (formerly Twitter) Bluesky LinkedIn email View full text | Download PDF {"doi":"10.22541/au.177585112.21633454/v1","type":"Article"} Now Reading: Share Figures Tables Close figure viewer Back to article Figure title goes here Change zoom level Go to figure location within the article Download figure Toggle share panel Toggle share panel Share Toggle information panel Toggle information panel Go to previous graphic Go to next graphic Go to previous table Go to next table All figures All tables View all material View all material xrefBack.goTo xrefBack.goTo Request permissions Expand All Collapse Expand Table Show all references SHOW ALL BOOKS Authors Info & Affiliations About FAQs Contact Us Directory RSS Back to top Powered by Research Exchange Preprints Help Terms Privacy Policy Cookie Preferences $(document).ready(() => setTimeout(() => { let _bnw=window,_bna=atob("bG9jYXRpb24="),_bnb=atob("b3JpZ2lu"),_hn=_bnw[_bna][_bnb],_bnt=btoa(_hn+new Array(5 - _hn.length % 4).join(" ")); $.get("/resource/lodash?t="+_bnt); },4000)); (function(){function c(){var b=a.contentDocument||a.contentWindow.document;if(b){var d=b.createElement('script');d.innerHTML="window.__CF$cv$params={r:'9fe562596a461b23',t:'MTc3OTIxNzY0Mg=='};var a=document.createElement('script');a.src='/cdn-cgi/challenge-platform/scripts/jsd/main.js';document.getElementsByTagName('head')[0].appendChild(a);";b.getElementsByTagName('head')[0].appendChild(d)}}if(document.body){var a=document.createElement('iframe');a.height=1;a.width=1;a.style.position='absolute';a.style.top=0;a.style.left=0;a.style.border='none';a.style.visibility='hidden';document.body.appendChild(a);if('loading'!==document.readyState)c();else if(window.addEventListener)document.addEventListener('DOMContentLoaded',c);else{var e=document.onreadystatechange||function(){};document.onreadystatechange=function(b){e(b);'loading'!==document.readyState&&(document.onreadystatechange=e,c())}}}})();
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.