Data Analysis in Excel and R: A Comparative Evaluation | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article Data Analysis in Excel and R: A Comparative Evaluation Rohan Magar This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-8299990/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract The proposed study is relevant since a comparative analysis of Excel and R data analyzers is necessary due to the fact that the development of large and complex data in various domains has escalated the demand for data analysis tools that can produce consistent, reliable, and reproducible data. Although Excel is still popular because of its accessibility and simplicity in learning, the available literature raises questions about the statistical accuracy of manually processing data and the high probability of error by the user in the spreadsheet-based analysis. It has also been revealed in research that spreadsheets have undiscovered errors more often and that people who use them have a tendency to be overconfident about the output, casting doubt on their suitability in utilizing Excel in higher levels of analysis. More rigorous data analysis R is an ideal programming environment with more complex statistical modeling and enhanced reproducibility in contrast to programming environments like R. Thus, this study will also determine whether Excel and R are effective in general data analysis tasks. The data utilized in the investigation comprises 5901 records and applies the following same procedures in either of the two tools: data cleaning, descriptive statistics, correlation analysis, regression modeling, and visualization. The findings indicate that Excel is only efficient and error-free to the extent of basic analysis and simple outputs but is inefficient, and errors are more likely to occur in cases where the tasks are more complex. This indicates that R is more useful in terms of analytical capacity and reliability, whereas Excel can be applied in introductory and simple data analysis. Data analysis Microsoft Excel R programming regression correlation statistical modeling Figures Figure 1 Figure 2 Figure 3 Introduction The growing presence of massive and intricate datasets in business, healthcare, government services, and other areas has compounded the demand for analytical tools helpful to processing raw data into meaningful and credible information. The choice of analytical software has grown to be a significant methodological issue in the evaluation of outcomes and decisions made by organizations and researchers that heavily employ data analysis as an informational tool. Microsoft Excel has been one of the most popular data analysis tools since it is easy to access because of familiarity and a low entry level. Its inbuilt support spreadsheet format and graphical features allow the user with little statistical background to perform simple analysis and produce graphical results. Nevertheless, a significant amount of literature has posed questions on the process of spreadsheet-based analysis as to its reliability and strength. Inaccuracies in Excel's statistical functions are reported in the studies to be inconsistent in computational procedures and not allowing advanced modeling. Moreover, the spreadsheet analysis is extremely susceptible to the human factor since research has demonstrated that even well-constructed spreadsheets often harbor hidden faults and that the user of the sheet is always overconfident about the correctness of his or her work. The above problems compromise the reproducibility and validity of the analytical outcomes, especially in high-risk or complex procedures. Compared to programming-oriented environments like R, contrast programming-oriented environments offer more advanced statistical modeling functions and raise reproducibility by executing the scripts after a programming language. R is also open-source, allowing more complex exploratory and inferential analysis to be performed. Regardless of these benefits, comparatively small amounts of empirical research directly compare Excel and R in terms of their use on the same tasks of analysis, leaving a gap in knowledge about the differences between these tools in their practical use and reliability in delivering analytical results. The research paper will fill this gap by performing a systematic comparative analysis of Excel and R based on a dataset with 5901 records. The same analytical measures involving data cleaning, descriptive statistics, correlation analysis, regression modeling, and visualization were in both environments to determine their effectiveness, efficiency, and possible error sources. The results are expected to deliver evidence-based knowledge that will assist the users in making the correct choices among the tools in accordance with their complexity and analytical requirements. Literature Review The increasing access to big and complicated data in business, medical services, and government programs has increased the necessity to apply analytical applications that might transform raw data into knowledge that is usable. The literature on big data emphasizes the increased reliance of the organization on analytics to optimize prediction and support decision-making using a set of semi-structured and unstructured data that can be structured or unstructured [ 1 ] [ 2 ]. An example of this is in finance, where big data analytics and machine learning have moved into the center of attention in anomalous behavior detection and fraud prevention in large-volume transactional settings [ 3 ]. In other areas like process mining and event-log analysis, the integration of visual analytics and algorithmics can be beheld to comprehend the behavior of a complex workflow and the system [ 19 ]. These innovations are accompanied by the development of real-time and advanced data analysis that underlines the necessity of the tools that would be able to handle high-velocity and continuous streams of data. Considered in this larger framework, Microsoft Excel and the R programming language have been recognized as two highly utilized environments for teaching and developing methodological applied data analysis. Excel as a Platform for the Applied Data Analysis Excel is still pervasive in education and practice because many people have heard of it, it is widely available, and it is relatively easy to learn. Studies in applied fields such as strength and conditioning have shown that coaches can use Excel to compute reliability indices (e.g., coefficient of variation), smallest worthwhile change effect sizes, and correlations, thus linking testing data to training decisions without specialized software [ 4 ]. In engineering and reliability analysis, Excel-based linear regression and distribution fitting workflows have been proposed for identifying appropriate lifetime or failure time distributions using correlation-based criteria, with the advantage that the entire procedure can be implemented through standard spreadsheet functions and charts [ 5 ]. Basic statistics tutorials also demonstrate the method of performing descriptive measures (means, medians, and measures of dispersion) and graphical summaries in Excel in line with general statistical theory [ 7 ]. Pedagogical research suggests that many learners have limited experience using Excel for data analysis initially, but a short-duration focused training boosts their competence as well as their intention to use Excel for future analytical work [ 6 ]. Additional methodological contributions are given, which show how to use Excel for more advanced tasks. For example [ 9 ], present step-by-step spreadsheets for meta-analysis including both fixed and random-effects models and custom forest plots, making evidence synthesis accessible to users that only have Excel [ 9 ]. In time-series applications Excel's FORECAST.ETS function has been compared with several SAS forecasting tools, highlighting that while Excel can implement the exponential smoothing methods in a convenient way, more advanced platforms better support complex time series families such as ARIMA and state space models [ 8 ]. However, several authors identify substantive limitations of the statistical capabilities of Excel. An extensive evaluation of Excel 2010 concludes that though Microsoft improved a number of statistical functions and the random number generator compared to previous versions, inaccuracies and inconsistencies still exist in comparison with reference implementations and dedicated statistical software [ 10 ]. Similar concerns emerge in the time series comparison in which Excel is made out to be appropriate for simpler forms of exponential smoothing but less suitable for the more complex families of models found in special-purpose software [ 8 ]. These results indicate that, while Excel can be a useful teaching and prototyping environment, one must be careful about using this program for advanced and/or high-stake statistical analysis. Spreadsheet Risk: Human Error & Overconfidence Beyond algorithmic accuracy, spreadsheet-based analysis is prone to human error. Panko's synthesis of evidence from both spreadsheet research and work on broader human error issues indicates that even with low per-cell error rates, the likelihood that a large spreadsheet has at least one significant error is very high [ 11 ]. Empirical studies also indicate that such errors are hard to identify and that developers as a rule overestimate the correctness of their own models [ 11 ]. Building on this [ 12 ]. Explore end-user overconfidence in spreadsheet development and demonstrate how teaching error taxonomies that illustrate real "spreadsheet horror stories" and introducing best practice guidelines can help reduce the rates of error as well as misplaced confidence among student developers [ 12 ]. Together these studies emphasize that using Excel for data analysis is not only a technical question but also a problem of human factors, in which aspects such as testing, auditing, and user education are quite important. R and Today Software Tools for Data Analysis In parallel with continued use of Excel, R and other programming environments have become the central part of modern statistics and data science. Comparative work on software for teaching analytics points out that Python and R are lush ecosystems for mathematics, statistics, and data analysis, offering bountiful open-source libraries and interactive workflows [ 14 ]. A broader comparison of software for analysis for instruction uses a Task-Technology Fit framework. Excel, Python, and R are evaluated for common data tasks (including reading, preprocessing, descriptive statistics, probability distributions, hypothesis testing, and regression) [ 15 ]. This work concludes that although Excel has a lot of basic things already, which can be accomplished with menu-driven functions, R and Python offer greater flexibility, better support for complex modeling, and richer capabilities for reproducible workflows. The strength of R is also demonstrated by contributions on the level of packages. The SmartEDA package automates a large part of exploratory data analysis by classifying variables to automatically generate descriptive statistics and data visualizations and compute measures such as information value and weight of evidence, reducing manual coding efforts and the risk of ad hoc mistakes [ 16 ]. In health economics [ 17 ], they discuss the limitations of Excel and specialist GUI software (such as TreeAge) in analyzing model complexity, uncertainty, and reproducibility in health technology assessment; they advocate the use of modern programming languages such as R and Python to build clinically realistic models, quantify decision uncertainty, and create transparent, reproducible analyses [ 17 ]. R vs. Excel: Comparative Evidence. Under a head-to-head comparison of Excel vs. R on specific tasks, it shows the overlap and also shows the differences between the two tools. Das et al. investigate, in the context of the Poisson distribution, simulation using Excel and R for comparison of the mean squared error for parameter estimates versus sample size and parameter values [ 13 ]. Their results suggest that Excel is in some settings able to perform comparably or even superiorly, but R generally provides more control when it comes to simulation and distributional assumptions [ 13 ]. When combined with the documented problems with the accuracy of Excel's statistical functions [ 10 ] and plenty of evidence of spreadsheet error and overconfidence risks [ 11 ] [ 12 ], you can see that these findings bolster the case for R's superiority with advanced simulation modeling and high-stakes inference. At the same time, Excel doesn't lose its important advantages. It is ubiquitous, easy to learn, and well suited for smaller data sets, quick exploratory works, and communication with non-technical stakeholders. [ 4 ][ 7 ][ 9 ]. R, in contrast, has a steeper initial learning curve but allows more sophisticated analytics capabilities combined with better integration with big data and visual analytics workflows and better support for reproducibility and automation. [ 14 ][ 19 ]. Synthesis and Gap Overall, the literature introduces Excel and R as complementary but not mutually exclusive tools: Excel is far better (pun intended) at accessibility and interactive mutually exclusive exploration in a short time, whereas R has depth (flexibility and complexity) capability for complex analyses. Existing studies, however, tend to focus either on Excel (accuracy, human error, teaching) or R and other programming languages (capabilities, automation, HTA, EDA) with relatively few integrated empirical evaluations that compare Excel and R across the full lifecycle of data analysis from data cleaning and exploration through modeling, simulation, and reporting under realistic user conditions. The present study thus serves a clear gap in curating a structured comparative analysis on data analysis in Excel and R from data informed by evidence about accuracy and usability, human error, and domain-specific applications. Methodology The research used the comparative analysis method to assess the efficacy of Microsoft Excel and the R programming language in performing general data analysis duties. The aim of the methodology was to use the same procedure of the analytical in the two settings, such that the variation in performance, accuracy, and reliability can be ascribed to the tools instead of the process variation. Each of the steps was implemented in a manner that was created to indicate realistic workflows that are often employed in academic and business analytics environments. Research Approach The study adopted an experimental design whereby both Excel and R were independently applied to analyze the same dataset (5901 records). Every tool was assigned to carry out a sequence of routine analytical functions such as cleaning of data, descriptive statistics, correlation analysis, regression modeling, and visualization. The parallel workflow of the study also provided a level of control to the basis of the comparison, enabling relevant assessment of the strengths and limitations of each of the tools. Dataset Description The dataset of this research is the Superstore Sales dataset that is frequently used as a study and analysis tool. It holds comprehensive transactional data of a retail superstore in terms of the customer demographics, product category, shipping data, and financial results (sales, discount, quantity, and profit). The dataset is quite appropriate to conduct this comparative study since it: is a combination of a numerical and categorical variable, is a realistic business operation, has adequate volume (5,901 rows) to measure differences in performance with it, companies' correlation, regression, and visual analyses. Fields such as Order Date, Ship Date, Region, Category, Sub-Category, Sales, Discount, Quantity, and Profit are the key ones. The richness and variety of these variables offer a sufficient basis to compare the math power in Excel and R. Data Preparation The data was first of all imported into both Excel and R, and the same cleaning procedures were performed there. These processes involved authenticating data types, uniform date forms, and absence of missing or disagreeable values, and analyzing the existence of outliers in the numerical data of Sales, Quantity, Discount, and Profit. Although these ones needed a mix of manual and formula-based analyses in Excel, the work was performed in R in brief script-based commands. This difference gave a chance to make an early observation of process automation and repeatability differences between the two settings. Analytical Procedures Descriptive Statistics Key variables had summary statistics produced in the form of means, medians, standard deviations, and frequency distributions. Excel used some inbuilt functions and PivotTables as opposed to tidyverse ecosystem functions used in R. This step showed the difference in the philosophy of the two tools: the interactive and user-driven nature of the Excel tool versus the code-based nature of R. Correlation Analysis Pearson correlation coefficients were calculated to investigate the relationship between the numerical variables. This was done with the Data Analysis Toolpak available in Excel and R with the cor and corrplot functions with visualization packages. The analysis gave an understanding of the ability of each tool to be statistically accurate or to be able to communicate graphs. Regression Modeling To predict the profit, a multilinear regression model was developed with sales, quantity, and discount as the predictors of the profit. Both R and Excel were able to give estimates of the coefficients, the level of significance, and the model fit (R²). The comparisons of the outputs made it possible to compare the precision of computations, the management of residues, and the diagnostic information provided by each of the tools. Visualization There were no differences in the production of identical visualization forms such as histograms, scatterplots, and category-by-category representations in the two environments. Excel charting was also contrasted with the ggplot2 package of R, especially in its flexibility, legibility, and reproducibility. Evaluation Criteria Five criteria were used to measure the performance of Excel and R: Accuracy Did both tools give similar and statistically accurate results? Reproducibility It was measured by how far one could repeat analyses without a difference or error. Efficiency The number of steps, time, and automation needed to accomplish each task. Complexity Processing : Both tools have the capability to handle the data, perform multi-stage analysis, and do advanced modeling. Possibility of human error The possibility of errors that might be introduced by the user, especially on formula-driven operations in Excel as compared to script-driven performance in R. Tool Specifications Microsoft Excel (Office 365/Excel 2021 or further) and R version 4.x were used to run the analyses. The R packages were tidyverse, ggplot2, lubridate, and corrplot. These settings were also chosen due to their ubiquity in the field of academic and professional analytics. Methodological Limitations Even though comparison was controlled by the methodology, there are some limitations. Excel can also have different results depending on how the user uses it, whereas R has programming designs that are dependent on the skills that require one to know how to program. In addition, the analysis is based on a single dataset and a series of generic analytical actions; the findings could also be diverse under circumstances of very large-scale data, real-time analysis, or customized models. Results This section presents the findings obtained from applying identical analytical procedures in Microsoft Excel and R using the Superstore data set having 5901 records. The results illustrate where both of the tools give consistent output and where differences arise between the tools with respect to analysis depth, efficiency, and visualization abilities. Descriptive Statistics Both Excel and R calculated the same statistical summary for Sales, Profit, and Quantity, proving 100% computational agreement between these two software. As shown in Table 1 , sales and profit have a lot of variability, whereas it is very concentrated around zero. Table 1 Descriptive Statistics for Numerical Variables (Excel vs. R) Variable Tool Mean Median SD Sales Excel 265.34 128.64 474.26 R 265.34 128.64 474.26 Profit Excel 29.700 8.50 259.58 R 29.70 8.50 259.58 Quantity Excel 3.78 3 2.21 R 3.78 3 2.21 While the numbers themselves were identical, the process used to get them was very different. Excel involved several manual operations that used formulas and PivotTables, while in R you simply executed the same results using scripted commands. This means that while Excel is good for simple summarization, R is better in terms of efficiency and reproducibility for descriptive analysis. Correlation Analysis Pearson correlation coefficients done in both the Excel software and the R software proved to be identical for all the variable pairs. The results, presented in Table 2 , have an overall similar result: there is a moderate positive correlation between sales and profit and a strong negative correlation between discount and profit. From this we can infer that increased discounting is a reliable way of reducing profitability. Table 2 Comparison of Pearson Correlation Matrices in R and Excel R Programming Sales Quantity Profit Excel Sales Quantity Profit Sales 1.0000 0.2024 0.3259 Sales 1.0000 0.2024 0.3259 Quantity 0.2024 1.0000 0.0748 Quantity 0.2024 1.0000 0.0748 Profit 0.3259 0.0748 1.0000 Profit 0.3259 0.0748 1.0000 Although Excel offered a simple numerical correlation matrix, R created a heatmap visualization ( Fig. 1 ) of the relationships that made the data easier to interpret. The heatmap was able to clearly show the effect and this expected positive relationship between sales volume and profit. This goes to show R's edge when it comes to visually communicating analytical findings. Regression Modeling A multiple linear regression model to predict profit by sales and quantity was estimated in both tools. The coefficients, present in Table 3 , were the same in Excel and R. Sales proved to be an important positive predictor of profit, strongly negatively affecting profit. Quantity had a positive relationship but was smaller than the other predictors. Table 3 Comparison of Regression Results in Excel and R Dependent Variable: Profit Predictor Excel: Estimate (β) R: Estimate (β) Excel SE R SE Excel t-value R t-value Excel p-value R p-value Intercept −21.4314 −21.4314 6.3653 6.3653 −3.3669 −3.367 0.000765 0.000765 Sales 0.17738 0.17738 0.00688 0.00688 25.7808 25.781 5.682E-139 < 2e-16 Quantity 1.07510 1.07510 1.47452 1.47452 0.7291 0.729 0.46595 0.46595 Model Fit Comparison Statistic Excel R Multiple R 0.3260 0.3260 R Square 0.1063 0.1063 Adjusted R Square 0.1060 0.1060 Standard Error (Residual SE) 245.4459 245.4 F-Statistic 350.7691 350.8 Significance F / p-value 1.1559E-144 < 2.2e-16 Observations 5901 5901 Although the basic regression output was the same, R added other information about the regression, such as plots of residual plots, checking the assumptions of the model, and doing significance tests. Excel provided nothing more than the basic regression table without adding additional steps manually or through external add-ins. These differences reflect R's superior ability to conduct advanced statistical modeling. Visualization Comparison Both Excel and R easily produced visualizations such as histograms, scatterplots, and comparisons by category. However, the quality and the depth of analysis of the outputs were different. Excel's visualizations were simple and appropriate for business reporting, but manual formatting was a required step to help make data clearer. R's ggplots for visualization were more refined and consistent with superior customization and interpretability. For example, the column and line chart of sales vs. profit in Fig. 2 (Excel) looks functional but minimal, while the analogous plot in Fig. 3 (R) includes smoothing lines, confidence intervals, and better aesthetics that can be used to see underlying patterns much better. Comparative Summary · A comparative analysis between two tools shows some important observations: · Accuracy: Both Excel and R came up with the same statistical values in all the tests. · Reproducibility: We get the full reproducibility from R scripting; Excel highly depends upon manual operations. · Efficiency: Fewer steps were needed by R for correlation, regression, and visualization. · Visualization Quality: More flexible and publication-quality figures were created with R. · Error Risk: 1. Excel - Manual formulas: reveals that they were vulnerable to users' errors. 2. R—Minimize this by scripted workflow. · Complexity Handling: Handling of advanced modeling and diagnostic analysis was better in R. Table 4 Comparative Tasks in Excel and R Task Excel R Descriptive Statistics Manual formulas / PivotTable One script line Correlation ToolPak required cor() function Regression Limited output Full statistical model Heatmap Not available corrplot() Visualization Basic charts Publication-quality Reproducibility Low High Efficiency Slow for many tasks Fast Error Risk High Low Handling Large Data Weak Strong Overall Findings The results show that Excel is well-suited to introductory and low-complexity analytical tasks, especially for users that like to use an interactive interface. However, R was always superior to Excel in terms of analytical depth, reproducibility, automation, and quality of visualization. These findings show that although Excel is still useful for simple analysis, R is more reliable and powerful for more advanced data analysis and statistical modeling. Discussion The aim in this study was to compare Microsoft Excel and R systematically to find out which is more effective in a general data analysis workflow using a common dataset of 5,901 Superstore records. Although both tools gave the same number as the output for descriptive statistics, correlation, and regression, significant differences were found in the reproducibility evaluation, workflow design, automation capacity, and quality of visualization. The descriptive statistics obtained in Excel and R were numerically the same, indicating that the basic measures produced by both tools are accurate. This is in agreement with findings reported in the literature that Excel is fit for elementary statistical tasks and business reporting applications [ 5 ], [ 7 ]. However, the process used to arrive at these results was very different. Excel took multiple manual steps (using formulas or PivotTables), whereas R gave the same results using a single scripted command. This requirement of manual operations in Excel results in more likely user-based errors, which are well documented in spreadsheet error research [ 11 ], [ 12 ]. Correlation analysis also showed the complete numerical agreement between the Data Analysis ToolPak in Microsoft Excel and the cor() function in the statistical program R. However, R supported more sophisticated visualizations, such as correlation heatmaps, which allowed for more straightforward interpretation of relationships. The fact that R is capable of communicating statistical patterns visually is in agreement with the contemporary views on the importance of visual analytics in data interpretation [ 19 ]. Regression modeling results were the same on both tools, confirming that Excel's regression engine and R's lm() function make equivalent coefficient estimates, significance levels, and model-fit statistics. Nevertheless, R had rich diagnostic output such as residual plots and various assumption checks and model fit summaries, whereas Excel has simply the basic regression tables unless extra steps or add-ins were used. This supports earlier comparisons that found programming-based tools are more flexible and reproducible and more diagnostic than spreadsheet environments [ 15 ], [ 17 ]. The best contrast was produced by visualization. While charts created in Excel were functional and good enough for business applications, R produced publication-quality charts with better customization capabilities along with smoothing and interpretability using the ggplot2 package. These findings are in line with past research pointing to the strength of R in exploring data and the automatic production of graphs [ 16 ]. Overall, with the discussion, it appears that Excel is good for beginners and tasks with low complexity, while R provides clear advantages in automation, advanced modeling, reducing error, visualization capability, and reproducibility. These results add weight to concerns about the risk of spreadsheets raised in the existing literature [ 10 ]-[ 12 ] and provide support for broader recommendations to use programming-based tools when data analysis requirements are so demanding that, as appealing and powerful as spreadsheets are to analysts [ 14 ], [ 17 ]. Conclusion This study offered a well-organized comparison of Microsoft Excel and R based on the same data analysis procedures, such as data cleaning, descriptive statistics, correlation analysis, regression modeling, and visualization. Although both tools were found to give identical numerical results on all of the statistical computations, considerable differences appeared in the workflow efficiency, reproducibility, diagnostic capabilities, and visualization quality. Excel showed a strength in terms of usability, accessibility, and suitability for simple statistical summaries and business-specific graphical outputs. Its interface is available to the user with little statistical training and provides the ability to perform fundamental analysis with relative ease. However, Excel relies on manual operations, which increases the chances of human error and limits the reproducibility, especially if different analytical steps are needed. In comparison, R provided a better and more scalable analytical environment. Its script-based workflow guarantees complete reproducibility, reduces user-induced error, and allows for considerable flexibility for sophisticated modeling and/or diagnostic evaluation. R also provided better visualization with automated and publication-quality graphics. These, alongside R's versatility, flexibility, scalability, and ability to work with large datasets, make R particularly suitable for academic research, complex modelling, and any application that involves using big data. Finally, Excel is good when one has to do intro partite analytics and quickie, one-off exploratory work, whereas R is stronger and sure in its applications, reproducible work processes, and standard data-inspect tasks of data scientists. Users and organizations are advised to choose the tool according to the complexity of the analysis, requirements for reproducibility, and the level of skill of the analyst. Declarations Author Contribution R.M. conceived the study, designed the methodology, performed the data analysis in both Excel and R, generated all tables and figures, interpreted the results, and wrote the full manuscript. R.M. reviewed and approved the final version of the manuscript. References D. Darwish “Big Data Issues: Analytics and Security” in Encyclopedia of Information Science and Technology 6th ed. Hershey PA USA: IGI Global n.d. ch. 20 doi: 10.4018/978-1-6684-7366-5.ch020 . C. A. Udeh O. H. Orieno O. D. Daraojimba N. L. Ndubuisi and O. I. Oriekhoe “Big Data Analytics: A Review of Its Transformative Role in Modern Business Intelligence” Computer Science & IT Research Journal vol. 5 no. 1 pp. 219–236 Jan. 2024 doi: 10.51594/csitrj.v5i.718 . P. O. Shoetan A. T. Oyewole C. C. Okoye and O. C. Ofodile “Reviewing the Role of Big Data Analytics in Financial Fraud Detection” Finance & Accounting Research Journal vol. 6 no. 3 pp. 384–394 Mar. 2024 doi: 10.51594/farj.v6i3.899 . A. Turner J. Brazier C. Bishop S. Chavda J. Cree and P. Read “Data Analysis for Strength and Conditioning Coaches: Using Excel to Analyze Reliability Differences and Relationships” Strength and Conditioning Journal vol. 37 no. 1 pp. 76–83 Feb. 2015. J. Qin Y. Niu and Z. Li “A Data Analysis Method and Its Applications in EXCEL” Journal of Software vol. 9 no. 12 pp. 2998–3004 Dec. 2014 doi: 10.4304/jsw.9.12.2998-3004 . A. Nath “Analyzing Learners Perception about Data Analysis on Microsoft Excel” Learning Community vol. 12 no. 2 pp. 95–100 2021 doi: 10.30954/2231-458X.02.2021.2 . D. Divisi G. Di Leonardo G. Zaccagna and R. Crisci “Basic Statistics with Microsoft Excel: A Review” Journal of Thoracic Disease vol. 9 no. 6 pp. 1734–1740 Jun. 2017 doi: 10.21037/jtd.2017.05.81 . D. Rahardja “Advantages and Disadvantages Comparison of Time-Series Forecast Computing via Microsoft Excel and Three Different SAS Products” Journal of Statistics & Management Systems vol. 27 no. 4 pp. 769–784 2024 doi: 10.47974/JSMS-983 . J. L. Neyeloff S. C. Fuchs and L. B. Moreira “Meta-Analyses and Forest Plots Using a Microsoft Excel Spreadsheet: Step-by-Step Guide Focusing on Descriptive Data Analysis” BMC Research Notes vol. 5 no. 52 pp. 1–6 2012 doi: 10.1186/1756-0500-5-52 . G. Mélard “On the Accuracy of Statistical Procedures in Microsoft Excel 2010” Computational Statistics & Data Analysis (preprint) 2012. R. R. Panko “What We Don’t Know About Spreadsheet Errors Today: The Facts Why We Don’t Believe Them and What We Need to Do” in Proc. EuSpRIG Conf. Spreadsheet Risk Management 2015 pp. 1–15. L. Raković M. Sakal and V. Vuković “Improvement of Spreadsheet Quality through Reduction of End-User Overconfidence: Case Study” Periodica Polytechnica Social and Management Sciences vol. 27 no. 2 pp. 119–130 2019 doi: 10.3311/PPso.12392 . T. Deb Roy D. Bhattacharjee and K. K. Das “Comparing the Ability of MS Excel and R While Simulating from Poisson Distribution” Assam University Journal of Science and Technology vol. 4 no. 2 pp. 1–6 2009. C. Ozgur S. Jha and Y. Shen “Comparison and Contrast of Statistics Software Packages Including R and Python for Teaching Purposes” working paper n.d. S. R. Antony “Comparison of Data Analysis Software for Instructional Use” American Journal of Information Technology vol. 12 no. 2 pp. 25–34 2022. S. Putatunda D. Ubrangala K. Rama and R. Kondapalli “SmartEDA: An R Package for Automated Exploratory Data Analysis” Journal of Open-Source Software vol. 4 no. 41 art. 1509 2019 doi: 10.21105/joss.01509 . D. Incerti H. Thom G. Baio and J. P. Jansen “R You Still Using Excel? The Advantages of Modern Software Tools for Health Technology Assessment” Value in Health vol. 22 no. 5 pp. 575–579 2019 doi: 10.1016/j.jval.2019.01.003 . N. R. Naylor J. Williams N. Green F. Lamrock and A. Briggs “Extensions of Health Economic Evaluations in R for Microsoft Excel Users: A Tutorial for Incorporating Heterogeneity and Conducting Value of Information Analyses” PharmacoEconomics vol. 41 pp. 21–32 2023 doi: 10.1007/s40273-022-01203-0 . S. Miksch C. Di Ciccio P. Soffer and B. Weber “Visual Analytics Meets Process Mining: Challenges and Opportunities” IEEE Computer Graphics and Applications vol. 44 no. 6 pp. 132–143 Nov.–Dec. 2024 doi: 10.1109/MCG.2024.3456916 . Additional Declarations No competing interests reported. Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-8299990","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":556800137,"identity":"f9e478a3-d3c9-4262-aa36-3e5865714d27","order_by":0,"name":"Rohan Magar","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAA4ElEQVRIiWNgGAWjYPCCAwwM7M1AgkFChgQtPMcSQFp4SNAikWMAYhHWott+/OLnioo78uY8Zz6/ulFjwcPAfvjoBnxazM7kFEueOfPMcGd77zbrnGNAh/Gkpd3Aq+VAToJkY9thxg1nzm4zzmEDapHgMcOv5fyb5J9ALfYbbuQ8M875R4yWG+nHQLYkArUwP85tI0rLGzbLhjOHkzecOWbGnNsnwcNG0C/n0x/fbKg4bLvhePPjzznf6uT42Q8fw6sFGBEGMBabBJjErxwE2B/AWMwfCKseBaNgFIyCkQgA3ylTLk1E3o4AAAAASUVORK5CYII=","orcid":"","institution":"","correspondingAuthor":true,"prefix":"","firstName":"Rohan","middleName":"","lastName":"Magar","suffix":""}],"badges":[],"createdAt":"2025-12-07 13:23:17","currentVersionCode":1,"declarations":{"humanSubjects":false,"vertebrateSubjects":false,"conflictsOfInterestStatement":true,"humanSubjectEthicalGuidelines":false,"humanSubjectConsent":false,"humanSubjectClinicalTrial":false,"humanSubjectCaseReport":false,"vertebrateSubjectEthicalGuidelines":false},"doi":"10.21203/rs.3.rs-8299990/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-8299990/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":97774832,"identity":"f555a90a-e9f0-47c9-9298-04835b3820e8","added_by":"auto","created_at":"2025-12-09 08:48:14","extension":"docx","order_by":0,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":172544,"visible":true,"origin":"","legend":"","description":"","filename":"DataAnalysisinExcelandRAComparativeEvaluationnew.docx","url":"https://assets-eu.researchsquare.com/files/rs-8299990/v1/b6080e0e0de1fd0925e9eb9b.docx"},{"id":97897131,"identity":"82bbae23-d184-48c2-bc0e-94ea13185ccf","added_by":"auto","created_at":"2025-12-10 15:37:28","extension":"json","order_by":1,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":3605,"visible":true,"origin":"","legend":"","description":"","filename":"3aa9bbbc8f2545518c9f35e6657c54ce.json","url":"https://assets-eu.researchsquare.com/files/rs-8299990/v1/e90c8aafdf2f5c1d00f368eb.json"},{"id":97774846,"identity":"d03090a6-32f7-4733-aff6-198b0ddd72cf","added_by":"auto","created_at":"2025-12-09 08:48:14","extension":"xml","order_by":2,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":83752,"visible":true,"origin":"","legend":"","description":"","filename":"3aa9bbbc8f2545518c9f35e6657c54ce1enriched.xml","url":"https://assets-eu.researchsquare.com/files/rs-8299990/v1/c720dab3b40daf216872fc91.xml"},{"id":97774836,"identity":"21d67d8c-d775-40d8-8c5a-29489bcb3878","added_by":"auto","created_at":"2025-12-09 08:48:14","extension":"jpeg","order_by":3,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":5686,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage1.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-8299990/v1/33eeb3e39489f18fd72ac9a5.jpeg"},{"id":97897991,"identity":"46f5af36-8ac2-4ef8-9cc2-f5fd841bb445","added_by":"auto","created_at":"2025-12-10 15:38:33","extension":"png","order_by":4,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":11352,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage2.png","url":"https://assets-eu.researchsquare.com/files/rs-8299990/v1/78b0c6e7f43e2e432ddb2f2a.png"},{"id":97774842,"identity":"b4e9654b-05a4-4ca0-854f-cddfde1feb30","added_by":"auto","created_at":"2025-12-09 08:48:14","extension":"png","order_by":5,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":26685,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage3.png","url":"https://assets-eu.researchsquare.com/files/rs-8299990/v1/1089c0f9cb51ff2ccaf5f996.png"},{"id":97774840,"identity":"76825997-022e-4481-9da0-716672250ce8","added_by":"auto","created_at":"2025-12-09 08:48:14","extension":"png","order_by":6,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":33832,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage4.png","url":"https://assets-eu.researchsquare.com/files/rs-8299990/v1/e5f57a010b364c763a5a9bd5.png"},{"id":97897605,"identity":"7e375a69-d1ff-4f4d-99b4-6f8d59b7deaf","added_by":"auto","created_at":"2025-12-10 15:37:59","extension":"png","order_by":7,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":1002,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage1.png","url":"https://assets-eu.researchsquare.com/files/rs-8299990/v1/5972e65d8304de88bb3c712c.png"},{"id":97774837,"identity":"bf970881-506e-4916-b9f2-6b7b6cafaa9b","added_by":"auto","created_at":"2025-12-09 08:48:14","extension":"png","order_by":8,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":7432,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage2.png","url":"https://assets-eu.researchsquare.com/files/rs-8299990/v1/74e1e5d3617655f75535a454.png"},{"id":97897810,"identity":"3bf4b43f-557e-4ccd-b6c0-731bec68af20","added_by":"auto","created_at":"2025-12-10 15:38:16","extension":"png","order_by":9,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":8708,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage3.png","url":"https://assets-eu.researchsquare.com/files/rs-8299990/v1/33fcc627dd26d2d2a10e0167.png"},{"id":97774844,"identity":"a6632f29-f8c6-4da8-bc06-5d84761b6953","added_by":"auto","created_at":"2025-12-09 08:48:14","extension":"png","order_by":10,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":12024,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage4.png","url":"https://assets-eu.researchsquare.com/files/rs-8299990/v1/156bb94fa37bcefb1fc92cd7.png"},{"id":97897037,"identity":"388f03e3-30c8-4c59-950a-274b3aca67da","added_by":"auto","created_at":"2025-12-10 15:37:22","extension":"xml","order_by":11,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":80325,"visible":true,"origin":"","legend":"","description":"","filename":"3aa9bbbc8f2545518c9f35e6657c54ce1structuring.xml","url":"https://assets-eu.researchsquare.com/files/rs-8299990/v1/0099330e268a6fd944135dc5.xml"},{"id":97774847,"identity":"37d90ae4-d00e-41c2-9a96-37b11f5a4aff","added_by":"auto","created_at":"2025-12-09 08:48:14","extension":"html","order_by":12,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":89896,"visible":true,"origin":"","legend":"","description":"","filename":"earlyproof.html","url":"https://assets-eu.researchsquare.com/files/rs-8299990/v1/382da66573249e017cfa95ad.html"},{"id":97774831,"identity":"6b588a54-4442-4a80-a8fc-ae98e4250dcc","added_by":"auto","created_at":"2025-12-09 08:48:14","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":33407,"visible":true,"origin":"","legend":"\u003cp\u003eHeatmap visualizing the correlation matrix in R\u003c/p\u003e","description":"","filename":"1.png","url":"https://assets-eu.researchsquare.com/files/rs-8299990/v1/5dc1ed7819380bb77ac4fc48.png"},{"id":97895117,"identity":"a958832c-b91a-4529-bc47-64d06b983076","added_by":"auto","created_at":"2025-12-10 15:33:37","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":117192,"visible":true,"origin":"","legend":"\u003cp\u003e(Excel ) Region-wise Sales and Profit\u003c/p\u003e","description":"","filename":"2.png","url":"https://assets-eu.researchsquare.com/files/rs-8299990/v1/47a0a4d4e8aac75928818fdb.png"},{"id":97897329,"identity":"70d4a716-e408-4d64-90c6-38cad6c35027","added_by":"auto","created_at":"2025-12-10 15:37:44","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":46588,"visible":true,"origin":"","legend":"\u003cp\u003e(R). Region-wise Sales and Profit\u003c/p\u003e","description":"","filename":"3.png","url":"https://assets-eu.researchsquare.com/files/rs-8299990/v1/a11cf5fd9099d05ac48a8034.png"},{"id":99314041,"identity":"3f6c710a-3696-48ee-abbe-4ae521a6562f","added_by":"auto","created_at":"2025-12-31 16:20:47","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":1037587,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-8299990/v1/6a802565-edbb-4065-b989-2359f8de26ec.pdf"}],"financialInterests":"No competing interests reported.","formattedTitle":"Data Analysis in Excel and R: A Comparative Evaluation","fulltext":[{"header":"Introduction","content":"\u003cp\u003eThe growing presence of massive and intricate datasets in business, healthcare, government services, and other areas has compounded the demand for analytical tools helpful to processing raw data into meaningful and credible information. The choice of analytical software has grown to be a significant methodological issue in the evaluation of outcomes and decisions made by organizations and researchers that heavily employ data analysis as an informational tool. Microsoft Excel has been one of the most popular data analysis tools since it is easy to access because of familiarity and a low entry level. Its inbuilt support spreadsheet format and graphical features allow the user with little statistical background to perform simple analysis and produce graphical results.\u003c/p\u003e\u003cp\u003eNevertheless, a significant amount of literature has posed questions on the process of spreadsheet-based analysis as to its reliability and strength. Inaccuracies in Excel's statistical functions are reported in the studies to be inconsistent in computational procedures and not allowing advanced modeling. Moreover, the spreadsheet analysis is extremely susceptible to the human factor since research has demonstrated that even well-constructed spreadsheets often harbor hidden faults and that the user of the sheet is always overconfident about the correctness of his or her work. The above problems compromise the reproducibility and validity of the analytical outcomes, especially in high-risk or complex procedures.\u003c/p\u003e\u003cp\u003eCompared to programming-oriented environments like R, contrast programming-oriented environments offer more advanced statistical modeling functions and raise reproducibility by executing the scripts after a programming language. R is also open-source, allowing more complex exploratory and inferential analysis to be performed. Regardless of these benefits, comparatively small amounts of empirical research directly compare Excel and R in terms of their use on the same tasks of analysis, leaving a gap in knowledge about the differences between these tools in their practical use and reliability in delivering analytical results.\u003c/p\u003e\u003cp\u003eThe research paper will fill this gap by performing a systematic comparative analysis of Excel and R based on a dataset with 5901 records. The same analytical measures involving data cleaning, descriptive statistics, correlation analysis, regression modeling, and visualization were in both environments to determine their effectiveness, efficiency, and possible error sources. The results are expected to deliver evidence-based knowledge that will assist the users in making the correct choices among the tools in accordance with their complexity and analytical requirements.\u003c/p\u003e"},{"header":"Literature Review","content":"\u003cp\u003eThe increasing access to big and complicated data in business, medical services, and government programs has increased the necessity to apply analytical applications that might transform raw data into knowledge that is usable. The literature on big data emphasizes the increased reliance of the organization on analytics to optimize prediction and support decision-making using a set of semi-structured and unstructured data that can be structured or unstructured [\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e] [\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e]. An example of this is in finance, where big data analytics and machine learning have moved into the center of attention in anomalous behavior detection and fraud prevention in large-volume transactional settings [\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e]. In other areas like process mining and event-log analysis, the integration of visual analytics and algorithmics can be beheld to comprehend the behavior of a complex workflow and the system [\u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e19\u003c/span\u003e]. These innovations are accompanied by the development of real-time and advanced data analysis that underlines the necessity of the tools that would be able to handle high-velocity and continuous streams of data. Considered in this larger framework, Microsoft Excel and the R programming language have been recognized as two highly utilized environments for teaching and developing methodological applied data analysis.\u003c/p\u003e\u003cdiv id=\"Sec3\" class=\"Section2\"\u003e\u003ch2\u003eExcel as a Platform for the Applied Data Analysis\u003c/h2\u003e\u003cp\u003eExcel is still pervasive in education and practice because many people have heard of it, it is widely available, and it is relatively easy to learn. Studies in applied fields such as strength and conditioning have shown that coaches can use Excel to compute reliability indices (e.g., coefficient of variation), smallest worthwhile change effect sizes, and correlations, thus linking testing data to training decisions without specialized software [\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e]. In engineering and reliability analysis, Excel-based linear regression and distribution fitting workflows have been proposed for identifying appropriate lifetime or failure time distributions using correlation-based criteria, with the advantage that the entire procedure can be implemented through standard spreadsheet functions and charts [\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e]. Basic statistics tutorials also demonstrate the method of performing descriptive measures (means, medians, and measures of dispersion) and graphical summaries in Excel in line with general statistical theory [\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e].\u003c/p\u003e\u003cp\u003ePedagogical research suggests that many learners have limited experience using Excel for data analysis initially, but a short-duration focused training boosts their competence as well as their intention to use Excel for future analytical work [\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e]. Additional methodological contributions are given, which show how to use Excel for more advanced tasks. For example [\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e], present step-by-step spreadsheets for meta-analysis including both fixed and random-effects models and custom forest plots, making evidence synthesis accessible to users that only have Excel [\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e]. In time-series applications Excel's FORECAST.ETS function has been compared with several SAS forecasting tools, highlighting that while Excel can implement the exponential smoothing methods in a convenient way, more advanced platforms better support complex time series families such as ARIMA and state space models [\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e].\u003c/p\u003e\u003cp\u003eHowever, several authors identify substantive limitations of the statistical capabilities of Excel. An extensive evaluation of Excel 2010 concludes that though Microsoft improved a number of statistical functions and the random number generator compared to previous versions, inaccuracies and inconsistencies still exist in comparison with reference implementations and dedicated statistical software [\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e]. Similar concerns emerge in the time series comparison in which Excel is made out to be appropriate for simpler forms of exponential smoothing but less suitable for the more complex families of models found in special-purpose software [\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e]. These results indicate that, while Excel can be a useful teaching and prototyping environment, one must be careful about using this program for advanced and/or high-stake statistical analysis.\u003c/p\u003e\u003c/div\u003e\n\u003ch3\u003eSpreadsheet Risk: Human Error \u0026 Overconfidence\u003c/h3\u003e\n\u003cp\u003eBeyond algorithmic accuracy, spreadsheet-based analysis is prone to human error. Panko's synthesis of evidence from both spreadsheet research and work on broader human error issues indicates that even with low per-cell error rates, the likelihood that a large spreadsheet has at least one significant error is very high [\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e]. Empirical studies also indicate that such errors are hard to identify and that developers as a rule overestimate the correctness of their own models [\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e]. Building on this [\u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e]. Explore end-user overconfidence in spreadsheet development and demonstrate how teaching error taxonomies that illustrate real \"spreadsheet horror stories\" and introducing best practice guidelines can help reduce the rates of error as well as misplaced confidence among student developers [\u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e]. Together these studies emphasize that using Excel for data analysis is not only a technical question but also a problem of human factors, in which aspects such as testing, auditing, and user education are quite important.\u003c/p\u003e\n\u003ch3\u003eR and Today Software Tools for Data Analysis\u003c/h3\u003e\n\u003cp\u003eIn parallel with continued use of Excel, R and other programming environments have become the central part of modern statistics and data science. Comparative work on software for teaching analytics points out that Python and R are lush ecosystems for mathematics, statistics, and data analysis, offering bountiful open-source libraries and interactive workflows [\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e]. A broader comparison of software for analysis for instruction uses a Task-Technology Fit framework. Excel, Python, and R are evaluated for common data tasks (including reading, preprocessing, descriptive statistics, probability distributions, hypothesis testing, and regression) [\u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e15\u003c/span\u003e]. This work concludes that although Excel has a lot of basic things already, which can be accomplished with menu-driven functions, R and Python offer greater flexibility, better support for complex modeling, and richer capabilities for reproducible workflows.\u003c/p\u003e\u003cp\u003eThe strength of R is also demonstrated by contributions on the level of packages. The SmartEDA package automates a large part of exploratory data analysis by classifying variables to automatically generate descriptive statistics and data visualizations and compute measures such as information value and weight of evidence, reducing manual coding efforts and the risk of ad hoc mistakes [\u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e]. In health economics [\u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e17\u003c/span\u003e], they discuss the limitations of Excel and specialist GUI software (such as TreeAge) in analyzing model complexity, uncertainty, and reproducibility in health technology assessment; they advocate the use of modern programming languages such as R and Python to build clinically realistic models, quantify decision uncertainty, and create transparent, reproducible analyses [\u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e17\u003c/span\u003e].\u003c/p\u003e\u003cp\u003e\u003cb\u003eR vs. Excel: Comparative Evidence.\u003c/b\u003e\u003c/p\u003e\u003cp\u003eUnder a head-to-head comparison of Excel vs. R on specific tasks, it shows the overlap and also shows the differences between the two tools. Das et al. investigate, in the context of the Poisson distribution, simulation using Excel and R for comparison of the mean squared error for parameter estimates versus sample size and parameter values [\u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e]. Their results suggest that Excel is in some settings able to perform comparably or even superiorly, but R generally provides more control when it comes to simulation and distributional assumptions [\u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e]. When combined with the documented problems with the accuracy of Excel's statistical functions [\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e] and plenty of evidence of spreadsheet error and overconfidence risks [\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e] [\u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e], you can see that these findings bolster the case for R's superiority with advanced simulation modeling and high-stakes inference.\u003c/p\u003e\u003cp\u003eAt the same time, Excel doesn't lose its important advantages. It is ubiquitous, easy to learn, and well suited for smaller data sets, quick exploratory works, and communication with non-technical stakeholders. [\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e][\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e][\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e]. R, in contrast, has a steeper initial learning curve but allows more sophisticated analytics capabilities combined with better integration with big data and visual analytics workflows and better support for reproducibility and automation. [\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e][\u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e19\u003c/span\u003e].\u003c/p\u003e\n\u003ch3\u003eSynthesis and Gap\u003c/h3\u003e\n\u003cp\u003eOverall, the literature introduces Excel and R as complementary but not mutually exclusive tools: Excel is far better (pun intended) at accessibility and interactive mutually exclusive exploration in a short time, whereas R has depth (flexibility and complexity) capability for complex analyses. Existing studies, however, tend to focus either on Excel (accuracy, human error, teaching) or R and other programming languages (capabilities, automation, HTA, EDA) with relatively few integrated empirical evaluations that compare Excel and R across the full lifecycle of data analysis from data cleaning and exploration through modeling, simulation, and reporting under realistic user conditions. The present study thus serves a clear gap in curating a structured comparative analysis on data analysis in Excel and R from data informed by evidence about accuracy and usability, human error, and domain-specific applications.\u003c/p\u003e"},{"header":"Methodology","content":"\u003cp\u003eThe research used the comparative analysis method to assess the efficacy of Microsoft Excel and the R programming language in performing general data analysis duties. The aim of the methodology was to use the same procedure of the analytical in the two settings, such that the variation in performance, accuracy, and reliability can be ascribed to the tools instead of the process variation. Each of the steps was implemented in a manner that was created to indicate realistic workflows that are often employed in academic and business analytics environments.\u003c/p\u003e\u003cdiv id=\"Sec8\" class=\"Section2\"\u003e\u003ch2\u003eResearch Approach\u003c/h2\u003e\u003cp\u003eThe study adopted an experimental design whereby both Excel and R were independently applied to analyze the same dataset (5901 records). Every tool was assigned to carry out a sequence of routine analytical functions such as cleaning of data, descriptive statistics, correlation analysis, regression modeling, and visualization. The parallel workflow of the study also provided a level of control to the basis of the comparison, enabling relevant assessment of the strengths and limitations of each of the tools.\u003c/p\u003e\u003c/div\u003e\n\u003ch3\u003eDataset Description\u003c/h3\u003e\n\u003cp\u003eThe dataset of this research is the Superstore Sales dataset that is frequently used as a study and analysis tool. It holds comprehensive transactional data of a retail superstore in terms of the customer demographics, product category, shipping data, and financial results (sales, discount, quantity, and profit). The dataset is quite appropriate to conduct this comparative study since it:\u003c/p\u003e\u003cp\u003e\u003cul\u003e\u003cli\u003e\u003cp\u003eis a combination of a numerical and categorical variable,\u003c/p\u003e\u003c/li\u003e\u003cli\u003e\u003cp\u003eis a realistic business operation,\u003c/p\u003e\u003c/li\u003e\u003cli\u003e\u003cp\u003ehas adequate volume (5,901 rows) to measure differences in performance with it,\u003c/p\u003e\u003c/li\u003e\u003cli\u003e\u003cp\u003ecompanies' correlation, regression, and visual analyses.\u003c/p\u003e\u003c/li\u003e\u003c/ul\u003e\u003c/p\u003e\u003cp\u003eFields such as Order Date, Ship Date, Region, Category, Sub-Category, Sales, Discount, Quantity, and Profit are the key ones. The richness and variety of these variables offer a sufficient basis to compare the math power in Excel and R.\u003c/p\u003e\n\u003ch3\u003eData Preparation\u003c/h3\u003e\n\u003cp\u003eThe data was first of all imported into both Excel and R, and the same cleaning procedures were performed there. These processes involved authenticating data types, uniform date forms, and absence of missing or disagreeable values, and analyzing the existence of outliers in the numerical data of Sales, Quantity, Discount, and Profit. Although these ones needed a mix of manual and formula-based analyses in Excel, the work was performed in R in brief script-based commands. This difference gave a chance to make an early observation of process automation and repeatability differences between the two settings.\u003c/p\u003e\u003cdiv id=\"Sec11\" class=\"Section2\"\u003e\u003ch2\u003eAnalytical Procedures\u003c/h2\u003e\u003cdiv id=\"Sec12\" class=\"Section3\"\u003e\u003ch2\u003eDescriptive Statistics\u003c/h2\u003e\u003cp\u003eKey variables had summary statistics produced in the form of means, medians, standard deviations, and frequency distributions. Excel used some inbuilt functions and PivotTables as opposed to tidyverse ecosystem functions used in R. This step showed the difference in the philosophy of the two tools: the interactive and user-driven nature of the Excel tool versus the code-based nature of R.\u003c/p\u003e\u003c/div\u003e\u003c/div\u003e\u003cdiv id=\"Sec13\" class=\"Section2\"\u003e\u003ch2\u003eCorrelation Analysis\u003c/h2\u003e\u003cp\u003ePearson correlation coefficients were calculated to investigate the relationship between the numerical variables. This was done with the Data Analysis Toolpak available in Excel and R with the cor and corrplot functions with visualization packages. The analysis gave an understanding of the ability of each tool to be statistically accurate or to be able to communicate graphs.\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec14\" class=\"Section2\"\u003e\u003ch2\u003eRegression Modeling\u003c/h2\u003e\u003cp\u003eTo predict the profit, a multilinear regression model was developed with sales, quantity, and discount as the predictors of the profit. Both R and Excel were able to give estimates of the coefficients, the level of significance, and the model fit (R\u0026sup2;). The comparisons of the outputs made it possible to compare the precision of computations, the management of residues, and the diagnostic information provided by each of the tools.\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec15\" class=\"Section2\"\u003e\u003ch2\u003eVisualization\u003c/h2\u003e\u003cp\u003eThere were no differences in the production of identical visualization forms such as histograms, scatterplots, and category-by-category representations in the two environments. Excel charting was also contrasted with the ggplot2 package of R, especially in its flexibility, legibility, and reproducibility.\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec16\" class=\"Section2\"\u003e\u003ch2\u003eEvaluation Criteria\u003c/h2\u003e\u003cp\u003eFive criteria were used to measure the performance of Excel and R:\u003c/p\u003e\u003cp\u003e\u003cstrong\u003eAccuracy\u003c/strong\u003e\u003cp\u003eDid both tools give similar and statistically accurate results?\u003c/p\u003e\u003c/p\u003e\u003cp\u003e\u003cstrong\u003eReproducibility\u003c/strong\u003e\u003cp\u003eIt was measured by how far one could repeat analyses without a difference or error.\u003c/p\u003e\u003c/p\u003e\u003cp\u003e\u003cb\u003eEfficiency\u003c/b\u003e The number of steps, time, and automation needed to accomplish each task.\u003c/p\u003e\u003cp\u003e\u003cb\u003eComplexity Processing\u003c/b\u003e: Both tools have the capability to handle the data, perform multi-stage analysis, and do advanced modeling.\u003c/p\u003e\u003cp\u003e\u003cstrong\u003ePossibility of human error\u003c/strong\u003e\u003cp\u003eThe possibility of errors that might be introduced by the user, especially on formula-driven operations in Excel as compared to script-driven performance in R.\u003c/p\u003e\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec17\" class=\"Section2\"\u003e\u003ch2\u003eTool Specifications\u003c/h2\u003e\u003cp\u003eMicrosoft Excel (Office 365/Excel 2021 or further) and R version 4.x were used to run the analyses. The R packages were tidyverse, ggplot2, lubridate, and corrplot. These settings were also chosen due to their ubiquity in the field of academic and professional analytics.\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec18\" class=\"Section2\"\u003e\u003ch2\u003eMethodological Limitations\u003c/h2\u003e\u003cp\u003eEven though comparison was controlled by the methodology, there are some limitations. Excel can also have different results depending on how the user uses it, whereas R has programming designs that are dependent on the skills that require one to know how to program. In addition, the analysis is based on a single dataset and a series of generic analytical actions; the findings could also be diverse under circumstances of very large-scale data, real-time analysis, or customized models.\u003c/p\u003e\u003c/div\u003e"},{"header":"Results","content":"\u003cp\u003eThis section presents the findings obtained from applying identical analytical procedures in Microsoft Excel and R using the Superstore data set having 5901 records. The results illustrate where both of the tools give consistent output and where differences arise between the tools with respect to analysis depth, efficiency, and visualization abilities.\u003c/p\u003e\u003cdiv id=\"Sec20\" class=\"Section2\"\u003e\u003ch2\u003eDescriptive Statistics\u003c/h2\u003e\u003cp\u003eBoth Excel and R calculated the same statistical summary for Sales, Profit, and Quantity, proving 100% computational agreement between these two software. As shown in Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e, sales and profit have a lot of variability, whereas it is very concentrated around zero.\u003c/p\u003e\u003cp\u003e\u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab1\" border=\"1\"\u003e\u003ccaption language=\"En\"\u003e\u003cdiv class=\"CaptionNumber\"\u003eTable 1\u003c/div\u003e\u003cdiv class=\"CaptionContent\"\u003e\u003cp\u003eDescriptive Statistics for Numerical Variables (Excel vs. R)\u003c/p\u003e\u003c/div\u003e\u003c/caption\u003e\u003ccolgroup cols=\"5\"\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e\u003cthead\u003e\u003ctr\u003e\u003cth align=\"left\" colname=\"c1\"\u003e\u003cp\u003eVariable\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c2\"\u003e\u003cp\u003eTool\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c3\"\u003e\u003cp\u003eMean\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c4\"\u003e\u003cp\u003eMedian\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c5\"\u003e\u003cp\u003eSD\u003c/p\u003e\u003c/th\u003e\u003c/tr\u003e\u003c/thead\u003e\u003ctbody\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eSales\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eExcel\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e265.34\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e128.64\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\u003cp\u003e474.26\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eR\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e265.34\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e128.64\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\u003cp\u003e474.26\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eProfit\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eExcel\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e29.700\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e8.50\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\u003cp\u003e259.58\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eR\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e29.70\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e8.50\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\u003cp\u003e259.58\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eQuantity\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eExcel\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e3.78\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e3\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\u003cp\u003e2.21\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eR\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e3.78\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e3\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\u003cp\u003e2.21\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003c/tbody\u003e\u003c/colgroup\u003e\u003c/table\u003e\u003c/div\u003e\u003c/p\u003e\u003cp\u003eWhile the numbers themselves were identical, the process used to get them was very different. Excel involved several manual operations that used formulas and PivotTables, while in R you simply executed the same results using scripted commands. This means that while Excel is good for simple summarization, R is better in terms of efficiency and reproducibility for descriptive analysis.\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec21\" class=\"Section2\"\u003e\u003ch2\u003eCorrelation Analysis\u003c/h2\u003e\u003cp\u003ePearson correlation coefficients done in both the Excel software and the R software proved to be identical for all the variable pairs. The results, presented in Table\u0026nbsp;\u003cspan refid=\"Tab2\" class=\"InternalRef\"\u003e2\u003c/span\u003e, have an overall similar result: there is a moderate positive correlation between sales and profit and a strong negative correlation between discount and profit. From this we can infer that increased discounting is a reliable way of reducing profitability.\u003c/p\u003e\u003cp\u003e\u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab2\" border=\"1\"\u003e\u003ccaption language=\"En\"\u003e\u003cdiv class=\"CaptionNumber\"\u003eTable 2\u003c/div\u003e\u003cdiv class=\"CaptionContent\"\u003e\u003cp\u003eComparison of Pearson Correlation Matrices in R and Excel\u003c/p\u003e\u003c/div\u003e\u003c/caption\u003e\u003ccolgroup cols=\"9\"\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c7\" colnum=\"7\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c8\" colnum=\"8\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c9\" colnum=\"9\"\u003e\u003c/div\u003e\u003cthead\u003e\u003ctr\u003e\u003cth align=\"left\" colname=\"c1\"\u003e\u003cp\u003eR Programming\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c2\"\u003e\u003cp\u003eSales\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c3\"\u003e\u003cp\u003eQuantity\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c4\"\u003e\u003cp\u003eProfit\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c5\"\u003e\u0026nbsp;\u003c/th\u003e\u003cth align=\"left\" colname=\"c6\"\u003e\u003cp\u003eExcel\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c7\"\u003e\u003cp\u003eSales\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c8\"\u003e\u003cp\u003eQuantity\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c9\"\u003e\u003cp\u003eProfit\u003c/p\u003e\u003c/th\u003e\u003c/tr\u003e\u003c/thead\u003e\u003ctbody\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eSales\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\u003cp\u003e1.0000\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e0.2024\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e0.3259\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u0026nbsp;\u003c/td\u003e\u003ctd align=\"left\" colname=\"c6\"\u003e\u003cp\u003eSales\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e\u003cp\u003e1.0000\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e\u003cp\u003e0.2024\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e\u003cp\u003e0.3259\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eQuantity\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\u003cp\u003e0.2024\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e1.0000\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e0.0748\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u0026nbsp;\u003c/td\u003e\u003ctd align=\"left\" colname=\"c6\"\u003e\u003cp\u003eQuantity\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e\u003cp\u003e0.2024\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e\u003cp\u003e1.0000\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e\u003cp\u003e0.0748\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eProfit\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\u003cp\u003e0.3259\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e0.0748\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e1.0000\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u0026nbsp;\u003c/td\u003e\u003ctd align=\"left\" colname=\"c6\"\u003e\u003cp\u003eProfit\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e\u003cp\u003e0.3259\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e\u003cp\u003e0.0748\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e\u003cp\u003e1.0000\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003c/tbody\u003e\u003c/colgroup\u003e\u003c/table\u003e\u003c/div\u003e\u003c/p\u003e\u003cp\u003eAlthough Excel offered a simple numerical correlation matrix, R created a heatmap visualization \u003cb\u003e(\u003c/b\u003eFig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e\u003cb\u003e)\u003c/b\u003e of the relationships that made the data easier to interpret. The heatmap was able to clearly show the effect and this expected positive relationship between sales volume and profit. This goes to show R's edge when it comes to visually communicating analytical findings.\u003c/p\u003e\u003cp\u003e\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec22\" class=\"Section2\"\u003e\u003ch2\u003eRegression Modeling\u003c/h2\u003e\u003cp\u003eA multiple linear regression model to predict profit by sales and quantity was estimated in both tools. The coefficients, present in Table\u0026nbsp;\u003cspan refid=\"Tab3\" class=\"InternalRef\"\u003e3\u003c/span\u003e, were the same in Excel and R. Sales proved to be an important positive predictor of profit, strongly negatively affecting profit. Quantity had a positive relationship but was smaller than the other predictors.\u003c/p\u003e\u003cp\u003e\u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab3\" border=\"1\"\u003e\u003ccaption language=\"En\"\u003e\u003cdiv class=\"CaptionNumber\"\u003eTable 3\u003c/div\u003e\u003cdiv class=\"CaptionContent\"\u003e\u003cp\u003eComparison of Regression Results in Excel and R \u003cb\u003eDependent Variable: Profit\u003c/b\u003e\u003c/p\u003e\u003c/div\u003e\u003c/caption\u003e\u003ccolgroup cols=\"9\"\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c7\" colnum=\"7\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c8\" colnum=\"8\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c9\" colnum=\"9\"\u003e\u003c/div\u003e\u003cthead\u003e\u003ctr\u003e\u003cth align=\"left\" colname=\"c1\"\u003e\u003cp\u003ePredictor\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c2\"\u003e\u003cp\u003eExcel: Estimate (β)\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c3\"\u003e\u003cp\u003eR: Estimate (β)\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c4\"\u003e\u003cp\u003eExcel SE\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c5\"\u003e\u003cp\u003eR SE\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c6\"\u003e\u003cp\u003eExcel\u003c/p\u003e\u003cp\u003et-value\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c7\"\u003e\u003cp\u003eR t-value\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c8\"\u003e\u003cp\u003eExcel\u003c/p\u003e\u003cp\u003ep-value\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c9\"\u003e\u003cp\u003eR\u003c/p\u003e\u003cp\u003ep-value\u003c/p\u003e\u003c/th\u003e\u003c/tr\u003e\u003c/thead\u003e\u003ctbody\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003e\u003cb\u003eIntercept\u003c/b\u003e\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\u003cp\u003e\u0026minus;21.4314\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e\u0026minus;21.4314\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e6.3653\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\u003cp\u003e6.3653\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e\u003cp\u003e\u0026minus;3.3669\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e\u003cp\u003e\u0026minus;3.367\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c8\"\u003e\u003cp\u003e0.000765\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c9\"\u003e\u003cp\u003e0.000765\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003e\u003cb\u003eSales\u003c/b\u003e\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\u003cp\u003e0.17738\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e0.17738\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e0.00688\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\u003cp\u003e0.00688\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e\u003cp\u003e25.7808\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e\u003cp\u003e25.781\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c8\"\u003e\u003cp\u003e5.682E-139\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c9\"\u003e\u003cp\u003e\u0026lt;\u0026thinsp;2e-16\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003e\u003cb\u003eQuantity\u003c/b\u003e\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\u003cp\u003e1.07510\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e1.07510\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e1.47452\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\u003cp\u003e1.47452\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e\u003cp\u003e0.7291\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e\u003cp\u003e0.729\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c8\"\u003e\u003cp\u003e0.46595\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c9\"\u003e\u003cp\u003e0.46595\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003c/tbody\u003e\u003c/colgroup\u003e\u003c/table\u003e\u003c/div\u003e\u003c/p\u003e\u003cp\u003eModel Fit Comparison\u003c/p\u003e\u003cp\u003e\u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"No\" id=\"Taba\" border=\"1\"\u003e\u003ccolgroup cols=\"3\"\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e\u003cthead\u003e\u003ctr\u003e\u003cth align=\"left\" colname=\"c1\"\u003e\u003cp\u003eStatistic\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c2\"\u003e\u003cp\u003eExcel\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c3\"\u003e\u003cp\u003eR\u003c/p\u003e\u003c/th\u003e\u003c/tr\u003e\u003c/thead\u003e\u003ctbody\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003e\u003cb\u003eMultiple R\u003c/b\u003e\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e0.3260\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e0.3260\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003e\u003cb\u003eR Square\u003c/b\u003e\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e0.1063\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e0.1063\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003e\u003cb\u003eAdjusted R Square\u003c/b\u003e\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e0.1060\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e0.1060\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003e\u003cb\u003eStandard Error (Residual SE)\u003c/b\u003e\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e245.4459\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e245.4\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003e\u003cb\u003eF-Statistic\u003c/b\u003e\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e350.7691\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e350.8\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003e\u003cb\u003eSignificance F / p-value\u003c/b\u003e\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e1.1559E-144\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e\u0026lt;\u0026thinsp;2.2e-16\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003e\u003cb\u003eObservations\u003c/b\u003e\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e5901\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e5901\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003c/tbody\u003e\u003c/colgroup\u003e\u003c/table\u003e\u003c/div\u003e\u003c/p\u003e\u003cp\u003eAlthough the basic regression output was the same, R added other information about the regression, such as plots of residual plots, checking the assumptions of the model, and doing significance tests. Excel provided nothing more than the basic regression table without adding additional steps manually or through external add-ins. These differences reflect R's superior ability to conduct advanced statistical modeling.\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec23\" class=\"Section2\"\u003e\u003ch2\u003eVisualization Comparison\u003c/h2\u003e\u003cp\u003eBoth Excel and R easily produced visualizations such as histograms, scatterplots, and comparisons by category. However, the quality and the depth of analysis of the outputs were different. Excel's visualizations were simple and appropriate for business reporting, but manual formatting was a required step to help make data clearer.\u003c/p\u003e\u003cp\u003e\u003c/p\u003e\u003cp\u003e\u003c/p\u003e\u003cp\u003eR's ggplots for visualization were more refined and consistent with superior customization and interpretability. For example, the column and line chart of sales vs. profit in Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003e \u003cb\u003e(Excel)\u003c/b\u003e looks functional but minimal, while the analogous plot in Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003e \u003cb\u003e(R)\u003c/b\u003e includes smoothing lines, confidence intervals, and better aesthetics that can be used to see underlying patterns much better.\u003c/p\u003e\u003cdiv id=\"Sec24\" class=\"Section3\"\u003e\u003ch2\u003eComparative Summary\u003c/h2\u003e\u003cp\u003e\u0026middot; A comparative analysis between two tools shows some important observations:\u003c/p\u003e\u003cp\u003e\u0026middot; Accuracy: Both Excel and R came up with the same statistical values in all the tests.\u003c/p\u003e\u003cp\u003e\u0026middot; Reproducibility: We get the full reproducibility from R scripting; Excel highly depends upon manual operations.\u003c/p\u003e\u003cp\u003e\u0026middot; Efficiency: Fewer steps were needed by R for correlation, regression, and visualization.\u003c/p\u003e\u003cp\u003e\u0026middot; Visualization Quality: More flexible and publication-quality figures were created with R.\u003c/p\u003e\u003cp\u003e\u0026middot; Error Risk: 1. Excel - Manual formulas: reveals that they were vulnerable to users' errors. 2. R\u0026mdash;Minimize this by scripted workflow.\u003c/p\u003e\u003cp\u003e\u0026middot; Complexity Handling: Handling of advanced modeling and diagnostic analysis was better in R.\u003c/p\u003e\u003cp\u003e\u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab4\" border=\"1\"\u003e\u003ccaption language=\"En\"\u003e\u003cdiv class=\"CaptionNumber\"\u003eTable 4\u003c/div\u003e\u003cdiv class=\"CaptionContent\"\u003e\u003cp\u003eComparative Tasks in Excel and R\u003c/p\u003e\u003c/div\u003e\u003c/caption\u003e\u003ccolgroup cols=\"3\"\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e\u003cthead\u003e\u003ctr\u003e\u003cth align=\"left\" colname=\"c1\"\u003e\u003cp\u003eTask\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c2\"\u003e\u003cp\u003eExcel\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c3\"\u003e\u003cp\u003eR\u003c/p\u003e\u003c/th\u003e\u003c/tr\u003e\u003c/thead\u003e\u003ctbody\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eDescriptive Statistics\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eManual formulas / PivotTable\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003eOne script line\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eCorrelation\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eToolPak required\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003ecor() function\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eRegression\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eLimited output\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003eFull statistical model\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eHeatmap\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eNot available\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003ecorrplot()\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eVisualization\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eBasic charts\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003ePublication-quality\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eReproducibility\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eLow\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003eHigh\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eEfficiency\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eSlow for many tasks\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003eFast\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eError Risk\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eHigh\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003eLow\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eHandling Large Data\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eWeak\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003eStrong\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003c/tbody\u003e\u003c/colgroup\u003e\u003c/table\u003e\u003c/div\u003e\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec25\" class=\"Section3\"\u003e\u003ch2\u003e\u003c/h2\u003e\u003cdiv id=\"Sec26\" class=\"Section4\"\u003e\u003ch2\u003eOverall Findings\u003c/h2\u003e\u003cp\u003eThe results show that Excel is well-suited to introductory and low-complexity analytical tasks, especially for users that like to use an interactive interface. However, R was always superior to Excel in terms of analytical depth, reproducibility, automation, and quality of visualization. These findings show that although Excel is still useful for simple analysis, R is more reliable and powerful for more advanced data analysis and statistical modeling.\u003c/p\u003e\u003c/div\u003e\u003c/div\u003e\u003c/div\u003e"},{"header":"Discussion","content":"\u003cp\u003eThe aim in this study was to compare Microsoft Excel and R systematically to find out which is more effective in a general data analysis workflow using a common dataset of 5,901 Superstore records. Although both tools gave the same number as the output for descriptive statistics, correlation, and regression, significant differences were found in the reproducibility evaluation, workflow design, automation capacity, and quality of visualization.\u003c/p\u003e\u003cp\u003eThe descriptive statistics obtained in Excel and R were numerically the same, indicating that the basic measures produced by both tools are accurate. This is in agreement with findings reported in the literature that Excel is fit for elementary statistical tasks and business reporting applications [\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e], [\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e]. However, the process used to arrive at these results was very different. Excel took multiple manual steps (using formulas or PivotTables), whereas R gave the same results using a single scripted command. This requirement of manual operations in Excel results in more likely user-based errors, which are well documented in spreadsheet error research [\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e], [\u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e].\u003c/p\u003e\u003cp\u003eCorrelation analysis also showed the complete numerical agreement between the Data Analysis ToolPak in Microsoft Excel and the cor() function in the statistical program R. However, R supported more sophisticated visualizations, such as correlation heatmaps, which allowed for more straightforward interpretation of relationships. The fact that R is capable of communicating statistical patterns visually is in agreement with the contemporary views on the importance of visual analytics in data interpretation [\u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e19\u003c/span\u003e].\u003c/p\u003e\u003cp\u003eRegression modeling results were the same on both tools, confirming that Excel's regression engine and R's lm() function make equivalent coefficient estimates, significance levels, and model-fit statistics. Nevertheless, R had rich diagnostic output such as residual plots and various assumption checks and model fit summaries, whereas Excel has simply the basic regression tables unless extra steps or add-ins were used. This supports earlier comparisons that found programming-based tools are more flexible and reproducible and more diagnostic than spreadsheet environments [\u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e15\u003c/span\u003e], [\u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e17\u003c/span\u003e].\u003c/p\u003e\u003cp\u003eThe best contrast was produced by visualization. While charts created in Excel were functional and good enough for business applications, R produced publication-quality charts with better customization capabilities along with smoothing and interpretability using the ggplot2 package. These findings are in line with past research pointing to the strength of R in exploring data and the automatic production of graphs [\u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e].\u003c/p\u003e\u003cp\u003eOverall, with the discussion, it appears that Excel is good for beginners and tasks with low complexity, while R provides clear advantages in automation, advanced modeling, reducing error, visualization capability, and reproducibility. These results add weight to concerns about the risk of spreadsheets raised in the existing literature [\u003cspan additionalcitationids=\"CR11\" citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e]-[\u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e] and provide support for broader recommendations to use programming-based tools when data analysis requirements are so demanding that, as appealing and powerful as spreadsheets are to analysts [\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e], [\u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e17\u003c/span\u003e].\u003c/p\u003e"},{"header":"Conclusion","content":"\u003cp\u003eThis study offered a well-organized comparison of Microsoft Excel and R based on the same data analysis procedures, such as data cleaning, descriptive statistics, correlation analysis, regression modeling, and visualization. Although both tools were found to give identical numerical results on all of the statistical computations, considerable differences appeared in the workflow efficiency, reproducibility, diagnostic capabilities, and visualization quality.\u003c/p\u003e\u003cp\u003eExcel showed a strength in terms of usability, accessibility, and suitability for simple statistical summaries and business-specific graphical outputs. Its interface is available to the user with little statistical training and provides the ability to perform fundamental analysis with relative ease. However, Excel relies on manual operations, which increases the chances of human error and limits the reproducibility, especially if different analytical steps are needed.\u003c/p\u003e\u003cp\u003eIn comparison, R provided a better and more scalable analytical environment. Its script-based workflow guarantees complete reproducibility, reduces user-induced error, and allows for considerable flexibility for sophisticated modeling and/or diagnostic evaluation. R also provided better visualization with automated and publication-quality graphics. These, alongside R's versatility, flexibility, scalability, and ability to work with large datasets, make R particularly suitable for academic research, complex modelling, and any application that involves using big data.\u003c/p\u003e\u003cp\u003eFinally, Excel is good when one has to do intro partite analytics and quickie, one-off exploratory work, whereas R is stronger and sure in its applications, reproducible work processes, and standard data-inspect tasks of data scientists. Users and organizations are advised to choose the tool according to the complexity of the analysis, requirements for reproducibility, and the level of skill of the analyst.\u003c/p\u003e"},{"header":"Declarations","content":"\u003ch2\u003eAuthor Contribution\u003c/h2\u003e\u003cp\u003eR.M. conceived the study, designed the methodology, performed the data analysis in both Excel and R, generated all tables and figures, interpreted the results, and wrote the full manuscript. R.M. reviewed and approved the final version of the manuscript.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\u003cli\u003e\u003cspan\u003eD. Darwish \u0026ldquo;Big Data Issues: Analytics and Security\u0026rdquo; in \u003cem\u003eEncyclopedia of Information Science and Technology\u003c/em\u003e 6th ed. Hershey PA USA: IGI Global n.d. ch. 20 doi: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.4018/978-1-6684-7366-5.ch020\u003c/span\u003e\u003cspan address=\"10.4018/978-1-6684-7366-5.ch020\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eC. A. Udeh O. H. Orieno O. D. Daraojimba N. L. Ndubuisi and O. I. Oriekhoe \u0026ldquo;Big Data Analytics: A Review of Its Transformative Role in Modern Business Intelligence\u0026rdquo; \u003cem\u003eComputer Science\u003c/em\u003e \u0026amp; \u003cem\u003eIT Research Journal\u003c/em\u003e vol. 5 no. 1 pp. 219\u0026ndash;236 Jan. 2024 doi: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.51594/csitrj.v5i.718\u003c/span\u003e\u003cspan address=\"10.51594/csitrj.v5i.718\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eP. O. Shoetan A. T. Oyewole C. C. Okoye and O. C. Ofodile \u0026ldquo;Reviewing the Role of Big Data Analytics in Financial Fraud Detection\u0026rdquo; \u003cem\u003eFinance \u0026amp; Accounting Research Journal\u003c/em\u003e vol. 6 no. 3 pp. 384\u0026ndash;394 Mar. 2024 doi: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.51594/farj.v6i3.899\u003c/span\u003e\u003cspan address=\"10.51594/farj.v6i3.899\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eA. Turner J. Brazier C. Bishop S. Chavda J. Cree and P. Read \u0026ldquo;Data Analysis for Strength and Conditioning Coaches: Using Excel to Analyze Reliability Differences and Relationships\u0026rdquo; \u003cem\u003eStrength and Conditioning Journal\u003c/em\u003e vol. 37 no. 1 pp. 76\u0026ndash;83 Feb. 2015.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eJ. Qin Y. Niu and Z. Li \u0026ldquo;A Data Analysis Method and Its Applications in EXCEL\u0026rdquo; \u003cem\u003eJournal of Software\u003c/em\u003e vol. 9 no. 12 pp. 2998\u0026ndash;3004 Dec. 2014 doi: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.4304/jsw.9.12.2998-3004\u003c/span\u003e\u003cspan address=\"10.4304/jsw.9.12.2998-3004\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eA. Nath \u0026ldquo;Analyzing Learners Perception about Data Analysis on Microsoft Excel\u0026rdquo; \u003cem\u003eLearning Community\u003c/em\u003e vol. 12 no. 2 pp. 95\u0026ndash;100 2021 doi: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.30954/2231-458X.02.2021.2\u003c/span\u003e\u003cspan address=\"10.30954/2231-458X.02.2021.2\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eD. Divisi G. Di Leonardo G. Zaccagna and R. Crisci \u0026ldquo;Basic Statistics with Microsoft Excel: A Review\u0026rdquo; \u003cem\u003eJournal of Thoracic Disease\u003c/em\u003e vol. 9 no. 6 pp. 1734\u0026ndash;1740 Jun. 2017 doi: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.21037/jtd.2017.05.81\u003c/span\u003e\u003cspan address=\"10.21037/jtd.2017.05.81\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eD. Rahardja \u0026ldquo;Advantages and Disadvantages Comparison of Time-Series Forecast Computing via Microsoft Excel and Three Different SAS Products\u0026rdquo; \u003cem\u003eJournal of Statistics \u0026amp; Management Systems\u003c/em\u003e vol. 27 no. 4 pp. 769\u0026ndash;784 2024 doi: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.47974/JSMS-983\u003c/span\u003e\u003cspan address=\"10.47974/JSMS-983\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eJ. L. Neyeloff S. C. Fuchs and L. B. Moreira \u0026ldquo;Meta-Analyses and Forest Plots Using a Microsoft Excel Spreadsheet: Step-by-Step Guide Focusing on Descriptive Data Analysis\u0026rdquo; \u003cem\u003eBMC Research Notes\u003c/em\u003e vol. 5 no. 52 pp. 1\u0026ndash;6 2012 doi: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1186/1756-0500-5-52\u003c/span\u003e\u003cspan address=\"10.1186/1756-0500-5-52\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eG. M\u0026eacute;lard \u0026ldquo;On the Accuracy of Statistical Procedures in Microsoft Excel 2010\u0026rdquo; \u003cem\u003eComputational Statistics \u0026amp; Data Analysis\u003c/em\u003e (preprint) 2012.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eR. R. Panko \u0026ldquo;What We Don\u0026rsquo;t Know About Spreadsheet Errors Today: The Facts Why We Don\u0026rsquo;t Believe Them and What We Need to Do\u0026rdquo; in \u003cem\u003eProc. EuSpRIG Conf. Spreadsheet Risk Management\u003c/em\u003e 2015 pp. 1\u0026ndash;15.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eL. Raković M. Sakal and V. Vuković \u0026ldquo;Improvement of Spreadsheet Quality through Reduction of End-User Overconfidence: Case Study\u0026rdquo; \u003cem\u003ePeriodica Polytechnica Social and Management Sciences\u003c/em\u003e vol. 27 no. 2 pp. 119\u0026ndash;130 2019 doi: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.3311/PPso.12392\u003c/span\u003e\u003cspan address=\"10.3311/PPso.12392\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eT. Deb Roy D. Bhattacharjee and K. K. Das \u0026ldquo;Comparing the Ability of MS Excel and R While Simulating from Poisson Distribution\u0026rdquo; \u003cem\u003eAssam University Journal of Science and Technology\u003c/em\u003e vol. 4 no. 2 pp. 1\u0026ndash;6 2009.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eC. Ozgur S. Jha and Y. Shen \u0026ldquo;Comparison and Contrast of Statistics Software Packages Including R and Python for Teaching Purposes\u0026rdquo; working paper n.d.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eS. R. Antony \u0026ldquo;Comparison of Data Analysis Software for Instructional Use\u0026rdquo; \u003cem\u003eAmerican Journal of Information Technology\u003c/em\u003e vol. 12 no. 2 pp. 25\u0026ndash;34 2022.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eS. Putatunda D. Ubrangala K. Rama and R. Kondapalli \u0026ldquo;SmartEDA: An R Package for Automated Exploratory Data Analysis\u0026rdquo; \u003cem\u003eJournal of Open-Source Software\u003c/em\u003e vol. 4 no. 41 art. 1509 2019 doi: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.21105/joss.01509\u003c/span\u003e\u003cspan address=\"10.21105/joss.01509\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eD. Incerti H. Thom G. Baio and J. P. Jansen \u0026ldquo;R You Still Using Excel? The Advantages of Modern Software Tools for Health Technology Assessment\u0026rdquo; \u003cem\u003eValue in Health\u003c/em\u003e vol. 22 no. 5 pp. 575\u0026ndash;579 2019 doi: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1016/j.jval.2019.01.003\u003c/span\u003e\u003cspan address=\"10.1016/j.jval.2019.01.003\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eN. R. Naylor J. Williams N. Green F. Lamrock and A. Briggs \u0026ldquo;Extensions of Health Economic Evaluations in R for Microsoft Excel Users: A Tutorial for Incorporating Heterogeneity and Conducting Value of Information Analyses\u0026rdquo; \u003cem\u003ePharmacoEconomics\u003c/em\u003e vol. 41 pp. 21\u0026ndash;32 2023 doi: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1007/s40273-022-01203-0\u003c/span\u003e\u003cspan address=\"10.1007/s40273-022-01203-0\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eS. Miksch C. Di Ciccio P. Soffer and B. Weber \u0026ldquo;Visual Analytics Meets Process Mining: Challenges and Opportunities\u0026rdquo; \u003cem\u003eIEEE Computer Graphics and Applications\u003c/em\u003e vol. 44 no. 6 pp. 132\u0026ndash;143 Nov.\u0026ndash;Dec. 2024 doi: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1109/MCG.2024.3456916\u003c/span\u003e\u003cspan address=\"10.1109/MCG.2024.3456916\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":true,"hideJournal":true,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":true,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"Data analysis, Microsoft Excel, R programming, regression, correlation, statistical modeling","lastPublishedDoi":"10.21203/rs.3.rs-8299990/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-8299990/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eThe proposed study is relevant since a comparative analysis of Excel and R data analyzers is necessary due to the fact that the development of large and complex data in various domains has escalated the demand for data analysis tools that can produce consistent, reliable, and reproducible data. Although Excel is still popular because of its accessibility and simplicity in learning, the available literature raises questions about the statistical accuracy of manually processing data and the high probability of error by the user in the spreadsheet-based analysis. It has also been revealed in research that spreadsheets have undiscovered errors more often and that people who use them have a tendency to be overconfident about the output, casting doubt on their suitability in utilizing Excel in higher levels of analysis. More rigorous data analysis R is an ideal programming environment with more complex statistical modeling and enhanced reproducibility in contrast to programming environments like R.\u003c/p\u003e\u003cp\u003eThus, this study will also determine whether Excel and R are effective in general data analysis tasks. The data utilized in the investigation comprises 5901 records and applies the following same procedures in either of the two tools: data cleaning, descriptive statistics, correlation analysis, regression modeling, and visualization. The findings indicate that Excel is only efficient and error-free to the extent of basic analysis and simple outputs but is inefficient, and errors are more likely to occur in cases where the tasks are more complex. This indicates that R is more useful in terms of analytical capacity and reliability, whereas Excel can be applied in introductory and simple data analysis.\u003c/p\u003e\u003cp\u003e\u003c/p\u003e","manuscriptTitle":"Data Analysis in Excel and R: A Comparative Evaluation","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-12-09 08:48:09","doi":"10.21203/rs.3.rs-8299990/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"d61950aa-d309-4663-96ef-5e97ae25c809","owner":[],"postedDate":"December 9th, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[],"tags":[],"updatedAt":"2025-12-27T16:24:25+00:00","versionOfRecord":[],"versionCreatedAt":"2025-12-09 08:48:09","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-8299990","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-8299990","identity":"rs-8299990","version":["v1"]},"buildId":"8U1c8b4HqxoKbykW_rLl7","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.