Comparative Analysis of Linear Regression, Decision Tree, Xgboost, Catboost, and Artificial Neural Network Machine Learning Algorithms for | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article Comparative Analysis of Linear Regression, Decision Tree, Xgboost, Catboost, and Artificial Neural Network Machine Learning Algorithms for Agaji, Augustine Alul, Igah, Godspower Charles This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-5442566/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract The compressive strength of concrete was predicted using various machine learning algorithms: Linear Regression (LR), Decision Tree (DT), Xgboost, Catboost, and Artificial Neural Network (ANN), to determine the best performant for concrete compressive strength prediction. From the analysis report, the best-performing model was the Catboost model, followed by the Xgboost model, with mean absolute errors of 2.72 N/mm 2 and 3 N/mm 2 respectively. The least performing models for concrete prediction from the research are the LR and the ANN with very high mean absolute errors of 7.75 N/mm 2 and 4.9 N/mm 2 respectively. The dataset from which the models were built, and insights drawn was obtained from kaggle.com. Environmental Engineering Materials Engineering Civil Engineering Concrete Machine Learning Algorithms Concrete Compressive Strength Prediction Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 CHAPTER ONE 1.0 INTRODUCTION Concrete is a commonly used construction material all over the world. It is the second most-utilized material on Earth after water [1]. Some of the reasons for the massive adoption of concrete as a construction material are due to its properties such as durability, compressive strength, stiffness, hardness, porosity, density, and fire/thermal resistance. The compressive strength of concrete is thought to be the most important property as it strongly impacts the safety and durability of the concrete elements made with it. In the execution of most construction projects, the attainment of certain compressive strength values of different elements to meet design specifications is paramount, hence the need for compressive strength testing of concrete samples in a laboratory during the preliminary stage of the construction through trial mix design. “Machine learning techniques are widely used algorithms for predicting the mechanical properties of concrete” [2]. This is because they can be used to predict a compressive strength value of concrete based on the input parameters (quantities of cement, fine aggregate, coarse aggregate, etc.), this is helpful since it saves time and resources during testing as it gives an idea of what the compressive strength a particular concrete ingredient-mix-ratio would yield. This study compares the accuracy of various Machine Learning (ML) algorithms such as Linear Regression (LR), Decision Tree (DT), Xgboost, Catboost, and Artificial Neural Networks (ANN) in predicting the compressive strength of concrete. 1.2 SIGNIFICANCE OF STUDY Due to the massive adoption of ML in civil engineering applications, this research seeks to develop a high-performant model from a list of popular ML algorithms, for predicting the compressive strength of concrete. This will give professionals in the civil engineering domain, and other researchers alike who intend to apply ML algorithms in solving similar tasks a tried and tested lead way since training models using different ML algorithms can be exhausting and time-consuming as well. CHAPTER TWO 2.0 LITERATURE REVIEW 2.1 Concrete Concrete is a very popular construction material that has been used from as far back as 6500 BC, where it was used by the Nabataea traders in Syria and Jordan; they used it to create concrete floors, housing structures, and underground cisterns. In 600 BC, it was also recorded that the Ancient Romans had also utilized concrete; even though they weren’t the first to use concrete, they were the first to utilize it in a large-scale application. As of 200 BC, the Romans had successfully implemented the use of concrete in most of their construction works. The ingredients of their concrete were majorly made of volcanic ash, lime, and seawater, to form the mix [3]. Concrete is evolving and is the most widely utilized construction material. Modern concrete is made of more than just a mixture of cement, water, and aggregates (fine and coarse); as it contains more and more often mineral components, chemical admixtures, fibers, etc. [4]. Concrete will continually evolve in its composition and applicability to meet current sustainability concerns that the world is currently facing, until a balance is achieved. Concrete has many properties that have endeared it to construction practitioners over the decades, some of them are flexibility in conforming to various shapes during placement, high compressive strength, thermal resistance, etc. Because of its quality, which allows for veracity, concrete is now the most used civil engineering material. Concrete has low tensile strength but strong compressive strength [5] 2.11 Concrete Compressive Strength Concrete compressive strength is a crucial mechanical property, this is generally obtained by evaluating the specimen after a standard curing period of 28 days [6], this is important as it determines the ability of concrete members to bear structural loads for construction purposes. The compressive strength of concrete increases as the age increases when required curing has been done [7] The compressive strength of concrete can be obtained through either destructive, partially destructive, or non-destructive testing methods. The non-destructive test is done without damage to the concrete. In partially destructive testing, there are slight reparable damages on the surface of the concrete; these involve methods like core tests and pull-out and pull-off tests. Non-destructive and partially destructive tests can provide properties of the tested concrete specimen such as density, surface hardness, elastic modulus, reinforcement position, and compressive strength. Meanwhile, the crushing of the test specimen is the usual destructive approach to determine the concrete strength [8]. You can obtain the compressive strength of concrete by dividing the load on the failure of the concrete specimen (load by which the specimen crumbles) by the surface area of the specimen. 2.2 Machine Learning Machine Learning (ML) is typically a sub-field under Artificial Intelligence, it aims at building computer systems that are capable of learning or improving their performance based on the data they consume. Artificial Intelligence refers to systems and machines that do tasks that normally would require human intelligence [9]. 2.3 Machine Learning Algorithm ML algorithm refers to a set of rules or procedures that an AI system uses to carry out a task-- which is often to discover new patterns and data insights or make predictions of an output value based on a given set of input variables. Algorithms enable machine learning to learn from data [10]. 2.4 Machine Learning Model A Machine Learning model refers to an object (stored in a file) that has undergone training based on a dataset, and a certain algorithm to be able to recognize certain types of patterns. The algorithm enables the model to be able to reason and make inferences. Once a model is trained, it is capable of reasoning over data it has not seen before and making predictions [11]. 2.4.1 Machine Learning Models Evaluation Metrics Determining whether your ML model performs excellently or poorly is very crucial, as this can help indicate that the model is fit for use or should be improved upon. Evaluation metrics are measures that are used to determine the performance of an ML or statistical model. Some of the commonly used evaluation metrics for ML models are: i. F1 Score ii. Root Mean Squared Error (RMSE) iii. Cross Validation iv. Mean Absolute Error (MAE) v. Accuracy Score CHAPTER THREE 3.0 MATERIALS AND METHODS The dataset used for the study was obtained from Kaggle's official website, it is made of 1,030 rows (data points) and 9 columns, the fields in the dataset are cement, blast furnace slag, fly ash, water, super plasticizer, coarse aggregate, fine aggregate, age, and concrete compressive strength. 3.1 Exploratory Data Analysis This analysis was carried out on the dataset for further understanding of the dataset, and for checking the validity for use with the machine learning algorithms, the dataset was found to contain no missing values and fit for use in building models. 3.2 Data Splitting The dataset was split into a ratio of 80:20, where 80% was used for training the model and 20% for validating or testing it. The dataset was further separated into two categories, one for input features(X); which are the features to be given to the model for prediction (cement, blast furnace slag, fly ash, water, superplasticizer, coarse aggregate, fine aggregate, age) and the other for the target feature(Y), which refers to the what the model predicts (concrete compressive strength). 3.3 Model Building and Evaluation ML models were built using LR, DT, Xgboost, Catboost, and ANN algorithms, and the metric for evaluation of their various performances is the Mean Absolute Error (MEA). 3.4 Model Hyper-parameter Tuning Hyper-parameter tuning was done on the various models; RandomizedSearchCV was for theLR, DT, Xgboost, and Catboost models. Meanwhile, for the ANN, the number of epochs was iterated for the ANN model to know the best performance. CHAPTER FOUR 4.0 RESULTS AND DISCUSSION 4.1 Correlation Between the Dataset Features: The correlation between the various features of the dataset is shown in Fig 4.1 below: The heatmap above in Fig 3 has clearly shown that the major factors influencing the compressive strength of concrete used in this study are the amount of cement present, followed by the amount of superplasticizer and the age of the concrete. 4.2 Models Evaluation Using Various Algorithms The metric used for evaluating the performance of the machine learning models used in this study is the Mean Absolute Error (MAE) which is suitable for regressive tasks; it shows the absolute difference in value between the actual test data and the predicted outcome. From Fig 4 above, it can be deduced that the best-performing model for concrete compressive strength prediction is Catboost, followed by Xgboost and Decision Tree (DT). Artificial Neural Network (ANN) and Linear Regression (LR) models are the least performant models in compressive concrete strength prediction. 4.3 Results from Model Hyper-parameter Tuning The performance of the models aside from the ANN after hyper-parameter tuning was done was below that of the untuned models in their default parameters. This occurred because the RandomizedSearchCV chooses random values from the parameters given to it. The RandomizedSearchCV was used instead of the GridSearchCV because it is faster and requires less computational resources. The number of epochs used for training the ANN was chosen from the range 1-21, and the number of epochs that gave the best result was 20, as shown in Figure 5 below. CHAPTER FIVE 5.0 CONCLUSION/RECOMMENDATION 5.1 Conclusion From the results obtained from the course of this study, the Catboost algorithm performed best and is recommended for use in predicting concrete compressive strength. When carrying out hyper-parameter tuning of model parameters, it is more reliable to use GridSearchCV, so long as you have the computational capacity, as it performs better than the RandomizedSearchCV approach. Cement content has been discovered to be the highest correlating factor to the compressive strength of concrete. 5.2 Recommendation It is recommended that laboratory testing be used for confirmatory testing of concrete samples after machine learning models have been used to predict the compressive strength values, to save time and resources. While building an ANN model, the number of epochs should be adjusted until a good-performing model with minimal error is achieved. Declarations CONFLICT OF INTEREST None. ACKNOWLEDGEMENT None. This study was funded by the author. ETHICAL CONSIDERATION All the materials and data used in this study was approved and confidentiality was duly observed where necessary. References Shyamala G, Kumar R, K., Olalusi OB (1927) (2020) Impacts of nonconventional construction materials on concrete strength development: case studies. SN Appl. Sci. 2, https://doi.org/10.1007/s42452-020-03687-x Ahmad A, Farooq F, Niewiadomski P, Ostrowski K, Akbar A, Aslam F, Alyousef R (2021) Prediction of Compressive Strength of Fly Ash Based Concrete Using Individual and Ensemble Algorithm. Materials 14(4):794. https://doi.org/10.3390/ma14040794 Giatech Scientific (2022) The History of Concrete. :text=6500BC%20%E2%80%93%20UAE%3A%20The%20earliest%20recordings,straw%20to%20bind%20dried%20bricks. https://www.giatecscientific.com/education/the-history-of-concrete/#:~ Pierre-Claude Aıtcin (2000) Cements of yesterday and today: Concrete of tomorrow, Cement and Concrete Research, Volume 30, Issue 9, Pages 1349–1359, ISSN 0008-8846, https://doi.org/10.1016/S0008-8846(00)00365-3 Peter DF, Olamilekan OA, Benjamin AO et al (2024) Ewemade Cornelius Enabulele, Nzemeka Ogechukwu Israel, Grace Agbons Aruya,. Comparison of Coren mix design with other international mix (ACI and DoE) design methods. World Journal of Advanced Research and Reviews [Internet]. ;23(1):2522–39. http://dx.doi.org/10.30574/wjarr.2024.23.1.2230 Ni H-G, Wang J-Z (2000) Prediction of compressive strength of concrete by neural networks, Cement and Concrete Research, Volume 30, Issue 8, Page 1245, ISSN 0008-8846, https://doi.org/10.1016/S0008-8846(00)00345-8 Rasak Q, Fakoyede P, Oparinde A, Enabulele E, Nzemeka O, Aruya G, Adeniran-Bakare S, Adeleke O, Ajibola I (2024) Modification Analysis of Two Different Cement Grades and Their Impact on Vibrated Concrete Qualities. Path Sci 10(8):6001–6017. http://dx.doi.org/10.22178/pos.107-8 Malek J (2014) Machta Kaouther. Destructive and Non-destructive Testing of Concrete Structures. Jordan J Civil Eng, 8, 4, Page 432. Oracle's official website What is Machine Learning, https://www.oracle.com/ng/artificial-intelligence/machine-learning/what-is-machine-learning/ IBM official website What is a Machine Learning Algorithm? https://www.ibm.com/topics/machine-learning-algorithms Microsoft official website (2024) What is a Machine Learning Model https://learn.microsoft.com/en-us/windows/ai/windows-ml/what-is-a-machine-learning-model Tables Tables are available in the Supplementary Files section Additional Declarations The authors declare no competing interests. Supplementary Files APPENDIX.docx Tables.docx Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-5442566","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":377401565,"identity":"d8318607-94b4-41b1-9f4f-f4b9221f7407","order_by":0,"name":"Agaji, Augustine Alul","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAA+0lEQVRIiWNgGAWjYPACCRDBeIChwAbKJggSIMoOMBikEa2FAablMGEt/DPSr0n+/GGRx8/A/OAwj8H5xP7ZzQcfMNTYROPSInEjp0yaJ0GiWLKBzQCo5XbijDvHkg0YjqXlNuDScyMnTRrol8QNIFeBtDTcyDGTYGw4jFOLPFCL5A+glv0H2D8AtZxLnE9Ii8GN9GMSPCBbGHhAthxI3EBIi+GZN8zWPGkSiTMO8xQcnGOQbLzxRlqyQQIev8gdT39484dNXWJ/e/vGB28q7GTn3Ug++OBDjQ1u7wPdA6GZIZQjWGUCTuUgwP4AhWuPV/EoGAWjYBSMSAAAi5JdYcn+kdAAAAAASUVORK5CYII=","orcid":"https://orcid.org/0009-0005-3623-8410","institution":"Cross River University of Technology, Calabar","correspondingAuthor":true,"prefix":"","firstName":"Augustine","middleName":"Alul","lastName":"Agaji","suffix":""},{"id":377404066,"identity":"78566a93-767d-4bd4-bc04-ae628d42428e","order_by":1,"name":"Igah, Godspower Charles","email":"","orcid":"https://orcid.org/0009-0002-3226-3940","institution":"Ahmadu Bello University, Zaria","correspondingAuthor":false,"prefix":"","firstName":"Godspower","middleName":"Charles","lastName":"Igah","suffix":""}],"badges":[],"createdAt":"2024-11-13 00:22:43","currentVersionCode":1,"declarations":{"humanSubjects":false,"vertebrateSubjects":false,"conflictsOfInterestStatement":false,"humanSubjectEthicalGuidelines":false,"humanSubjectConsent":false,"humanSubjectClinicalTrial":false,"humanSubjectCaseReport":false,"vertebrateSubjectEthicalGuidelines":false},"doi":"10.21203/rs.3.rs-5442566/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-5442566/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":69016807,"identity":"79d01ad4-c176-40b8-ba03-3cbd4822c040","added_by":"auto","created_at":"2024-11-14 14:57:01","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":367675,"visible":true,"origin":"","legend":"\u003cp\u003eshowing the rebound hammer equipment for Nondestructive concrete test.\u003c/p\u003e","description":"","filename":"1.png","url":"https://assets-eu.researchsquare.com/files/rs-5442566/v1/8fadf28587c8929bf0c2881b.png"},{"id":69016809,"identity":"7c69ad53-8d23-4f17-8ec5-e84c8c6212cf","added_by":"auto","created_at":"2024-11-14 14:57:01","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":185604,"visible":true,"origin":"","legend":"\u003cp\u003eshowing a compressive strength test machine for destructive test of concrete.\u003c/p\u003e","description":"","filename":"2.png","url":"https://assets-eu.researchsquare.com/files/rs-5442566/v1/46bcced1c4a52b0915d46f26.png"},{"id":69016806,"identity":"72adc904-f4e1-4ebf-80a9-171631bfb544","added_by":"auto","created_at":"2024-11-14 14:57:01","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":51749,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eDataset correlation\u003c/strong\u003e\u003c/p\u003e","description":"","filename":"3.png","url":"https://assets-eu.researchsquare.com/files/rs-5442566/v1/7e43119eebb6f5a46328a075.png"},{"id":69016940,"identity":"93c7f219-b058-4a3e-ad58-015fd6d83950","added_by":"auto","created_at":"2024-11-14 15:05:01","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":25232,"visible":true,"origin":"","legend":"\u003cp\u003eFigure showing the various ML models used in this study and their corresponding Mean Absolute Errors in N/mm2.\u003c/p\u003e","description":"","filename":"4.png","url":"https://assets-eu.researchsquare.com/files/rs-5442566/v1/91f3a320d2639959e33aa500.png"},{"id":69016941,"identity":"4d62906c-629b-4a4b-96f3-05539b595051","added_by":"auto","created_at":"2024-11-14 15:05:01","extension":"png","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":42618,"visible":true,"origin":"","legend":"\u003cp\u003eabove shows the mean absolute errors of the Artificial Neural Network model as against various epochs passed to it.\u003c/p\u003e","description":"","filename":"5.png","url":"https://assets-eu.researchsquare.com/files/rs-5442566/v1/ecd0b6a55069c9e904c51b5f.png"},{"id":69018059,"identity":"126c0cfb-ed11-4478-b6e2-e6f7ccc224d1","added_by":"auto","created_at":"2024-11-14 15:13:01","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":1169265,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-5442566/v1/78634819-d7d8-4087-a5ec-0bac38e2d5e6.pdf"},{"id":69016805,"identity":"fdc665ca-3b9a-4cf5-83c6-bc8af2d0f831","added_by":"auto","created_at":"2024-11-14 14:57:01","extension":"docx","order_by":1,"title":"","display":"","copyAsset":false,"role":"supplement","size":13776,"visible":true,"origin":"","legend":"","description":"","filename":"APPENDIX.docx","url":"https://assets-eu.researchsquare.com/files/rs-5442566/v1/a657f3ec1c43d09234baa2a9.docx"},{"id":69016939,"identity":"13ea3f7c-d815-459e-94b7-54b7a837ef2d","added_by":"auto","created_at":"2024-11-14 15:05:01","extension":"docx","order_by":2,"title":"","display":"","copyAsset":false,"role":"supplement","size":221494,"visible":true,"origin":"","legend":"","description":"","filename":"Tables.docx","url":"https://assets-eu.researchsquare.com/files/rs-5442566/v1/74613c76d2ddca818a8d5363.docx"}],"financialInterests":"The authors declare no competing interests.","formattedTitle":"\u003cp\u003eComparative Analysis of Linear Regression, Decision Tree, Xgboost, Catboost, and Artificial Neural Network Machine Learning Algorithms for\u003c/p\u003e","fulltext":[{"header":"CHAPTER ONE","content":"\u003cp\u003e\u003cstrong\u003e1.0 \u0026nbsp;INTRODUCTION\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eConcrete is a commonly used construction material all over the world. It is the second most-utilized material on Earth after water [1]. Some of the reasons for the massive adoption of concrete as a construction material are due to its properties such as durability, compressive strength, stiffness, hardness, porosity, density, and fire/thermal resistance. The compressive strength of concrete is thought to be the most important property as it strongly impacts the safety and durability of the concrete elements made with it. In the execution of most construction projects, the attainment of certain compressive strength values of different elements to meet design specifications is paramount, hence the need for compressive strength testing of concrete samples in a laboratory during the preliminary stage of the construction through trial mix design.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u0026ldquo;Machine learning techniques are widely used algorithms for predicting the mechanical properties of concrete\u0026rdquo; [2]. This is because they can be used to predict a compressive strength value of concrete\u0026nbsp;based on the\u0026nbsp;input parameters (quantities of cement, fine aggregate, coarse aggregate, etc.), this is helpful since it saves time and resources during testing as\u0026nbsp;it gives an idea of what the compressive strength a particular concrete ingredient-mix-ratio would yield.\u003c/p\u003e\n\u003cp\u003eThis study compares the accuracy of various Machine Learning (ML) algorithms such as Linear Regression (LR), Decision Tree (DT), Xgboost, Catboost, and Artificial Neural Networks (ANN)\u0026nbsp;in predicting the compressive strength of concrete.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e1.2 SIGNIFICANCE OF STUDY\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eDue to the massive adoption of ML in civil engineering applications, this research seeks to develop a high-performant model from a list of popular ML algorithms, for predicting the compressive strength of concrete. This will give professionals in the civil engineering domain, and other researchers alike who intend to apply ML algorithms in solving similar tasks a tried and tested lead way since training models using different ML algorithms can be exhausting and time-consuming as well.\u003c/p\u003e"},{"header":"CHAPTER TWO","content":"\u003cp\u003e\u003cstrong\u003e2.0 LITERATURE REVIEW\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e2.1 Concrete\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eConcrete is a very popular construction material that has been used from as far back as 6500 BC, where it was used by the Nabataea traders in Syria and Jordan; they used it to create concrete floors, housing structures, and underground cisterns. In 600 BC, it was also recorded that the Ancient Romans had also utilized concrete; even though they weren\u0026rsquo;t the first to use concrete, they were the first to utilize it in a large-scale application. As of 200 BC, the Romans had successfully implemented the use of concrete in most of their construction works. The ingredients of their concrete were majorly made of volcanic ash, lime, and seawater, to form the mix [3].\u003c/p\u003e\n\u003cp\u003eConcrete is evolving and is the most widely utilized construction material. Modern concrete is made of more than just a mixture of cement, water, and aggregates (fine and coarse); as it contains more and more often mineral components, chemical admixtures, fibers, etc. [4]. Concrete will continually evolve in its composition and applicability to meet current sustainability concerns that the world is currently facing, until a balance is achieved. Concrete has many properties that have endeared it to construction practitioners over the decades, some of them are flexibility in conforming to various shapes during placement, high compressive strength, thermal resistance, etc.\u003c/p\u003e\n\u003cp\u003eBecause of its quality, which allows for veracity, concrete is now the most used civil engineering material. Concrete has low tensile strength but strong compressive strength [5]\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e2.11 Concrete Compressive Strength\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eConcrete compressive strength is a crucial mechanical property, this is generally obtained by evaluating the specimen after a standard curing period of 28 days [6], this is important as it determines the ability of concrete members to bear structural loads for construction purposes. The compressive strength of concrete increases as the age increases when required curing has been done [7]\u003c/p\u003e\n\u003cp\u003e\u0026nbsp;The compressive strength of concrete can be obtained through either destructive, partially destructive, or non-destructive testing methods. The non-destructive test is done without damage to the concrete. In partially destructive testing, there are slight reparable damages on the surface of the concrete; these involve methods like core tests and pull-out and pull-off tests. Non-destructive and partially destructive tests can provide properties of the tested concrete specimen such as density, surface hardness, elastic modulus, reinforcement position, and compressive strength. Meanwhile, the crushing of the test specimen is the usual destructive approach to determine the concrete strength [8].\u003c/p\u003e\n\u003cp\u003eYou can obtain the compressive strength of concrete by dividing the load on the failure of the concrete specimen (load by which the specimen crumbles) by the surface area of the specimen.\u003c/p\u003e\n\u003ch3\u003e\u003cstrong\u003e2.2 Machine Learning\u003c/strong\u003e\u003c/h3\u003e\n\u003cp\u003eMachine Learning (ML) is typically a sub-field under Artificial Intelligence, it aims at building computer systems that are capable of learning or improving their performance based on the data they consume. Artificial Intelligence refers to systems and machines that do tasks that normally would require human intelligence [9].\u003c/p\u003e\n\u003ch3\u003e\u003cstrong\u003e2.3 Machine Learning Algorithm\u003c/strong\u003e\u003c/h3\u003e\n\u003cp\u003eML algorithm refers to a set of rules or procedures that an AI system uses to carry out a task-- which is often to discover new patterns and data insights or make predictions of an output value based on a given set of input variables. Algorithms enable machine learning to learn from data [10].\u003c/p\u003e\n\u003ch3\u003e\u003cstrong\u003e2.4 Machine Learning Model\u003c/strong\u003e\u003c/h3\u003e\n\u003cp\u003eA Machine Learning model refers to an object (stored in a file) that has undergone training based on a dataset, and a certain algorithm to be able to recognize certain types of patterns. The algorithm enables the model to be able to reason and make inferences. Once a model is trained, it is capable of reasoning over data it has not seen before and making predictions [11].\u003c/p\u003e\n\u003ch3\u003e\u003cstrong\u003e2.4.1 Machine Learning Models Evaluation Metrics\u003c/strong\u003e\u003c/h3\u003e\n\u003cp\u003eDetermining whether your ML model performs excellently or poorly is very crucial, as this can help indicate that the model is fit for use or should be improved upon. \u0026nbsp;Evaluation metrics are measures that are used to determine the performance of an ML or statistical model.\u003c/p\u003e\n\u003cp\u003eSome of the commonly used evaluation metrics for ML models are:\u003c/p\u003e\n\u003cp\u003ei. F1 Score\u003c/p\u003e\n\u003cp\u003eii. Root Mean Squared Error (RMSE)\u003c/p\u003e\n\u003cp\u003eiii. Cross Validation\u003c/p\u003e\n\u003cp\u003eiv. Mean Absolute Error (MAE)\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003ev.\u0026nbsp;\u003c/strong\u003eAccuracy Score\u003c/p\u003e"},{"header":"CHAPTER THREE","content":"\u003ch3\u003e\u003cstrong\u003e3.0 MATERIALS AND METHODS\u003c/strong\u003e\u003c/h3\u003e\n\u003cp\u003eThe dataset used for the study was obtained from Kaggle\u0026apos;s official website, it is made of 1,030 rows (data points) and 9 columns, the fields in the dataset are cement, blast furnace slag, fly ash, water, super plasticizer, coarse aggregate, fine aggregate, age, and concrete compressive strength.\u003c/p\u003e\n\u003ch3\u003e\u003cstrong\u003e3.1 Exploratory Data Analysis\u003c/strong\u003e\u003c/h3\u003e\n\u003cp\u003eThis analysis was carried out on the dataset for further understanding of the dataset, and for checking the validity for use with the machine learning algorithms, the dataset was found to contain no missing values and fit for use in building models.\u0026nbsp;\u003c/p\u003e\n\u003ch3\u003e\u003cstrong\u003e3.2 Data Splitting\u003c/strong\u003e\u003c/h3\u003e\n\u003cp\u003eThe dataset was split into a ratio of 80:20, where 80% was used for training the model and 20% for validating or testing it. The dataset was further separated into two categories, one for input features(X); which are the features to be given to the model for prediction (cement, blast furnace slag, fly ash, water, superplasticizer, coarse aggregate, fine aggregate, age) and the other for the target feature(Y), which refers to the what the model predicts (concrete compressive strength). \u0026nbsp;\u0026nbsp;\u003c/p\u003e\n\u003ch3\u003e\u003cstrong\u003e3.3 Model Building and Evaluation\u003c/strong\u003e\u003c/h3\u003e\n\u003cp\u003eML models were built using LR, DT, Xgboost, Catboost, and ANN algorithms, and the metric for evaluation of their various performances is the Mean Absolute Error (MEA).\u003c/p\u003e\n\u003ch3\u003e\u003cstrong\u003e3.4 Model Hyper-parameter Tuning\u003c/strong\u003e\u003c/h3\u003e\n\u003cp\u003eHyper-parameter tuning was done on the various models; RandomizedSearchCV was for theLR, DT, Xgboost, and Catboost models. Meanwhile, for the ANN, the number of epochs was iterated for the ANN model to know the best performance.\u003c/p\u003e"},{"header":"CHAPTER FOUR","content":"\u003cp\u003e\u003cstrong\u003e4.0 RESULTS AND DISCUSSION\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e4.1 Correlation Between the Dataset Features:\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe correlation between the various features of the dataset is shown in Fig 4.1 below:\u003c/p\u003e\n\u003cp\u003eThe heatmap above in Fig 3 has clearly\u0026nbsp;shown\u0026nbsp;that the major factors\u0026nbsp;influencing the compressive strength of concrete used in this study\u0026nbsp;are\u0026nbsp;the amount of cement present, followed by the amount of superplasticizer and the age of the concrete.\u003c/p\u003e\n\u003ch3\u003e\u003cstrong\u003e4.2 Models Evaluation Using Various Algorithms\u003c/strong\u003e\u003c/h3\u003e\n\u003cp\u003eThe metric used for evaluating the performance of the machine learning models used in this study is the Mean Absolute Error (MAE) which is suitable for regressive tasks; it shows the absolute difference in value between the actual test data and the predicted outcome.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eFrom Fig 4 above, it can be deduced that the best-performing model for concrete compressive strength prediction is Catboost, followed by Xgboost and Decision Tree (DT). Artificial Neural Network (ANN) and Linear Regression (LR) models are the least performant models in compressive concrete strength prediction.\u003c/p\u003e\n\u003ch3\u003e\u003cstrong\u003e4.3 Results from Model Hyper-parameter Tuning\u003c/strong\u003e\u003c/h3\u003e\n\u003cp\u003eThe performance of the models aside from the ANN after hyper-parameter tuning was done was below that of the untuned models in their default parameters. This occurred because the RandomizedSearchCV chooses random values from the parameters given to it. The RandomizedSearchCV was used instead of the GridSearchCV because it is faster and requires less computational resources.\u003c/p\u003e\n\u003cp\u003eThe number of epochs used for training the ANN was chosen from the range 1-21, and the number of epochs that gave the best result was 20, as shown in Figure 5 below.\u003c/p\u003e"},{"header":"CHAPTER FIVE","content":"\u003cp\u003e\u003cstrong\u003e5.0 CONCLUSION/RECOMMENDATION\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e5.1 Conclusion\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eFrom the results obtained from the course of this study, the Catboost algorithm performed best and is recommended for use in predicting concrete compressive strength.\u003c/p\u003e\n\u003cp\u003eWhen carrying out hyper-parameter tuning of model parameters, it is more reliable to use GridSearchCV, so long as you have the computational capacity, as it performs better than the RandomizedSearchCV approach.\u003c/p\u003e\n\u003cp\u003eCement content has been discovered to be the highest correlating factor to the compressive strength of concrete.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e5.2 Recommendation\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eIt is recommended that laboratory testing be used for confirmatory testing of concrete samples after machine learning models have been used to predict the compressive strength values, to save time and resources.\u003c/p\u003e\n\u003cp\u003eWhile building an ANN model, the number of epochs should be adjusted until a good-performing model with minimal error is achieved.\u003c/p\u003e"},{"header":"Declarations","content":"\u003cp\u003e\u003cstrong\u003eCONFLICT OF INTEREST\u0026nbsp;\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eNone.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eACKNOWLEDGEMENT\u0026nbsp;\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eNone.\u003c/p\u003e\n\u003cp\u003eThis study was funded by the author.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eETHICAL CONSIDERATION\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eAll the materials and data used in this study was approved and confidentiality was duly observed where necessary.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\u003cli\u003e\u003cspan\u003eShyamala G, Kumar R, K., Olalusi OB (1927) (2020) Impacts of nonconventional construction materials on concrete strength development: case studies. \u003cem\u003eSN Appl. Sci.\u003c/em\u003e 2, \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1007/s42452-020-03687-x\u003c/span\u003e\u003cspan address=\"10.1007/s42452-020-03687-x\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAhmad A, Farooq F, Niewiadomski P, Ostrowski K, Akbar A, Aslam F, Alyousef R (2021) Prediction of Compressive Strength of Fly Ash Based Concrete Using Individual and Ensemble Algorithm. Materials 14(4):794. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.3390/ma14040794\u003c/span\u003e\u003cspan address=\"10.3390/ma14040794\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eGiatech Scientific (2022) The History of Concrete. :text=6500BC%20%E2%80%93%20UAE%3A%20The%20earliest%20recordings,straw%20to%20bind%20dried%20bricks. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://www.giatecscientific.com/education/the-history-of-concrete/#:~\u003c/span\u003e\u003cspan address=\"https://www.giatecscientific.com/education/the-history-of-concrete/#:~\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003ePierre-Claude Aıtcin (2000) Cements of yesterday and today: Concrete of tomorrow, Cement and Concrete Research, Volume 30, Issue 9, Pages 1349\u0026ndash;1359, ISSN 0008-8846, \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/S0008-8846(00)00365-3\u003c/span\u003e\u003cspan address=\"10.1016/S0008-8846(00)00365-3\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003ePeter DF, Olamilekan OA, Benjamin AO et al (2024) Ewemade Cornelius Enabulele, Nzemeka Ogechukwu Israel, Grace Agbons Aruya,. Comparison of Coren mix design with other international mix (ACI and DoE) design methods. World Journal of Advanced Research and Reviews [Internet]. ;23(1):2522\u0026ndash;39. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttp://dx.doi.org/10.30574/wjarr.2024.23.1.2230\u003c/span\u003e\u003cspan address=\"10.30574/wjarr.2024.23.1.2230\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eNi H-G, Wang J-Z (2000) Prediction of compressive strength of concrete by neural networks, Cement and Concrete Research, Volume 30, Issue 8, Page 1245, ISSN 0008-8846, \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/S0008-8846(00)00345-8\u003c/span\u003e\u003cspan address=\"10.1016/S0008-8846(00)00345-8\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eRasak Q, Fakoyede P, Oparinde A, Enabulele E, Nzemeka O, Aruya G, Adeniran-Bakare S, Adeleke O, Ajibola I (2024) Modification Analysis of Two Different Cement Grades and Their Impact on Vibrated Concrete Qualities. Path Sci 10(8):6001\u0026ndash;6017. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttp://dx.doi.org/10.22178/pos.107-8\u003c/span\u003e\u003cspan address=\"10.22178/pos.107-8\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMalek J (2014) Machta Kaouther. Destructive and Non-destructive Testing of Concrete Structures. Jordan J Civil Eng, 8, 4, Page 432.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eOracle's official website What is Machine Learning, \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://www.oracle.com/ng/artificial-intelligence/machine-learning/what-is-machine-learning/\u003c/span\u003e\u003cspan address=\"https://www.oracle.com/ng/artificial-intelligence/machine-learning/what-is-machine-learning/\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eIBM official website What is a Machine Learning Algorithm? \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://www.ibm.com/topics/machine-learning-algorithms\u003c/span\u003e\u003cspan address=\"https://www.ibm.com/topics/machine-learning-algorithms\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMicrosoft official website (2024) What is a Machine Learning Model \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://learn.microsoft.com/en-us/windows/ai/windows-ml/what-is-a-machine-learning-model\u003c/span\u003e\u003cspan address=\"https://learn.microsoft.com/en-us/windows/ai/windows-ml/what-is-a-machine-learning-model\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003c/ol\u003e"},{"header":"Tables","content":"\u003cp\u003eTables are available in the Supplementary Files section\u003c/p\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":true,"hideJournal":true,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"Concrete, Machine Learning Algorithms, Concrete Compressive Strength Prediction","lastPublishedDoi":"10.21203/rs.3.rs-5442566/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-5442566/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eThe compressive strength of concrete was predicted using various machine learning algorithms: Linear Regression (LR), Decision Tree (DT), Xgboost, Catboost, and Artificial Neural Network (ANN), to determine the best performant for concrete compressive strength prediction. From the analysis report, the best-performing model was the Catboost model, followed by the Xgboost model, with mean absolute errors of 2.72 N/mm\u003csup\u003e2\u003c/sup\u003e and 3 N/mm\u003csup\u003e2\u003c/sup\u003e respectively. The least performing models for concrete prediction from the research are the LR and the ANN with very high mean absolute errors of 7.75 N/mm\u003csup\u003e2\u003c/sup\u003e and 4.9 N/mm\u003csup\u003e2\u003c/sup\u003e respectively. The dataset from which the models were built, and insights drawn was obtained from kaggle.com.\u003c/p\u003e","manuscriptTitle":"Comparative Analysis of Linear Regression, Decision Tree, Xgboost, Catboost, and Artificial Neural Network Machine Learning Algorithms for","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2024-11-14 14:56:56","doi":"10.21203/rs.3.rs-5442566/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"90c9720d-cbd4-4dc5-a42a-ea4a490f82e4","owner":[],"postedDate":"November 14th, 2024","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[{"id":40272602,"name":"Environmental Engineering"},{"id":40272603,"name":"Materials Engineering"},{"id":40272604,"name":"Civil Engineering"}],"tags":[],"updatedAt":"2024-11-14T14:56:56+00:00","versionOfRecord":[],"versionCreatedAt":"2024-11-14 14:56:56","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-5442566","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-5442566","identity":"rs-5442566","version":["v1"]},"buildId":"qtupq5eGEP_6zYnWcrvyt","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.