Discerning Sustainability: Analyzing Asia's Greenhouse Gas Emissions Through AI

doi:10.21203/rs.3.rs-4466189/v1

Discerning Sustainability: Analyzing Asia's Greenhouse Gas Emissions Through AI

2024 · doi:10.21203/rs.3.rs-4466189/v1

preprint OA: closed

Full text JSON View at publisher

Full text 55,975 characters · extracted from preprint-html · click to expand

Discerning Sustainability: Analyzing Asia's Greenhouse Gas Emissions Through AI | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article Discerning Sustainability: Analyzing Asia's Greenhouse Gas Emissions Through AI Nikita Ramrakhiani, Devashish Chitale, Anup Kukreja, Kunal Mhetre This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-4466189/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract The escalating global apprehension regarding climate change and its consequences on the environment have incited extensive inquiry into the origins and repercussions of greenhouse gas emissions, particularly carbon dioxide (CO 2 ) emissions. Amongst the distinct geographical regions worldwide, Asia is a substantial contributor to these emissions, necessitating an exhaustive examination of its constituent elements which encompass land utilization, industrial undertakings, and various other demographic factors. This instigates the basis for our research with the intent of exploring multifaceted dimensions of CO 2 emissions in Asian countries, with a pronounced emphasis on emissions stemming from environmental Indicators. The inherent objective of our research is to stratify the per capita CO 2 emissions of these nations into discrete categories predicated on sustainability benchmarks. In this research, we work with AI algorithms like Decision tree, Random Forest and logistic regression, to ascertain and substantiate the classifications which can be categorized with regards to their CO 2 emissions. The amalgamation of these algorithms with data visualization tools like Tableau and Power BI contribute towards identifying existing patterns and add a dynamic edge to our model. The aim of the research is to analyze the cause of disparities between CO 2 emissions and develop insights to reduce or mitigate the effects of emissions on the environment while maintaining industrial development & quality of life. Unveiling critical discernments into the several factors that wield influence over CO 2 emissions to mitigate the effects is a crucial need today. The findings also identify the leading contributors among Asian nations in terms of CO 2 emissions. The key focus of this research is about the significance of adopting sustainable methodologies to curtail CO 2 emissions within Asian countries while maintaining a balance with their overall development. This endeavor not only distinguishes the nations with the loftiest emissions but also Gives input pertaining to assign priority in their efforts to mitigate their environmental footprint. By leveraging the insights emanating from this investigation, policymakers, environmental advocates, and stakeholders can devise strategies to achieve sustainability and alleviate the detrimental ramifications of CO 2 emissions in the region. This research serves as an indispensable initial stride toward a more ecologically aware and sustainable future for Asian nations and the global community at large. Environmental Analytics Artificial Intelligence Greenhouse Gas Emissions Decision Tree Predictive Modelling Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Figure 6 Figure 7 Figure 8 Figure 9 Figure 10 Figure 11 Figure 12 Figure 13 Figure 14 Figure 15 Figure 16 Figure 17 Figure 18 Figure 19 Figure 20 Figure 21 Figure 22 Figure 23 I. Introduction Greenhouse gas emissions have a significant impact on humanity and the world’s ecological balance. Currently rising greenhouse emissions are a major concern globally. The CO 2 emissions are primarily responsible for global climate change among the Kyoto gasses. In the year 2020 the CO 2 emissions recorded were 50.1 Gigatons for the entire planet[19]. Asia accounts for more than 50% of these carbon emissions. In the year 2020 CO 2 emissions produced by Asia are 34.2 Giga tons. From previous studies we can infer that the developing nations tend to generate more CO 2 as compared to other countries. The increase in CO 2 emissions have led to climate change disrupting the weather patterns leading to increase in natural disasters. Prolonged exposure to emission may lead to respiratory disorders exacerbating conditions like asthma and bronchitis. Furthermore, CO 2 emissions affect crop yields in turn reducing the nutritional value of food impacting quality of life. Today the world is moving towards sustainability and progressing to become carbon efficient. Artificial Intelligence plays a crucial role in curbing CO 2 emissions with innovative techniques such as energy optimization, predictive industrial maintenance, supply chain optimization and climate prediction and risk analysis. This study we intend to focus upon the parameter leading to CO 2 emissions and classifying them into sustainability benchmarks. It will provide us clarity upon the factors tending to have greater emissions caused by the countries which are necessary to mitigate. The study revolves around numerous parameters that result in emission such as Population, Armed Forces Size, Forested Area, Power Sector (CO 2 ) emissions, other sector combustion (CO 2 ) emissions etc. The study consists of an analytical framework dedicated to analyses of the CO 2 emissions based on defined parameters. The defined variables are encompassing giving a vivid idea about the effects of CO 2 emissions on climate consequently affecting the earth’s ecological balance. II. Research Methodology Several studies have used techniques such as Linear Regression, Artificial Neural Networks to estimate CO 2 emissions. Our primary objective is to classify countries on the basis of emission levels. We decided to implement Decision Tree and Random Forest Classifiers. With nodes for features, branches for decision rules, and leaves for class labels or final judgements, the Decision Tree Classifier builds a structure like a tree. The root, internal, and leaf nodes make up its three primary nodes. The root node first divides into what are referred to be internal nodes. The interior nodes in the model represent the data properties, whereas the leaf nodes stand for the final decision. Hyperparameter Tuning means picking the best settings for a machine learning method. We have to figure out what needs changing, test out different options, and find the best combo that works well for new data. Similarly, Random Forest as the name suggests generates multiple decision trees using random subsets of parameters. These trees are formed by repeatedly partitioning the data into subsets based on different features. Each tree in the Random Forest is built using a random subset of the features, providing a diverse set of trees that are less correlated with one another.K-fold cross-validation is a way to test how good a machine learning model is. It works by dividing the data into 'k' parts and using each part as a test at different times. This is useful when you don't have much data. It ensures that all data is used for both training and testing, giving a more accurate picture of how well the model performs with new data. It also helps to detect problems such as when the model memorises the training data too much (overfitting) or does not learn enough from it (underfitting). Overall, it gives a truer picture of how well the model is. III. Exploratory Data Analysis Data Cleaning: EDA is an important procedure for comprehending a dataset with several elements. It helps to understand the structure, characteristics and potential issues within the data. It involves descriptive analysis of the extensive parameters required for understanding CO 2 emissions. The Original dataset had to be processed for missing values, data points and other metric conversions. The missing values in the factors were handled as they are a smaller number of records in the data with numerous factors. The Original dataset utilized for study comprises 195 rows and 47 columns. It is a combination of two datasets that is from Kaggle and Emissions Database for Global Atmospheric Research. The total null values present in the data were 12% of the total values. The data utilized is for the year 2020 consisting approximately of all the countries of the world. Parameters such as Land Area, Energy Sector emissions, Forested Area, Population, CO 2 emissions and other socio-economic indicators are considered. We have used regression as treatment for null values, as with limited records we avoid deleting them. This can be particularly helpful when the missing data is not completely random and shows some correlation with other variables in the dataset avoiding loss of information. The above boxplot represents the data distribution before & after treatment of null values using linear regression, showing the initial spread and central tendency of the dataset revealing any potential changes in the spread, outliers, or central tendencies resulting from the linear regression imputation of null values. We also attempted to compare the CO 2 emissions level with different socio-economic factors across continents to get a basic understanding with CO 2 .. Here we can infer that there is no definite relationship between CO 2 emissions and social factors. Here, we can see that CO 2 emissions have some relationship with factors like tax revenue, tax rate, and unemployment rate. After completing the EDA, we used correlation to look for any associations between distinct components. Using correlation, we discovered a high correlation between some variables, which may indicate multicollinearity. Multicollinearity makes it difficult to determine each predictor's independent effect on the target variable. To determine multicollinearity, we compute the variance inflation factor for each variable.The Variance Inflation Factor (VIF) is used to identify multicollinearity. It determines how much the variance of the computed regression coefficients increases as the predictor variables are connected. To deal with multicollinearity we have dropped the variables having extremely high multicollinearity before proceeding further. Subsequently, after removing values with high VIF we were left with 18 variables. IV. Results Implementation of Model In model building we have deployed Decision Tree Classifier and Random Forest Classifier using python. The model comprised 18 variables as classifiers with a single dependent variable ensuring a good analysis. The model for decision tree created using train test split with 70% data used for train and remaining 30% data used for test purpose. The model's accuracy (the percentage of accurately predicted instances out of total occurrences) was 0.854. Precision, or the proportion of properly predicted positive instances out of every case forecasted to be positive by the algorithm, was 0.853. Another key metric was the proportion of accurately anticipated positive events out of all actual positive instances (recall = 0.854). Finally, the F1-Score of 0.847 served as a harmonic mean of precision and recall, offering a balanced metric that takes into account both values, making it appropriate for Decision Tree classification with imbalanced datasets. A contingency table in the context of a decision tree classifier is a table that showcases the counts of actual versus predicted classes, allowing the evaluation of the classifier's performance Feature importance refers to a technique used in machine learning that quantifies the impact or relevance of each input feature in a predictive model's output, aiding in identifying the most influential factors for the model's predictions. Power Industry CO 2 per capita, other sectors CO 2 per capita, transport CO 2 per capita, building CO 2 per capita including life Expectancy are the most influential features of this Decision Tree model. Confusion Matrix: The CO 2 emissions value have been classified into 4 categories accordingly 0-2.3, 2.3-5, 5.1–10 and 10 and above. variable selection criterion in Decision Trees can be done via two approaches:1. Entropy and Information Gain 2. Gini Index. The entropy of a random variable is the mean degree of "data", "surprise", or "unpredictability contained in the statistic's probable results. In the context of Decision Trees, entropy is a measure of disorder or impurity in a node. Leaf nodes which have all instances belonging to 1 class would have an entropy of 0. Whereas, the entropy for a node where the classes are divided equally would be 1. Entropy is measured as: Entropy=−∑ n i=1 p i ⋅log 2 (p i ) The Gini Index, also known as Impurity, calculates the risk of a randomly selected instance being misclassified. The lower the Gini Index, the smaller the risk of misclassification. The formula for Gini Impurity: Gini Impurity = 1−∑ n i=1 p 2 i The highest impurity for the Gini index is 0.5, while the maximum purity for Entropy is 0. The node with the lowest Gini index is selected. Random Forest: Model with Train- Test split Accuracy Feature Importance: We have implemented hyper parameter tuning with 6 estimators and depth of the tree as 7 ROC Curve: Confusion Matrix: The CO 2 emissions value have been classified into 4 categories accordingly 0-2.3,2.3-5,5.1–10 and 10 and above. There were 47 True positives which are correctly classified. in decision trees in comparison with 45 True Positives of Random Forest. We can perform pruning to decrease misclassification and in RF we have techniques like boosting and bagging. In DT we achieved high accuracy without pruning but in RF we needed k fold for best outcome We have implemented k- fold cross validation as we encountered minor class imbalance observed in the data. V. Conclusion As observed in the evaluation metrics, decision tree gave an accuracy of 85% without pruning whereas Random Forest gave an accuracy of 81.8% through pruning and k - fold cross validation. Error for decision tree is 0.404 and for Random Forest is 0.426. Also, the misclassification rate of Random Forest is less by 1 percent as compared to Decision Tree. In decision tree we can perform pruning to decrease misclassification and in RF we have techniques like boosting and bagging. Through our research with the help of both the Machine Learning Algorithms we identified the major contributors to CO 2 emissions for all the continents which can be used to mitigate the further irreversible effects of climate change and long-term impact of the same. Declarations Author Contribution Nikita is primary author.Anup has done coding.Kunal has done visualization.Devashish has written paper.All authors reviewed the manuscript. References In a case study of Guangzhou, China, decision tree analysis was used to uncover factors influencing inhabitants' CO2 emissions from various types of excursions - https://www.sciencedirect.com/science/article/abs/pii/S0959652620341160 Using Machine Learning to Study the Factors Affecting CO2 Emissions: https://www.frontiersin.org/articles/10.3389/fenvs.2021.721517/full Developing an interpretable framework for projecting energy usage and CO2 emissions- https://www.sciencedirect.com/science/article/pii/S0306261922014209 Applying machine learning to forecast carbon dioxide emissions in Turkey. - https://www.inderscienceonline.com/doi/pdf/10.1504/IJGW.2022.126669 A cross-country investigation of the role of the service sector in the relationship between CO2 emissions and economic growth, using machine learning techniques. - https://www.inderscienceonline.com/doi/epdf/10.1504/IJSE.2022.125979 Dynamical impacts of broad built environments on CO2 emissions from travelling: https://www.sciencedirect.com/science/article/abs/pii/S1361920923001335 Data-driven strategy to build a CO2 emission baseline for a multi-family residential estate using data mining approaches https://www.sciencedirect.com/science/article/abs/pii/S1364032120307838 Using various prediction methods to anticipate CO2 emissions from paddy crops in India. https://link.springer.com/article/10.1007/s11356-021-17487-2 Exploring the nonlinear and asymmetric influences of built environment on CO2 emission of ride-hailing trips - https://www.sciencedirect.com/science/article/abs/pii/S0195925521001414 The roles of economic growth and health expenditure on CO2 emissions in selected Asian countries: a quantile regression model approach - https://link.springer.com/article/10.1007/s11356-021-13639-6 Development of regression models to forecast the CO2 emissions from fossil fuels in the BRICS and MINT countries - https://www.sciencedirect.com/science/article/abs/pii/S0360544222025361 Investigating drivers of CO2 emission in China’s heavy industry: A quantile regression analysis - https://www.sciencedirect.com/science/article/abs/pii/S0360544220312664 Conduct economical strategy uncertainty and geopolitical risk increase CO2 emissions? New findings from the group quantile regression method - https://link.springer.com/article/10.1007/s11356-021-17707-9 Impact of trade in and out on carbon emissions in 7 ASEAN nations using a panel quantile regression technique. https://www.sciencedirect.com/science/article/abs/pii/S0048969719325732 Additional Declarations No competing interests reported. Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-4466189","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":311966226,"identity":"26ed04b7-4bbf-47f1-bc0e-baa04530944c","order_by":0,"name":"Nikita Ramrakhiani","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAABGklEQVRIie2RMUvDQBTH3xG4LAldL4v5Chcy1EBLv0qDEEdXB5GTwmWqbpJ8iyuC6HZy0MnSWbI0FJzjUhSCeAUriLnMDvdb7vHufrw/7wAslv+Ig0EeanTFiO44hwZiRkX+VvC0XwEMP2O0sj882ptr6OJUNRdwNrhd1W/lw3E4dL0daVoIB8zhmw4lmWEp5RKSojqNg8UziR5n/n1QcogKifKueVS5TEoMFKoMgpoTJJR/F/sMkADEiVH5BBpWmfOhlYlQ3jZuW5iYFR3siQOlVYaDBSepVtBWLyXtUaZydU1opJWk5OREKBzVc10UyqCsl3FzvhvRIx3sZc4vx2KtNvK9HY1v8vy1S/nm79X+T83vLRaLxdLPF2jMZqCnLsrrAAAAAElFTkSuQmCC","orcid":"","institution":"Vivekanand Education Society’s Business School","correspondingAuthor":true,"prefix":"","firstName":"Nikita","middleName":"","lastName":"Ramrakhiani","suffix":""},{"id":311966227,"identity":"bcfe3c14-0898-4e07-b60a-a16209db1a94","order_by":1,"name":"Devashish Chitale","email":"","orcid":"","institution":"Vivekanand Education Society’s Business School","correspondingAuthor":false,"prefix":"","firstName":"Devashish","middleName":"","lastName":"Chitale","suffix":""},{"id":311966228,"identity":"5c555ad0-0012-4217-8e07-3a97bbd69efa","order_by":2,"name":"Anup Kukreja","email":"","orcid":"","institution":"Vivekanand Education Society’s Business School","correspondingAuthor":false,"prefix":"","firstName":"Anup","middleName":"","lastName":"Kukreja","suffix":""},{"id":311966229,"identity":"e6dcfd2d-38c8-4997-b2b2-2bded41c5ef1","order_by":3,"name":"Kunal Mhetre","email":"","orcid":"","institution":"Vivekanand Education Society’s Business School","correspondingAuthor":false,"prefix":"","firstName":"Kunal","middleName":"","lastName":"Mhetre","suffix":""}],"badges":[],"createdAt":"2024-05-23 10:30:37","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-4466189/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-4466189/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":58255585,"identity":"7c822516-b3ce-410a-8aae-a530ba3a7d7d","added_by":"auto","created_at":"2024-06-13 05:03:08","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":30317,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cem\u003eStructure of Decision Tree\u003c/em\u003e\u003c/p\u003e","description":"","filename":"image1.png","url":"https://assets-eu.researchsquare.com/files/rs-4466189/v1/983c1b62d7759484392a574a.png"},{"id":58256669,"identity":"5257f352-a9bb-4780-b5a4-82a5ad6c45e8","added_by":"auto","created_at":"2024-06-13 05:19:08","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":86780,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cem\u003eBoxplot comparison of minimum wage\u003c/em\u003e\u003c/p\u003e","description":"","filename":"image2.png","url":"https://assets-eu.researchsquare.com/files/rs-4466189/v1/ee6bce5e9c22d341cd674cf3.png"},{"id":58255586,"identity":"e00a9a24-6f5c-418a-b08c-0093b7f5eeee","added_by":"auto","created_at":"2024-06-13 05:03:08","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":81976,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cem\u003eBoxplot Comparison of Tax revenue %\u003c/em\u003e\u003c/p\u003e","description":"","filename":"image3.png","url":"https://assets-eu.researchsquare.com/files/rs-4466189/v1/af13082ff7b7f5321b427830.png"},{"id":58255587,"identity":"aa4ade7f-c981-44c1-81c1-9ce927b47a45","added_by":"auto","created_at":"2024-06-13 05:03:08","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":67236,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cem\u003eStandard of living as per CO\u003c/em\u003e\u003csub\u003e\u003cem\u003e2\u003c/em\u003e\u003c/sub\u003e\u003c/p\u003e","description":"","filename":"image4.png","url":"https://assets-eu.researchsquare.com/files/rs-4466189/v1/a905174159c261aecbf45945.png"},{"id":58255588,"identity":"dad7650b-ac53-4cf1-acde-39f56476003c","added_by":"auto","created_at":"2024-06-13 05:03:08","extension":"png","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":81115,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cem\u003eFinancial overview of the continent as per CO\u003c/em\u003e\u003csub\u003e\u003cem\u003e2\u003c/em\u003e\u003c/sub\u003e\u003c/p\u003e","description":"","filename":"image5.png","url":"https://assets-eu.researchsquare.com/files/rs-4466189/v1/e9b75ebf7f65a7fb48cda8b1.png"},{"id":58256672,"identity":"4bfd5e39-d19b-40ac-990d-47b05707bd87","added_by":"auto","created_at":"2024-06-13 05:19:09","extension":"png","order_by":6,"title":"Figure 6","display":"","copyAsset":false,"role":"figure","size":55127,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cem\u003eVariance Inflation Factor - all classifiers\u003c/em\u003e\u003c/p\u003e","description":"","filename":"image6.png","url":"https://assets-eu.researchsquare.com/files/rs-4466189/v1/2adad4bb3d75570fa24b75bb.png"},{"id":58255593,"identity":"f68bd180-3f70-494a-8021-57527ee1a4ad","added_by":"auto","created_at":"2024-06-13 05:03:09","extension":"png","order_by":7,"title":"Figure 7","display":"","copyAsset":false,"role":"figure","size":147925,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cem\u003eDecision tree model\u003c/em\u003e\u003c/p\u003e","description":"","filename":"image7.png","url":"https://assets-eu.researchsquare.com/files/rs-4466189/v1/1bb5e25049b10425b2ca8c71.png"},{"id":58256670,"identity":"e50f4385-d590-4465-896c-eacf7d5d6480","added_by":"auto","created_at":"2024-06-13 05:19:08","extension":"png","order_by":8,"title":"Figure 8","display":"","copyAsset":false,"role":"figure","size":321959,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cem\u003eAccuracy, Precision, Recall \u0026amp; F1 Score\u003c/em\u003e\u003c/p\u003e","description":"","filename":"image8.png","url":"https://assets-eu.researchsquare.com/files/rs-4466189/v1/f4f6b80461424f9e1dc9a882.png"},{"id":58255589,"identity":"97f7fb95-4aa2-40bc-897e-647a4337e38b","added_by":"auto","created_at":"2024-06-13 05:03:08","extension":"png","order_by":9,"title":"Figure 9","display":"","copyAsset":false,"role":"figure","size":32912,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cem\u003eErrors of Decision Tree Model\u003c/em\u003e\u003c/p\u003e","description":"","filename":"image9.png","url":"https://assets-eu.researchsquare.com/files/rs-4466189/v1/ad873d7acd780cffeb6c4807.png"},{"id":58255600,"identity":"1d10554a-4733-418a-b4ab-525f586a2d70","added_by":"auto","created_at":"2024-06-13 05:03:09","extension":"png","order_by":10,"title":"Figure 10","display":"","copyAsset":false,"role":"figure","size":101801,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cem\u003eContingency table of Decision Tree Model\u003c/em\u003e\u003c/p\u003e","description":"","filename":"image10.png","url":"https://assets-eu.researchsquare.com/files/rs-4466189/v1/5b07f0092710f2f1108c1d12.png"},{"id":58255603,"identity":"094aae5c-44da-4428-8c7f-a06ad6d920b0","added_by":"auto","created_at":"2024-06-13 05:03:10","extension":"png","order_by":11,"title":"Figure 11","display":"","copyAsset":false,"role":"figure","size":250768,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cem\u003eFeature importances for decision tree\u003c/em\u003e\u003c/p\u003e","description":"","filename":"image11.png","url":"https://assets-eu.researchsquare.com/files/rs-4466189/v1/cc45e88891869841293e4443.png"},{"id":58255592,"identity":"7fd5b6f5-1993-45a6-b46e-bb5a565c6ea4","added_by":"auto","created_at":"2024-06-13 05:03:09","extension":"png","order_by":12,"title":"Figure 12","display":"","copyAsset":false,"role":"figure","size":74294,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cem\u003eConfusion matrix\u003c/em\u003e\u003c/p\u003e","description":"","filename":"image12.png","url":"https://assets-eu.researchsquare.com/files/rs-4466189/v1/7e1f3214837f891f22a89ebd.png"},{"id":58255598,"identity":"391b4e15-dbda-4c5b-9e36-73b42a9670ef","added_by":"auto","created_at":"2024-06-13 05:03:09","extension":"png","order_by":13,"title":"Figure 13","display":"","copyAsset":false,"role":"figure","size":308692,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cem\u003eReceiver Operating Characteristic (ROC) curve\u003c/em\u003e\u003c/p\u003e","description":"","filename":"image13.png","url":"https://assets-eu.researchsquare.com/files/rs-4466189/v1/a51b93269b027176dbc41e0a.png"},{"id":58255606,"identity":"178c9106-a00d-4b7c-906d-f06995ee73eb","added_by":"auto","created_at":"2024-06-13 05:03:10","extension":"png","order_by":14,"title":"Figure 14","display":"","copyAsset":false,"role":"figure","size":167969,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cem\u003eDecision Tree performance for different criteria\u003c/em\u003e\u003c/p\u003e","description":"","filename":"image14.png","url":"https://assets-eu.researchsquare.com/files/rs-4466189/v1/17c0680913dbda4147df18e5.png"},{"id":58256671,"identity":"dd0eaf07-b90c-467b-a54f-8216eaf75097","added_by":"auto","created_at":"2024-06-13 05:19:09","extension":"png","order_by":15,"title":"Figure 15","display":"","copyAsset":false,"role":"figure","size":152094,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cem\u003eTrain-Test split for RF\u003c/em\u003e\u003c/p\u003e","description":"","filename":"image15.png","url":"https://assets-eu.researchsquare.com/files/rs-4466189/v1/ab20a0f92a3b2a627aba5d6c.png"},{"id":58256677,"identity":"c7881d42-5fcc-4da5-9ed1-03335dc49b85","added_by":"auto","created_at":"2024-06-13 05:19:10","extension":"png","order_by":16,"title":"Figure 16","display":"","copyAsset":false,"role":"figure","size":94049,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cem\u003eCode for contingency table\u003c/em\u003e\u003c/p\u003e","description":"","filename":"image16.png","url":"https://assets-eu.researchsquare.com/files/rs-4466189/v1/8ec740b8939b3ab589f0e71a.png"},{"id":58255595,"identity":"ac0c377e-baf3-41cf-ba95-a57c440db05e","added_by":"auto","created_at":"2024-06-13 05:03:09","extension":"png","order_by":17,"title":"Figure 17","display":"","copyAsset":false,"role":"figure","size":111355,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cem\u003eContingency table of random forest model\u003c/em\u003e\u003c/p\u003e","description":"","filename":"image17.png","url":"https://assets-eu.researchsquare.com/files/rs-4466189/v1/1a9c9bd9e0beaed89ce5b8f2.png"},{"id":58256674,"identity":"8c3dbaa1-1a4c-481a-b071-51d26dd54253","added_by":"auto","created_at":"2024-06-13 05:19:09","extension":"png","order_by":18,"title":"Figure 18","display":"","copyAsset":false,"role":"figure","size":80281,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cem\u003eAccuracy of random forest model\u003c/em\u003e\u003c/p\u003e","description":"","filename":"image18.png","url":"https://assets-eu.researchsquare.com/files/rs-4466189/v1/a0d6698b914aca1064064aab.png"},{"id":58255601,"identity":"591513dd-d69c-40e9-94d6-cb26319eb003","added_by":"auto","created_at":"2024-06-13 05:03:10","extension":"png","order_by":19,"title":"Figure 19","display":"","copyAsset":false,"role":"figure","size":46599,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cem\u003eError of random forest model\u003c/em\u003e\u003c/p\u003e","description":"","filename":"image19.png","url":"https://assets-eu.researchsquare.com/files/rs-4466189/v1/4ea59083ab7d42d2ef485a79.png"},{"id":58256676,"identity":"7319ce7f-9b02-468f-afd0-9156fd3badc8","added_by":"auto","created_at":"2024-06-13 05:19:09","extension":"png","order_by":20,"title":"Figure 20","display":"","copyAsset":false,"role":"figure","size":224824,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cem\u003eFeature importance of Random Forest model\u003c/em\u003e\u003c/p\u003e","description":"","filename":"image20.png","url":"https://assets-eu.researchsquare.com/files/rs-4466189/v1/f298bbb7a5020cedaa20ec78.png"},{"id":58257416,"identity":"3355b405-988d-4077-a4ec-9670d82da029","added_by":"auto","created_at":"2024-06-13 05:27:09","extension":"png","order_by":21,"title":"Figure 21","display":"","copyAsset":false,"role":"figure","size":154807,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cem\u003eRandom forest performance for different criteria\u003c/em\u003e\u003c/p\u003e","description":"","filename":"image21.png","url":"https://assets-eu.researchsquare.com/files/rs-4466189/v1/5b52cf407386d4bccfe19e85.png"},{"id":58255605,"identity":"6cb16934-d8df-44a6-9600-f46b683a9fdb","added_by":"auto","created_at":"2024-06-13 05:03:10","extension":"png","order_by":22,"title":"Figure 22","display":"","copyAsset":false,"role":"figure","size":176549,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cem\u003eReceiver Operating Characteristic (ROC) Curve\u003c/em\u003e\u003c/p\u003e","description":"","filename":"image22.png","url":"https://assets-eu.researchsquare.com/files/rs-4466189/v1/eaab60565bfeed4a790e052c.png"},{"id":58255607,"identity":"cce3e398-986b-484a-acb7-882c6062c8af","added_by":"auto","created_at":"2024-06-13 05:03:10","extension":"png","order_by":23,"title":"Figure 23","display":"","copyAsset":false,"role":"figure","size":72244,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cem\u003eConfusion matrix of RF\u003c/em\u003e\u003c/p\u003e","description":"","filename":"image23.png","url":"https://assets-eu.researchsquare.com/files/rs-4466189/v1/0d5678ad005c147394a0ccc4.png"},{"id":58257417,"identity":"4f9fb77f-3e3b-4b64-823a-1dabff67d1c6","added_by":"auto","created_at":"2024-06-13 05:27:16","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":3315897,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-4466189/v1/f0fd875d-0732-4d90-83bf-d1db66e22f08.pdf"}],"financialInterests":"No competing interests reported.","formattedTitle":"\u003cp\u003eDiscerning Sustainability: Analyzing Asia's Greenhouse Gas Emissions Through AI\u003c/p\u003e","fulltext":[{"header":"I. Introduction","content":"\u003cp\u003eGreenhouse gas emissions have a significant impact on humanity and the world\u0026rsquo;s ecological balance. Currently rising greenhouse emissions are a major concern globally. The CO\u003csub\u003e2\u003c/sub\u003e emissions are primarily responsible for global climate change among the Kyoto gasses. In the year 2020 the CO\u003csub\u003e2\u003c/sub\u003e emissions recorded were 50.1 Gigatons for the entire planet[19]. Asia accounts for more than 50% of these carbon emissions. In the year 2020 CO\u003csub\u003e2\u003c/sub\u003e emissions produced by Asia are 34.2 Giga tons.\u003c/p\u003e \u003cp\u003eFrom previous studies we can infer that the developing nations tend to generate more CO\u003csub\u003e2\u003c/sub\u003e as compared to other countries. The increase in CO\u003csub\u003e2\u003c/sub\u003e emissions have led to climate change disrupting the weather patterns leading to increase in natural disasters. Prolonged exposure to emission may lead to respiratory disorders exacerbating conditions like asthma and bronchitis. Furthermore, CO\u003csub\u003e2\u003c/sub\u003e emissions affect crop yields in turn reducing the nutritional value of food impacting quality of life.\u003c/p\u003e \u003cp\u003eToday the world is moving towards sustainability and progressing to become carbon efficient. Artificial Intelligence plays a crucial role in curbing CO\u003csub\u003e2\u003c/sub\u003e emissions with innovative techniques such as energy optimization, predictive industrial maintenance, supply chain optimization and climate prediction and risk analysis.\u003c/p\u003e \u003cp\u003eThis study we intend to focus upon the parameter leading to CO\u003csub\u003e2\u003c/sub\u003e emissions and classifying them into sustainability benchmarks. It will provide us clarity upon the factors tending to have greater emissions caused by the countries which are necessary to mitigate. The study revolves around numerous parameters that result in emission such as Population, Armed Forces Size, Forested Area, Power Sector (CO\u003csub\u003e2\u003c/sub\u003e) emissions, other sector combustion (CO\u003csub\u003e2\u003c/sub\u003e) emissions etc.\u003c/p\u003e \u003cp\u003eThe study consists of an analytical framework dedicated to analyses of the CO\u003csub\u003e2\u003c/sub\u003e emissions based on defined parameters. The defined variables are encompassing giving a vivid idea about the effects of CO\u003csub\u003e2\u003c/sub\u003e emissions on climate consequently affecting the earth\u0026rsquo;s ecological balance.\u003c/p\u003e"},{"header":"II. Research Methodology","content":"\u003cp\u003eSeveral studies have used techniques such as Linear Regression, Artificial Neural Networks to estimate CO\u003csub\u003e2\u003c/sub\u003e emissions. Our primary objective is to classify countries on the basis of emission levels. We decided to implement Decision Tree and Random Forest Classifiers.\u003c/p\u003e \u003cp\u003eWith nodes for features, branches for decision rules, and leaves for class labels or final judgements, the Decision Tree Classifier builds a structure like a tree. The root, internal, and leaf nodes make up its three primary nodes. The root node first divides into what are referred to be internal nodes. The interior nodes in the model represent the data properties, whereas the leaf nodes stand for the final decision. \u003c/p\u003e \u003cp\u003eHyperparameter Tuning means picking the best settings for a machine learning method. We have to figure out what needs changing, test out different options, and find the best combo that works well for new data.\u003c/p\u003e \u003cp\u003eSimilarly, Random Forest as the name suggests generates multiple decision trees using random subsets of parameters. These trees are formed by repeatedly partitioning the data into subsets based on different features. Each tree in the Random Forest is built using a random subset of the features, providing a diverse set of trees that are less correlated with one another.K-fold cross-validation is a way to test how good a machine learning model is. It works by dividing the data into 'k' parts and using each part as a test at different times. This is useful when you don't have much data. It ensures that all data is used for both training and testing, giving a more accurate picture of how well the model performs with new data. It also helps to detect problems such as when the model memorises the training data too much (overfitting) or does not learn enough from it (underfitting). Overall, it gives a truer picture of how well the model is.\u003c/p\u003e"},{"header":"III. Exploratory Data Analysis","content":"\u003cp\u003eData Cleaning: EDA is an important procedure for comprehending a dataset with several elements. It helps to understand the structure, characteristics and potential issues within the data.\u003c/p\u003e \u003cp\u003eIt involves descriptive analysis of the extensive parameters required for understanding CO\u003csub\u003e2\u003c/sub\u003e emissions. The Original dataset had to be processed for missing values, data points and other metric conversions. The missing values in the factors were handled as they are a smaller number of records in the data with numerous factors.\u003c/p\u003e \u003cp\u003eThe Original dataset utilized for study comprises 195 rows and 47 columns. It is a combination of two datasets that is from Kaggle and Emissions Database for Global Atmospheric Research. The total null values present in the data were 12% of the total values. The data utilized is for the year 2020 consisting approximately of all the countries of the world. Parameters such as Land Area, Energy Sector emissions, Forested Area, Population, CO\u003csub\u003e2\u003c/sub\u003e emissions and other socio-economic indicators are considered. We have used regression as treatment for null values, as with limited records we avoid deleting them. This can be particularly helpful when the missing data is not completely random and shows some correlation with other variables in the dataset avoiding loss of information.\u003c/p\u003e \u003cp\u003eThe above boxplot represents the data distribution before \u0026amp; after treatment of null values using linear regression, showing the initial spread and central tendency of the dataset revealing any potential changes in the spread, outliers, or central tendencies resulting from the linear regression imputation of null values.\u003c/p\u003e \u003cp\u003eWe also attempted to compare the CO\u003csub\u003e2\u003c/sub\u003e emissions level with different socio-economic factors across continents to get a basic understanding with CO\u003csub\u003e2\u003c/sub\u003e..\u003c/p\u003e \u003cp\u003eHere we can infer that there is no definite relationship between CO\u003csub\u003e2\u003c/sub\u003e emissions and social factors.\u003c/p\u003e \u003cp\u003eHere, we can see that CO\u003csub\u003e2\u003c/sub\u003e emissions have some relationship with factors like tax revenue, tax rate, and unemployment rate.\u003c/p\u003e \u003cp\u003eAfter completing the EDA, we used correlation to look for any associations between distinct components. Using correlation, we discovered a high correlation between some variables, which may indicate multicollinearity. Multicollinearity makes it difficult to determine each predictor's independent effect on the target variable. To determine multicollinearity, we compute the variance inflation factor for each variable.The Variance Inflation Factor (VIF) is used to identify multicollinearity. It determines how much the variance of the computed regression coefficients increases as the predictor variables are connected. \u003c/p\u003e\u003cp\u003eTo deal with multicollinearity we have dropped the variables having extremely high multicollinearity before proceeding further. Subsequently, after removing values with high VIF we were left with 18 variables.\u003c/p\u003e "},{"header":"IV. Results","content":"\u003cp\u003eImplementation of Model\u003c/p\u003e\n\u003cp\u003eIn model building we have deployed Decision Tree Classifier and Random Forest Classifier using python. The model comprised 18 variables as classifiers with a single dependent variable ensuring a good analysis.\u003c/p\u003e\n\u003cp\u003eThe model for decision tree created using train test split with 70% data used for train and remaining 30% data used for test purpose.\u003c/p\u003e\n\u003cp\u003eThe model\u0026apos;s accuracy (the percentage of accurately predicted instances out of total occurrences) was 0.854. Precision, or the proportion of properly predicted positive instances out of every case forecasted to be positive by the algorithm, was 0.853. Another key metric was the proportion of accurately anticipated positive events out of all actual positive instances (recall\u0026thinsp;=\u0026thinsp;0.854). Finally, the F1-Score of 0.847 served as a harmonic mean of precision and recall, offering a balanced metric that takes into account both values, making it appropriate for Decision Tree classification with imbalanced datasets.\u003c/p\u003e\n\u003cp\u003eA contingency table in the context of a decision tree classifier is a table that showcases the counts of actual versus predicted classes, allowing the evaluation of the classifier\u0026apos;s performance\u003c/p\u003e\n\u003cp\u003eFeature importance refers to a technique used in machine learning that quantifies the impact or relevance of each input feature in a predictive model\u0026apos;s output, aiding in identifying the most influential factors for the model\u0026apos;s predictions. Power Industry CO\u003csub\u003e2\u003c/sub\u003e per capita, other sectors CO\u003csub\u003e2\u003c/sub\u003e per capita, transport CO\u003csub\u003e2\u003c/sub\u003e per capita, building CO\u003csub\u003e2\u003c/sub\u003e per capita including life Expectancy are the most influential features of this Decision Tree model.\u003c/p\u003e\n\u003cp\u003eConfusion Matrix: The CO\u003csub\u003e2\u003c/sub\u003e emissions value have been classified into 4 categories accordingly 0-2.3, 2.3-5, 5.1\u0026ndash;10 and 10 and above.\u003c/p\u003e\n\u003cp\u003evariable selection criterion in Decision Trees can be done via two approaches:1. Entropy and Information Gain 2. Gini Index. The entropy of a random variable is the mean degree of \u0026quot;data\u0026quot;, \u0026quot;surprise\u0026quot;, or \u0026quot;unpredictability contained in the statistic\u0026apos;s probable results. In the context of Decision Trees, entropy is a measure of disorder or impurity in a node. Leaf nodes which have all instances belonging to 1 class would have an entropy of 0. Whereas, the entropy for a node where the classes are divided equally would be 1.\u003c/p\u003e\n\u003cp\u003eEntropy is measured as:\u003c/p\u003e\n\u003cp\u003eEntropy=\u0026minus;\u0026sum;\u003csup\u003en\u003c/sup\u003e \u003csub\u003ei=1\u003c/sub\u003e p\u003csub\u003ei\u003c/sub\u003e \u0026sdot;log\u003csub\u003e2\u003c/sub\u003e (p\u003csub\u003ei\u003c/sub\u003e)\u003c/p\u003e\n\u003cp\u003eThe Gini Index, also known as Impurity, calculates the risk of a randomly selected instance being misclassified. The lower the Gini Index, the smaller the risk of misclassification.\u003c/p\u003e\n\u003cp\u003eThe formula for Gini Impurity:\u003c/p\u003e\n\u003cp\u003eGini Impurity\u0026thinsp;=\u0026thinsp;1\u0026minus;\u0026sum;\u003csup\u003en\u003c/sup\u003e \u003csub\u003ei=1\u003c/sub\u003e p\u003csup\u003e2\u003c/sup\u003e \u003csub\u003ei\u003c/sub\u003e\u003c/p\u003e\n\u003cp\u003eThe highest impurity for the Gini index is 0.5, while the maximum purity for Entropy is 0. The node with the lowest Gini index is selected.\u003c/p\u003e\n\u003cp\u003eRandom Forest: Model with Train- Test split\u003c/p\u003e\n\u003cp\u003eAccuracy\u003c/p\u003e\n\u003cp\u003eFeature Importance:\u003c/p\u003e\n\u003cp\u003eWe have implemented hyper parameter tuning with 6 estimators and depth of the tree as 7\u003c/p\u003e\n\u003cp\u003eROC Curve:\u003c/p\u003e\n\u003cp\u003eConfusion Matrix: The CO\u003csub\u003e2\u003c/sub\u003e emissions value have been classified into 4 categories accordingly 0-2.3,2.3-5,5.1\u0026ndash;10 and 10 and above.\u003c/p\u003e\n\u003cp\u003eThere were 47 True positives which are correctly classified. in decision trees in comparison with 45 True Positives of Random Forest. We can perform pruning to decrease misclassification and in RF we have techniques like boosting and bagging. In DT we achieved high accuracy without pruning but in RF we needed k fold for best outcome\u003c/p\u003e\n\u003cp\u003eWe have implemented k- fold cross validation as we encountered minor class imbalance observed in the data.\u003c/p\u003e"},{"header":"V. Conclusion","content":"\u003cp\u003eAs observed in the evaluation metrics, decision tree gave an accuracy of 85% without pruning whereas Random Forest gave an accuracy of 81.8% through pruning and k - fold cross validation. Error for decision tree is 0.404 and for Random Forest is 0.426. Also, the misclassification rate of Random Forest is less by 1 percent as compared to Decision Tree. In decision tree we can perform pruning to decrease misclassification and in RF we have techniques like boosting and bagging.\u003c/p\u003e \u003cp\u003eThrough our research with the help of both the Machine Learning Algorithms we identified the major contributors to CO\u003csub\u003e2\u003c/sub\u003e emissions for all the continents which can be used to mitigate the further irreversible effects of climate change and long-term impact of the same.\u003c/p\u003e"},{"header":"Declarations","content":"\u003ch2\u003eAuthor Contribution\u003c/h2\u003e\u003cp\u003eNikita is primary author.Anup has done coding.Kunal has done visualization.Devashish has written paper.All authors reviewed the manuscript.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\n\u003cli\u003eIn a case study of Guangzhou, China, decision tree analysis was used to uncover factors influencing inhabitants\u0026apos; CO2 emissions from various types of excursions - https://www.sciencedirect.com/science/article/abs/pii/S0959652620341160\u003c/li\u003e\n\u003cli\u003eUsing Machine Learning to Study the Factors Affecting CO2 Emissions: https://www.frontiersin.org/articles/10.3389/fenvs.2021.721517/full\u003c/li\u003e\n\u003cli\u003eDeveloping an interpretable framework for projecting energy usage and CO2 emissions- https://www.sciencedirect.com/science/article/pii/S0306261922014209\u003c/li\u003e\n\u003cli\u003eApplying machine learning to forecast carbon dioxide emissions in Turkey. - https://www.inderscienceonline.com/doi/pdf/10.1504/IJGW.2022.126669\u003c/li\u003e\n\u003cli\u003eA cross-country investigation of the role of the service sector in the relationship between CO2 emissions and economic growth, using machine learning techniques. - https://www.inderscienceonline.com/doi/epdf/10.1504/IJSE.2022.125979\u003c/li\u003e\n\u003cli\u003eDynamical impacts of broad built environments on CO2 emissions from travelling: https://www.sciencedirect.com/science/article/abs/pii/S1361920923001335\u003c/li\u003e\n\u003cli\u003eData-driven strategy to build a CO2 emission baseline for a multi-family residential estate using data mining approaches https://www.sciencedirect.com/science/article/abs/pii/S1364032120307838\u003c/li\u003e\n\u003cli\u003eUsing various prediction methods to anticipate CO2 emissions from paddy crops in India. https://link.springer.com/article/10.1007/s11356-021-17487-2\u003c/li\u003e\n\u003cli\u003eExploring the nonlinear and asymmetric influences of built environment on CO2 emission of ride-hailing trips - https://www.sciencedirect.com/science/article/abs/pii/S0195925521001414\u003c/li\u003e\n\u003cli\u003eThe roles of economic growth and health expenditure on CO2 emissions in selected Asian countries: a quantile regression model approach - https://link.springer.com/article/10.1007/s11356-021-13639-6\u003c/li\u003e\n\u003cli\u003eDevelopment of regression models to forecast the CO2 emissions from fossil fuels in the BRICS and MINT countries - https://www.sciencedirect.com/science/article/abs/pii/S0360544222025361\u003c/li\u003e\n\u003cli\u003eInvestigating drivers of CO2 emission in China\u0026rsquo;s heavy industry: A quantile regression analysis - https://www.sciencedirect.com/science/article/abs/pii/S0360544220312664\u003c/li\u003e\n\u003cli\u003eConduct economical strategy uncertainty and geopolitical risk increase CO2 emissions? New findings from the group quantile regression method - https://link.springer.com/article/10.1007/s11356-021-17707-9\u003c/li\u003e\n\u003cli\u003eImpact of trade in and out on carbon emissions in 7 ASEAN nations using a panel quantile regression technique. https://www.sciencedirect.com/science/article/abs/pii/S0048969719325732\u0026amp;nbsp;\u003c/li\u003e\n\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":true,"hideJournal":true,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"Environmental Analytics, Artificial Intelligence, Greenhouse Gas Emissions, Decision Tree, Predictive Modelling","lastPublishedDoi":"10.21203/rs.3.rs-4466189/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-4466189/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eThe escalating global apprehension regarding climate change and its consequences on the environment have incited extensive inquiry into the origins and repercussions of greenhouse gas emissions, particularly carbon dioxide (CO\u003csub\u003e2\u003c/sub\u003e) emissions. Amongst the distinct geographical regions worldwide, Asia is a substantial contributor to these emissions, necessitating an exhaustive examination of its constituent elements which encompass land utilization, industrial undertakings, and various other demographic factors. This instigates the basis for our research with the intent of exploring multifaceted dimensions of CO\u003csub\u003e2\u003c/sub\u003e emissions in Asian countries, with a pronounced emphasis on emissions stemming from environmental Indicators. The inherent objective of our research is to stratify the per capita CO\u003csub\u003e2\u003c/sub\u003e emissions of these nations into discrete categories predicated on sustainability benchmarks.\u003c/p\u003e \u003cp\u003eIn this research, we work with AI algorithms like Decision tree, Random Forest and logistic regression, to ascertain and substantiate the classifications which can be categorized with regards to their CO\u003csub\u003e2\u003c/sub\u003e emissions. The amalgamation of these algorithms with data visualization tools like Tableau and Power BI contribute towards identifying existing patterns and add a dynamic edge to our model. The aim of the research is to analyze the cause of disparities between CO\u003csub\u003e2\u003c/sub\u003e emissions and develop insights to reduce or mitigate the effects of emissions on the environment while maintaining industrial development \u0026amp; quality of life. Unveiling critical discernments into the several factors that wield influence over CO\u003csub\u003e2\u003c/sub\u003e emissions to mitigate the effects is a crucial need today. The findings also identify the leading contributors among Asian nations in terms of CO\u003csub\u003e2\u003c/sub\u003e emissions.\u003c/p\u003e \u003cp\u003eThe key focus of this research is about the significance of adopting sustainable methodologies to curtail CO\u003csub\u003e2\u003c/sub\u003e emissions within Asian countries while maintaining a balance with their overall development. This endeavor not only distinguishes the nations with the loftiest emissions but also Gives input pertaining to assign priority in their efforts to mitigate their environmental footprint. By leveraging the insights emanating from this investigation, policymakers, environmental advocates, and stakeholders can devise strategies to achieve sustainability and alleviate the detrimental ramifications of CO\u003csub\u003e2\u003c/sub\u003e emissions in the region. This research serves as an indispensable initial stride toward a more ecologically aware and sustainable future for Asian nations and the global community at large.\u003c/p\u003e","manuscriptTitle":"Discerning Sustainability: Analyzing Asia's Greenhouse Gas Emissions Through AI","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2024-06-13 05:03:04","doi":"10.21203/rs.3.rs-4466189/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"92eac58b-6f8d-4c9b-9602-01c8235b8e74","owner":[],"postedDate":"June 13th, 2024","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[],"tags":[],"updatedAt":"2024-06-13T05:03:04+00:00","versionOfRecord":[],"versionCreatedAt":"2024-06-13 05:03:04","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-4466189","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-4466189","identity":"rs-4466189","version":["v1"]},"buildId":"qtupq5eGEP_6zYnWcrvyt","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

⚙ Ask this paper AI returns verbatim quotes from the full text · source: preprint-html ⓘ

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2024) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc: last seen: 2026-05-20T01:45:00.602351+00:00