Insights and Analysis of Machine Learning for Benzene Hydrogenation to Cyclohexene | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article Insights and Analysis of Machine Learning for Benzene Hydrogenation to Cyclohexene Chao Sun, Bin Zhang This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-6475690/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract Cyclohexene is an important raw material in the production of nylon. Selective hydrogenation of benzene is a key method for preparing cyclohexene. However, the Ru catalysts used in current industrial processes still face the challenges of high metal usage, high process costs, and low cyclohexene yield. This study utilizes existing literature data combined with machine learning methods to analyze the factors influencing conversion rate, selectivity, and yield in the benzene hydrogenation to cyclohexene reaction and constructs predictive models based on XGBoost and Random Forest algorithms. After analysis, it was found that reaction time, Ru content, and space velocity are key factors influencing reaction yield, selectivity, and conversion rate. Shapley Additive Explanations (SHAP) analysis and feature importance analysis further revealed the contribution of each variable to the reaction outcomes. Additionally, we randomly generated one million variable combinations using the Dirichlet distribution to attempt to predict high-yield catalyst formulations. This paper provides new insights into the application of machine learning in heterogeneous catalysis and offers some reference for further research. Machine learning Heterogeneous catalysis Hydrogenation of benzene XGBoost Figures Figure 1 Figure 2 Figure 3 Introduction Cyclohexene is an important raw material for nylon production. Selective hydrogenation of benzene is a key method for preparing cyclohexene 1 . Compared to the partial hydrogenation of benzene to cyclohexane, the complete hydrogenation of benzene is a thermodynamically more favorable process. However, the challenge in this field lies in controlling the kinetics to improve the yield of cyclohexene. Researchers have gradually enhanced the selectivity for cyclohexene by modifying catalyst structures and reaction processes. Although this technology was industrialized over half a century ago, current industrial processes still rely on metal salts in aqueous systems, high Ru loading in the catalysts, and low cyclohexene yields. Combined with extensive single-factor experiments, traditional methods typically reveal patterns by controlling additives and supports. For benzene hydrogenation, the yield of cyclohexene increased through continuous attempts and adjustments to the mechanistic investigations and catalyst compositions. In 1934, Horiut and colleagues proposed the stepwise hydrogenation mechanism of partial benzene hydrogenation by Group VIII metals 2 . In this mechanism, adsorbed activated benzene and dissociatively adsorbed hydrogen proceed through intermediates such as cyclohexa-1,4-diene and cyclohexene, ultimately producing cyclohexane. Subsequently, Prasad et al. suggested that selective hydrogenation of benzene occurs through two processes. One is the stepwise hydrogenation, where the π electrons of benzene form a σ bond with the unbonded d electrons of the metal, and hydrogen is dissociatively adsorbed 2 , 3 . The adsorbed benzene and dissociatively adsorbed hydrogen undergo stepwise hydrogenation, ultimately producing cyclohexane. The other process is the direct hydrogenation of benzene to cyclohexane, where the planar benzene, adsorbed via its large π bond, interacts with six hydrogen atoms to form a van der Waals-activated complex, leading to the direct formation of cyclohexane. 3 – 5 Generally, under low-pressure conditions, the reaction follows the first process, while under high-pressure situations, the reaction favors the second process. Over the past century, various methods have been attempted to increase the yield of cyclohexene in benzene hydrogenation reactions, such as experimenting with different supports, adding various additives, altering the size and shape of catalysts, and using DFT calculations to predict performance. 1 , 6 – 12 Over the years of research, the system using an aqueous zinc sulfate additive in combination with a Ru catalyst has been shown to improve the selectivity for cyclohexene occasionally; however, the yield of cyclohexene remains limited to around 60%. Machine learning, as an important tool in the development of artificial intelligence, has gradually been applied to various scientific fields in recent years due to its ability to efficiently uncover patterns and correlations between different data. In recent years, machine learning has also attracted the attention of chemical researchers. For instance, Liu et al. used neural networks based on experimental data to reveal the crucial role of reaction conditions, pretreatment methods, and post-reaction iron phase compositions. They showed that controlling the iron phase ratio and adding appropriate promoters can effectively suppress methane selectivity 13 . Shi et al. utilized extensive literature data and combined machine learning algorithms, including random forests and extreme gradient boosting, to successfully build a model for predicting the yield and linear selectivity of 1-octene hydrogenation, which was experimentally validated 14 . Han et al. used machine learning to assist in designing electrocatalytic materials for high-performance lithium-sulfur batteries 15 . The review by Jaison et al. discusses the application of machine learning in the field of photocatalysis, with a particular focus on its role in environmental science. 16 In these fields, machine learning has demonstrated certain potential and has been applied to solve problems in catalysis. However, in traditional thermal catalysis, predicting the conversion rate, selectivity, and yield of reactions using machine learning algorithms presents significant challenges. This is primarily due to the difficulty of quantifying data from experimental processes and achieving consistency across datasets. Although there is a wealth of data left by previous researchers, most of this data is challenging to use as standard datasets for model building. Additionally, thermal catalysis reactions tend to have complex conditions and generally long reaction times, making it difficult to generate a large amount of data through self-conducted experiments. Over the years, considerable data has been accumulated for partial hydrogenation of benzene to cyclohexane. Given this context, we aim to utilize the existing experimental data and employ innovative methods to analyze the factors influencing the conversion rate, selectivity, and yield of cyclohexene in benzene hydrogenation. Specifically, we selected high-quality data with a balanced distribution from numerous literature sources on benzene hydrogenation. After organizing the data into a dataset, we built models using Random Forest and XGBoost algorithms to predict the conversion, selectivity, and yield. Additionally, we seek to predict high-performance catalyst compositions and reaction conditions. Results and discussion 2.1. Construction of databases Before collecting data, it is essential to identify which descriptors are easy to standardize and are important for the reaction. The hydrogenation of benzene has been studied by many researchers over the years, with some efforts focusing on adding less common additives or supports to regulate the yield of a specific product. For our purposes, these uncommon factors could potentially act as interfering variables in the model. Therefore, by reviewing nearly 50 relevant papers, we selected those that fully meet the established descriptors and have a substantial number of data points. This approach ensures the reliability of the original dataset while minimizing interference 8 , 17 – 25 . The descriptors include 25 factors (Table 1 ), such as reaction temperature, reaction pressure, reaction time, preparation method, catalyst composition, etc. It should be noted that some catalysts were not subjected to a reduction process. For these data points, we set their reduction temperature to 298 K to ensure the integrity of the dataset. Detailed information on the dataset is presented in the Supporting Information. Table 1 Variable summary of the dataset. Variable Possible Values Input Benzene/water 0.175-0.5 Benzene/catalyst 0.05-0.3846 Pretreatment temperature (K) 298–873 Reduction temperature (K) 298–573 Hydrogen concentration (%) 0/5/99.9 Reduction time (h) 0–3 Reaction temperature (K) 373–453 Reaction time (min) 1-720 Reaction pressure (MPa) 2–7 S BET (m 2 /g) 8.4–318 d pore (nm) 2.63–42.2 V pore (cm 3 /g) 0.03–1.05 VHSV (mL/mg/h) 0.00058–11.538 Ru (wt%) 0.18-20 Cu (wt%) 0-1.26 ZnO (wt%) 0-97.93 NaOH (wt%) 0-93.26 ZnSO 4 (wt%) 0-90.99 AlOOH (wt%) 0-43.68 SiO 2 (wt%) 0-43.68 TiO 2 (wt%) 0-97.1 Output Selectivity (%) 0.7–89.3 Yield (%) 0.3-60.64 Conversion (%) 4.04–98.58 The "Preparation" column in the descriptors, which records three preparation methods, was one-hot encoded. The remaining descriptors were standardized and normalized based on their value ranges and numerical characteristics. Subsequently, XGBoost 26 and Random Forest 27 models were trained to generate prediction results. The specific code is documented in the Supporting Information. In brief, we first preprocess the dataset as described, then split it into training and testing sets, with 80% for training and 20% for testing, using a random seed of 42. We then perform feature scaling and combine all features. Next, we define the XGBoost regression model using GridSearchCV for hyperparameter tuning, cross-validation, and model training, with a separate model trained for each target variable. Finally, we perform model prediction and evaluation. For model evaluation, we use R-squared (R²), Mean Absolute Error (MAE), and Root Mean Square Error (RMSE). R² measures the extent to which the model explains the variability in the data, representing the correlation between the predicted and actual values. Its value ranges between [0, 1], with values closer to 1 indicating a better fit. The calculation formula is as follows: $$\:{\text{R}}^{2}=1-\frac{{\sum\:({\text{y}}_{\text{t}\text{r}\text{u}\text{e}}-{\text{y}}_{\text{p}\text{r}\text{e}\text{d}})}^{2}}{{\sum\:({\text{y}}_{\text{t}\text{r}\text{u}\text{e}}-\stackrel{-}{\text{y}})}^{2}}$$ 1 MAE measures the average absolute difference between predicted and actual values. It is a linear metric that is not influenced by outliers. A smaller MAE indicates a lower model's prediction error, and the model performs better. The units of MAE are the same as those of the target variable, which helps in understanding the practical significance of the error. MAE is less sensitive to outliers because it does not square the errors. However, it may not reflect the distribution of the errors, as it does not impose a more significant penalty on larger errors. The calculation formula is as follows: $$\:\:\:\:\:\:\:\:\:\:\:\text{M}\text{A}\text{E}=\frac{1}{n}\sum\:_{i=1}^{n}\left|{\text{y}}_{\text{t}\text{r}\text{u}\text{e},i}-{\text{y}}_{\text{p}\text{r}\text{e}\text{d},i}\right|$$ 2 RMSE measures the square root of the average of the squared differences between the predicted values and the actual values. RMSE is more sensitive to larger errors because it squares the differences, meaning larger errors receive a more significant penalty. A lower RMSE indicates a smaller model's prediction error, and the model performs better. Because of the squaring process, RMSE is particularly sensitive to large prediction errors (such as outliers). The calculation formula is as follows: $$\:\:\:\:\:\:\:\:\:\:\text{R}\text{M}\text{S}\text{E}=\sqrt{\frac{1}{n}\sum\:_{i=1}^{n}{({\text{y}}_{\text{t}\text{r}\text{u}\text{e},i}-{\text{y}}_{\text{p}\text{r}\text{e}\text{d},i})}^{2}}$$ 3 2.2 Model Building and Analysis The advantage of Random Forest lies in its simplicity, high robustness, and low risk of overfitting, making it suitable for rapid model development with small to medium-sized datasets 27 . XGBoost, on the other hand, offers high prediction accuracy, strong regularization capabilities, and excellent support for missing values and coefficient data 26 . Often, a Random Forest can be quickly trained as a baseline, and then XGBoost can be tried to achieve even higher performance. We used both algorithms to predict the target reaction's conversion rate, selectivity, and yield, which were evaluated using R², MAE, and RMSE (Fig. 1 ). The results show that the overall prediction performance of XGBoost is better than that of Random Forest (the closer the blue dots are to the red line, the better the prediction performance). For the reaction yield, XGBoost achieved an R² of 0.87, MAE of 3.19, and RMSE of 4.46; for selectivity, R² = 0.81, MAE = 5.95, and RMSE = 9.01; and for conversion, R² = 0.96, MAE = 3.85, and RMSE = 5.31, indicating that the model has relatively high accuracy and stability for this task. In the prediction of selectivity, the prediction errors of both models are larger than those for conversion and yield, which may be due to this variable being more influenced by other hidden factors that the model finds difficult to capture fully. Conversion is the easiest variable to predict, indicating that decision trees learn their variation pattern more readily. XGBoost's superior performance compared to Random Forest may be attributed to its ability to continuously adjust weights and optimize residuals during training, thereby uncovering deeper relationships in the dataset and reducing errors. We use a heatmap to examine which variables have strong linear relationships and which do not. Since the heatmap of more than twenty descriptors is quite complex, we have included it in the Supporting Information. Some of the strong correlations may require explanation. For example, the strong negative correlation between zinc sulfate and hydrogen reduction concentration and reduction time may be because systems with a large amount of zinc sulfate do not require the reduction step in the catalyst preparation process. The hydrogen reduction time and concentration in the dataset were recorded as 0 for reactions that do not require reduction, leading to the high correlation observed in the heatmap. Figure 2 presents each target variable's SHAP plot analysis and feature importance analysis. These three target variables are yield, selectivity, and conversion, respectively. The SHAP plot shows the average contribution of each feature to the model output (including both positive and negative effects) 28 . It captures the interaction effects between features, while the feature importance plot indicates the frequency or other simple metrics on how often a feature is used in decision trees. The influence of features on the target variable in this model decreases from top to bottom. The features at the top typically have the most significant impact on the prediction outcome. The larger the SHAP value (on the horizontal axis), the greater the positive contribution of the feature to the prediction of the target variable for this data point. Conversely, a smaller SHAP value indicates a negative influence. Figures 2 a and b show that the Ru content in the range of 0.18wt%-20wt% significantly impacts both yield and selectivity. As the Ru content increases, it negatively affects both the yield and selectivity of cyclohexene in benzene hydrogenation while having a minor impact on the conversion rate. The reaction time significantly affects all three target variables but is not a simple linear relationship. This is because the selective hydrogenation of benzene to cyclohexene typically increases initially with time and then gradually decreases, which is consistent with findings reported in other literature 1 , 9 , 10 , 23 , 24 . The content of ZnSO 4 in the reaction has a positive impact on all three target variables, meaning that within the range of our dataset, the addition of ZnSO 4 can increase the yield, selectivity, and conversion rate of the reaction. VHSV reflects the relationship between the reaction flow rate and the catalyst volume. It is commonly used to describe the rate at which fluid passes through a catalyst, especially in gas-solid or liquid-solid catalytic reactions. A higher VHSV value indicates a larger flow rate per unit time and a shorter contact time with the catalyst, while a lower VHSV value means a longer contact time between the reactants and the catalyst. From the overall trend, low space velocity has a positive impact. However, the relationship with selectivity is not simple, although space velocity significantly influences the reaction's selectivity. This situation may be because space velocity is linked to the reaction time. We can infer several insights from the feature importance analysis plot provided by XGBoost (Fig. 3 ). For example, reaction time is the most critical factor for reaction yield, while space velocity, reaction time, and Ru content variations are relatively important for reaction selectivity. Incorporating insights from the SHAP plot, it can be concluded that controlling an appropriate reaction time, maintaining a lower space velocity, limiting Ru content, and managing the type and ratio of supports and additives can help improve the reaction's selectivity towards cyclohexene production. A higher pore volume also contributes positively to the reaction. The above analysis can guide the development of new catalysts. We randomly generated variables, setting the variation ranges for each variable based on the conclusions obtained earlier. Dirichlet distribution method was used to ensure that the total sum of the catalyst components remained 100%. A total of 1,000,000 sets of variables were randomly sampled within the defined range, and these were input into the trained model to predict several sets of high-yield values. The input values corresponding to these high-yield predictions are also presented. In addition, we have included a section of code (supplementary Information) at the end to allow for manual input of values, which can then be used to predict the yield. When the key parameters are set to Ru content = 1.6wt%, TiO₂ content = 13.5wt%, ZnSO₄ content = 84.9wt%, reaction temperature = 413 K, reaction time = 20 minutes, and reaction pressure = 5.0 MPa, the model predicts a cyclohexene yield of 60.6%. Conclusions This paper collects data from existing literature and uses XGBoost and Random Forest algorithms to predict and analyze the yield, selectivity, and conversion rate of selective benzene hydrogenation to cyclohexene. SHAP plots and feature importance analysis indicate that controlling an appropriate reaction time, maintaining a lower space velocity, limiting Ru content, and selecting suitable supports and additives can increase the yield of cyclohexene. These models are used to predict unexplored catalyst combinations and reaction conditions. This work can also provide some reference for the application of machine learning in heterogeneous thermal catalysis. At the same time, it should be noted that machine learning faces many challenges when predicting complex systems like heterogeneous thermal catalysis. Achieving high-precision predictions and predicting a reasonable, practical catalyst system is challenging. Abbreviations SHAP Shapley additive explanations DFT Density functional theory XGBoost Extreme gradient boosting S BET Specific surface area determined by the Brunauer–Emmett–Teller method d pore Average pore diameter V pore Total pore volume determined VHSV Volumetric hourly space velocity RF Random forest R 2 R-squared MAE Mean absolute error RMSE Root mean square error Declarations Ethics and Consent to Participate Not applicable. Consent for Publication Not applicable. Competing Interest The authors declare no competing interests. Author Contribution C. Sun wrote the main manuscript text and prepared figures. B. Zhang reviewed and modified the manuscript. Funding National Key R&D Program of China (2020YFA0210902) CAS Basic and Interdisciplinary Frontier Scientific Research Pilot Project (XDB1190300 and XDB1190302) Youth Innovation Promotion Association CAS (Y2021056) Joint Fund of the Yulin University and the Dalian National Laboratory for Clean Energy (Grant. YLU-DNL Fund 2022007) The special fund for Science and Technology Innovation Teams of Shanxi Province (202304051001007) Availability of data and materials Supplementary data associated with this article can be found in the online version at https://github.com/ttoono-takaki/Code-and-dataset.git. Acknowledgements We acknowledge the National Key R&D Program of China (2020YFA0210902), CAS Basic and Interdisciplinary Frontier Scientific Research Pilot Project (XDB1190300 and XDB1190302), Youth Innovation Promotion Association CAS (Y2021056), Joint Fund of the Yulin University and the Dalian National Laboratory for Clean Energy (Grant. YLU-DNL Fund 2022007), the special fund for Science and Technology Innovation Teams of Shanxi Province (202304051001007). References Chen Z, Sun H, Peng Z, Gao J, Li B, Liu Z, Liu S (2019) Selective Hydrogenation of Benzene: Progress of Understanding for the Ru-Based Catalytic System Design. Ind Eng Chem Res 58(31):13794–13803 I H, M P, EXCHANGE REACTIONS OF, HYDROGEN ON METALLIC CATALYSTS (1934) Trans Faraday Soc 30:1164 K.B.S KHVP, M.M P et al (1987) M.; The Catalysts for the Synthesis of Formaldehyde by Partial Oxidation of Methane JOURNAL OF CATALYSIS 84(1), 65 JENKINS GI, RIDEAL. SE (1955) The Catalytic Hydrogenation of Ethylene at a Nickel Surface. Part II.* The Reaction Mechanism. J Chem Soc 0:2490 JENKINS GI, RIDEAL SE (1955) The Catalytic Hydrogenation of Ethylene at a Nickel Surface. Part 1. The Chemisorption of Ethylene. J Chem Soc 0:2496 Fan C, Zhu Y-A, Zhou X-G, Liu Z-P (2011) Catalytic hydrogenation of benzene to cyclohexene on Ru(0001) from density functional theory investigations☆. Catal Today 160(1):234–241 Jiang L, Dong Y, Zhou G, Li R, He D (2019) Promoting the Performances of TiO2 Submicrosphere-Embedded Ru Nanoparticles in Benzene Selective Hydrogenation by Morphology Manipulation. Ind Eng Chem Res 59(3):1083–1092 Liu H, Liang S, Wang W, Jiang T, Han B (2011) The partial hydrogenation of benzene to cyclohexene over Ru–Cu catalyst supported on ZnO. J Mol Catal A: Chem 341(1–2):35–41 Silveira ET, Umpierre AP, Rossi LM, Machado G, Morais J, Soares GV, Baumvol IJ, Teixeira SR, Fichtner PF, Dupont J (2004) The partial hydrogenation of benzene to cyclohexene by nanoscale ruthenium catalysts in imidazolium ionic liquids. Chemistry 10(15):3734–3740 Sun H, Jiang H, Li S, Dong Y, Wang H, Pan Y, Liu S, Tang M, Liu Z (2013) Effect of alcohols as additives on the performance of a nano-sized Ru–Zn(2.8%) catalyst for selective hydrogenation of benzene to cyclohexene. Chem Eng J 218:415–424 Utelbaeva AB, Ermakhanov MN, Zhanabai NZ, Utelbaev BT, Mel'deshov AA (2013) Hydrogenation of benzene in the presence of ruthenium on a modified montmorillonite support. Russ J Phys Chem A 87(9):1478–1481 Yu X-L, Li Y, Xin S-M, Yuan P-Q, Yuan W-K (2018) Partial Hydrogenation of Benzene to Cyclohexene on Ru@XO 2 (X = Ti, Zr, or Si). Ind Eng Chem Res 57(6):1961–1967 Liu Y, Zhang X, Li L, Liu X, Lei T, Bai J, Guo W, Zhou Y, Liu X, Teng B, Wen X (2024) Machine learning insights into catalyst composition and structural effects on CH4 selectivity in iron-based fischer tropsch synthesis. Artif Intell Chem 2 (1) Shi H, Shen C, Huang Z, Dong K (2025) Machine Learning-Guided Prediction of Hydroformylation. ChemPhysChem 26 (3), e202400773 Han Z, Chen A, Li Z, Zhang M, Wang Z, Yang L, Gao R, Jia Y, Ji G, Lao Z, Xiao X, Tao K, Gao J, Lv W, Wang T, Li J, Zhou G (2024) Machine learning-based design of electrocatalytic materials towards high-energy lithium||sulfur batteries development. Nat Commun 15(1):8433 Jaison A, Mohan A, Lee Y-C (2024) Machine learning-enhanced photocatalysis for environmental sustainability: Integration and applications. Materials Science and Engineering: R: Reports 161 Zhou G, Dong Y, Jiang L, He D, Yang Y, Zhou X (2018) Effect of support composition on the structural and catalytic properties of Ru/AlOOH–SiO2catalysts for benzene selective hydrogenation. Catal Sci Technol 8(5):1435–1446 Wu T, Zhang P, Jiang T, Yang D, Han B (2014) Enhancing the selective hydrogenation of benzene to cyclohexene over Ru/TiO2 catalyst in the presence of a very small amount of ZnO. Sci China Chem 58(1):93–100 Hao F, Zheng J, Ouyang D, Xiong W, Liu P, Luo H (2021) Selective hydrogenation of benzene over Ru supported on surface modified TiO2. Korean J Chem Eng 38(4):736–746 Zhou G, Wang F, Shi R (2021) Nanoparticulate Ru on morphology-manipulated and Ti3 + defect-riched TiO2 nanosheets for benzene semi-hydrogenation. J Catal 398:148–160 Zhou G, Dou R, Bi H, Xie S, Pei Y, Fan K, Qiao M, Sun B, Zong B (2015) Ru nanoparticles on rutile/anatase junction of P25 TiO 2: Controlled deposition and synergy in partial hydrogenation of benzene to cyclohexene. J Catal 332:119–126 Yan X, Zhang Q, Zhu M, Wang Z (2016) Selective hydrogenation of benzene to cyclohexene over Ru–Zn/ZrO2 catalysts prepared by a two-step impregnation method. J Mol Catal A: Chem 413:85–93 Wang Z, Zhang Q, Lu X, Chen S, Liu C (2015) Ru-Zn catalysts for selective hydrogenation of benzene using coprecipitation in low alkalinity. Chin J Catal 36(3):400–407 Sun H-j, Wang H-x, Jiang H-b, Li S-h, Liu S-c, Liu Z-y, Yuan X-m, Yang K (2013) -j., Effect of (Zn(OH)2)3(ZnSO4)(H2O)5 on the performance of Ru–Zn catalyst for benzene selective hydrogenation to cyclohexene. Applied Catalysis A: General 450 , 160–168 Xue X, Liu J, Rao D, Xu S, Bing W, Wang B, He S, Wei M (2017) Double-active site synergistic catalysis in Ru–TiO2 toward benzene hydrogenation to cyclohexene with largely enhanced selectivity. Catal Sci Technol 7(3):650–657 Sheridan RP, Wang WM, Liaw A, Ma J, Gifford EM (2016) Extreme Gradient Boosting as a Method for Quantitative Structure–Activity Relationships. J Chem Inf Model 56(12):2353–2360 Breiman L, Forests R (2001) Mach Learn 45(1):5–32 Lundberg S, Lee S-I (2017) A Unified Approach to Interpreting Model Predictions. arXiv Additional Declarations No competing interests reported. Supplementary Files Supportinginformation.rar GA.png Graphical abstract Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-6475690","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":449858740,"identity":"acc248f5-7e35-4902-b35e-7b9535b532f2","order_by":0,"name":"Chao Sun","email":"","orcid":"","institution":"Institute of Coal Chemistry, Chinese Academy of Sciences","correspondingAuthor":false,"prefix":"","firstName":"Chao","middleName":"","lastName":"Sun","suffix":""},{"id":449858742,"identity":"7140f91f-4170-4b58-a4f0-8a9c354e3c5e","order_by":1,"name":"Bin Zhang","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAABB0lEQVRIiWNgGAWjYDACCSBmbJBg4GdmbHzwgYGBh3gtku3Nhw1nkKCFgcHgzLE0aSLUMzDIz25+9vDrDos8hhs5ZtI2fw7L8LcfYPzwg8EuD5cWxjnHzI1lz0gUM87IMbbObUvjkTiTwCzZw5BcjEsLs0SCmbRkm0Ris0SO4e3cBhseA6BTpRkYDiQ24NDCJpH+DaylTSLHQNrijwRIC/NvfFp4JHLMJD8CtfTwHEuSZmAD28KG1xYJiZwyacYzEokz2IGB3Av2S2KbZY9BMk4t8jPSt0n+3FGXuP8wMCp//Dlsz99++PCNHxV2OLWAgwAtOsDRhEc9SMkP/PKjYBSMglEw0gEAbClRWCg+eG0AAAAASUVORK5CYII=","orcid":"","institution":"Institute of Coal Chemistry, Chinese Academy of Sciences","correspondingAuthor":true,"prefix":"","firstName":"Bin","middleName":"","lastName":"Zhang","suffix":""}],"badges":[],"createdAt":"2025-04-18 03:38:16","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-6475690/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-6475690/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":81957965,"identity":"51467ec6-46f6-439e-955c-0b172bbc5afe","added_by":"auto","created_at":"2025-05-05 10:09:03","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":103755,"visible":true,"origin":"","legend":"\u003cp\u003eScatter plot of the prediction performance on the Random Forest and XGBoost test set: RF performance for yield (a), selectivity (b), and conversion (c). XGBoost performances for cyclohexene yield (d), selectivity (e), and conversion (f) for benzene hydrogenation over the Ru catalysts.\u003c/p\u003e","description":"","filename":"1.png","url":"https://assets-eu.researchsquare.com/files/rs-6475690/v1/df1081942f0364657e5a370c.png"},{"id":81958291,"identity":"b668ac92-360f-4ff7-b922-83cc4f69427a","added_by":"auto","created_at":"2025-05-05 10:17:03","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":138407,"visible":true,"origin":"","legend":"\u003cp\u003eSHAP analysis plot for target variables obtained using XGBoost. (a).SHAP summary plot for yield (a), selectivity (b), and conversion (c). (In each row, blue dots represent samples with lower feature values, while red dots represent higher ones)\u003c/p\u003e","description":"","filename":"2.png","url":"https://assets-eu.researchsquare.com/files/rs-6475690/v1/fa6e8e54f19350d811860379.png"},{"id":81958290,"identity":"30bfa64b-d7dc-4bba-93fe-6fb21c952fea","added_by":"auto","created_at":"2025-05-05 10:17:03","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":112463,"visible":true,"origin":"","legend":"\u003cp\u003eFeature importance analysis plots for the three target variables using XGBoost. Feature Importance for yield (a), selectivity (b) and conversion (c).\u003c/p\u003e","description":"","filename":"3.png","url":"https://assets-eu.researchsquare.com/files/rs-6475690/v1/cc265df0af4b793b2b642609.png"},{"id":88198875,"identity":"0cea7110-1595-4d8c-b3c0-1c4dca708b60","added_by":"auto","created_at":"2025-08-03 19:31:26","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":874301,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-6475690/v1/7bf80f9a-6553-4343-bccc-3102dcfb804a.pdf"},{"id":81957969,"identity":"b5be327a-6bcc-4ac2-99ab-42309ba03a21","added_by":"auto","created_at":"2025-05-05 10:09:03","extension":"rar","order_by":1,"title":"","display":"","copyAsset":false,"role":"supplement","size":1043325,"visible":true,"origin":"","legend":"","description":"","filename":"Supportinginformation.rar","url":"https://assets-eu.researchsquare.com/files/rs-6475690/v1/e25e961b0da1d6e7b90c9173.rar"},{"id":81957966,"identity":"34d788ea-8d70-4f6c-be9f-1995c5ac6a20","added_by":"auto","created_at":"2025-05-05 10:09:03","extension":"png","order_by":2,"title":"","display":"","copyAsset":false,"role":"supplement","size":64219,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eGraphical abstract\u003c/strong\u003e\u003c/p\u003e","description":"","filename":"GA.png","url":"https://assets-eu.researchsquare.com/files/rs-6475690/v1/bdadc97e26863395a7d81d09.png"}],"financialInterests":"No competing interests reported.","formattedTitle":"Insights and Analysis of Machine Learning for Benzene Hydrogenation to Cyclohexene","fulltext":[{"header":"Introduction","content":"\u003cp\u003eCyclohexene is an important raw material for nylon production. Selective hydrogenation of benzene is a key method for preparing cyclohexene\u003csup\u003e\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e\u003c/sup\u003e. Compared to the partial hydrogenation of benzene to cyclohexane, the complete hydrogenation of benzene is a thermodynamically more favorable process. However, the challenge in this field lies in controlling the kinetics to improve the yield of cyclohexene. Researchers have gradually enhanced the selectivity for cyclohexene by modifying catalyst structures and reaction processes. Although this technology was industrialized over half a century ago, current industrial processes still rely on metal salts in aqueous systems, high Ru loading in the catalysts, and low cyclohexene yields.\u003c/p\u003e \u003cp\u003eCombined with extensive single-factor experiments, traditional methods typically reveal patterns by controlling additives and supports. For benzene hydrogenation, the yield of cyclohexene increased through continuous attempts and adjustments to the mechanistic investigations and catalyst compositions. In 1934, Horiut and colleagues proposed the stepwise hydrogenation mechanism of partial benzene hydrogenation by Group VIII metals\u003csup\u003e\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e\u003c/sup\u003e. In this mechanism, adsorbed activated benzene and dissociatively adsorbed hydrogen proceed through intermediates such as cyclohexa-1,4-diene and cyclohexene, ultimately producing cyclohexane. Subsequently, Prasad et al. suggested that selective hydrogenation of benzene occurs through two processes. One is the stepwise hydrogenation, where the π electrons of benzene form a σ bond with the unbonded d electrons of the metal, and hydrogen is dissociatively adsorbed\u003csup\u003e\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e, \u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e\u003c/sup\u003e. The adsorbed benzene and dissociatively adsorbed hydrogen undergo stepwise hydrogenation, ultimately producing cyclohexane. The other process is the direct hydrogenation of benzene to cyclohexane, where the planar benzene, adsorbed via its large π bond, interacts with six hydrogen atoms to form a van der Waals-activated complex, leading to the direct formation of cyclohexane. \u003csup\u003e\u003cspan additionalcitationids=\"CR4\" citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e\u003c/sup\u003e Generally, under low-pressure conditions, the reaction follows the first process, while under high-pressure situations, the reaction favors the second process. Over the past century, various methods have been attempted to increase the yield of cyclohexene in benzene hydrogenation reactions, such as experimenting with different supports, adding various additives, altering the size and shape of catalysts, and using DFT calculations to predict performance.\u003csup\u003e\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e, \u003cspan additionalcitationids=\"CR7 CR8 CR9 CR10 CR11\" citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e\u003c/sup\u003eOver the years of research, the system using an aqueous zinc sulfate additive in combination with a Ru catalyst has been shown to improve the selectivity for cyclohexene occasionally; however, the yield of cyclohexene remains limited to around 60%.\u003c/p\u003e \u003cp\u003eMachine learning, as an important tool in the development of artificial intelligence, has gradually been applied to various scientific fields in recent years due to its ability to efficiently uncover patterns and correlations between different data. In recent years, machine learning has also attracted the attention of chemical researchers. For instance, Liu et al. used neural networks based on experimental data to reveal the crucial role of reaction conditions, pretreatment methods, and post-reaction iron phase compositions. They showed that controlling the iron phase ratio and adding appropriate promoters can effectively suppress methane selectivity\u003csup\u003e\u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e\u003c/sup\u003e. Shi et al. utilized extensive literature data and combined machine learning algorithms, including random forests and extreme gradient boosting, to successfully build a model for predicting the yield and linear selectivity of 1-octene hydrogenation, which was experimentally validated\u003csup\u003e\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e\u003c/sup\u003e. Han et al. used machine learning to assist in designing electrocatalytic materials for high-performance lithium-sulfur batteries\u003csup\u003e\u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e15\u003c/span\u003e\u003c/sup\u003e. The review by Jaison et al. discusses the application of machine learning in the field of photocatalysis, with a particular focus on its role in environmental science.\u003csup\u003e\u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e\u003c/sup\u003e In these fields, machine learning has demonstrated certain potential and has been applied to solve problems in catalysis. However, in traditional thermal catalysis, predicting the conversion rate, selectivity, and yield of reactions using machine learning algorithms presents significant challenges. This is primarily due to the difficulty of quantifying data from experimental processes and achieving consistency across datasets. Although there is a wealth of data left by previous researchers, most of this data is challenging to use as standard datasets for model building. Additionally, thermal catalysis reactions tend to have complex conditions and generally long reaction times, making it difficult to generate a large amount of data through self-conducted experiments.\u003c/p\u003e \u003cp\u003eOver the years, considerable data has been accumulated for partial hydrogenation of benzene to cyclohexane. Given this context, we aim to utilize the existing experimental data and employ innovative methods to analyze the factors influencing the conversion rate, selectivity, and yield of cyclohexene in benzene hydrogenation. Specifically, we selected high-quality data with a balanced distribution from numerous literature sources on benzene hydrogenation. After organizing the data into a dataset, we built models using Random Forest and XGBoost algorithms to predict the conversion, selectivity, and yield. Additionally, we seek to predict high-performance catalyst compositions and reaction conditions.\u003c/p\u003e"},{"header":"Results and discussion","content":"\u003cdiv id=\"Sec3\" class=\"Section2\"\u003e \u003ch2\u003e2.1. Construction of databases\u003c/h2\u003e \u003cp\u003eBefore collecting data, it is essential to identify which descriptors are easy to standardize and are important for the reaction. The hydrogenation of benzene has been studied by many researchers over the years, with some efforts focusing on adding less common additives or supports to regulate the yield of a specific product. For our purposes, these uncommon factors could potentially act as interfering variables in the model. Therefore, by reviewing nearly 50 relevant papers, we selected those that fully meet the established descriptors and have a substantial number of data points. This approach ensures the reliability of the original dataset while minimizing interference\u003csup\u003e\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e, \u003cspan additionalcitationids=\"CR18 CR19 CR20 CR21 CR22 CR23 CR24\" citationid=\"CR17\" class=\"CitationRef\"\u003e17\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e25\u003c/span\u003e\u003c/sup\u003e.\u003c/p\u003e \u003cp\u003eThe descriptors include 25 factors (Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e), such as reaction temperature, reaction pressure, reaction time, preparation method, catalyst composition, etc. It should be noted that some catalysts were not subjected to a reduction process. For these data points, we set their reduction temperature to 298 K to ensure the integrity of the dataset. Detailed information on the dataset is presented in the Supporting Information.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab1\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 1\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eVariable summary of the dataset.\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"2\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eVariable\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003ePossible Values\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eInput\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eBenzene/water\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.175-0.5\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eBenzene/catalyst\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.05-0.3846\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003ePretreatment temperature (K)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e298\u0026ndash;873\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eReduction temperature (K)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e298\u0026ndash;573\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eHydrogen concentration (%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0/5/99.9\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eReduction time (h)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0\u0026ndash;3\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eReaction temperature (K)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e373\u0026ndash;453\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eReaction time (min)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e1-720\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eReaction pressure (MPa)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e2\u0026ndash;7\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eS\u003csub\u003eBET\u003c/sub\u003e (m\u003csup\u003e2\u003c/sup\u003e/g)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e8.4\u0026ndash;318\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003ed\u003csub\u003epore\u003c/sub\u003e (nm)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e2.63\u0026ndash;42.2\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eV\u003csub\u003epore\u003c/sub\u003e (cm\u003csup\u003e3\u003c/sup\u003e/g)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.03\u0026ndash;1.05\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eVHSV (mL/mg/h)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.00058\u0026ndash;11.538\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eRu (wt%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.18-20\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eCu (wt%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0-1.26\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eZnO (wt%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0-97.93\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eNaOH (wt%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0-93.26\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eZnSO\u003csub\u003e4\u003c/sub\u003e (wt%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0-90.99\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eAlOOH (wt%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0-43.68\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eSiO\u003csub\u003e2\u003c/sub\u003e (wt%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0-43.68\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eTiO\u003csub\u003e2\u003c/sub\u003e (wt%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0-97.1\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eOutput\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eSelectivity (%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.7\u0026ndash;89.3\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eYield (%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.3-60.64\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eConversion (%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e4.04\u0026ndash;98.58\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003eThe \"Preparation\" column in the descriptors, which records three preparation methods, was one-hot encoded. The remaining descriptors were standardized and normalized based on their value ranges and numerical characteristics. Subsequently, XGBoost\u003csup\u003e\u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e26\u003c/span\u003e\u003c/sup\u003e and Random Forest\u003csup\u003e\u003cspan citationid=\"CR27\" class=\"CitationRef\"\u003e27\u003c/span\u003e\u003c/sup\u003e models were trained to generate prediction results. The specific code is documented in the Supporting Information.\u003c/p\u003e \u003cp\u003eIn brief, we first preprocess the dataset as described, then split it into training and testing sets, with 80% for training and 20% for testing, using a random seed of 42. We then perform feature scaling and combine all features. Next, we define the XGBoost regression model using GridSearchCV for hyperparameter tuning, cross-validation, and model training, with a separate model trained for each target variable. Finally, we perform model prediction and evaluation.\u003c/p\u003e \u003cp\u003eFor model evaluation, we use R-squared (R\u0026sup2;), Mean Absolute Error (MAE), and Root Mean Square Error (RMSE).\u003c/p\u003e \u003cp\u003eR\u0026sup2; measures the extent to which the model explains the variability in the data, representing the correlation between the predicted and actual values. Its value ranges between [0, 1], with values closer to 1 indicating a better fit. The calculation formula is as follows:\u003cdiv id=\"Equ1\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ1\" name=\"EquationSource\"\u003e\n$$\\:{\\text{R}}^{2}=1-\\frac{{\\sum\\:({\\text{y}}_{\\text{t}\\text{r}\\text{u}\\text{e}}-{\\text{y}}_{\\text{p}\\text{r}\\text{e}\\text{d}})}^{2}}{{\\sum\\:({\\text{y}}_{\\text{t}\\text{r}\\text{u}\\text{e}}-\\stackrel{-}{\\text{y}})}^{2}}$$\u003c/div\u003e\u003cdiv class=\"EquationNumber\"\u003e1\u003c/div\u003e\u003c/div\u003e\u003c/p\u003e \u003cp\u003eMAE measures the average absolute difference between predicted and actual values. It is a linear metric that is not influenced by outliers. A smaller MAE indicates a lower model's prediction error, and the model performs better. The units of MAE are the same as those of the target variable, which helps in understanding the practical significance of the error. MAE is less sensitive to outliers because it does not square the errors. However, it may not reflect the distribution of the errors, as it does not impose a more significant penalty on larger errors. The calculation formula is as follows:\u003cdiv id=\"Equ2\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ2\" name=\"EquationSource\"\u003e\n$$\\:\\:\\:\\:\\:\\:\\:\\:\\:\\:\\:\\text{M}\\text{A}\\text{E}=\\frac{1}{n}\\sum\\:_{i=1}^{n}\\left|{\\text{y}}_{\\text{t}\\text{r}\\text{u}\\text{e},i}-{\\text{y}}_{\\text{p}\\text{r}\\text{e}\\text{d},i}\\right|$$\u003c/div\u003e\u003cdiv class=\"EquationNumber\"\u003e2\u003c/div\u003e\u003c/div\u003e\u003c/p\u003e \u003cp\u003eRMSE measures the square root of the average of the squared differences between the predicted values and the actual values. RMSE is more sensitive to larger errors because it squares the differences, meaning larger errors receive a more significant penalty. A lower RMSE indicates a smaller model's prediction error, and the model performs better. Because of the squaring process, RMSE is particularly sensitive to large prediction errors (such as outliers). The calculation formula is as follows:\u003cdiv id=\"Equ3\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ3\" name=\"EquationSource\"\u003e\n$$\\:\\:\\:\\:\\:\\:\\:\\:\\:\\:\\text{R}\\text{M}\\text{S}\\text{E}=\\sqrt{\\frac{1}{n}\\sum\\:_{i=1}^{n}{({\\text{y}}_{\\text{t}\\text{r}\\text{u}\\text{e},i}-{\\text{y}}_{\\text{p}\\text{r}\\text{e}\\text{d},i})}^{2}}$$\u003c/div\u003e\u003cdiv class=\"EquationNumber\"\u003e3\u003c/div\u003e\u003c/div\u003e\u003c/p\u003e \u003c/div\u003e\n\u003ch3\u003e2.2 Model Building and Analysis\u003c/h3\u003e\n\u003cp\u003eThe advantage of Random Forest lies in its simplicity, high robustness, and low risk of overfitting, making it suitable for rapid model development with small to medium-sized datasets\u003csup\u003e\u003cspan citationid=\"CR27\" class=\"CitationRef\"\u003e27\u003c/span\u003e\u003c/sup\u003e. XGBoost, on the other hand, offers high prediction accuracy, strong regularization capabilities, and excellent support for missing values and coefficient data\u003csup\u003e\u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e26\u003c/span\u003e\u003c/sup\u003e. Often, a Random Forest can be quickly trained as a baseline, and then XGBoost can be tried to achieve even higher performance. We used both algorithms to predict the target reaction's conversion rate, selectivity, and yield, which were evaluated using R\u0026sup2;, MAE, and RMSE (Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e). The results show that the overall prediction performance of XGBoost is better than that of Random Forest (the closer the blue dots are to the red line, the better the prediction performance). For the reaction yield, XGBoost achieved an R\u0026sup2; of 0.87, MAE of 3.19, and RMSE of 4.46; for selectivity, R\u0026sup2; = 0.81, MAE\u0026thinsp;=\u0026thinsp;5.95, and RMSE\u0026thinsp;=\u0026thinsp;9.01; and for conversion, R\u0026sup2; = 0.96, MAE\u0026thinsp;=\u0026thinsp;3.85, and RMSE\u0026thinsp;=\u0026thinsp;5.31, indicating that the model has relatively high accuracy and stability for this task.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eIn the prediction of selectivity, the prediction errors of both models are larger than those for conversion and yield, which may be due to this variable being more influenced by other hidden factors that the model finds difficult to capture fully. Conversion is the easiest variable to predict, indicating that decision trees learn their variation pattern more readily. XGBoost's superior performance compared to Random Forest may be attributed to its ability to continuously adjust weights and optimize residuals during training, thereby uncovering deeper relationships in the dataset and reducing errors.\u003c/p\u003e \u003cp\u003eWe use a heatmap to examine which variables have strong linear relationships and which do not. Since the heatmap of more than twenty descriptors is quite complex, we have included it in the Supporting Information. Some of the strong correlations may require explanation. For example, the strong negative correlation between zinc sulfate and hydrogen reduction concentration and reduction time may be because systems with a large amount of zinc sulfate do not require the reduction step in the catalyst preparation process. The hydrogen reduction time and concentration in the dataset were recorded as 0 for reactions that do not require reduction, leading to the high correlation observed in the heatmap.\u003c/p\u003e \u003cp\u003eFigure \u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003e presents each target variable's SHAP plot analysis and feature importance analysis. These three target variables are yield, selectivity, and conversion, respectively. The SHAP plot shows the average contribution of each feature to the model output (including both positive and negative effects)\u003csup\u003e\u003cspan citationid=\"CR28\" class=\"CitationRef\"\u003e28\u003c/span\u003e\u003c/sup\u003e. It captures the interaction effects between features, while the feature importance plot indicates the frequency or other simple metrics on how often a feature is used in decision trees. The influence of features on the target variable in this model decreases from top to bottom. The features at the top typically have the most significant impact on the prediction outcome. The larger the SHAP value (on the horizontal axis), the greater the positive contribution of the feature to the prediction of the target variable for this data point. Conversely, a smaller SHAP value indicates a negative influence.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eFigures \u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003ea and b show that the Ru content in the range of 0.18wt%-20wt% significantly impacts both yield and selectivity. As the Ru content increases, it negatively affects both the yield and selectivity of cyclohexene in benzene hydrogenation while having a minor impact on the conversion rate. The reaction time significantly affects all three target variables but is not a simple linear relationship. This is because the selective hydrogenation of benzene to cyclohexene typically increases initially with time and then gradually decreases, which is consistent with findings reported in other literature\u003csup\u003e\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e, \u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e, \u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e, \u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e23\u003c/span\u003e, \u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e24\u003c/span\u003e\u003c/sup\u003e. The content of ZnSO\u003csub\u003e4\u003c/sub\u003e in the reaction has a positive impact on all three target variables, meaning that within the range of our dataset, the addition of ZnSO\u003csub\u003e4\u003c/sub\u003e can increase the yield, selectivity, and conversion rate of the reaction.\u003c/p\u003e \u003cp\u003eVHSV reflects the relationship between the reaction flow rate and the catalyst volume. It is commonly used to describe the rate at which fluid passes through a catalyst, especially in gas-solid or liquid-solid catalytic reactions. A higher VHSV value indicates a larger flow rate per unit time and a shorter contact time with the catalyst, while a lower VHSV value means a longer contact time between the reactants and the catalyst. From the overall trend, low space velocity has a positive impact. However, the relationship with selectivity is not simple, although space velocity significantly influences the reaction's selectivity. This situation may be because space velocity is linked to the reaction time.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eWe can infer several insights from the feature importance analysis plot provided by XGBoost (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003e). For example, reaction time is the most critical factor for reaction yield, while space velocity, reaction time, and Ru content variations are relatively important for reaction selectivity. Incorporating insights from the SHAP plot, it can be concluded that controlling an appropriate reaction time, maintaining a lower space velocity, limiting Ru content, and managing the type and ratio of supports and additives can help improve the reaction's selectivity towards cyclohexene production. A higher pore volume also contributes positively to the reaction.\u003c/p\u003e \u003cp\u003eThe above analysis can guide the development of new catalysts. We randomly generated variables, setting the variation ranges for each variable based on the conclusions obtained earlier. Dirichlet distribution method was used to ensure that the total sum of the catalyst components remained 100%. A total of 1,000,000 sets of variables were randomly sampled within the defined range, and these were input into the trained model to predict several sets of high-yield values. The input values corresponding to these high-yield predictions are also presented. In addition, we have included a section of code (supplementary Information) at the end to allow for manual input of values, which can then be used to predict the yield. When the key parameters are set to Ru content\u0026thinsp;=\u0026thinsp;1.6wt%, TiO₂ content\u0026thinsp;=\u0026thinsp;13.5wt%, ZnSO₄ content\u0026thinsp;=\u0026thinsp;84.9wt%, reaction temperature\u0026thinsp;=\u0026thinsp;413 K, reaction time\u0026thinsp;=\u0026thinsp;20 minutes, and reaction pressure\u0026thinsp;=\u0026thinsp;5.0 MPa, the model predicts a cyclohexene yield of 60.6%.\u003c/p\u003e"},{"header":"Conclusions","content":"\u003cp\u003eThis paper collects data from existing literature and uses XGBoost and Random Forest algorithms to predict and analyze the yield, selectivity, and conversion rate of selective benzene hydrogenation to cyclohexene. SHAP plots and feature importance analysis indicate that controlling an appropriate reaction time, maintaining a lower space velocity, limiting Ru content, and selecting suitable supports and additives can increase the yield of cyclohexene. These models are used to predict unexplored catalyst combinations and reaction conditions. This work can also provide some reference for the application of machine learning in heterogeneous thermal catalysis. At the same time, it should be noted that machine learning faces many challenges when predicting complex systems like heterogeneous thermal catalysis. Achieving high-precision predictions and predicting a reasonable, practical catalyst system is challenging.\u003c/p\u003e"},{"header":"Abbreviations","content":" \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"No\" id=\"Taba\" border=\"1\"\u003e \u003ccolgroup cols=\"2\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cdiv class=\"SimplePara\"\u003eSHAP\u003c/div\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cdiv class=\"SimplePara\"\u003eShapley additive explanations\u003c/div\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cdiv class=\"SimplePara\"\u003eDFT\u003c/div\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cdiv class=\"SimplePara\"\u003eDensity\u0026nbsp;functional\u0026nbsp;theory\u003c/div\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cdiv class=\"SimplePara\"\u003eXGBoost\u003c/div\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cdiv class=\"SimplePara\"\u003eExtreme\u0026nbsp;gradient\u0026nbsp;boosting\u003c/div\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cdiv class=\"SimplePara\"\u003eS\u003csub\u003eBET\u003c/sub\u003e\u003c/div\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cdiv class=\"SimplePara\"\u003eSpecific surface area determined by the Brunauer\u0026ndash;Emmett\u0026ndash;Teller method\u003c/div\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cdiv class=\"SimplePara\"\u003ed\u003csub\u003epore\u003c/sub\u003e\u003c/div\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cdiv class=\"SimplePara\"\u003eAverage pore diameter\u003c/div\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cdiv class=\"SimplePara\"\u003eV\u003csub\u003epore\u003c/sub\u003e\u003c/div\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cdiv class=\"SimplePara\"\u003eTotal pore volume determined\u003c/div\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cdiv class=\"SimplePara\"\u003eVHSV\u003c/div\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cdiv class=\"SimplePara\"\u003eVolumetric\u0026nbsp;hourly\u0026nbsp;space\u0026nbsp;velocity\u003c/div\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cdiv class=\"SimplePara\"\u003eRF\u003c/div\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cdiv class=\"SimplePara\"\u003eRandom forest\u003c/div\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cdiv class=\"SimplePara\"\u003eR\u003csup\u003e\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e\u003c/sup\u003e\u003c/div\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cdiv class=\"SimplePara\"\u003eR-squared\u003c/div\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cdiv class=\"SimplePara\"\u003eMAE\u003c/div\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cdiv class=\"SimplePara\"\u003eMean absolute error\u003c/div\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cdiv class=\"SimplePara\"\u003eRMSE\u003c/div\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cdiv class=\"SimplePara\"\u003eRoot mean square error\u003c/div\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e "},{"header":"Declarations","content":"\u003cp\u003e\u003cstrong\u003eEthics and Consent to Participate\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eNot applicable.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eConsent for Publication\u003c/strong\u003e\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eNot applicable.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eCompeting Interest\u003c/strong\u003e\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eThe authors declare no competing interests.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAuthor Contribution\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eC. Sun wrote the main manuscript text and prepared figures. B. Zhang reviewed and modified the manuscript.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eFunding\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eNational Key R\u0026amp;D Program of China (2020YFA0210902)\u003c/p\u003e\n\u003cp\u003eCAS Basic and Interdisciplinary Frontier Scientific Research Pilot Project (XDB1190300 and XDB1190302)\u003c/p\u003e\n\u003cp\u003eYouth Innovation Promotion Association CAS (Y2021056)\u003c/p\u003e\n\u003cp\u003eJoint Fund of the Yulin University and the Dalian National Laboratory for Clean Energy (Grant. YLU-DNL Fund 2022007)\u003c/p\u003e\n\u003cp\u003eThe special fund for Science and Technology Innovation Teams of Shanxi Province (202304051001007)\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAvailability of data and materials\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eSupplementary data associated with this article can be found in the online version at https://github.com/ttoono-takaki/Code-and-dataset.git.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAcknowledgements\u0026nbsp;\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eWe acknowledge the National Key R\u0026amp;D Program of China (2020YFA0210902), CAS Basic and Interdisciplinary Frontier Scientific Research Pilot Project (XDB1190300 and XDB1190302), Youth Innovation Promotion Association CAS (Y2021056), Joint Fund of the Yulin University and the Dalian National Laboratory for Clean Energy (Grant. YLU-DNL Fund 2022007), the special fund for Science and Technology Innovation Teams of Shanxi Province (202304051001007).\u0026nbsp;\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\u003cli\u003e\u003cspan\u003eChen Z, Sun H, Peng Z, Gao J, Li B, Liu Z, Liu S (2019) Selective Hydrogenation of Benzene: Progress of Understanding for the Ru-Based Catalytic System Design. Ind Eng Chem Res 58(31):13794\u0026ndash;13803\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eI H, M P, EXCHANGE REACTIONS OF, HYDROGEN ON METALLIC CATALYSTS (1934) Trans Faraday Soc 30:1164\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eK.B.S KHVP, M.M P et al (1987) M.; The Catalysts for the Synthesis of Formaldehyde by Partial Oxidation of Methane \u003cem\u003eJOURNAL OF CATALYSIS 84(1), 65\u003c/em\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eJENKINS GI, RIDEAL. SE (1955) The Catalytic Hydrogenation of Ethylene at a Nickel Surface. Part II.* The Reaction Mechanism. J Chem Soc 0:2490\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eJENKINS GI, RIDEAL SE (1955) The Catalytic Hydrogenation of Ethylene at a Nickel Surface. Part 1. The Chemisorption of Ethylene. J Chem Soc 0:2496\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eFan C, Zhu Y-A, Zhou X-G, Liu Z-P (2011) Catalytic hydrogenation of benzene to cyclohexene on Ru(0001) from density functional theory investigations☆. Catal Today 160(1):234\u0026ndash;241\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eJiang L, Dong Y, Zhou G, Li R, He D (2019) Promoting the Performances of TiO2 Submicrosphere-Embedded Ru Nanoparticles in Benzene Selective Hydrogenation by Morphology Manipulation. Ind Eng Chem Res 59(3):1083\u0026ndash;1092\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLiu H, Liang S, Wang W, Jiang T, Han B (2011) The partial hydrogenation of benzene to cyclohexene over Ru\u0026ndash;Cu catalyst supported on ZnO. J Mol Catal A: Chem 341(1\u0026ndash;2):35\u0026ndash;41\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSilveira ET, Umpierre AP, Rossi LM, Machado G, Morais J, Soares GV, Baumvol IJ, Teixeira SR, Fichtner PF, Dupont J (2004) The partial hydrogenation of benzene to cyclohexene by nanoscale ruthenium catalysts in imidazolium ionic liquids. Chemistry 10(15):3734\u0026ndash;3740\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSun H, Jiang H, Li S, Dong Y, Wang H, Pan Y, Liu S, Tang M, Liu Z (2013) Effect of alcohols as additives on the performance of a nano-sized Ru\u0026ndash;Zn(2.8%) catalyst for selective hydrogenation of benzene to cyclohexene. Chem Eng J 218:415\u0026ndash;424\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eUtelbaeva AB, Ermakhanov MN, Zhanabai NZ, Utelbaev BT, Mel'deshov AA (2013) Hydrogenation of benzene in the presence of ruthenium on a modified montmorillonite support. Russ J Phys Chem A 87(9):1478\u0026ndash;1481\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eYu X-L, Li Y, Xin S-M, Yuan P-Q, Yuan W-K (2018) Partial Hydrogenation of Benzene to Cyclohexene on Ru@XO\u003csub\u003e2\u003c/sub\u003e (X\u0026thinsp;=\u0026thinsp;Ti, Zr, or Si). Ind Eng Chem Res 57(6):1961\u0026ndash;1967\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLiu Y, Zhang X, Li L, Liu X, Lei T, Bai J, Guo W, Zhou Y, Liu X, Teng B, Wen X (2024) Machine learning insights into catalyst composition and structural effects on CH4 selectivity in iron-based fischer tropsch synthesis. Artif Intell Chem \u003cem\u003e2\u003c/em\u003e (1)\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eShi H, Shen C, Huang Z, Dong K (2025) Machine Learning-Guided Prediction of Hydroformylation. ChemPhysChem 26 (3), e202400773\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHan Z, Chen A, Li Z, Zhang M, Wang Z, Yang L, Gao R, Jia Y, Ji G, Lao Z, Xiao X, Tao K, Gao J, Lv W, Wang T, Li J, Zhou G (2024) Machine learning-based design of electrocatalytic materials towards high-energy lithium||sulfur batteries development. Nat Commun 15(1):8433\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eJaison A, Mohan A, Lee Y-C (2024) Machine learning-enhanced photocatalysis for environmental sustainability: Integration and applications. \u003cem\u003eMaterials Science and Engineering: R: Reports 161\u003c/em\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZhou G, Dong Y, Jiang L, He D, Yang Y, Zhou X (2018) Effect of support composition on the structural and catalytic properties of Ru/AlOOH\u0026ndash;SiO2catalysts for benzene selective hydrogenation. Catal Sci Technol 8(5):1435\u0026ndash;1446\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWu T, Zhang P, Jiang T, Yang D, Han B (2014) Enhancing the selective hydrogenation of benzene to cyclohexene over Ru/TiO2 catalyst in the presence of a very small amount of ZnO. Sci China Chem 58(1):93\u0026ndash;100\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHao F, Zheng J, Ouyang D, Xiong W, Liu P, Luo H (2021) Selective hydrogenation of benzene over Ru supported on surface modified TiO2. Korean J Chem Eng 38(4):736\u0026ndash;746\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZhou G, Wang F, Shi R (2021) Nanoparticulate Ru on morphology-manipulated and Ti3\u0026thinsp;+\u0026thinsp;defect-riched TiO2 nanosheets for benzene semi-hydrogenation. J Catal 398:148\u0026ndash;160\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZhou G, Dou R, Bi H, Xie S, Pei Y, Fan K, Qiao M, Sun B, Zong B (2015) Ru nanoparticles on rutile/anatase junction of P25 TiO 2: Controlled deposition and synergy in partial hydrogenation of benzene to cyclohexene. J Catal 332:119\u0026ndash;126\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eYan X, Zhang Q, Zhu M, Wang Z (2016) Selective hydrogenation of benzene to cyclohexene over Ru\u0026ndash;Zn/ZrO2 catalysts prepared by a two-step impregnation method. J Mol Catal A: Chem 413:85\u0026ndash;93\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWang Z, Zhang Q, Lu X, Chen S, Liu C (2015) Ru-Zn catalysts for selective hydrogenation of benzene using coprecipitation in low alkalinity. Chin J Catal 36(3):400\u0026ndash;407\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSun H-j, Wang H-x, Jiang H-b, Li S-h, Liu S-c, Liu Z-y, Yuan X-m, Yang K (2013) -j., Effect of (Zn(OH)2)3(ZnSO4)(H2O)5 on the performance of Ru\u0026ndash;Zn catalyst for benzene selective hydrogenation to cyclohexene. \u003cem\u003eApplied Catalysis A: General 450\u003c/em\u003e, 160\u0026ndash;168\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eXue X, Liu J, Rao D, Xu S, Bing W, Wang B, He S, Wei M (2017) Double-active site synergistic catalysis in Ru\u0026ndash;TiO2 toward benzene hydrogenation to cyclohexene with largely enhanced selectivity. Catal Sci Technol 7(3):650\u0026ndash;657\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSheridan RP, Wang WM, Liaw A, Ma J, Gifford EM (2016) Extreme Gradient Boosting as a Method for Quantitative Structure\u0026ndash;Activity Relationships. J Chem Inf Model 56(12):2353\u0026ndash;2360\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBreiman L, Forests R (2001) Mach Learn 45(1):5\u0026ndash;32\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLundberg S, Lee S-I (2017) A Unified Approach to Interpreting Model Predictions. \u003cem\u003earXiv\u003c/em\u003e\u003c/span\u003e\u003c/li\u003e\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":true,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"Machine learning, Heterogeneous catalysis, Hydrogenation of benzene, XGBoost","lastPublishedDoi":"10.21203/rs.3.rs-6475690/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-6475690/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eCyclohexene is an important raw material in the production of nylon. Selective hydrogenation of benzene is a key method for preparing cyclohexene. However, the Ru catalysts used in current industrial processes still face the challenges of high metal usage, high process costs, and low cyclohexene yield. This study utilizes existing literature data combined with machine learning methods to analyze the factors influencing conversion rate, selectivity, and yield in the benzene hydrogenation to cyclohexene reaction and constructs predictive models based on XGBoost and Random Forest algorithms. After analysis, it was found that reaction time, Ru content, and space velocity are key factors influencing reaction yield, selectivity, and conversion rate. Shapley Additive Explanations (SHAP) analysis and feature importance analysis further revealed the contribution of each variable to the reaction outcomes. Additionally, we randomly generated one million variable combinations using the Dirichlet distribution to attempt to predict high-yield catalyst formulations. This paper provides new insights into the application of machine learning in heterogeneous catalysis and offers some reference for further research.\u003c/p\u003e","manuscriptTitle":"Insights and Analysis of Machine Learning for Benzene Hydrogenation to Cyclohexene","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-05-05 10:08:58","doi":"10.21203/rs.3.rs-6475690/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"eda4fbec-c609-450b-bc2c-8348b660d023","owner":[],"postedDate":"May 5th, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[],"tags":[],"updatedAt":"2025-08-03T19:23:16+00:00","versionOfRecord":[],"versionCreatedAt":"2025-05-05 10:08:58","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-6475690","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-6475690","identity":"rs-6475690","version":["v1"]},"buildId":"8U1c8b4HqxoKbykW_rLl7","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.