Logging-data-driven lithology identification of conglomerate reservoir by the assistance of integrated machine learning methods | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Article Logging-data-driven lithology identification of conglomerate reservoir by the assistance of integrated machine learning methods Jiming Liu, Dongjin Xu This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-7429684/v1 This work is licensed under a CC BY 4.0 License Status: Published Journal Publication published 23 Nov, 2025 Read the published version in Scientific Reports → Version 1 posted 12 You are reading this latest preprint version Abstract Lithology is a key parameter in reservoir fine description and evaluation. It is difficult to identify reservoir lithology directly by single curve or conventional cross plot method, because the mud-gravel mixing in complex reservoirs. The accurate identification of conglomerate reservoir lithology has always been a high-profile issue in reservoir characterization. In this study, over 70 meters of cores were observed in detail. The conglomerate lithology after depth correction is matched with the log curves, including five log curves such as GR, DT, RHOB, TNPH, and M2R1. With the logging data as input, three machine learning models were built separately, and the prediction results were compared using a variety of methods, including accuracy analysis parameters and ROC curves. The results show that the machine learning model based on logging data has excellent performance in the lithology prediction of conglomerate reservoir, and the XGBoost model shows the best prediction results with the highest prediction accuracy of 0.902. In addition, the optimal model is interpreted by SHAP method. In different lithology prediction, the contribution of different log curves is different. On the whole, TNPH curve plays the most important role in lithology prediction. This study provides insights for lithology prediction of complex reservoirs. Physical sciences/Energy science and technology Earth and environmental sciences/Solid earth sciences lithology identification machine learning logging data driven conglomerate reservoir Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Figure 6 Figure 7 Figure 8 Figure 9 Figure 10 Figure 11 Figure 12 1 Introduction The accurate lithology identification is a basic and significant issue in reservoir research of petroleum industry(Liu et al., n.d.; Ruiyi et al., n.d.; Saporetti et al., n.d.; X. Zhang et al., n.d.). Three types of data could be used to identity the lithology in different reservoirs, including rock cuttings produced during drilling, cores obtained from the stratum, and the logging data. Cuttings logging is considered as a conventional method to obtain lithologies. However, the lithology interval in the well profile is about 1 meter and the depth information is commonly inexact. And this method has a certain lag in the drilling process, so it is difficult to accurately reflect the fine change of lithology. Cores can provide accurate lithology identification based on rock core or thin sections observation. However, this method needs amount of cost during the core collection and thin sections preparation. Well logs can provide a lot of information about formation rocks, so the logging curves are commonly used to make lithology identification(Gu et al., n.d.; Y. Sun et al., n.d.). Conventional cutoff values of logging data can reflect the change of lithology in the vertical profile of a single well. Thus, the logging-data-based lithology identification is usually based on typical log curves that can reflect lithology changes, such as gamma (GR) curve, spontaneous potential curve, well diameter curve, etc., and then the change of shale content in sandstone reservoirs is judged manually or automatically to identify the lithology. Using the cutoff value, spider grams or cross plots, the lithology could be distinguished in conventional reservoirs which has low-GR sandstones characteristic. However, the accurate identification based on logging curve shows difficulties in fan-delta conglomerate reservoir. It is not feasible to use conventional cross-plot or cutoff value to determine the lithology in fan-delta sandstone reservoirs, because if the gravel is acidic igneous rock in the sandstone, its radioactive element content is high, which will lead to high value of the gamma logging curve, and it is easy to be interpreted as mudstone. In addition, the argillaceous component is a near-source deposit, and if its transport distance is close to the reservoir, then the surface of the argillaceous component fails to absorb a large number of radioactive materials, and the natural gamma ray is abnormally low, which shows the characteristic of conventional sandstones. And in some typical reservoirs, gravel and argillaceous components are mixed near the source origin, resulting in higher gamma ray value than sandstone. Based on well logs, machine learning methods have been widely used for lithology prediction of different types of reservoirs in recent years(Ashraf et al., 2021 ; Lin P. et al., 2023 ; Liu and Liu, 2022 ; Ruiyi et al., n.d.; Y. Sun et al., n.d.; X. Zhang et al., n.d.; C. Zhao et al., 2022 ; X. Zhao et al., 2022 ). Based on GR value and its seven derivatives, seven supervised machine learning algorithms are used for lithofacies prediction, and the stochastic deep forest method performs best, with a MAE of 0.25. Using seven logging curves, including GR and AC, as inputs, a single integrated model and composition of integration models were established to predict eight lithologies and compared in Daniudi gas field of Ordes Basin. Using 76500 images from tray image as input, ResNeXt-50 model performed best in lithology identification with an accuracy value of 93% (Dong et al., n.d.). In general, unsupervised and semi-supervised algorithms, supervised algorithms, and deep learning algorithms are used for lithology prediction (Alzubaidi et al., n.d.). However, few lithology identifications in conglomerate reservoirs have been reported, especially fan delta reservoirs which show frequent vertical lithology changes and uneven gravel distribution. Using raw data from the fan delta conglomerate reservoir, including over 70 m of core data and well logging data, three machine learning models were developed to identify the formation lithology in this study. Accurate lithology identification will help the subsequent fine reservoir description and efficient development. 2 Methodology 2.1 Models using boosting algorithms based on regression tree Boosting is an ensemble learning method that builds strong learners by iteratively adding weak learners (such as decision trees). This method optimizes model performance by fitting residuals continuously, and has high prediction accuracy and robustness. However, the traditional gradient lifting algorithm has the problem of slow training speed and large memory consumption when dealing with large-scale data sets. Boosting can be randomly surmised slightly higher prediction accuracy than weak learning enhancement study for high prediction precision of apparatus, which provides effective new ideas and new methods for machine learning methods (Dev and Eden, n.d.; Ferreira and Figueiredo, n.d.; Schapire, n.d.; Zou et al., n.d.). Three types boosting algorithms were adopted to identify lithology in this study, including Adaptive Boosting, Light Gradient Boosting Machine, and Extreme Gradient Boosting. (1) Adaptive Boosting (Adaboost) Adaptive Boosting (AdaBoost) regression is a machine learning technique that employs the boosting principle to iteratively combine multiple weak predictors into a robust ensemble model. During training, the algorithm dynamically adjusts the weights assigned to base learners, prioritizing those with smaller prediction errors while diminishing the contribution of high-error models. Through successive iterations, this adaptive weighting mechanism optimizes the ensemble's performance, ultimately converging to a highly accurate predictor(CAO et al., n.d.; Sun Y. et al., n.d.; X. Zhang et al., n.d.). (2) Light Gradient Boosting Machine (LightGBM) The decision trees construction process of LightGBM is similar to the traditional decision tree algorithm, but there are some special features. It uses gradient-based historical information to build a decision tree and selects the optimal split point by calculating the information gain (or similar metric) of each feature. During the construction process, LightGBM also considers factors such as the sparsity of features and unbalance of data to further improve the performance of the model(Liu et al., n.d.; Wang et al., 2017 ). (3) Extreme Gradient Boosting (XGBoost) XGBoost (eXtreme Gradient Boosting) is a scalable, distributed gradient-boosting framework built on classification and regression tree (CART) principles, extending traditional decision tree methodologies (Chen T. and Guestrin, 2016 ; T. Chen and Guestrin, 2016 ; N. Lin et al., 2023 ; Lin P. et al., 2023 ; Ruiyi et al., n.d.; Wen et al., n.d.; J. Zhang et al., n.d.; Zhao and Liao, 2025 ; Zhong et al., 2020 ). By integrating base learners such as “GBTree” (CART-based regression trees) and “GBLinear” (linear regressors), it constructs a strong ensemble learner with high predictive accuracy and computational efficiency. Unlike conventional Gradient Boosting Decision Trees (GBDT), XGBoost enhances model robustness through “regularization terms” in its loss function, effectively mitigating overfitting and controlling complexity. A key innovation of XGBoost lies in its optimization strategy: it approximates the objective function using a “second-order Taylor expansion”, incorporating both first and second derivatives to accelerate convergence and improve precision. This approach enables flexibility in defining custom loss functions, provided they are “twice continuously differentiable”. While the framework supports user-defined objectives, it most frequently employs standard loss functions such as mean squared error (MSE) for regression and logistic loss for classification tasks(Dhaliwal et al., 2018 ; Ogunleye and Wang, 2020 ; Qiu et al., n.d.; Zhang and Zhan, 2017 ). 2.2 Verification methods for classification issues (1) Confusion matrix and accuracy verification parameters Confusion matrix is a commonly used visualization tool to evaluate the performance of the classification model, particularly in multi-class issues. It provides a detailed breakdown of predictions by comparing the model's outputs against the true labels. And by organizing errors and correct predictions per class, the confusion matrix offers a granular view of model behavior, enabling targeted improvements beyond aggregate metrics like overall accuracy. From the confusion matrix, parameters of model evaluation could be obtained, including precision, recall, F1-score, and accuracy(Brandmeier and Chen, 2019 ; Deng et al., 2017 ; Xu et al., n.d.). Precision quantifies the proportion of correctly predicted positive instances among all instances predicted as positive. It emphasizes the reliability of positive predictions. And high precision indicates minimal false positives, making it critical in scenarios where FP costs are high. $$\:Precision=\frac{TP}{TP\:+\:FP}$$ Recall measures the proportion of correctly identified positive instances relative to all actual positive instances. It evaluates the model's ability to detect relevant cases: $$\:Recall\:=\frac{TP}{TP\:+\:FN}$$ And high recall is essential when minimizing false negatives is paramount. The F1-score harmonizes precision and recall via their harmonic mean. It is particularly useful for imbalanced datasets where optimizing one metric alone may degrade the other. And this metric provides a balanced assessment of classifier performance. The method of F1-score calculation is as follows: $$\:\text{F}1-\text{s}\text{c}\text{o}\text{r}\text{e}\:=2\times\:\frac{Precision\times\:Recall\:}{Precision+\:Recall\:}$$ And accuracy represents the overall proportion of correct predictions (both positive and negative): $$\:\text{A}\text{c}\text{c}\text{u}\text{r}\text{a}\text{c}\text{y}\:=\frac{TP+TN\:}{TP+TN+FP+FN}$$ (2) Receiver Operating Characteristic (ROC) curve The Receiver Operating Characteristic (ROC) curve, a widely used evaluation tool in binary classification, can be extended to multiclass problems through adaptation strategies. It visualizes the trade-off between the “True Positive Rate (TPR) and False Positive Rate (FPR) across classification thresholds. The ROC curve uses the size of the area under the curve (AUC) to evaluate the model, which was shown in Fig. 4 . The value of AUC ranges from 0.5 to 1. The larger the area under the curve, the closer it is to 1, the better the diagnosis or prediction effect of the model: when the AUC ranges from 0.5 to 0.7, the accuracy is lower. When 0.7 ~ 0.9, there is a certain accuracy; When the AUC is above 0.9, the accuracy is higher. AUC = 0.5 indicates that the diagnostic method is completely ineffective and has no diagnostic value. 3 Workflow and data processing The complete process of lithology identification in conglomerate reservoir based on logging data was established in this study, and the specific steps were show in the following flow chart, including lithology identification of cores, depth matching of the cores, and logging data pre-processing. 3.1 Lithology identification of cores The study area is located in the South China Sea, and the reservoir belongs to fan delta front deposits. A total of 71.4 m cores from the formation were observed in detail and six types of lithologies were classified manually, including medium-fine sandstone (litho-1), pebbly sandstone (litho-2), mudstone (litho-3), argillaceous sandstone (litho-4), conglomerate sandstones (litho-5), and coarse sandstone (litho-6). 3.2 The depth matching of the cores to well logs The core observation is done in the laboratory, and the logging curves were derived from the well profile, and the error in depth takes difficulties in the data set construction. Thus, it is significant to make the depth of manual lithology labels and logging curves unified. The depth of collected cores commonly does not match the depth of logging curves because of the measurement error. The measurement error could be obtained through the comparison of GR data obtained using a handheld gamma detector on the ground and the GR in well profile. And the depth of cores should be added 0.96 m to match logging curves. Finally, the lithology labels were matched to the well logging curves. 3.3 Logging data selection and pre-processing (1) Logging data selection From the geological point of view, the GR curve could show response to argillaceous content, and the curves reflecting porosity and flow of rocks can also show the differences among lithologies, including GR (natural gamma ray, gAPI), DT (DTCO, Delta-T Compressional, µm/ft), RHOB (density log, g/cm 3 ), TNPH (thermal neutron porosity log, %), M2R1 (shallow high resolution array induced resistivity, ohmm), M2R6 (middle high resolution array induced resistivity, ohmm), and M2RX (deep high resolution array induced resistivity, ohmm). Thus, it’s reasonable to use these seven well logs. The cross plot of seven logging curves under different lithologies was shown in Fig. 4 , which showed the degree of differentiation of lithology by intersection of different logging curves. Unfortunately, it is difficult to distinguish the complex six lithologies accurately with two-dimensional logs, but the log's response to the lithology can be observed. However, in the regression or classification machine learning issues, too many parameters will result in higher dimensions of input data, bringing difficulties on model interpretation. Besides, when there exist two or more variables sharing high correlations, the regression model will meet greater risk of over-fitting. Thus, the logging series should be further selected in order to decrease the risk of over-fitting and guarantee the better pattern visualization. The cross-correlation matrix plot of seven well-logging parameters was shown in Fig. 6 . In the seven variables, M2R1, M2R6, and M2RX all show the resistivity characteristic of the formation. In addition, the correlation coefficient among them is over 0.8, which brings challenge of over-fitting. Thus, M2R6 and M2RX were removed from the input dada, and the finally selected logging curves were five series, including GR, DT, RHOB, TNPH, and M2R1. (2) Dataset construction and splitting Combining the manual lithology labels and selected five well longs, the full dataset was constructed according to the depth. Table 1 The presentation table of the complete data set (the first ten pieces of data). GR DT RHOB TNPH M2R1 lithology types (gAPI) µs/ft (g/cm³) (%) (ohmm) 159.658 98.926 2.24 27.117 3.909 coarse sandstone 161.666 98.939 2.252 26.151 3.662 coarse sandstone 165.914 99.317 2.254 25.367 3.355 coarse sandstone 168.514 100.107 2.244 25.027 2.954 coarse sandstone 169.893 100.846 2.234 25.308 2.664 coarse sandstone 94.263 90.021 2.279 16.223 34 coarse sandstone 97.412 91.767 2.291 16.294 18.836 coarse sandstone 102.83 93.271 2.31 16.637 9.211 coarse sandstone 115.427 94.207 2.354 17.449 6.559 coarse sandstone 135.153 94.542 2.398 18.781 4.34 coarse sandstone A total of 761 samples were divided into training data and testing data randomly using the “train_test_split” method of PYTHON. And 80% of the full data, 608 samples in total, were used as the training data consisted of x_train and y_train, and the other 20% of the dataset, 153 samples in total, were set as testing data composed by x_test and y_test. The manually labeled lithologies were used as the objective parameter in the whole process. And the five well logs were used as input in the machine learning models. The characteristics of input, both of x_train and x_test were summarized as follows. Table 2 Statistical summary of 608 input training data. Count: total number of samples; mean: mean value; std: standard deviation; min: minimum value; 25%: quantile at 25%; 50%: quantile at 50%; 75%: quantile at 75%; max: maximum value Features GR DT RHOB TNPH M2R1 (gAPI) (µs/ft) (g/cm³) (%) (ohmm) count 608 608 608 608 608 mean 156.429 92.512 2.339 22.112 7.542 std 35.460 6.503 0.099 4.740 10.722 min 94.246 72.415 2.090 9.113 1.264 25% 132.874 88.493 2.256 18.533 2.047 50% 151.283 92.596 2.355 23.240 3.786 75% 176.351 96.370 2.418 25.105 7.683 max 276.961 110.773 2.597 36.522 89.057 Table 3 Statistical summary of 153 input testing data. Count: total number of samples; mean: mean value; std: standard deviation; min: minimum value; 25%: quantile at 25%; 50%: quantile at 50%; 75%: quantile at 75%; max: maximum value Features GR DT RHOB TNPH M2R1 (gAPI) (µs/ft) (g/cm³) (%) (ohmm) count 153 153 153 153 153 mean 155.383 92.134 2.361 22.159 7.584 std 33.650 6.679 0.101 4.841 12.376 min 94.787 77.998 2.096 10.510 1.272 25% 136.513 87.334 2.294 19.473 1.898 50% 149.665 92.612 2.384 23.415 3.372 75% 172.658 95.163 2.426 25.184 7.875 max 276.589 109.789 2.568 31.531 85.452 4 Results and discussion 4.1 Models construction and hyperparameters tuning Using the segmented data set, three machine learning models based on boosting method are constructed, including Adaboost, LightGBM, and XGBoost. The three hyper-parameters of boosting models, named max_depth, learning_rate and n_estimators, play important roles in model performance and robustness. In order to obtain a more effective model, the tuning process of max_depth, learning_rate and n_estimators were discussed based on grid-search method. Generally, the ideal learning_rate fluctuates between 0.05 and 0.3 for different problems. In this study, the interval between 0.01 to 5 was search by a step of 0.01. For Adaboost model, it obtained best performance when the Learning_rate was set as 0.4. And the Learning_rate of LightGBM was set as 0.7. XGBoost model shows best prediction accuracy with the Learning_rate of 0.1. Max_depth is also a key hyperparameter in the boosting model, which means the max depth of decision tree in the forest. For the Adaboost model, It does not have a direct max_depth parameter by itself, but it can be used in conjunction with decision trees, and decision tree models have max_depth parameters. In the Scikit-learn library, when using AdaBoostClassifier, setting the decision tree as a weak classifier and specifying its max_depth parameter could meet the need of Max_depth for better performance. For Adaboost and XGBoost models, the Max_depth was set as 8 to make better prediction. And it was set as 10 in the LightGBM model. In the boosting models, the value of n_estimators is the number of trees in the “forest”. The N_ estimators was set as 80, 40, and 100 in Adaboost, LightGBM and XGBoost, respectively. Table 4 Hyperparameters search ranges and optimal values Models Hyperparameters Learning_rate Max_depth N_ estimators Adaboost [0.01,5], 0.4 [1,50], 8 [10,300], 80 LightGBM [0.01,5], 0.7 [1,50], 10 [10,300], 40 XGBoost [0.01,5], 0.1 [1,50], 8 [10,300], 100 4.2 Predicted results of three ensemble models Confusion matrixes showed the performance of different models. Compared with LightGBM model and XGBoost model, the Adaboost model showed worse with more incorrect identification results, especially in the argillaceous sandstone and conglomerate sandstones. According to the confusion matrixes, LightGBM and XGBoost models showed same performance on four lithologies, including mudstone, argillaceous sandstone, conglomerate sandstones, and coarse sandstone. And the difference of the two models existed in the medium-fine sandstone and pebbly sandstone prediction. A total of 173 medium-fine sandstone samples were predicted correctly by LightGBM, which was 174 in XGBoost model. And in the pebbly sandstone prediction, LightGBM showed better performance than XGBoost model with one correctly predicted sample. Using the tuned hyperparameters, the models were constructed and the predicted results of three ensemble models on testing data were compared. The detailed accuracy verification parameters were listed in the Table 5 ~ 7, including Adaboost, LightGBM, and XGBoost model. For different types of lithology, models showed different performance. The accuracy was the overall proportion of correct predictions, which shows the model’s performance intuitively. The XGBoost model reached the best performance with an accuracy of 0.902. And the accuracy on testing data was 0.725 and 0.895 of Adaboost and the LightGBM models. XGBoost and LightGBM models always showed better performance in the prediction process than the Adaboost judged by the other verification parameters. For example, XGBoost model showed satisfactory performance on the mudstone prediction, with a recall of 0.978, a precision of 0.936, and a score of 0.957. While the three parameters on mudstone prediction in Adaboost model were 0.933, 0.894, and 0.913, respectively. XGBoost and LightGBM models showed similar performance on lithology, and the verification parameters showed subtle differences. For example, XGBoost model obtained a recall of 0.846 in the medium-fine sandstone prediction, while the recall is 0.821 in the LightGBM model when predicting the medium-fine sandstone. In order to see the comparison more intuitively, the average values of recall, precision, and F1-score for six lithologies were calculated and compared. In the Adaboost mode, the average value of recall for all lithologies was 0.582, and it was 0.905 and 0.909 in LightGBM and XGBoost model. Combining these three synthesis parameters, the XGBoost model outperformed the LightGBM model by a small margin. Table 5 Prediction results analysis table of Adaboost model. Lithologies Adaboost (testing data) recall precision F1-score medium-fine sandstone 0.590 0.590 0.590 pebbly sandstone 0.786 0.595 0.677 mudstone 0.933 0.894 0.913 argillaceous sandstone 0.529 0.692 0.600 conglomerate sandstones 0.652 0.882 0.750 coarse sandstone 0.000 0.000 0.000 Average 0.582 0.609 0.588 Accuracy 0.725 Table 6 Prediction results analysis table of LightGBM model. Lithologies LightGBM (testing data) recall precision F1-score medium-fine sandstone 0.821 0.914 0.865 pebbly sandstone 0.893 0.833 0.862 mudstone 0.978 0.957 0.967 argillaceous sandstone 0.824 0.824 0.824 conglomerate sandstones 0.913 0.875 0.894 coarse sandstone 1.000 0.000 0.000 Average 0.905 0.734 0.735 Accuracy 0.895 Table 7 Prediction results analysis table of XGBoost model. Lithologies XGBoost (testing data) recall precision F1-score medium-fine sandstone 0.846 0.917 0.880 pebbly sandstone 0.893 0.806 0.847 mudstone 0.978 0.936 0.957 argillaceous sandstone 0.824 0.933 0.875 conglomerate sandstones 0.913 0.913 0.913 coarse sandstone 1.000 0.000 0.000 Average 0.909 0.751 0.745 Accuracy 0.902 Finally, the ROC curves were used to compare the models’ performance quantitatively. The AUC value indicates the degree of prediction for every type of lithology. For example, in the Adaboost model, the AUC value of medium-fine sandstone is 0.776, while it is 0.974 and 0.972 in LightGBM and XGBoost models. It means for the medium-fine sandstone prediction, LightGBM showed the best performance in the three models. Combining AUC values for all lithologies, XGBoost performed best in overall lithology prediction. 4.3 Visualization and interpretation of the best model (1) Predicted lithology comparison on well profile Based on the XGBoost model, continues lithology on the well profile was obtained, and it was compared with logging lithology and manually labeled lithology. It is obvious that the predicted lithology has more accurate and finer lithology than logging lithology. And it also indicates the complete lithology profile could be obtained through machine learning models based on well logs. (2) hyperparameters of the XGBoost model Visualization of model parameters is necessary for the reproduction of experimental results. Through the order of “get_ parameters” in XGBoost model, the hyperparameters of the best model is visualized as follows: {'objective': 'multi:softprob', 'use_label_encoder': True, 'base_score': 0.5, 'booster': 'gbtree', 'colsample_bylevel': 1, 'colsample_bynode': 1, 'colsample_bytree': 0.8, 'enable_categorical': False, 'gamma': 0, 'gpu_id': -1, 'importance_type': None, 'interaction_constraints': '', 'learning_rate': 0.1, 'max_delta_step': 0, 'max_depth': 8, 'min_child_weight': 1, 'missing': nan, 'monotone_constraints': '()', 'n_estimators': 100, 'n_jobs': 8, 'num_parallel_tree': 1, 'predictor': 'auto', 'random_state': 1440, 'reg_alpha': 0, 'reg_lambda': 1, 'scale_pos_weight': None, 'subsample': 0.5, 'tree_method': 'exact', 'validate_parameters': 1, 'verbosity': None, 'eval_metric': ['logloss', 'auc', 'error'], 'nthread': -1, 'seed': 1440} (2) Shap values of different variables Machine learning models are black boxes for users. Due to the difficulties to visualize the prediction process of these models, the interpretability of models remains a challenge and even an issue in the process of using models. In this manuscript, SHAP (Shapley Additive Explanations) method was used to interpreted the best model. SHAP is an interpretative framework derived from cooperative game theory principles, designed to elucidate prediction outcomes across various machine learning models (Antonini et al., n.d.; Lin P. et al., 2023 ; Y. Sun et al., n.d., n.d.; Wen et al., n.d.; J. Zhang et al., n.d.). This method quantifies the contribution of individual features through SHAP values - distinct numerical measures assigned to each input variable within a given data sample. Theoretically grounded in Shapley value calculations from game theory, SHAP establishes an additive explanation model where features are treated as collaborative participants in the prediction process. This approach ensures mathematically consistent attribution of feature importance while maintaining local interpretability for individual predictions and global insights into model behavior. Different well logs played different roles in the prediction process of different lithologies, which was shown in Fig. 9 . For example, the in the Fig. 9 (a), GR Has the widest range of its SHAP values. It indicated that when the model making the medium-fine sandstone perdition, GR had the greatest probability of having the greatest contribution to the predicted results. While it showed the minimal contribution in mudstone perdition shown in Fig. 9 (c). Through the interpretation of a particular lithology, the contribution of the well logs to the result could be quantified and the working process of the model could be obtained. (3) Global model interpretation of XGBoost model After the local interpretation is complete, the global interpretation is crucial to the understanding of the XGBoost model. As shown in Fig. 10 , the global contribution of different well logs in lithology prediction were interpreted. For example, in the bule bar in Fig. 10 , it was obvious that RHOB played the most important role in mudstone prediction, and then there came the well logs of M2R1, TNPH, DT, and GR. This result also verifies that GR is difficult to accurately and intuitively correspond to lithology changes in the lithology prediction of conglomerate reservoirs. 5 Conclusion In this study, using well logs as inputs, the rapid and accurate lithology identification of conglomerate reservoir was achieved based on machine learning methods. Several important conclusions were concluded as follows. A method for automatically classifying the lithology of the sandstone conglomerate reservoir is proposed using well logs as inputs, including GR, DT, RHOB, TNPH, and M2R1. Three machine learning models were constructed to make a rapid prediction of lithology using well-logging data, including Adboost model, LightGBM model, and the XGBoost model. The XGBoost showed the best performance among the three models, with an accuracy of 0.902. Based on the interpretation tool of SHAP, the global contribution of different well logs in lithology identification were quantified. For the all lithologies, TNPH played the most important role in mudstone prediction, and then there came the well logs of M2R1, RHOB, DT, and GR. And for one specific lithology type, SHAP values for all variables could meet the need for quantitative ranking of contributions. Declarations Author Contribution Conceptualization, Methodology, Formal Analysis, Investigation, and the Original Draft Writing: Jiming Liu.; Validation, Resources, and Data Curation: Dongjin XuConflict of interest: the authors declare no competing interests. Acknowledgements This work was financially supported by the National Natural Science Foundation of China (No. 51504038). Data Availability The datasets used and analyzed during the current study available from the corresponding author on reasonable request. References Alzubaidi, F., Mostaghimi, P., Swietojanski, P., Clark, S.R., Armstrong, R.T., n.d. Automated lithology classification from drill core images using convolutional neural networks 197, 107933. Antonini, A.S., Tanzola, J., Asiain, L., Ferracutti, G.R., Castro, S.M., Bjerg, E.A., Ganuza, M.L., n.d. Machine Learning model interpretability using SHAP values: Application to Igneous Rock Classification task 23, 100178. Ashraf, U., Zhang, H., Anees, A., Mangi, H.N., Ali, M., Zhang, X., Imraz, M., Abbasi, S.S., Abbas, A., Ullah, Z., Ullah, J., Tan, S., 2021. A Core Logging, Machine Learning and Geostatistical Modeling Interactive Approach for Subsurface Imaging of Lenticular Geobodies in a Clastic Depositional System, SE Pakistan 30, 2807-2830. Brandmeier, M., Chen, Y., 2019. Lithological classification using multi-sensor data and convolutional neural networks xlii-2/w16, 55-59. CAO, Y., MIAO, Q.-G., LIU, J.-C., GAO, L., n.d. Advance and Prospects of AdaBoost Algorithm 39, 745-758. Chen T., Guestrin C., 2016. XGBoost. ACM, pp. 785-794. Chen, T., Guestrin, C., 2016. XGBoost: A Scalable Tree Boosting System. abs/1603.02754. Deng C., Pan H., Fang S., Konaté A.A., Qin R., 2017. Support vector machine as an alternative method for lithology classification of crystalline rocks 14, 341-349. Dev V.A., Eden M.R., n.d. Evaluating the Boosting Approach to Machine Learning for Formation Lithology Classification. Elsevier. Dhaliwal, S.S., Nahid, A.-A., Abbas, R., 2018. Effective Intrusion Detection System Using XGBoost 9, 149. Dong, S.-Q., Sun, Y.-M., Xu, T., Zeng, L.-B., Du, X.-Y., Yang, X., Liang, Y., n.d. How to improve machine learning models for lithofacies identification by practical and novel ensemble strategy and principles 20, 733-752. Ferreira, A.J., Figueiredo, M.A.T., n.d. Boosting Algorithms: A Review of Methods, Theory, and Applications. Springer New York. Gu, Y., Zhang, D., Bao, Z., n.d. Lithological classification via an improved extreme gradient boosting: A demonstration of the Chang 4+5 member, Ordos Basin, Northern China 215, 104798. Lin, N., Fu, J., Jiang, R., Li, G., Yang, Q., 2023. Lithological Classification by Hyperspectral Images Based on a Two-Layer XGBoost Model, Combined with a Greedy Algorithm 15, 3764. Lin P., Dong X., Ji Y., Xia J., Zhai Y., Hou Q., 2023. Explainable Prediction Model of Logging Lithology Classification Based on XGBoost and SHAP. IEEE, pp. 307-312. Liu, J.-J., Liu, J.-C., 2022. Integrating deep learning and logging data analytics for lithofacies classification and 3D modeling of tight sandstone reservoirs 13, 101311. Liu, Y., Zhu, R., Zhai, S., Li, N., Li, C., n.d. Lithofacies identification of shale formation based on mineral content regression using LightGBM algorithm: A case study in the Luzhou block, South Sichuan Basin, China 11, 4256-4272. Ogunleye A., Wang Q.-G., 2020. XGBoost Model for Chronic Kidney Disease Diagnosis 17, 2131-2140. Qiu, Y., Zhou, J., Khandelwal, M., Yang, H., Yang, P., Li, C., n.d. Performance evaluation of hybrid WOA-XGBoost, GWO-XGBoost and BO-XGBoost models to predict blast-induced ground vibration 38, 4145-4162. Ruiyi, H., Zhuwen, W., Wenhua, W., Fanghui, X., Xinghua, Q., Yitong, C., n.d. Lithology identification of igneous rocks based on XGboost and conventional logging curves, a case study of the eastern depression of Liaohe Basin 195, 104480. Saporetti C.M., da Fonseca L.G., Pereira E., n.d. A Lithology Identification Approach Based on Machine Learning With Evolutionary Parameter Tuning 16, 1819-1823. Schapire R.E., n.d. The Boosting Approach to Machine Learning: An Overview. Springer New York. Sun, Y., Pang, S., Li, H., Qiao, S., Zhang, Y., n.d. Enhanced Lithology Classification Using an Interpretable SHAP Model Integrating Semi-Supervised Contrastive Learning and Transformer with Well Logging Data 34, 785-813. Sun Y., Pang S., Zhang Y., n.d. Application of Adaboost-Transformer Algorithm for Lithology Identification Based on Well Logging Data 21, 1-5. Sun, Y., Pang, S., Zhao, Z., Zhang, Y., n.d. Interpretable SHAP Model Combining Meta-learning and Vision Transformer for Lithology Classification Using Limited and Unbalanced Drilling Data in Well Logging 33, 2545-2565. Wang D., Zhang Y., Zhao Y., 2017. LightGBM. ACM, pp. 7-11. Wen, H., Liu, B., Di, M., Li, J., Zhou, X., n.d. A SHAP-enhanced XGBoost model for interpretable prediction of coseismic landslides 74, 3826-3854. Xu, Z., Shi, H., Lin, P., Liu, T., n.d. Integrated lithology identification based on images and elemental data from rocks 205, 108853. Zhang, J., Ma, X., Zhang, Jialan, Sun, D., Zhou, X., Mi, C., Wen, H., n.d. Insights into geospatial heterogeneity of landslide susceptibility based on the SHAP-XGBoost model 332, 117357. Zhang L., Zhan C., 2017. Machine Learning in Rock Facies Classification: An Application of XGBoost. Society of Exploration Geophysicists and Chinese Petroleum Society. Zhang, X., Sun, Q., He, K., Wang, Z., Wang, J., n.d. Lithology identification of logging data based on improved neighborhood rough set and AdaBoost 15, 1201-1213. Zhao, B., Liao, W., 2025. Lithology Identification of Buried Hill Reservoir Based on XGBoost with Optimized Interpretation 13, 682. Zhao, C., Jiang, Y., Wang, L., 2022. Data-driven diagenetic facies classification and well-logging identification based on machine learning methods: A case study on Xujiahe tight sandstone in Sichuan Basin 217, 110798. Zhao, X., Chen, X., Huang, Q., Lan, Z., Wang, X., Yao, G., 2022. Logging-data-driven permeability prediction in low-permeable sandstones based on machine learning with pattern visualization: A case study in Wenchang A Sag, Pearl River Mouth Basin 214, 110517. Zhong, R., Johnson, R., Chen, Z., 2020. Generating pseudo density log from drilling and logging-while-drilling data using extreme gradient boosting (XGBoost) 220, 103416. Zou, Y., Chen, Y., Deng, H., n.d. Gradient Boosting Decision Tree for Lithology Identification with Well Logs: A Case Study of Zhaoxian Gold Deposit, Shandong Peninsula, China 30, 3197-3217. Additional Declarations No competing interests reported. Cite Share Download PDF Status: Published Journal Publication published 23 Nov, 2025 Read the published version in Scientific Reports → Version 1 posted Editorial decision: Revision requested 09 Oct, 2025 Reviews received at journal 19 Sep, 2025 Reviews received at journal 18 Sep, 2025 Reviewers agreed at journal 15 Sep, 2025 Reviewers agreed at journal 06 Sep, 2025 Reviewers agreed at journal 05 Sep, 2025 Reviewers agreed at journal 05 Sep, 2025 Reviewers invited by journal 05 Sep, 2025 Editor assigned by journal 01 Sep, 2025 Editor invited by journal 01 Sep, 2025 Submission checks completed at journal 29 Aug, 2025 First submitted to journal 29 Aug, 2025 You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-7429684","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Article","associatedPublications":[],"authors":[{"id":511210041,"identity":"e78ca8fc-3195-4163-b345-53c3ada4eb89","order_by":0,"name":"Jiming Liu","email":"","orcid":"","institution":"Yangtze University","correspondingAuthor":false,"prefix":"","firstName":"Jiming","middleName":"","lastName":"Liu","suffix":""},{"id":511210042,"identity":"fe2fd23a-a1a4-4060-a189-22b1e1173b73","order_by":1,"name":"Dongjin Xu","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAA0UlEQVRIiWNgGAWjYDCCA0AsYQAk2JsPHPjwgyQtPMcSD87sIVYLGEjkGB/mYCNCB9/x3sMvLAruJG64kfPhMAMPgzy/2AH8WiTPnEuzkDB4Zmxw5u2GwwUWDIYzZyfg12JwI8fMQMLgsJzB8dwNh2fwMCQY3CZSC4/BgZwHh3nYiNNi/ABsy4kcBuK0SJ45YwYM5MPGkmeOGQADWYKwX/iO9xh/lvhzOLHvePPjDx9+2MjzSxPQAgRs0hIIjgRudUiA+eMHotSNglEwCkbBiAUAVKBMgbbZRRAAAAAASUVORK5CYII=","orcid":"","institution":"Yangtze University","correspondingAuthor":true,"prefix":"","firstName":"Dongjin","middleName":"","lastName":"Xu","suffix":""}],"badges":[],"createdAt":"2025-08-22 00:53:18","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-7429684/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-7429684/v1","draftVersion":[],"editorialEvents":[{"content":"https://doi.org/10.1038/s41598-025-27640-3","type":"published","date":"2025-11-23T15:56:59+00:00"}],"editorialNote":"","failedWorkflow":false,"files":[{"id":91149659,"identity":"36b3aa4d-4941-4a13-840a-a8cf5d864efc","added_by":"auto","created_at":"2025-09-12 06:51:54","extension":"jpeg","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":832544,"visible":true,"origin":"","legend":"\u003cp\u003eThe classical confusion matrix for binary classification issues. TP: True Positives; TN: True Negatives; FP: False Positives; FN: False Negatives.\u003c/p\u003e","description":"","filename":"image1.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-7429684/v1/dfe8cf0924f3fb622c7a9675.jpeg"},{"id":91149674,"identity":"919982fe-8cc9-4ee3-9b3f-a1463b4221ee","added_by":"auto","created_at":"2025-09-12 06:51:56","extension":"jpeg","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":428975,"visible":true,"origin":"","legend":"\u003cp\u003ethe classical schematic diagram of the ROC curve. Red curve is the characteristic of model 1\u003csup\u003est\u003c/sup\u003e; green curve indicates the model 2\u003csup\u003end\u003c/sup\u003e; and the bule one reflects the performance of model 3\u003csup\u003erd\u003c/sup\u003e. AUC refers to the area under the curve of model, filled by shadow.\u003c/p\u003e","description":"","filename":"image2.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-7429684/v1/39d1c8a5e1105c6f6f18e510.jpeg"},{"id":91150339,"identity":"eb30465c-2e2b-404d-96e8-d32cff42b27b","added_by":"auto","created_at":"2025-09-12 06:59:57","extension":"jpeg","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":169764,"visible":true,"origin":"","legend":"\u003cp\u003eWorkflow of logging-data-driven lithofacies identification by the assistance of ensemble machine learning models\u003c/p\u003e","description":"","filename":"image3.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-7429684/v1/33eceeb09e3ac450f3c93125.jpeg"},{"id":91149693,"identity":"d3d08eee-6ca7-4caa-918f-179fc1cdeb3f","added_by":"auto","created_at":"2025-09-12 06:51:58","extension":"jpeg","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":269220,"visible":true,"origin":"","legend":"\u003cp\u003eReal photographs and drawings of manually identified lithologies. (a) medium-fine sandstone with several muddy bandings; (b) pebbly sandstone with small size (less than 8mm) of argillaceous and metamorphic gravels; (c) gray mudstone; (d) argillaceous sandstone with several sandy bandings; (e) conglomerate sandstones with the gravel size over 3.5cm; coarse sandstone with few muddy bandings.\u003c/p\u003e","description":"","filename":"image4.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-7429684/v1/567a63bdd32512d223b0d243.jpeg"},{"id":91149655,"identity":"37c6a52b-4875-400b-b15c-99f58484855f","added_by":"auto","created_at":"2025-09-12 06:51:53","extension":"jpeg","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":773895,"visible":true,"origin":"","legend":"\u003cp\u003eCore depth correction based on GR curves obtained in the well profile and laboratory\u003c/p\u003e","description":"","filename":"image5.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-7429684/v1/b5d50e4f38c5ceed7b138375.jpeg"},{"id":91149660,"identity":"71a2d3e2-e56b-4982-b47b-42a8dad61444","added_by":"auto","created_at":"2025-09-12 06:51:54","extension":"png","order_by":6,"title":"Figure 6","display":"","copyAsset":false,"role":"figure","size":878089,"visible":true,"origin":"","legend":"\u003cp\u003eCross plot of seven logging curves under different lithologies\u003c/p\u003e","description":"","filename":"image6.png","url":"https://assets-eu.researchsquare.com/files/rs-7429684/v1/62d9ce0019d707600f65beea.png"},{"id":91149654,"identity":"f1ad314c-ebca-4ae2-9c41-d32ac7ab36a2","added_by":"auto","created_at":"2025-09-12 06:51:52","extension":"png","order_by":7,"title":"Figure 7","display":"","copyAsset":false,"role":"figure","size":145917,"visible":true,"origin":"","legend":"\u003cp\u003eThe correlation coefficient heat map of seven logging curves, including GR, DT, RHOB, TNPH, M2R1, M2R6, and M2R\u003c/p\u003e","description":"","filename":"image7.png","url":"https://assets-eu.researchsquare.com/files/rs-7429684/v1/c34c5585ceaf07c1dd51a99f.png"},{"id":91149690,"identity":"36462804-d37e-49c6-bfe7-2e9c8286da1d","added_by":"auto","created_at":"2025-09-12 06:51:57","extension":"jpeg","order_by":8,"title":"Figure 8","display":"","copyAsset":false,"role":"figure","size":158665,"visible":true,"origin":"","legend":"\u003cp\u003eFigure 7. Confusion matrixes showing the performance of different models. (a) Adaboost model; (b) LightGBM model; (c) XGBoost model. litho-1: medium-fine sandstone; litho-2: pebbly sandstone, litho-3: mudstone, litho-4: argillaceous sandstone, litho-5: conglomerate sandstones, litho-6: and coarse sandstone.\u003c/p\u003e","description":"","filename":"image8.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-7429684/v1/cc190a00b5362fe6692ad912.jpeg"},{"id":91149672,"identity":"a99bf544-5c10-4b4b-bce3-185d48b4bbe2","added_by":"auto","created_at":"2025-09-12 06:51:56","extension":"png","order_by":9,"title":"Figure 9","display":"","copyAsset":false,"role":"figure","size":136301,"visible":true,"origin":"","legend":"\u003cp\u003eFigure 7. ROC curves showing the performance of different models. (a) Adaboost model; (b) LightGBM model; (c) XGBoost model. litho-1: medium-fine sandstone; litho-2: pebbly sandstone, litho-3: mudstone, litho-4: argillaceous sandstone, litho-5: conglomerate sandstones, litho-6: and coarse sandstone.\u003c/p\u003e","description":"","filename":"image9.png","url":"https://assets-eu.researchsquare.com/files/rs-7429684/v1/bd4944ec8a4c859310ab0d0c.png"},{"id":91150334,"identity":"61396932-1b71-4d0e-83db-68e85c12c258","added_by":"auto","created_at":"2025-09-12 06:59:53","extension":"png","order_by":10,"title":"Figure 10","display":"","copyAsset":false,"role":"figure","size":504453,"visible":true,"origin":"","legend":"\u003cp\u003eFigure 8. Comprehensive diagram showing the continuous lithology predictions in the well profile based on the trained XGBoost model\u003c/p\u003e","description":"","filename":"image10.png","url":"https://assets-eu.researchsquare.com/files/rs-7429684/v1/d6aaa66ca868f4a66292be92.png"},{"id":91149656,"identity":"34b67fb3-36d9-434c-a35e-1e4d9a2972e9","added_by":"auto","created_at":"2025-09-12 06:51:53","extension":"jpeg","order_by":11,"title":"Figure 11","display":"","copyAsset":false,"role":"figure","size":815431,"visible":true,"origin":"","legend":"\u003cp\u003eFigure 9. Comprehensive analysis diagram of SHAP values for all variables in different lithologies perdition. SHAP values of all variables in (a) medium-fine sandstone perdition; (b) pebbly sandstone perdition; (c) mudstone perdition; (d) argillaceous sandstone perdition; (e) conglomerate sandstones perdition; coarse sandstone perdition. Blue dots indicate a small variable value and red data points indicate a large variable value.\u003c/p\u003e","description":"","filename":"image11.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-7429684/v1/4a86d11bb6e455a91ebb8c07.jpeg"},{"id":91149684,"identity":"24ab2bd4-89f6-4d9d-840f-db15c1d4b3d9","added_by":"auto","created_at":"2025-09-12 06:51:57","extension":"jpeg","order_by":12,"title":"Figure 12","display":"","copyAsset":false,"role":"figure","size":315592,"visible":true,"origin":"","legend":"\u003cp\u003eFigure 10. Global interpretation to the XGBoost model. The bars in different colors show the global contribution of a particular variable to a given lithology prediction.\u003c/p\u003e","description":"","filename":"image12.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-7429684/v1/9b5c8f18382c3dd6daad0bf5.jpeg"},{"id":96650070,"identity":"857c96b8-8ef7-4fde-9218-c0b894f93528","added_by":"auto","created_at":"2025-11-24 16:06:19","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":5473951,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-7429684/v1/134b52c0-7392-41bb-835e-4536dc2f30e1.pdf"}],"financialInterests":"No competing interests reported.","formattedTitle":"Logging-data-driven lithology identification of conglomerate reservoir by the assistance of integrated machine learning methods","fulltext":[{"header":"1 Introduction","content":"\u003cp\u003eThe accurate lithology identification is a basic and significant issue in reservoir research of petroleum industry(Liu et al., n.d.; Ruiyi et al., n.d.; Saporetti et al., n.d.; X. Zhang et al., n.d.). Three types of data could be used to identity the lithology in different reservoirs, including rock cuttings produced during drilling, cores obtained from the stratum, and the logging data. Cuttings logging is considered as a conventional method to obtain lithologies. However, the lithology interval in the well profile is about 1 meter and the depth information is commonly inexact. And this method has a certain lag in the drilling process, so it is difficult to accurately reflect the fine change of lithology. Cores can provide accurate lithology identification based on rock core or thin sections observation. However, this method needs amount of cost during the core collection and thin sections preparation. Well logs can provide a lot of information about formation rocks, so the logging curves are commonly used to make lithology identification(Gu et al., n.d.; Y. Sun et al., n.d.). Conventional cutoff values of logging data can reflect the change of lithology in the vertical profile of a single well. Thus, the logging-data-based lithology identification is usually based on typical log curves that can reflect lithology changes, such as gamma (GR) curve, spontaneous potential curve, well diameter curve, etc., and then the change of shale content in sandstone reservoirs is judged manually or automatically to identify the lithology. Using the cutoff value, spider grams or cross plots, the lithology could be distinguished in conventional reservoirs which has low-GR sandstones characteristic. However, the accurate identification based on logging curve shows difficulties in fan-delta conglomerate reservoir. It is not feasible to use conventional cross-plot or cutoff value to determine the lithology in fan-delta sandstone reservoirs, because if the gravel is acidic igneous rock in the sandstone, its radioactive element content is high, which will lead to high value of the gamma logging curve, and it is easy to be interpreted as mudstone. In addition, the argillaceous component is a near-source deposit, and if its transport distance is close to the reservoir, then the surface of the argillaceous component fails to absorb a large number of radioactive materials, and the natural gamma ray is abnormally low, which shows the characteristic of conventional sandstones. And in some typical reservoirs, gravel and argillaceous components are mixed near the source origin, resulting in higher gamma ray value than sandstone.\u003c/p\u003e\u003cp\u003eBased on well logs, machine learning methods have been widely used for lithology prediction of different types of reservoirs in recent years(Ashraf et al., \u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e2021\u003c/span\u003e; Lin P. et al., \u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e2023\u003c/span\u003e; Liu and Liu, \u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e2022\u003c/span\u003e; Ruiyi et al., n.d.; Y. Sun et al., n.d.; X. Zhang et al., n.d.; C. Zhao et al., \u003cspan citationid=\"CR33\" class=\"CitationRef\"\u003e2022\u003c/span\u003e; X. Zhao et al., \u003cspan citationid=\"CR33\" class=\"CitationRef\"\u003e2022\u003c/span\u003e). Based on GR value and its seven derivatives, seven supervised machine learning algorithms are used for lithofacies prediction, and the stochastic deep forest method performs best, with a MAE of 0.25. Using seven logging curves, including GR and AC, as inputs, a single integrated model and composition of integration models were established to predict eight lithologies and compared in Daniudi gas field of Ordes Basin. Using 76500 images from tray image as input, ResNeXt-50 model performed best in lithology identification with an accuracy value of 93% (Dong et al., n.d.). In general, unsupervised and semi-supervised algorithms, supervised algorithms, and deep learning algorithms are used for lithology prediction (Alzubaidi et al., n.d.). However, few lithology identifications in conglomerate reservoirs have been reported, especially fan delta reservoirs which show frequent vertical lithology changes and uneven gravel distribution.\u003c/p\u003e\u003cp\u003eUsing raw data from the fan delta conglomerate reservoir, including over 70 m of core data and well logging data, three machine learning models were developed to identify the formation lithology in this study. Accurate lithology identification will help the subsequent fine reservoir description and efficient development.\u003c/p\u003e"},{"header":"2 Methodology","content":"\u003cdiv id=\"Sec3\" class=\"Section2\"\u003e\u003ch2\u003e2.1 Models using boosting algorithms based on regression tree\u003c/h2\u003e\u003cp\u003eBoosting is an ensemble learning method that builds strong learners by iteratively adding weak learners (such as decision trees). This method optimizes model performance by fitting residuals continuously, and has high prediction accuracy and robustness. However, the traditional gradient lifting algorithm has the problem of slow training speed and large memory consumption when dealing with large-scale data sets. Boosting can be randomly surmised slightly higher prediction accuracy than weak learning enhancement study for high prediction precision of apparatus, which provides effective new ideas and new methods for machine learning methods (Dev and Eden, n.d.; Ferreira and Figueiredo, n.d.; Schapire, n.d.; Zou et al., n.d.). Three types boosting algorithms were adopted to identify lithology in this study, including Adaptive Boosting, Light Gradient Boosting Machine, and Extreme Gradient Boosting.\u003c/p\u003e\u003cp\u003e(1) Adaptive Boosting (Adaboost)\u003c/p\u003e\u003cp\u003eAdaptive Boosting (AdaBoost) regression is a machine learning technique that employs the boosting principle to iteratively combine multiple weak predictors into a robust ensemble model. During training, the algorithm dynamically adjusts the weights assigned to base learners, prioritizing those with smaller prediction errors while diminishing the contribution of high-error models. Through successive iterations, this adaptive weighting mechanism optimizes the ensemble's performance, ultimately converging to a highly accurate predictor(CAO et al., n.d.; Sun Y. et al., n.d.; X. Zhang et al., n.d.).\u003c/p\u003e\u003cp\u003e(2) Light Gradient Boosting Machine (LightGBM)\u003c/p\u003e\u003cp\u003eThe decision trees construction process of LightGBM is similar to the traditional decision tree algorithm, but there are some special features. It uses gradient-based historical information to build a decision tree and selects the optimal split point by calculating the information gain (or similar metric) of each feature. During the construction process, LightGBM also considers factors such as the sparsity of features and unbalance of data to further improve the performance of the model(Liu et al., n.d.; Wang et al., \u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e2017\u003c/span\u003e).\u003c/p\u003e\u003cp\u003e(3) Extreme Gradient Boosting (XGBoost)\u003c/p\u003e\u003cp\u003eXGBoost (eXtreme Gradient Boosting) is a scalable, distributed gradient-boosting framework built on classification and regression tree (CART) principles, extending traditional decision tree methodologies (Chen T. and Guestrin, \u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e2016\u003c/span\u003e; T. Chen and Guestrin, \u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e2016\u003c/span\u003e; N. Lin et al., \u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e2023\u003c/span\u003e; Lin P. et al., \u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e2023\u003c/span\u003e; Ruiyi et al., n.d.; Wen et al., n.d.; J. Zhang et al., n.d.; Zhao and Liao, \u003cspan citationid=\"CR32\" class=\"CitationRef\"\u003e2025\u003c/span\u003e; Zhong et al., \u003cspan citationid=\"CR35\" class=\"CitationRef\"\u003e2020\u003c/span\u003e). By integrating base learners such as \u0026ldquo;GBTree\u0026rdquo; (CART-based regression trees) and \u0026ldquo;GBLinear\u0026rdquo; (linear regressors), it constructs a strong ensemble learner with high predictive accuracy and computational efficiency. Unlike conventional Gradient Boosting Decision Trees (GBDT), XGBoost enhances model robustness through \u0026ldquo;regularization terms\u0026rdquo; in its loss function, effectively mitigating overfitting and controlling complexity. A key innovation of XGBoost lies in its optimization strategy: it approximates the objective function using a \u0026ldquo;second-order Taylor expansion\u0026rdquo;, incorporating both first and second derivatives to accelerate convergence and improve precision. This approach enables flexibility in defining custom loss functions, provided they are \u0026ldquo;twice continuously differentiable\u0026rdquo;. While the framework supports user-defined objectives, it most frequently employs standard loss functions such as mean squared error (MSE) for regression and logistic loss for classification tasks(Dhaliwal et al., \u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e2018\u003c/span\u003e; Ogunleye and Wang, \u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e2020\u003c/span\u003e; Qiu et al., n.d.; Zhang and Zhan, \u003cspan citationid=\"CR30\" class=\"CitationRef\"\u003e2017\u003c/span\u003e).\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec4\" class=\"Section2\"\u003e\u003ch2\u003e2.2 Verification methods for classification issues\u003c/h2\u003e\u003cp\u003e(1) Confusion matrix and accuracy verification parameters\u003c/p\u003e\u003cp\u003eConfusion matrix is a commonly used visualization tool to evaluate the performance of the classification model, particularly in multi-class issues. It provides a detailed breakdown of predictions by comparing the model's outputs against the true labels. And by organizing errors and correct predictions per class, the confusion matrix offers a granular view of model behavior, enabling targeted improvements beyond aggregate metrics like overall accuracy. From the confusion matrix, parameters of model evaluation could be obtained, including precision, recall, F1-score, and accuracy(Brandmeier and Chen, \u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e2019\u003c/span\u003e; Deng et al., \u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e2017\u003c/span\u003e; Xu et al., n.d.).\u003c/p\u003e\u003cp\u003e\u003c/p\u003e\u003cp\u003ePrecision quantifies the proportion of correctly predicted positive instances among all instances predicted as positive. It emphasizes the reliability of positive predictions. And high precision indicates minimal false positives, making it critical in scenarios where FP costs are high.\u003cdiv id=\"Equa\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equa\" name=\"EquationSource\"\u003e\n$$\\:Precision=\\frac{TP}{TP\\:+\\:FP}$$\u003c/div\u003e\u003c/div\u003e\u003c/p\u003e\u003cp\u003eRecall measures the proportion of correctly identified positive instances relative to all actual positive instances. It evaluates the model's ability to detect relevant cases:\u003cdiv id=\"Equb\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equb\" name=\"EquationSource\"\u003e\n$$\\:Recall\\:=\\frac{TP}{TP\\:+\\:FN}$$\u003c/div\u003e\u003c/div\u003e\u003c/p\u003e\u003cp\u003eAnd high recall is essential when minimizing false negatives is paramount.\u003c/p\u003e\u003cp\u003eThe F1-score harmonizes precision and recall via their harmonic mean. It is particularly useful for imbalanced datasets where optimizing one metric alone may degrade the other. And this metric provides a balanced assessment of classifier performance. The method of F1-score calculation is as follows:\u003cdiv id=\"Equc\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equc\" name=\"EquationSource\"\u003e\n$$\\:\\text{F}1-\\text{s}\\text{c}\\text{o}\\text{r}\\text{e}\\:=2\\times\\:\\frac{Precision\\times\\:Recall\\:}{Precision+\\:Recall\\:}$$\u003c/div\u003e\u003c/div\u003e\u003c/p\u003e\u003cp\u003eAnd accuracy represents the overall proportion of correct predictions (both positive and negative):\u003cdiv id=\"Equd\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equd\" name=\"EquationSource\"\u003e\n$$\\:\\text{A}\\text{c}\\text{c}\\text{u}\\text{r}\\text{a}\\text{c}\\text{y}\\:=\\frac{TP+TN\\:}{TP+TN+FP+FN}$$\u003c/div\u003e\u003c/div\u003e\u003c/p\u003e\u003cp\u003e(2) Receiver Operating Characteristic (ROC) curve\u003c/p\u003e\u003cp\u003eThe Receiver Operating Characteristic (ROC) curve, a widely used evaluation tool in binary classification, can be extended to multiclass problems through adaptation strategies. It visualizes the trade-off between the \u0026ldquo;True Positive Rate (TPR) and False Positive Rate (FPR) across classification thresholds. The ROC curve uses the size of the area under the curve (AUC) to evaluate the model, which was shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003e. The value of AUC ranges from 0.5 to 1. The larger the area under the curve, the closer it is to 1, the better the diagnosis or prediction effect of the model: when the AUC ranges from 0.5 to 0.7, the accuracy is lower. When 0.7\u0026thinsp;~\u0026thinsp;0.9, there is a certain accuracy; When the AUC is above 0.9, the accuracy is higher. AUC\u0026thinsp;=\u0026thinsp;0.5 indicates that the diagnostic method is completely ineffective and has no diagnostic value.\u003c/p\u003e\u003cp\u003e\u003c/p\u003e\u003c/div\u003e"},{"header":"3 Workflow and data processing","content":"\u003cp\u003eThe complete process of lithology identification in conglomerate reservoir based on logging data was established in this study, and the specific steps were show in the following flow chart, including lithology identification of cores, depth matching of the cores, and logging data pre-processing.\u003c/p\u003e\u003cp\u003e\u003c/p\u003e\u003cdiv id=\"Sec6\" class=\"Section2\"\u003e\u003ch2\u003e3.1 Lithology identification of cores\u003c/h2\u003e\u003cp\u003eThe study area is located in the South China Sea, and the reservoir belongs to fan delta front deposits. A total of 71.4 m cores from the formation were observed in detail and six types of lithologies were classified manually, including medium-fine sandstone (litho-1), pebbly sandstone (litho-2), mudstone (litho-3), argillaceous sandstone (litho-4), conglomerate sandstones (litho-5), and coarse sandstone (litho-6).\u003c/p\u003e\u003cp\u003e\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec7\" class=\"Section2\"\u003e\u003ch2\u003e3.2 The depth matching of the cores to well logs\u003c/h2\u003e\u003cp\u003eThe core observation is done in the laboratory, and the logging curves were derived from the well profile, and the error in depth takes difficulties in the data set construction. Thus, it is significant to make the depth of manual lithology labels and logging curves unified. The depth of collected cores commonly does not match the depth of logging curves because of the measurement error. The measurement error could be obtained through the comparison of GR data obtained using a handheld gamma detector on the ground and the GR in well profile. And the depth of cores should be added 0.96 m to match logging curves. Finally, the lithology labels were matched to the well logging curves.\u003c/p\u003e\u003cp\u003e\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec8\" class=\"Section2\"\u003e\u003ch2\u003e3.3 Logging data selection and pre-processing\u003c/h2\u003e\u003cp\u003e(1) Logging data selection\u003c/p\u003e\u003cp\u003eFrom the geological point of view, the GR curve could show response to argillaceous content, and the curves reflecting porosity and flow of rocks can also show the differences among lithologies, including GR (natural gamma ray, gAPI), DT (DTCO, Delta-T Compressional, \u0026micro;m/ft), RHOB (density log, g/cm\u003csup\u003e3\u003c/sup\u003e), TNPH (thermal neutron porosity log, %), M2R1 (shallow high resolution array induced resistivity, ohmm), M2R6 (middle high resolution array induced resistivity, ohmm), and M2RX (deep high resolution array induced resistivity, ohmm). Thus, it\u0026rsquo;s reasonable to use these seven well logs. The cross plot of seven logging curves under different lithologies was shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003e, which showed the degree of differentiation of lithology by intersection of different logging curves. Unfortunately, it is difficult to distinguish the complex six lithologies accurately with two-dimensional logs, but the log's response to the lithology can be observed.\u003c/p\u003e\u003cp\u003e\u003c/p\u003e\u003cp\u003eHowever, in the regression or classification machine learning issues, too many parameters will result in higher dimensions of input data, bringing difficulties on model interpretation. Besides, when there exist two or more variables sharing high correlations, the regression model will meet greater risk of over-fitting. Thus, the logging series should be further selected in order to decrease the risk of over-fitting and guarantee the better pattern visualization. The cross-correlation matrix plot of seven well-logging parameters was shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003e. In the seven variables, M2R1, M2R6, and M2RX all show the resistivity characteristic of the formation. In addition, the correlation coefficient among them is over 0.8, which brings challenge of over-fitting. Thus, M2R6 and M2RX were removed from the input dada, and the finally selected logging curves were five series, including GR, DT, RHOB, TNPH, and M2R1.\u003c/p\u003e\u003cp\u003e\u003c/p\u003e\u003cp\u003e(2) Dataset construction and splitting\u003c/p\u003e\u003cp\u003eCombining the manual lithology labels and selected five well longs, the full dataset was constructed according to the depth.\u003c/p\u003e\u003cp\u003e\u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab1\" border=\"1\"\u003e\u003ccaption language=\"En\"\u003e\u003cdiv class=\"CaptionNumber\"\u003eTable 1\u003c/div\u003e\u003cdiv class=\"CaptionContent\"\u003e\u003cp\u003eThe presentation table of the complete data set (the first ten pieces of data).\u003c/p\u003e\u003c/div\u003e\u003c/caption\u003e\u003ccolgroup cols=\"7\"\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c7\" colnum=\"7\"\u003e\u003c/div\u003e\u003cthead\u003e\u003ctr\u003e\u003cth align=\"left\" colname=\"c1\"\u003e\u003cp\u003eGR\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c2\"\u003e\u003cp\u003eDT\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c3\"\u003e\u003cp\u003eRHOB\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colspan=\"2\" nameend=\"c5\" namest=\"c4\"\u003e\u003cp\u003eTNPH\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c6\"\u003e\u003cp\u003eM2R1\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c7\" morerows=\"1\" rowspan=\"2\"\u003e\u003cp\u003elithology types\u003c/p\u003e\u003c/th\u003e\u003c/tr\u003e\u003ctr\u003e\u003cth align=\"left\" colname=\"c1\"\u003e\u003cp\u003e(gAPI)\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c2\"\u003e\u003cp\u003e\u0026micro;s/ft\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c3\"\u003e\u003cp\u003e(g/cm\u0026sup3;)\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colspan=\"2\" nameend=\"c5\" namest=\"c4\"\u003e\u003cp\u003e(%)\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c6\"\u003e\u003cp\u003e(ohmm)\u003c/p\u003e\u003c/th\u003e\u003c/tr\u003e\u003c/thead\u003e\u003ctbody\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003e159.658\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e98.926\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e2.24\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colspan=\"2\" nameend=\"c5\" namest=\"c4\"\u003e\u003cp\u003e27.117\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c6\"\u003e\u003cp\u003e3.909\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c7\"\u003e\u003cp\u003ecoarse sandstone\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003e161.666\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e98.939\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e2.252\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colspan=\"2\" nameend=\"c5\" namest=\"c4\"\u003e\u003cp\u003e26.151\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c6\"\u003e\u003cp\u003e3.662\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c7\"\u003e\u003cp\u003ecoarse sandstone\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003e165.914\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e99.317\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e2.254\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colspan=\"2\" nameend=\"c5\" namest=\"c4\"\u003e\u003cp\u003e25.367\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c6\"\u003e\u003cp\u003e3.355\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c7\"\u003e\u003cp\u003ecoarse sandstone\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003e168.514\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e100.107\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e2.244\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colspan=\"2\" nameend=\"c5\" namest=\"c4\"\u003e\u003cp\u003e25.027\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c6\"\u003e\u003cp\u003e2.954\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c7\"\u003e\u003cp\u003ecoarse sandstone\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003e169.893\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e100.846\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e2.234\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colspan=\"2\" nameend=\"c5\" namest=\"c4\"\u003e\u003cp\u003e25.308\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c6\"\u003e\u003cp\u003e2.664\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c7\"\u003e\u003cp\u003ecoarse sandstone\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003e94.263\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e90.021\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e2.279\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colspan=\"2\" nameend=\"c5\" namest=\"c4\"\u003e\u003cp\u003e16.223\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c6\"\u003e\u003cp\u003e34\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c7\"\u003e\u003cp\u003ecoarse sandstone\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003e97.412\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e91.767\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e2.291\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colspan=\"2\" nameend=\"c5\" namest=\"c4\"\u003e\u003cp\u003e16.294\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c6\"\u003e\u003cp\u003e18.836\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c7\"\u003e\u003cp\u003ecoarse sandstone\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003e102.83\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e93.271\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e2.31\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colspan=\"2\" nameend=\"c5\" namest=\"c4\"\u003e\u003cp\u003e16.637\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c6\"\u003e\u003cp\u003e9.211\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c7\"\u003e\u003cp\u003ecoarse sandstone\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003e115.427\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e94.207\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e2.354\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colspan=\"2\" nameend=\"c5\" namest=\"c4\"\u003e\u003cp\u003e17.449\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c6\"\u003e\u003cp\u003e6.559\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c7\"\u003e\u003cp\u003ecoarse sandstone\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003e135.153\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e94.542\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e2.398\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e18.781\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colspan=\"2\" nameend=\"c6\" namest=\"c5\"\u003e\u003cp\u003e4.34\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c7\"\u003e\u003cp\u003ecoarse sandstone\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003c/tbody\u003e\u003c/colgroup\u003e\u003c/table\u003e\u003c/div\u003e\u003c/p\u003e\u003cp\u003eA total of 761 samples were divided into training data and testing data randomly using the \u0026ldquo;train_test_split\u0026rdquo; method of PYTHON. And 80% of the full data, 608 samples in total, were used as the training data consisted of x_train and y_train, and the other 20% of the dataset, 153 samples in total, were set as testing data composed by x_test and y_test. The manually labeled lithologies were used as the objective parameter in the whole process. And the five well logs were used as input in the machine learning models. The characteristics of input, both of x_train and x_test were summarized as follows.\u003c/p\u003e\u003cp\u003e\u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab2\" border=\"1\"\u003e\u003ccaption language=\"En\"\u003e\u003cdiv class=\"CaptionNumber\"\u003eTable 2\u003c/div\u003e\u003cdiv class=\"CaptionContent\"\u003e\u003cp\u003eStatistical summary of 608 input training data. Count: total number of samples; mean: mean value; std: standard deviation; min: minimum value; 25%: quantile at 25%; 50%: quantile at 50%; 75%: quantile at 75%; max: maximum value\u003c/p\u003e\u003c/div\u003e\u003c/caption\u003e\u003ccolgroup cols=\"6\"\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e\u003cthead\u003e\u003ctr\u003e\u003cth align=\"left\" colname=\"c1\" morerows=\"1\" rowspan=\"2\"\u003e\u003cp\u003eFeatures\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c2\"\u003e\u003cp\u003eGR\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c3\"\u003e\u003cp\u003eDT\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c4\"\u003e\u003cp\u003eRHOB\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c5\"\u003e\u003cp\u003eTNPH\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c6\"\u003e\u003cp\u003eM2R1\u003c/p\u003e\u003c/th\u003e\u003c/tr\u003e\u003ctr\u003e\u003cth align=\"left\" colname=\"c2\"\u003e\u003cp\u003e(gAPI)\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c3\"\u003e\u003cp\u003e(\u0026micro;s/ft)\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c4\"\u003e\u003cp\u003e(g/cm\u0026sup3;)\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c5\"\u003e\u003cp\u003e(%)\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c6\"\u003e\u003cp\u003e(ohmm)\u003c/p\u003e\u003c/th\u003e\u003c/tr\u003e\u003c/thead\u003e\u003ctbody\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003ecount\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e608\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e608\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e608\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u003cp\u003e608\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c6\"\u003e\u003cp\u003e608\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003emean\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e156.429\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e92.512\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e2.339\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u003cp\u003e22.112\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c6\"\u003e\u003cp\u003e7.542\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003estd\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e35.460\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e6.503\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e0.099\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u003cp\u003e4.740\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c6\"\u003e\u003cp\u003e10.722\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003emin\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e94.246\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e72.415\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e2.090\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u003cp\u003e9.113\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c6\"\u003e\u003cp\u003e1.264\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003e25%\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e132.874\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e88.493\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e2.256\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u003cp\u003e18.533\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c6\"\u003e\u003cp\u003e2.047\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003e50%\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e151.283\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e92.596\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e2.355\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u003cp\u003e23.240\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c6\"\u003e\u003cp\u003e3.786\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003e75%\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e176.351\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e96.370\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e2.418\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u003cp\u003e25.105\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c6\"\u003e\u003cp\u003e7.683\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003emax\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e276.961\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e110.773\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e2.597\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u003cp\u003e36.522\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c6\"\u003e\u003cp\u003e89.057\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003c/tbody\u003e\u003c/colgroup\u003e\u003c/table\u003e\u003c/div\u003e\u003c/p\u003e\u003cp\u003e\u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab3\" border=\"1\"\u003e\u003ccaption language=\"En\"\u003e\u003cdiv class=\"CaptionNumber\"\u003eTable 3\u003c/div\u003e\u003cdiv class=\"CaptionContent\"\u003e\u003cp\u003eStatistical summary of 153 input testing data. Count: total number of samples; mean: mean value; std: standard deviation; min: minimum value; 25%: quantile at 25%; 50%: quantile at 50%; 75%: quantile at 75%; max: maximum value\u003c/p\u003e\u003c/div\u003e\u003c/caption\u003e\u003ccolgroup cols=\"6\"\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e\u003cthead\u003e\u003ctr\u003e\u003cth align=\"left\" colname=\"c1\" morerows=\"1\" rowspan=\"2\"\u003e\u003cp\u003eFeatures\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c2\"\u003e\u003cp\u003eGR\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c3\"\u003e\u003cp\u003eDT\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c4\"\u003e\u003cp\u003eRHOB\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c5\"\u003e\u003cp\u003eTNPH\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c6\"\u003e\u003cp\u003eM2R1\u003c/p\u003e\u003c/th\u003e\u003c/tr\u003e\u003ctr\u003e\u003cth align=\"left\" colname=\"c2\"\u003e\u003cp\u003e(gAPI)\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c3\"\u003e\u003cp\u003e(\u0026micro;s/ft)\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c4\"\u003e\u003cp\u003e(g/cm\u0026sup3;)\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c5\"\u003e\u003cp\u003e(%)\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c6\"\u003e\u003cp\u003e(ohmm)\u003c/p\u003e\u003c/th\u003e\u003c/tr\u003e\u003c/thead\u003e\u003ctbody\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003ecount\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e153\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e153\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e153\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u003cp\u003e153\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c6\"\u003e\u003cp\u003e153\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003emean\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e155.383\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e92.134\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e2.361\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u003cp\u003e22.159\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c6\"\u003e\u003cp\u003e7.584\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003estd\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e33.650\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e6.679\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e0.101\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u003cp\u003e4.841\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c6\"\u003e\u003cp\u003e12.376\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003emin\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e94.787\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e77.998\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e2.096\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u003cp\u003e10.510\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c6\"\u003e\u003cp\u003e1.272\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003e25%\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e136.513\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e87.334\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e2.294\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u003cp\u003e19.473\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c6\"\u003e\u003cp\u003e1.898\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003e50%\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e149.665\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e92.612\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e2.384\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u003cp\u003e23.415\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c6\"\u003e\u003cp\u003e3.372\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003e75%\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e172.658\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e95.163\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e2.426\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u003cp\u003e25.184\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c6\"\u003e\u003cp\u003e7.875\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003emax\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e276.589\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e109.789\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e2.568\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u003cp\u003e31.531\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c6\"\u003e\u003cp\u003e85.452\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003c/tbody\u003e\u003c/colgroup\u003e\u003c/table\u003e\u003c/div\u003e\u003c/p\u003e\u003c/div\u003e"},{"header":"4 Results and discussion","content":"\u003cdiv id=\"Sec10\" class=\"Section2\"\u003e\u003ch2\u003e4.1 Models construction and hyperparameters tuning\u003c/h2\u003e\u003cp\u003eUsing the segmented data set, three machine learning models based on boosting method are constructed, including Adaboost, LightGBM, and XGBoost.\u003c/p\u003e\u003cp\u003eThe three hyper-parameters of boosting models, named max_depth, learning_rate and n_estimators, play important roles in model performance and robustness. In order to obtain a more effective model, the tuning process of max_depth, learning_rate and n_estimators were discussed based on grid-search method.\u003c/p\u003e\u003cp\u003eGenerally, the ideal learning_rate fluctuates between 0.05 and 0.3 for different problems. In this study, the interval between 0.01 to 5 was search by a step of 0.01. For Adaboost model, it obtained best performance when the Learning_rate was set as 0.4. And the Learning_rate of LightGBM was set as 0.7. XGBoost model shows best prediction accuracy with the Learning_rate of 0.1.\u003c/p\u003e\u003cp\u003eMax_depth is also a key hyperparameter in the boosting model, which means the max depth of decision tree in the forest. For the Adaboost model, It does not have a direct max_depth parameter by itself, but it can be used in conjunction with decision trees, and decision tree models have max_depth parameters. In the Scikit-learn library, when using AdaBoostClassifier, setting the decision tree as a weak classifier and specifying its max_depth parameter could meet the need of Max_depth for better performance. For Adaboost and XGBoost models, the Max_depth was set as 8 to make better prediction. And it was set as 10 in the LightGBM model.\u003c/p\u003e\u003cp\u003eIn the boosting models, the value of n_estimators is the number of trees in the \u0026ldquo;forest\u0026rdquo;. The N_ estimators was set as 80, 40, and 100 in Adaboost, LightGBM and XGBoost, respectively.\u003c/p\u003e\u003cp\u003e\u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab4\" border=\"1\"\u003e\u003ccaption language=\"En\"\u003e\u003cdiv class=\"CaptionNumber\"\u003eTable 4\u003c/div\u003e\u003cdiv class=\"CaptionContent\"\u003e\u003cp\u003eHyperparameters search ranges and optimal values\u003c/p\u003e\u003c/div\u003e\u003c/caption\u003e\u003ccolgroup cols=\"4\"\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e\u003cthead\u003e\u003ctr\u003e\u003cth align=\"left\" colname=\"c1\" morerows=\"1\" rowspan=\"2\"\u003e\u003cp\u003eModels\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colspan=\"3\" nameend=\"c4\" namest=\"c2\"\u003e\u003cp\u003eHyperparameters\u003c/p\u003e\u003c/th\u003e\u003c/tr\u003e\u003ctr\u003e\u003cth align=\"left\" colname=\"c2\"\u003e\u003cp\u003eLearning_rate\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c3\"\u003e\u003cp\u003eMax_depth\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c4\"\u003e\u003cp\u003eN_ estimators\u003c/p\u003e\u003c/th\u003e\u003c/tr\u003e\u003c/thead\u003e\u003ctbody\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eAdaboost\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e[0.01,5], 0.4\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e[1,50], 8\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e[10,300], 80\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eLightGBM\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e[0.01,5], 0.7\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e[1,50], 10\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e[10,300], 40\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eXGBoost\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e[0.01,5], 0.1\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e[1,50], 8\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e[10,300], 100\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003c/tbody\u003e\u003c/colgroup\u003e\u003c/table\u003e\u003c/div\u003e\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec11\" class=\"Section2\"\u003e\u003ch2\u003e4.2 Predicted results of three ensemble models\u003c/h2\u003e\u003cp\u003eConfusion matrixes showed the performance of different models. Compared with LightGBM model and XGBoost model, the Adaboost model showed worse with more incorrect identification results, especially in the argillaceous sandstone and conglomerate sandstones. According to the confusion matrixes, LightGBM and XGBoost models showed same performance on four lithologies, including mudstone, argillaceous sandstone, conglomerate sandstones, and coarse sandstone. And the difference of the two models existed in the medium-fine sandstone and pebbly sandstone prediction. A total of 173 medium-fine sandstone samples were predicted correctly by LightGBM, which was 174 in XGBoost model. And in the pebbly sandstone prediction, LightGBM showed better performance than XGBoost model with one correctly predicted sample.\u003c/p\u003e\u003cp\u003e\u003c/p\u003e\u003cp\u003eUsing the tuned hyperparameters, the models were constructed and the predicted results of three ensemble models on testing data were compared. The detailed accuracy verification parameters were listed in the Table \u003cspan refid=\"Tab5\" class=\"InternalRef\"\u003e5\u003c/span\u003e\u0026thinsp;~\u0026thinsp;7, including Adaboost, LightGBM, and XGBoost model.\u003c/p\u003e\u003cp\u003eFor different types of lithology, models showed different performance. The accuracy was the overall proportion of correct predictions, which shows the model\u0026rsquo;s performance intuitively. The XGBoost model reached the best performance with an accuracy of 0.902. And the accuracy on testing data was 0.725 and 0.895 of Adaboost and the LightGBM models.\u003c/p\u003e\u003cp\u003eXGBoost and LightGBM models always showed better performance in the prediction process than the Adaboost judged by the other verification parameters. For example, XGBoost model showed satisfactory performance on the mudstone prediction, with a recall of 0.978, a precision of 0.936, and a score of 0.957. While the three parameters on mudstone prediction in Adaboost model were 0.933, 0.894, and 0.913, respectively. XGBoost and LightGBM models showed similar performance on lithology, and the verification parameters showed subtle differences. For example, XGBoost model obtained a recall of 0.846 in the medium-fine sandstone prediction, while the recall is 0.821 in the LightGBM model when predicting the medium-fine sandstone. In order to see the comparison more intuitively, the average values of recall, precision, and F1-score for six lithologies were calculated and compared. In the Adaboost mode, the average value of recall for all lithologies was 0.582, and it was 0.905 and 0.909 in LightGBM and XGBoost model. Combining these three synthesis parameters, the XGBoost model outperformed the LightGBM model by a small margin.\u003c/p\u003e\u003cp\u003e\u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab5\" border=\"1\"\u003e\u003ccaption language=\"En\"\u003e\u003cdiv class=\"CaptionNumber\"\u003eTable 5\u003c/div\u003e\u003cdiv class=\"CaptionContent\"\u003e\u003cp\u003ePrediction results analysis table of Adaboost model.\u003c/p\u003e\u003c/div\u003e\u003c/caption\u003e\u003ccolgroup cols=\"4\"\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e\u003cthead\u003e\u003ctr\u003e\u003cth align=\"left\" colname=\"c1\" morerows=\"1\" rowspan=\"2\"\u003e\u003cp\u003eLithologies\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colspan=\"3\" nameend=\"c4\" namest=\"c2\"\u003e\u003cp\u003eAdaboost (testing data)\u003c/p\u003e\u003c/th\u003e\u003c/tr\u003e\u003ctr\u003e\u003cth align=\"left\" colname=\"c2\"\u003e\u003cp\u003erecall\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c3\"\u003e\u003cp\u003eprecision\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c4\"\u003e\u003cp\u003eF1-score\u003c/p\u003e\u003c/th\u003e\u003c/tr\u003e\u003c/thead\u003e\u003ctbody\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003emedium-fine sandstone\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e0.590\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e0.590\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e0.590\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003epebbly sandstone\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e0.786\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e0.595\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e0.677\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003emudstone\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e0.933\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e0.894\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e0.913\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eargillaceous sandstone\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e0.529\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e0.692\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e0.600\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003econglomerate sandstones\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e0.652\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e0.882\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e0.750\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003ecoarse sandstone\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e0.000\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e0.000\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e0.000\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eAverage\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e0.582\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e0.609\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e0.588\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eAccuracy\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colspan=\"3\" nameend=\"c4\" namest=\"c2\"\u003e\u003cp\u003e0.725\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003c/tbody\u003e\u003c/colgroup\u003e\u003c/table\u003e\u003c/div\u003e\u003c/p\u003e\u003cp\u003e\u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab6\" border=\"1\"\u003e\u003ccaption language=\"En\"\u003e\u003cdiv class=\"CaptionNumber\"\u003eTable 6\u003c/div\u003e\u003cdiv class=\"CaptionContent\"\u003e\u003cp\u003ePrediction results analysis table of LightGBM model.\u003c/p\u003e\u003c/div\u003e\u003c/caption\u003e\u003ccolgroup cols=\"4\"\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e\u003cthead\u003e\u003ctr\u003e\u003cth align=\"left\" colname=\"c1\" morerows=\"1\" rowspan=\"2\"\u003e\u003cp\u003eLithologies\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colspan=\"3\" nameend=\"c4\" namest=\"c2\"\u003e\u003cp\u003eLightGBM (testing data)\u003c/p\u003e\u003c/th\u003e\u003c/tr\u003e\u003ctr\u003e\u003cth align=\"left\" colname=\"c2\"\u003e\u003cp\u003erecall\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c3\"\u003e\u003cp\u003eprecision\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c4\"\u003e\u003cp\u003eF1-score\u003c/p\u003e\u003c/th\u003e\u003c/tr\u003e\u003c/thead\u003e\u003ctbody\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003emedium-fine sandstone\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e0.821\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e0.914\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e0.865\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003epebbly sandstone\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e0.893\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e0.833\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e0.862\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003emudstone\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e0.978\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e0.957\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e0.967\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eargillaceous sandstone\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e0.824\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e0.824\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e0.824\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003econglomerate sandstones\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e0.913\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e0.875\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e0.894\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003ecoarse sandstone\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e1.000\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e0.000\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e0.000\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eAverage\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e0.905\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e0.734\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e0.735\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eAccuracy\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colspan=\"3\" nameend=\"c4\" namest=\"c2\"\u003e\u003cp\u003e0.895\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003c/tbody\u003e\u003c/colgroup\u003e\u003c/table\u003e\u003c/div\u003e\u003c/p\u003e\u003cp\u003e\u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab7\" border=\"1\"\u003e\u003ccaption language=\"En\"\u003e\u003cdiv class=\"CaptionNumber\"\u003eTable 7\u003c/div\u003e\u003cdiv class=\"CaptionContent\"\u003e\u003cp\u003ePrediction results analysis table of XGBoost model.\u003c/p\u003e\u003c/div\u003e\u003c/caption\u003e\u003ccolgroup cols=\"4\"\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e\u003cthead\u003e\u003ctr\u003e\u003cth align=\"left\" colname=\"c1\" morerows=\"1\" rowspan=\"2\"\u003e\u003cp\u003eLithologies\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colspan=\"3\" nameend=\"c4\" namest=\"c2\"\u003e\u003cp\u003eXGBoost (testing data)\u003c/p\u003e\u003c/th\u003e\u003c/tr\u003e\u003ctr\u003e\u003cth align=\"left\" colname=\"c2\"\u003e\u003cp\u003erecall\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c3\"\u003e\u003cp\u003eprecision\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c4\"\u003e\u003cp\u003eF1-score\u003c/p\u003e\u003c/th\u003e\u003c/tr\u003e\u003c/thead\u003e\u003ctbody\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003emedium-fine sandstone\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e0.846\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e0.917\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e0.880\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003epebbly sandstone\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e0.893\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e0.806\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e0.847\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003emudstone\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e0.978\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e0.936\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e0.957\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eargillaceous sandstone\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e0.824\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e0.933\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e0.875\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003econglomerate sandstones\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e0.913\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e0.913\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e0.913\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003ecoarse sandstone\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e1.000\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e0.000\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e0.000\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eAverage\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e0.909\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e0.751\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e0.745\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eAccuracy\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colspan=\"3\" nameend=\"c4\" namest=\"c2\"\u003e\u003cp\u003e0.902\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003c/tbody\u003e\u003c/colgroup\u003e\u003c/table\u003e\u003c/div\u003e\u003c/p\u003e\u003cp\u003eFinally, the ROC curves were used to compare the models\u0026rsquo; performance quantitatively. The AUC value indicates the degree of prediction for every type of lithology. For example, in the Adaboost model, the AUC value of medium-fine sandstone is 0.776, while it is 0.974 and 0.972 in LightGBM and XGBoost models. It means for the medium-fine sandstone prediction, LightGBM showed the best performance in the three models. Combining AUC values for all lithologies, XGBoost performed best in overall lithology prediction.\u003c/p\u003e\u003cp\u003e\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec12\" class=\"Section2\"\u003e\u003ch2\u003e4.3 Visualization and interpretation of the best model\u003c/h2\u003e\u003cp\u003e(1) Predicted lithology comparison on well profile\u003c/p\u003e\u003cp\u003eBased on the XGBoost model, continues lithology on the well profile was obtained, and it was compared with logging lithology and manually labeled lithology. It is obvious that the predicted lithology has more accurate and finer lithology than logging lithology. And it also indicates the complete lithology profile could be obtained through machine learning models based on well logs.\u003c/p\u003e\u003cp\u003e\u003c/p\u003e\u003cp\u003e(2) hyperparameters of the XGBoost model\u003c/p\u003e\u003cp\u003eVisualization of model parameters is necessary for the reproduction of experimental results. Through the order of \u0026ldquo;get_ parameters\u0026rdquo; in XGBoost model, the hyperparameters of the best model is visualized as follows: {'objective': 'multi:softprob', 'use_label_encoder': True, 'base_score': 0.5, 'booster': 'gbtree', 'colsample_bylevel': 1, 'colsample_bynode': 1, 'colsample_bytree': 0.8, 'enable_categorical': False, 'gamma': 0, 'gpu_id': -1, 'importance_type': None, 'interaction_constraints': '', 'learning_rate': 0.1, 'max_delta_step': 0, 'max_depth': 8, 'min_child_weight': 1, 'missing': nan, 'monotone_constraints': '()', 'n_estimators': 100, 'n_jobs': 8, 'num_parallel_tree': 1, 'predictor': 'auto', 'random_state': 1440, 'reg_alpha': 0, 'reg_lambda': 1, 'scale_pos_weight': None, 'subsample': 0.5, 'tree_method': 'exact', 'validate_parameters': 1, 'verbosity': None, 'eval_metric': ['logloss', 'auc', 'error'], 'nthread': -1, 'seed': 1440}\u003c/p\u003e\u003cp\u003e(2) Shap values of different variables\u003c/p\u003e\u003cp\u003eMachine learning models are black boxes for users. Due to the difficulties to visualize the prediction process of these models, the interpretability of models remains a challenge and even an issue in the process of using models. In this manuscript, SHAP (Shapley Additive Explanations) method was used to interpreted the best model. SHAP is an interpretative framework derived from cooperative game theory principles, designed to elucidate prediction outcomes across various machine learning models (Antonini et al., n.d.; Lin P. et al., \u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e2023\u003c/span\u003e; Y. Sun et al., n.d., n.d.; Wen et al., n.d.; J. Zhang et al., n.d.). This method quantifies the contribution of individual features through SHAP values - distinct numerical measures assigned to each input variable within a given data sample. Theoretically grounded in Shapley value calculations from game theory, SHAP establishes an additive explanation model where features are treated as collaborative participants in the prediction process. This approach ensures mathematically consistent attribution of feature importance while maintaining local interpretability for individual predictions and global insights into model behavior.\u003c/p\u003e\u003cp\u003eDifferent well logs played different roles in the prediction process of different lithologies, which was shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig11\" class=\"InternalRef\"\u003e9\u003c/span\u003e. For example, the in the Fig.\u0026nbsp;\u003cspan refid=\"Fig11\" class=\"InternalRef\"\u003e9\u003c/span\u003e(a), GR Has the widest range of its SHAP values. It indicated that when the model making the medium-fine sandstone perdition, GR had the greatest probability of having the greatest contribution to the predicted results. While it showed the minimal contribution in mudstone perdition shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig11\" class=\"InternalRef\"\u003e9\u003c/span\u003e(c). Through the interpretation of a particular lithology, the contribution of the well logs to the result could be quantified and the working process of the model could be obtained.\u003c/p\u003e\u003cp\u003e\u003c/p\u003e\u003cp\u003e(3) Global model interpretation of XGBoost model\u003c/p\u003e\u003cp\u003eAfter the local interpretation is complete, the global interpretation is crucial to the understanding of the XGBoost model.\u003c/p\u003e\u003cp\u003eAs shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig12\" class=\"InternalRef\"\u003e10\u003c/span\u003e, the global contribution of different well logs in lithology prediction were interpreted. For example, in the bule bar in Fig.\u0026nbsp;\u003cspan refid=\"Fig12\" class=\"InternalRef\"\u003e10\u003c/span\u003e, it was obvious that RHOB played the most important role in mudstone prediction, and then there came the well logs of M2R1, TNPH, DT, and GR. This result also verifies that GR is difficult to accurately and intuitively correspond to lithology changes in the lithology prediction of conglomerate reservoirs.\u003c/p\u003e\u003cp\u003e\u003c/p\u003e\u003c/div\u003e"},{"header":"5 Conclusion","content":"\u003cp\u003eIn this study, using well logs as inputs, the rapid and accurate lithology identification of conglomerate reservoir was achieved based on machine learning methods. Several important conclusions were concluded as follows.\u003c/p\u003e\u003cp\u003e\u003col\u003e\u003cspan\u003e\u003cli\u003e\u003cp\u003eA method for automatically classifying the lithology of the sandstone conglomerate reservoir is proposed using well logs as inputs, including GR, DT, RHOB, TNPH, and M2R1.\u003c/p\u003e\u003c/li\u003e\u003c/span\u003e\u003cspan\u003e\u003cli\u003e\u003cp\u003eThree machine learning models were constructed to make a rapid prediction of lithology using well-logging data, including Adboost model, LightGBM model, and the XGBoost model. The XGBoost showed the best performance among the three models, with an accuracy of 0.902.\u003c/p\u003e\u003c/li\u003e\u003c/span\u003e\u003cspan\u003e\u003cli\u003e\u003cp\u003eBased on the interpretation tool of SHAP, the global contribution of different well logs in lithology identification were quantified. For the all lithologies, TNPH played the most important role in mudstone prediction, and then there came the well logs of M2R1, RHOB, DT, and GR. And for one specific lithology type, SHAP values for all variables could meet the need for quantitative ranking of contributions.\u003c/p\u003e\u003c/li\u003e\u003c/span\u003e\u003c/ol\u003e\u003c/p\u003e"},{"header":"Declarations","content":"\u003ch2\u003eAuthor Contribution\u003c/h2\u003e\u003cp\u003eConceptualization, Methodology, Formal Analysis, Investigation, and the Original Draft Writing: Jiming Liu.; Validation, Resources, and Data Curation: Dongjin XuConflict of interest: the authors declare no competing interests.\u003c/p\u003e\u003ch2\u003eAcknowledgements\u003c/h2\u003e\u003cp\u003eThis work was financially supported by the National Natural Science Foundation of China (No. 51504038).\u003c/p\u003e\u003ch2\u003eData Availability\u003c/h2\u003e\u003cp\u003eThe datasets used and analyzed during the current study available from the corresponding author on reasonable request.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\n\u003cli\u003eAlzubaidi, F., Mostaghimi, P., Swietojanski, P., Clark, S.R., Armstrong, R.T., n.d. Automated lithology classification from drill core images using convolutional neural networks 197, 107933.\u003c/li\u003e\n\u003cli\u003eAntonini, A.S., Tanzola, J., Asiain, L., Ferracutti, G.R., Castro, S.M., Bjerg, E.A., Ganuza, M.L., n.d. Machine Learning model interpretability using SHAP values: Application to Igneous Rock Classification task 23, 100178.\u003c/li\u003e\n\u003cli\u003eAshraf, U., Zhang, H., Anees, A., Mangi, H.N., Ali, M., Zhang, X., Imraz, M., Abbasi, S.S., Abbas, A., Ullah, Z., Ullah, J., Tan, S., 2021. A Core Logging, Machine Learning and Geostatistical Modeling Interactive Approach for Subsurface Imaging of Lenticular Geobodies in a Clastic Depositional System, SE Pakistan 30, 2807-2830.\u003c/li\u003e\n\u003cli\u003eBrandmeier, M., Chen, Y., 2019. Lithological classification using multi-sensor data and convolutional neural networks xlii-2/w16, 55-59.\u003c/li\u003e\n\u003cli\u003eCAO, Y., MIAO, Q.-G., LIU, J.-C., GAO, L., n.d. Advance and Prospects of AdaBoost Algorithm 39, 745-758.\u003c/li\u003e\n\u003cli\u003eChen T., Guestrin C., 2016. XGBoost. ACM, pp. 785-794.\u003c/li\u003e\n\u003cli\u003eChen, T., Guestrin, C., 2016. XGBoost: A Scalable Tree Boosting System. abs/1603.02754.\u003c/li\u003e\n\u003cli\u003eDeng C., Pan H., Fang S., Konat\u0026eacute; A.A., Qin R., 2017. Support vector machine as an alternative method for lithology classification of crystalline rocks 14, 341-349.\u003c/li\u003e\n\u003cli\u003eDev V.A., Eden M.R., n.d. Evaluating the Boosting Approach to Machine Learning for Formation Lithology Classification. Elsevier.\u003c/li\u003e\n\u003cli\u003eDhaliwal, S.S., Nahid, A.-A., Abbas, R., 2018. Effective Intrusion Detection System Using XGBoost 9, 149.\u003c/li\u003e\n\u003cli\u003eDong, S.-Q., Sun, Y.-M., Xu, T., Zeng, L.-B., Du, X.-Y., Yang, X., Liang, Y., n.d. How to improve machine learning models for lithofacies identification by practical and novel ensemble strategy and principles 20, 733-752.\u003c/li\u003e\n\u003cli\u003eFerreira, A.J., Figueiredo, M.A.T., n.d. Boosting Algorithms: A Review of Methods, Theory, and Applications. Springer New York.\u003c/li\u003e\n\u003cli\u003eGu, Y., Zhang, D., Bao, Z., n.d. Lithological classification via an improved extreme gradient boosting: A demonstration of the Chang 4+5 member, Ordos Basin, Northern China 215, 104798.\u003c/li\u003e\n\u003cli\u003eLin, N., Fu, J., Jiang, R., Li, G., Yang, Q., 2023. Lithological Classification by Hyperspectral Images Based on a Two-Layer XGBoost Model, Combined with a Greedy Algorithm 15, 3764.\u003c/li\u003e\n\u003cli\u003eLin P., Dong X., Ji Y., Xia J., Zhai Y., Hou Q., 2023. Explainable Prediction Model of Logging Lithology Classification Based on XGBoost and SHAP. IEEE, pp. 307-312.\u003c/li\u003e\n\u003cli\u003eLiu, J.-J., Liu, J.-C., 2022. Integrating deep learning and logging data analytics for lithofacies classification and 3D modeling of tight sandstone reservoirs 13, 101311.\u003c/li\u003e\n\u003cli\u003eLiu, Y., Zhu, R., Zhai, S., Li, N., Li, C., n.d. Lithofacies identification of shale formation based on mineral content regression using LightGBM algorithm: A case study in the Luzhou block, South Sichuan Basin, China 11, 4256-4272.\u003c/li\u003e\n\u003cli\u003eOgunleye A., Wang Q.-G., 2020. XGBoost Model for Chronic Kidney Disease Diagnosis 17, 2131-2140.\u003c/li\u003e\n\u003cli\u003eQiu, Y., Zhou, J., Khandelwal, M., Yang, H., Yang, P., Li, C., n.d. Performance evaluation of hybrid WOA-XGBoost, GWO-XGBoost and BO-XGBoost models to predict blast-induced ground vibration 38, 4145-4162.\u003c/li\u003e\n\u003cli\u003eRuiyi, H., Zhuwen, W., Wenhua, W., Fanghui, X., Xinghua, Q., Yitong, C., n.d. Lithology identification of igneous rocks based on XGboost and conventional logging curves, a case study of the eastern depression of Liaohe Basin 195, 104480.\u003c/li\u003e\n\u003cli\u003eSaporetti C.M., da Fonseca L.G., Pereira E., n.d. A Lithology Identification Approach Based on Machine Learning With Evolutionary Parameter Tuning 16, 1819-1823.\u003c/li\u003e\n\u003cli\u003eSchapire R.E., n.d. The Boosting Approach to Machine Learning: An Overview. Springer New York.\u003c/li\u003e\n\u003cli\u003eSun, Y., Pang, S., Li, H., Qiao, S., Zhang, Y., n.d. Enhanced Lithology Classification Using an Interpretable SHAP Model Integrating Semi-Supervised Contrastive Learning and Transformer with Well Logging Data 34, 785-813.\u003c/li\u003e\n\u003cli\u003eSun Y., Pang S., Zhang Y., n.d. Application of Adaboost-Transformer Algorithm for Lithology Identification Based on Well Logging Data 21, 1-5.\u003c/li\u003e\n\u003cli\u003eSun, Y., Pang, S., Zhao, Z., Zhang, Y., n.d. Interpretable SHAP Model Combining Meta-learning and Vision Transformer for Lithology Classification Using Limited and Unbalanced Drilling Data in Well Logging 33, 2545-2565.\u003c/li\u003e\n\u003cli\u003eWang D., Zhang Y., Zhao Y., 2017. LightGBM. ACM, pp. 7-11.\u003c/li\u003e\n\u003cli\u003eWen, H., Liu, B., Di, M., Li, J., Zhou, X., n.d. A SHAP-enhanced XGBoost model for interpretable prediction of coseismic landslides 74, 3826-3854.\u003c/li\u003e\n\u003cli\u003eXu, Z., Shi, H., Lin, P., Liu, T., n.d. Integrated lithology identification based on images and elemental data from rocks 205, 108853.\u003c/li\u003e\n\u003cli\u003eZhang, J., Ma, X., Zhang, Jialan, Sun, D., Zhou, X., Mi, C., Wen, H., n.d. Insights into geospatial heterogeneity of landslide susceptibility based on the SHAP-XGBoost model 332, 117357.\u003c/li\u003e\n\u003cli\u003eZhang L., Zhan C., 2017. Machine Learning in Rock Facies Classification: An Application of XGBoost. Society of Exploration Geophysicists and Chinese Petroleum Society.\u003c/li\u003e\n\u003cli\u003eZhang, X., Sun, Q., He, K., Wang, Z., Wang, J., n.d. Lithology identification of logging data based on improved neighborhood rough set and AdaBoost 15, 1201-1213.\u003c/li\u003e\n\u003cli\u003eZhao, B., Liao, W., 2025. Lithology Identification of Buried Hill Reservoir Based on XGBoost with Optimized Interpretation 13, 682.\u003c/li\u003e\n\u003cli\u003eZhao, C., Jiang, Y., Wang, L., 2022. Data-driven diagenetic facies classification and well-logging identification based on machine learning methods: A case study on Xujiahe tight sandstone in Sichuan Basin 217, 110798.\u003c/li\u003e\n\u003cli\u003eZhao, X., Chen, X., Huang, Q., Lan, Z., Wang, X., Yao, G., 2022. Logging-data-driven permeability prediction in low-permeable sandstones based on machine learning with pattern visualization: A case study in Wenchang A Sag, Pearl River Mouth Basin 214, 110517.\u003c/li\u003e\n\u003cli\u003eZhong, R., Johnson, R., Chen, Z., 2020. Generating pseudo density log from drilling and logging-while-drilling data using extreme gradient boosting (XGBoost) 220, 103416.\u003c/li\u003e\n\u003cli\u003eZou, Y., Chen, Y., Deng, H., n.d. Gradient Boosting Decision Tree for Lithology Identification with Well Logs: A Case Study of Zhaoxian Gold Deposit, Shandong Peninsula, China 30, 3197-3217.\u003c/li\u003e\n\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":true,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"scientific-reports","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"scirep","sideBox":"Learn more about [Scientific Reports](http://www.nature.com/srep/)","snPcode":"","submissionUrl":"","title":"Scientific Reports","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"stoa","reportingPortfolio":"Scientific Reports","inReviewEnabled":true,"inReviewRevisionsEnabled":true},"keywords":"lithology identification, machine learning, logging data driven, conglomerate reservoir","lastPublishedDoi":"10.21203/rs.3.rs-7429684/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-7429684/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eLithology is a key parameter in reservoir fine description and evaluation. It is difficult to identify reservoir lithology directly by single curve or conventional cross plot method, because the mud-gravel mixing in complex reservoirs. The accurate identification of conglomerate reservoir lithology has always been a high-profile issue in reservoir characterization. In this study, over 70 meters of cores were observed in detail. The conglomerate lithology after depth correction is matched with the log curves, including five log curves such as GR, DT, RHOB, TNPH, and M2R1. With the logging data as input, three machine learning models were built separately, and the prediction results were compared using a variety of methods, including accuracy analysis parameters and ROC curves. The results show that the machine learning model based on logging data has excellent performance in the lithology prediction of conglomerate reservoir, and the XGBoost model shows the best prediction results with the highest prediction accuracy of 0.902. In addition, the optimal model is interpreted by SHAP method. In different lithology prediction, the contribution of different log curves is different. On the whole, TNPH curve plays the most important role in lithology prediction. This study provides insights for lithology prediction of complex reservoirs.\u003c/p\u003e","manuscriptTitle":"Logging-data-driven lithology identification of conglomerate reservoir by the assistance of integrated machine learning methods","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-09-12 06:50:25","doi":"10.21203/rs.3.rs-7429684/v1","editorialEvents":[{"type":"communityComments","content":0},{"type":"decision","content":"Revision requested","date":"2025-10-09T04:16:01+00:00","index":"","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2025-09-19T04:09:13+00:00","index":"hide","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2025-09-19T03:57:34+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"332950642740123981712102859932887231566","date":"2025-09-15T07:15:24+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"113252216518603451313717187496769581772","date":"2025-09-06T15:11:08+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"280025421120970460530170590910992610627","date":"2025-09-05T10:02:33+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"119490297962806367297462093888544906632","date":"2025-09-05T08:28:45+00:00","index":"hide","fulltext":""},{"type":"reviewersInvited","content":"","date":"2025-09-05T08:12:00+00:00","index":"","fulltext":""},{"type":"editorAssigned","content":"","date":"2025-09-02T00:47:41+00:00","index":"","fulltext":""},{"type":"editorInvited","content":"","date":"2025-09-01T17:34:39+00:00","index":"","fulltext":""},{"type":"checksComplete","content":"","date":"2025-08-29T09:10:42+00:00","index":"","fulltext":""},{"type":"submitted","content":"Scientific Reports","date":"2025-08-29T09:07:20+00:00","index":"","fulltext":""}],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"scientific-reports","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"scirep","sideBox":"Learn more about [Scientific Reports](http://www.nature.com/srep/)","snPcode":"","submissionUrl":"","title":"Scientific Reports","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"stoa","reportingPortfolio":"Scientific Reports","inReviewEnabled":true,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"3c941b46-c1a2-405c-8c8b-586b25aa915a","owner":[],"postedDate":"September 12th, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"published-in-journal","subjectAreas":[{"id":54303178,"name":"Physical sciences/Energy science and technology"},{"id":54303179,"name":"Earth and environmental sciences/Solid earth sciences"}],"tags":[],"updatedAt":"2025-11-24T16:00:27+00:00","versionOfRecord":{"articleIdentity":"rs-7429684","link":"https://doi.org/10.1038/s41598-025-27640-3","journal":{"identity":"scientific-reports","isVorOnly":false,"title":"Scientific Reports"},"publishedOn":"2025-11-23 15:56:59","publishedOnDateReadable":"November 23rd, 2025"},"versionCreatedAt":"2025-09-12 06:50:25","video":"","vorDoi":"10.1038/s41598-025-27640-3","vorDoiUrl":"https://doi.org/10.1038/s41598-025-27640-3","workflowStages":[]},"version":"v1","identity":"rs-7429684","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-7429684","identity":"rs-7429684","version":["v1"]},"buildId":"XKTyCvWXoU3ODBz1xrDgd","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.