Lithology identification with wireline logs based on stacked- driven ensemble method for complex carbonate reservoirs

doi:10.21203/rs.3.rs-8931314/v1

Lithology identification with wireline logs based on stacked- driven ensemble method for complex carbonate reservoirs

2026 · doi:10.21203/rs.3.rs-8931314/v1

preprint OA: closed

Full text JSON View at publisher

Full text 140,550 characters · extracted from preprint-html · click to expand

Lithology identification with wireline logs based on stacked- driven ensemble method for complex carbonate reservoirs | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article Lithology identification with wireline logs based on stacked- driven ensemble method for complex carbonate reservoirs Yan Zhang, Yang Chen, Weiwei Xie, Kai Xing, Shaobo Cheng This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-8931314/v1 This work is licensed under a CC BY 4.0 License Status: Under Review Version 1 posted 10 You are reading this latest preprint version Abstract Carbonate reservoirs are influenced by structure, deposition and diagenesis, making its lithology complex and diverse. Thus, carrying out the research of lithology identification is significantly essential for reservoir evaluation. With the development of intelligent methods and ensemble strategies, the stacked-driven ensemble method (SDEM) was proposed. Firstly, the lithologies were determined through the logging while drill (LWD) data and core observations. And six sensitive wireline logs were selected as the input of the SDEM, namely GR (natural gamma ray logging), DEN (density logging), AC (compensated acoustic logging), RLLD (deep lateral resistivity logging), PE (photoelectric absorption cross-section logging) and CNL (neutron logging). Then, a two-level SDEM was constructed. For the first level, base models, including BPNN (back-propagation neural network), SVM (support vector machine) and DT (decision tree), were used, while the DT was employed as the meta-model in the second level. In addition, the grid search method combined with 10-fold cross-validation was adopted to search for the optimal hyperparameters of SDEM. The results showed that the average classification accuracy of 10-fold cross-validation reached 95.1%, which was higher (approximately 2.6%) than any individual method. Finally, two cases in different regions in the Sulige gas field of Ordos Basin were discussed and the results showed that the proposed SDEM outperforms all other individual approaches or traditional ensemble learning methods (ELMs) with higher accuracy and superior performance. Subsequently, the developed approach is applicable to the predictive work in other oil and gas exploration fields, which can improve exploration precision and raise hydrocarbon production. Carbonate reservoirs Lithology identification Stacked-driven ensemble method Wireline logs Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Figure 6 Figure 7 Figure 8 Figure 9 Figure 10 Figure 11 Figure 12 Figure 13 Introduction Lithology identification is the basis of reservoir description, formation evaluation and reservoir evaluation, especially for the carbonate reservoirs, which has the characteristics of strong heterogeneity and complex lithology, making the lithology identification more difficulty (Bai et al. 2016; Zhao et al. 2017; Zhang et al. 2020). Therefore, a good understanding of the spatial distribution of lithology is of great significance to the evaluation of carbonate reservoirs (Jia et al. 2025). At present, three methods, mainly well logging, observations of well cores and the wireline logs analysis, are commonly used to obtain the lithology information (Hughes & Thomas 2011; Liu et al. 2024; Wang et al. 2025). Due to the subjectivity for the well logging method and the high cost for the well cores, the last method has become an important means of lithology identification. Presently, the methods on the lithology identification based with wireline logs are relatively mature, mainly including qualitative analysis, crossplots analysis and mathematical methods (Li & Li 2013; Horrocks et al. 2015; Insua et al. 2015; Konate et al. 2017; Abbey et al. 2018; Ren et al. 2019). Qualitative analysis was commonly based on the morphological characteristics of the wireline logs, which was rarely used (Horrocks et al. 2015; Insua et al. 2015; Ren et al. 2019). The crossplots, mainly including the crossplots of GR with three porosity curves (AC, DEN, CNL) and the crossplot of PE and DEN, were used for its simple characteristics. However, only two logs can be used and it cannot analyze the relationships for the lithology with more logs, especially for complex carbonate reservoirs (Abbey et al. 2018; Ren et al. 2019). Considering the complexity and diversity of the carbonate reservoirs, there is a large overlap between different lithologies with any two logs. The accuracy of lithology identification with this method is reduced (Corina & Hovda 2018; Ren et al. 2019). In recent years, with the development of machine learning (ML) methods, the third method has become a hot spot for the lithology identification of carbonate reservoirs (Li & Li 2013; Konate et al. 2017; Xie et al. 2018; Ren et al. 2019; Bressan et al. 2020). Numerous studies of lithology identification with the ML methods, such as K-nearest Neighbors (KNN) (Wang et al. 2018), Naïve Bayes (NB) (Wang & Carr 2012), Support Vector Machine (SVM) (Li & Li 2013; Wang et al. 2014; Xie et al. 2018), Artificial Neural Network (ANN) (Bressan et al. 2020; Wang et al. 2025), Fuzzy Logic System (FLS) (Lopes & Andrade 2019), Decision Tree (DT) (Xie et al. 2018; Bressan et al. 2020), Deep Learning (DL) (Xiang et al. 2020), have been carried out. Although each method has shown high performance, certain shortcomings are inevitably existing, such as DL cannot estimate the distribution characteristics of data without bias, ANN is prone to converging to local optimal values and face an inherent contradiction between the generalization performance and learning capacity, et al. These drawbacks collectively result in inferior generalization and weak robustness of the prediction model, and additionally prevent it from effectively processing misclassified samples. Therefore, the lithology identification of carbonate reservoirs based on a single machine learning method has certain limitations (Tan et al. 2020; Shuvo & Joy 2024). In order to solve the drawbacks of single method on the lithology identification, many scholars have carried out the research on the another identification mechanism-ensemble learning method (ELM), which combined one more learners (also called weaker learner or base model) with the bias and variance into a better learner (also called strong learner or meta-model) (Xie et al. 2018; Tewari & Dwivedi 2019; Bressan et al. 2020). Sun et al.(2019) analyzed three popular machine learning algorithms (OVO SVMs, OVR SVMs and RF) for lithology identification and the RF classifier is better than the other two classifiers with a accuracy of greater than 90%. Dev et al. (2019) applied recently developed gradient boosted decision tree (GBDT) systems and achieved higher performance. The difference between various ELMs is mainly on the ensemble strategy in the process of creating a strong learner. Commonly used ensemble strategies were voting, averaging, boosting, bagging, et al. All these ensemble strategies basically combined weak learners based on deterministic algorithms and cannot significantly reduce the bias and variance of the prediction model (Dev & Eden 2018; Dev & Eden 2019; Sun et al. 2019; Tewari & Dwivedi 2019; Tan et al. 2020). Another commonly used ELM was based on the stacked generalization method (also called stacked-driven ensemble method, SDEM). SDEM is a two-level model, in which the first level is inheriting and integrating the merits of base models and the second layer uses the meta-model to reduce the bias and variance of the final model, thus, it improves the performance of the model without the risk of overfitting (Fang et al. 2024; Wang et al. 2025). At present, this method has gained much ground with the classification or regression problems in many fields (Zhai & Chen 2018; Agarwal & Chowdary 2020; Sun & Li 2020). However, it remains an innovative technology for oil and gas exploration, particularly for lithology identification. In this study, we focus on the complex lithology identification based on a two-level SDEM with the wireline logs, which provides a feasible method for the complex carbonate reservoirs. To accomplish this technical objective, the theory and methodology of SDEM and associated methods were investigated initially. For the ensemble model, base models consisted of BPNN, SVM and DT, and the meta-model was specified as DT. The grid search method combined with 10-fold cross-validation was adopted to search for the optimal hyperparameters of SDEM. In order to analyze the generalization of method, two cases of Well1 and Well2 within the Ma 5 Member of Majiagou Formation located in the eastern and western regions of the Sulige Gas Field were tested. And the predicted were all well consistent with the logging while drilling (LWD) equipment. Theory and Methodology In this section, the modeling structure of conventional ELM was presented. Moreover, each model used in this paper and the ensemble strategies were also introduced. Modeling structure ELM represents a machine learning framework that addresses the target problem via the joint training of multiple expert models, and combines the options of multiple experts to obtain a better decision result. The ELMs help to reduce the chance of error while increasing the overall reliability and confidence of the model (Dev & Eden 2018; Fang et al. 2024; Wang et al. 2025). Figure 1 shows the structure of ELM, which consist two parts. One is the training process of multiple weak learner models. Each model will produce a decision result based on the training sets. The second part is the ensemble strategy. Considering the high bias and variance of base model, the ensemble strategy used the individual result produced by the base learners as the input of the ELM. Thus, the misclassified samples were corrected and a comprehensive result was obtained, which provided the best solution overall (Zhai & Chen 2018; Sun & Li 2020). This way of solving problems often occurs in part of human daily lives, such as, the article will be reviewed by multiple experts before it is accepted or rejected, the company will call the relevant experts to evaluate the projects before investing in them, the potential for gas or oil of one well will be evaluated by the experts with various data before drilling, et al. All the situations above are the committee of experts to reduce the risk of taking a wrong decision on a problem. Therefore, in terms of practical application and theory, the ensemble learning methods will achieve better results, less risky and unbiased, than the individual method (Agarwal & Chowdary 2020; Tan et al. 2020). ELMs can serve as an effective approach for both classification and regression. Depending on the problem solved, the base models are also different. The classification problems output different categories (Dev & Eden 2018; Dev & Eden 2019), and the regression problems output the specific value (Zhai & Chen 2018; Agarwal & Chowdary 2020; Sun & Li 2020). The lithology identification problem based on wireline logs developed in this paper is a classification problem. Base models The lithology identification is a classification problem, thus some methods that can be competent for classification were used. Considering the complex nonlinear relationship between wireline logs and lithology, three methods, including DT, BPNN and SVM, were chosen as the base model, while the DT method was adopted as the meta-model. The three models have their own advantages, such as BPNN has the ability of strong learning and processing noisy data (Bressan et al. 2020), DT method fully considers the mutual relationships between different features (Xie et al. 2018; Bressan et al. 2020), SVM has the excellent generalization ability (Bressan et al. 2020). Three methods are described as follows. The DT is a supervised classification method with the tree structure. The root node of tree will be achieved that best divides the training data. From the root node, each internal node (non-leaf nodes) represents the splitting boundaries of selected feature, each branch represents the discriminant results based on the above internal node, the leaf nodes represent the final classification results (Xie et al. 2018; Bressan et al. 2020). The DT mainly includes two processes, namely the construction of tree and the classification with tree. The former process is used to summarize the decision tree with training sets. The latter is used to classify the training sets (Xie et al. 2018; Bressan et al. 2020; Tan et al. 2020). The BPNN is a multi-layer feed-forward neural network based on the gradient descent method, which was proposed by Rumelhart and McClelland in 1986 (Rumelhart et al. 1986). It consists two stages, namely the signal forward propagation and error back-propagation. In the forward propagation stage, signals are transmitted from the input layer through the hidden layer and finally to the output layer. The second stage corresponds to error back-propagation, where the gradient descent algorithm is adopted to update the weights and biases between the output and hidden layers, as well as those between the hidden and input layers. The ultimate objective is to minimize the sum of squared errors. The training process terminates when the error converges to a minimum value or the predefined number of iterations is reached (Bisoyi et al. 2019). The SVM is a machine learning method developed on the basis of statistical learning theory, which was proposed by Vapnik (1995). The method aims to minimize the structural risk, thereby enhancing the generalization capability of the learning model and enabling satisfactory statistical inference even with limited training samples. At present, this method has been widely used in petroleum exploration fields such as reservoir lithology identification, fluid identification, and reservoir prediction (Wang et al. 2014; Tan et al. 2020; Zhang et al. 2024). The fundamental principle of SVM lies in establishing an optimal separating hyperplane to distinguish different types of samples. The hyperplane is optimized by maximizing the distance to the closest samples of the two categories. Considering the problem solved, it is difficult to find a linearly separable hyperplane, that is, most of the problems solved are linearly inseparable. Therefore, the kernel function is used to convert the nonlinear problem into a linear problem (Zhang et al. 2024). Simultaneously, considering the noise samples in the training sets, a penalty factor is added into the establishment of the SVM model, which aims to punish the misclassified sample points, reduce the sensitivity of the model to the noise points and then improve the generalization performance of the model (Vapnik 1995; Zhang et al. 2024). Ensemble strategy The ensemble strategy is important for the ELMs. Different from others ensemble strategies (Dev & Eden 2018; Dev & Eden 2019; Sun et al. 2019; Tewari & Dwivedi 2019; Tan et al. 2020), the SDEM is based on the stacked generalization, which was firstly put forward by Wolpert (1992). By integrating different base learners, the SDEM achieves a more accurate result owning to different base learners can provide various information from different perspectives and compensate the errors originating from individual model (Wolpert 1992). This method is a hierarchical ensemble model, mainly including base model and meta-model. To enhance the generalization performance of the proposed method, K-fold cross-validation is employed during the training of the base learners. The output of base model is used as the features for the meta-model. Then the final output is obtained with the meta-model (Zhai & Chen 2018; Agarwal & Chowdary 2020; Sun & Li 2020). As described in Fig. 2 , the model construction of proposed SDEM included two levels (Wolpert 1992; Sun & Li 2020): (1) Level0 Here, a common base model was used to describe the construction process and the 5-fold cross-validation was used. Actually, different K values of fold can be selected and the base model can use the different or same model. For the different base model, the training process is the same. Level0 can be divided into three steps. Step1: Dividing the sample set into training sets and test sets, with size of M and N, respectively. Step2: 5-fold cross-validation was employed for the base model. The training set was randomly divided into 5 subsets, denoted as T1, T2, T3, T4 and T5. In each fold, four subsets with a total size of 4*M/5 were used as training data, and the remaining subset of size M/5 served as the test set. Based on the base model, five prediction results, named P5, P4, P3, P2 and P1, were obtained when the process repeated five times. The prediction results of test data were accordingly obtained with the five training models and the result were correspondingly labeled as T-P1, T-P2, T-P3, T-P4 and T-P5. Step3: The prediction outputs of five training subsets were stacked to form an M*1 matrix denoted as A1. The prediction results of five test subsets were averaged to form an N*1 matrix denoted as B1. A1 and B1 were shown in the last column of the Level0 in the Fig. 2 . Based on the above process, the other two base models are used and the matrix of A2, B2 and A3, B3 were obtained. (2) Level1 Step1: The outputs of the base models, i.e., matrices A1, A2, A3and B1, B2, B3, were concatenated to form new matrices that serve as input features for the meta-model. Specifically, A1, A2 and A3 were used as the new training set, while B1, B2 and B3 constituted the test set. In the process of construction, all the samples label did not change. Step2: The meta-model is trained based on the newly constructed training set and test set. And then final prediction result was obtained. Workflow As shown in Fig. 3 , the research workflow for the lithology identification of carbonate reservoirs in this paper is presented in detailIt is carried out in four parts. Part1: Wireline logs parameters analysis. Based on the LWD data and the core observations, the lithology developed of sutdy area were analyzed. The characteristics of wireline logs for different lithologies were also analyzed with the Boxplot tool, indicating the sensitive parameters for differetnt lithologies. According to the analysis results, the training, validation, and test sets required for modeling were accordingly obtained. Part2: Model optimization. The optimal hyperparameters for each base model were determined using grid search integrated with K-fold cross-validation. For BPNN, the key hyperparameter optimized was the number of hidden layer nodes( num_hidden_nodes ). For the DT, In the case of the DT, the focus was on two key parameters: maximum depth ( max_depth ) and minimum number of samples per leaf( min_sample_leaf ). As for the SVM, the hyperparameters optimized were the error penalty parameter( C ) and the kernel coefficient. Part3: Ensemle method. Three base models are established with the optimized hyperparameters and the outputs of three base models were obtained with the training sets with the steps as is shown in Fig. 2 . Then the DT was used as the meta-model and made the final judement with the three outputs of base models. Part4: Result analysis. According to the identification results, the evaluation standards, such as confusion matrix, accuracy, precsion, recall and f1-score, were adopted to assess the prediction performance. To further vertify the generalization performance and robustness of the predictive model, two cases in the Majiagou Formation Ma 5 Member in the east and west of the Sulige Gas Field were tested. Result In this section, the SDEM was applied using wireline logs in the Sulige gas filed in Ordos Basin. Four sections were introduced, namely data background, wireline log parameters analysis, model optimization and the result analysis of lithology identification. Data background The data studied is located in the eastern part of Sulige Gas Field in the Ordos Basin and the Ma 5 Member in Majiagou Formation is an important gas-bearing reservoir, which is the target formation (Gu et al. 2017; Bai et al. 2016). The porosity in this area is mainly distributed in 2–6% and the permeability is less than 0.02 × 10-3um 3 , with the characteristics of low porosity and low permeability (Jia et al. 2025). According to the core observations and LWD data analysis, the lithologies of the Ma 5 Member predominantly consist of limestone (L), dolomitic limestone (DL), argillaceous limestone (AL), dolomite (D), calcite dolomite (CD) and argillaceous dolomite (AD) (Zhang et al 2022). The gas-bearing reservoirs are generally distributed in two types of lithology, calcite dolomite and dolomitic limestone (Gu et al. 2017). Therefore, the research of lithology identification is of great significance for the reservoir evaluation of oil and gas. In this work, 2934 data samples from 3 wells in the Ma 5 Member of the study area were employed as the dataset for the lithology identification model. These samples were partitioned into three subsets: 1642 samples (training sets) for model training, 705 samples (validation sets) for model validation and hyperparameters optimization, and 587 samples (test sets) for independent testing to quantitatively assess the lithology identification performance. Wireline logs parameter analysis In order to identify lithology with wireline logs, the wireline logs parameters analysis with different lithologies should be studied and the sensitivity parameters should be selected (Zhang et al. 2020). Through previous studies on the carbonate reservoirs and the analysis of logging data in the study area, different wireline logs exhibit distinct indicative significance for lithology identification, such as GR (natural gamma ray logging) and PE (photoelectric absorption cross-section logging) are primarily sensitive to reservoir lithology. The AC (compensated acoustic logging), CNL (neutron logging) and DEN (density logging) predominantly characterize the physical properties of the reservoirs, while RLLD (deep lateral resistivity logging) serves as an effective indicator of gas-bearing properties. Different wireline logs reflect different characteristics of reservoirs in various aspects (Zhang et al. 2020). Thus, these six wireline logs are adopted as the input parameters of SDEM method. Based on the 2934 data samples in the study area, the Boxplot of each wireline log were obtained for different lithologies. The Boxplot contains the minimum value, maximum value, median value, 25% percentile value, 75% percentile value and the outliers of the samples (Li et al. 2016). The box size represents the interval between the 75th and 25th percentiles. A larger box size corresponds to a more dispersed data distribution and stronger data fluctuation. Otherwise, the distribution of the data is more concentrated. Compared with other plots, the Boxplot has obvious advantages, which clearly shows the distribution characteristics and the comparative analysis results between multiple variables of different lithologies types (Li et al. 2016). Figure 4 shows the Boxplot of the wireline logs with different lithologies. The detailed lithologic characteristics of different wireline logs are shown as follow. (1)The AC log: As is shown in Fig. 4a, the value of AC for lithology of L, DL and AL is higher than D, CD and AD. The value for lithology of L is mainly distributed between 156.1 us/m and 158.4 us/m and the size of box is small, indicating the distribution of data is comparatively concentrated, with a value of 157 us/m. The lithologies of DL have a higher overlap with lithology L, mainly between 155 us/m and 161.1 us/m. Compared with other lithologies, the value of lithology AL is the highest, with a value of 159.4-171.3 us/m, and the lithology D is the lowest, with a value of 146.4-152.9 us/m. The lithology of CD and AD are overlapped with other lithologies, the former is mainly distributed at 149.6-157.4 us / m and the latter is mainly at 155-165.5 us / m. (2) The GR log: As is shown in Fig. 4b, the value of GR is higher when the carbonate reservoirs contain clay minerals. And the value will be higher as the contents of clay minerals increases. Therefore, the GR log can be used to clearly distinguish the lithologies of AL, AD with other four lithologies. The lithology AL is mainly distributed between 32.74 API and 59.17 API and the maximum value is up to 97.2 API. The lithology AD are mostly ranging from 30.8 API to 57.6 API, and concentrated in 40.6 API. The other four lithologies are primarily below 20 API and the lithologies of L and D are the lowest. (3) The CNL log: As is shown in Fig. 4c, the value of CNL log for lithology L can be clearly distinguished from other lithologies, mostly below 1.28%. The value of AD and AL are much higher than DL, D and CD. The lithology AD is chiefly distributed at 5.0-8.6%, the AL ranges from 2.76% to 7.08%. Other lithologies, including DL, D and CD, have a high degree of overlap, mostly less than 6.5%. (4) The DEN log: As is shown in Fig. 4d, the value of DEN for lithologies L and D can be clearly distinguished from the other four types of lithology. The lithology D is more than 2.82 g/cm 3 , the lithology L is less than 2.72 g/cm 3 and other four types of lithology range from 2.72 g/cm 3 to 2.82 g/cm 3 . The lithologies of AL and DL are generally smaller than those of AD and CD. The lithologies of AL, DL, AD and CD are distributed at 2.71–2.76 g/cm 3 , 2.72–2.76 g/cm 3 , 2.78–2.83 g/cm 3 and 2.77–2.82 g/cm 3 , in turn. (5) The PE log: As is shown in Fig. 4e, the PE log is relatively sensitive to the lithology. Among six kinds of lithologies, the lithology of L has the highest PE value, mostly higher than 5 b/e, the lithology of D has the lowest PE value, mostly lower than 3.1 b/e. The PE value of other four types of lithology is between the lithologies of L and D, and the lithologies of AL and DL is much higher than AD and CD. The PE value of lithology AL and DL have a higher overlap, mainly distributed between 3.89 b/e and 4.43 b/e and 3.82 b/e and 4.37 b/e, respectively. The lithology AD are mostly distributed at 3.10–3.52 b/e and CD are at 3.25–3.57 b/e. (6) The RLLD log: As is shown in Fig. 4f, six types of lithology have a bigger size of boxes on RLLD compared with other five wireline logs. This phenomenon shows that the samples on the RLLD is more scatter and the distribution range is wider. On the whole, the RLLD value of lithology L is much higher and the AL and AD is smaller among six types of lithology. The distribution of AL and AD is mostly overlapped. The lithologies, including D, CD and DL, also have a high degree of overlap, in which the D is mainly ranging from 2.42 Ω.m to 3.77 Ω.m, the CD is between 2.50Ω.m and 3.62 Ω.m, the DL is between 2.45Ω.m, and 3.75Ω.m. It can be seen from the Boxplot of wireline logs for different lithologies that each wireline log has a certain indication to the different lithologies, while the high overlaps make the lithology identification uncertainty with a single wireline log. Therefore, multiple wireline logs should be considered for the lithology identification. Model optimization The base model in the SDEM usually involve multiple hyperparameters and the value of hyperparameters exert a significant influence on enhancing model performance. The model optimization is to use an evaluation standard to evaluate the performance of the classifier with different hyperparameters and to obtain the optimized hyperparameters for each base model (Xie et al. 2018; Tan et al. 2020). Here, the grid search method was employed to traverse all hyperparameter combinations, and a 10-fold cross-validation strategy was adopted for each combination, with classification accuracy serving as the evaluation metric. The combination achieving the highest accuracy was selected as the optimal hyperparameter configuration (Xie et al. 2018; Tan et al. 2020). The classification performance of DT is highly dependent on the hyperparameter max_depth and min_samples_leaf . The DT with too larger max_depth tends to cause the phenomenon of overfitting. The hyperparameter min_samples_leaf serves to perform pruning on the decision tree. The node will be pruned for the number of leaf node less than the sample data (Xie et al. 2018; Bressan et al. 2020; Tan et al. 2020). Figure 5a shows the plots of average cross-validation accuracy and the variance based on the validation sets for the hyperparameter max_depth . The plot of average cross-validation accuracy shows the accuracy under each hyperparameter and the plot of variance shows the deviation of accuracy for each fold. So it is necessary to consider two plots simultaneously when choosing the optimal hyperparameters. The cross-validation accuracy was highest and the variance was lowest when the max_depth is 9, 33 and 37, as the cross-validation accuracy was approximately 95%. Considering that the training time will increase as the maximum depth increases, the max_depth equals to 9 was selected as the optimal hyperparameter in case of high accuracy and low variance. Figure 5b shows the plots of average cross-validation accuracy and the variance based on the validation sets for the hyperparameter min_samples_leaf . The cross-validation accuracy was highest (approximately 0.95) when the min_samples_leaf was 5 and 7. While, compared with the variance of two hyperparameters, the min_samples_leaf equals to 7 was chosen as the optimal hyperparameter. Finally, the optimal classifier was obtained when the max_depth was 9 and the min_samples_leaf was 7. The BPNN is typically configured as a three-layer network, consisting of one input layer, one hidden layer, as well as one output layer. The number of nodes in the input layer is the same as the dimension of the input data, and the number of nodes in the output layer matches the number of classification categories. Considering that the nodes number of input layer and output layer is constant, the classification performance of BPNN is highly dependent on the hyperparameter num_hidden_nodes (Rumelhart et al. 1986; Bisoyi et al. 2019). Increasing the num_hidden_nodes can reduce network errors and improve model accuracy. However, it also prolongs training time, enhances the tendency of overfitting, and increases network complexity. By continuously updating the num_hidden_nodes , the generalization ability of the network can be improved (Bisoyi et al. 2019; Tan et al. 2020). Figure 6 shows the plots of average cross-validation accuracy and the variance based on the validation sets for BPNN model with different num_hidden_nodes . It can be seen that the cross-validation accuracy was the highest (approximately 0.72), when the num_hidden_nodes was 216, 256 and 296. While the plot of variance shows that the variance was relatively small (approximately 0.031) when the num_hidden_nodes equals 296. Therefore, the optimal BPNN classifier was obtained when the num_hidden_nodes was 296. For the BPNN model, the sigmoid function was adopted as the activation function and the optimized stochastic gradient descent algorithm served as the weight optimizer. The classification performance of SVM is highly dependent on the hyperparameter C and kernel function coefficients (Wang et al. 2014; Tan et al. 2020; Zhang et al. 2024). As the value of C increases, the penalty for misclassified samples becomes more severe. This will result in a higher accuracy for the training sets and leads to the phenomenon of overfitting and a weak generalization ability for the constructed model. Conversely, the smaller of hyperparameter C , the milder the penalty. It will enhance the fault tolerance and the generalization ability, while the phenomenon of underfitting will occur (Wang et al. 2014). In the SVM, the radial basis function (RBF) is used as the kernel function and the gamma is the kernel function coefficients (Wang et al. 2014), which implicitly determines the distribution of the data mapped to the new feature space. A larger gamma value corresponds to fewer support vectors, whereas a smaller gamma value results in more support vectors. The number of support vectors affects the time and speed of model training and prediction (Wang et al. 2014). The range of two hyperparameters that require tuning is shown in Table 2 . The hyperparameters C ranged from 10 − 5 to 10 5 with the increment of 10 and the gamma ranged from 10 − 7 to 10 3 with the increment of 10. With the pairs of C and gamma, the average accuracy of 10-fold cross-validation was obtained. Based on the 121 points, the contours map of accuracy was plotted (Fig. 7). On the whole, the average accuracy varied greatly with different pairs of C and gamma. The while region in Fig. 7 represented the average accuracy of less than 0.6 and the filled region was above 0.6. The average accuracy greater than 0.9 was primarily distributed in the right region, where the C was between 10 0 and 10 5 and the gamma was between 10 − 3 and 10 1 . Compared with all the result, the optimal hyperparameters for SVM classifier was 10 3 and 10 − 2 . Table 2 shows the hyperparameters of each model, search range and the optimal hyperparameters achieved. Table 2. Model hyperparameters of each base model and corresponding optimal values Base model Hyperparameters Search range Optimal hyperparameter BPNN num_hidden_nodes 100–300 296 SVM C 10 − 5 -10 5 1000 gamma 10 − 7 -10 3 0.01 DT max_depth 5-100 9 min_sample_leaf 5-100 7 Result analysis On the basis of 1642 training sets and 705 validation sets, the optimal hyperparameters of three base models were obtained and then the optimal SVM, BPNN and DT models were consequently concluded. Based on the three base models, the SDEM were constructed as is illustrated in Fig. 2 . To evaluate the practical application performance, 587 test samples data in the study area were used for analysis. Figure 8 shows the prediction results of proposed method and three individual methods for six kinds of lithologies. Some conclusions were achieved. For lithology L, there were 49 samples among the test sets. The result of BPNN had one misclassification sample. The accuracy of two other individual methods and the SDEM reached 100%. For lithology DL, the SDEM had obvious advantages over other methods and the prediction accuracy was approximately 96.5%. For lithology AL and CD, the SDEM and DT achieved higher accuracy than the other two methods, both of which exceeded 97%. For lithology D, the SDEM and SVM had a higher prediction accuracy, up to 97.1%. Whereas, the DT and BP were approximately 95.6%. For lithology AD, 176 samples were turned out to be consistent with the true label among 177 samples and the SVM had a higher misclassification samples. Compared with the result of all the lithologies, each single method had its own advantage for each lithology, however, the proposed method SDEM was higher than any of single method for any of lithologies. In order to further discuss the prediction effect and robustness of the SDEM, the 10-fold cross-validation method was utilized to evaluate its performance. Figure 9 shows the cross-validation accuracy of lithology identification in each fold and the average cross average accuracy (grey box in Fig. 9 ). On the whole, seven out of ten prediction results had a higher accuracy for SDEM than other methods and the average cross-validation accuracy was 95.1%. The average cross-validation accuracy of SVM, DT and BPNN was 93.7%, 92.7% and 91.0%, respectively. Compared with the calculated variance in 10-folds, the variance for SDEM was also the lowest, approximately 0.02, which shows that the SDEM is stable. In conclusion, the prediction accuracy of three single methods and SDEM shows that the SDEM has higher accuracy than any of single method and can effectively improve the performance of lithology identification for carbonate reservoirs. Case study The implementation above proved that the SDEM showed high performance. To further prove the generalization capability and effectiveness of the proposed model, two cases, including two wells, were selected. By calculating the confusion matrix, the evaluation standards of accuracy, precision, recall and f1-score were obtained (Xie et al. 2018; Bressan et al. 2020). (1) Case1 Well 1 is located in the carbonate reservoir of the eastern Sulige Gas Field, Ordos Basin and it belongs to the same area as the data used in this study. The target formation is the Ma 5 Member and six kinds of lithologies, including L, DL, AL, D, CD and AD, are developed. The selected sensitive wireline logs comprise AC, GR, CNL, PE, DEN, and RLLD. As illustrated in Fig. 10 , a comprehensive histogram of SDEM-based predicted lithology is presented. Panel 3 to panel 6 exhibit the lithology-sensitive wireline logs (GR, PE, AC, DEN, CNL, and RLLD) arranged from left to right and top to bottom. Panel 7 is the LWD data and the last panel is lithology identification result with SDEM. Through comparative analysis, the identification result with SDEM was highly correlated with the LWD data. It can also be seen from the panel 7 that there are many thin layers in Well1, with a thickness of 0.2–3.2 m. Based on the statistical result of thickness for lithology identification, three sections, mainly greater than 1.2m, 0.6-1.2m and less than 0.6m, were divided for different thickness of layers. The number of samples with thickness greater than 1.2m was 767 and the identified samples correctly was 752. The number of samples with thickness between 0.6 m and 1.2 m was 376 and the correct samples was 359. The number of samples with thickness less than 0.6 m was 330 and the correct samples was 306. The accuracy of three sections were 98.0%, 95.5% and 92.7%, respectively. As the decrease of the thickness, the identification accuracy was decrease. The thickness of layer less than 0.4m was also analyzed and the accuracy is only 86.6%. The SDEM cannot obtain a satisfied result of lithology identification when the thickness of layer was less than 0.4m. Therefore, the SDEM had better performance for the identification of thin layer and the minimum thickness that can be identified is between 0.4 m and 0.6 m. Figure 11 shows the confusion matrix obtained by the SDEM. The vertical direction represents predicted lithology type and the horizontal direction represents the true lithology type. The number of lithology L, DL, AL, D, CD and AD were 70, 219, 118, 69, 393 and 604 with LWD equipment, respectively. The number identified correctly were 67, 216, 110, 67, 367, 590 and the overall accuracy was approximately 96.2%. For each lithology, the four types, including L, DL, AL and D, tended to be good performance, whereas, the lithology of CD and AD were likely to be misclassified. Other than confusion matrix and accuracy, f1-score, precision and recall were also used as evaluation metrics. In terms of recall, the classification accuracy of the six types of lithology was basically above 90%, indicating that the proportion of each lithology identified correctly was high among the true lithology type. From the perspective of precision, in addition to lithology DL, the classification accuracy of the other five types was also above 90%, indicating that the proportion of each lithology identified correctly was high among the predicted lithology type. The value of f1-score, which was synthetically the weight of recall and precision, was also high, indicating better performance of SDEM. (2) Case2 Well 2 is located in the carbonate reservoir of the western Sulige gas field, Ordos Basin. The target formation is also the Ma 5 Member, which contains six types of lithologies and six wireline logs described above. There are a total of 840 sample data in with lithologies for Well2. The number of lithology L, DL, AL, D, CD and AD are 109, 59, 193, 3, 99 and 377, respectively, which shows the obvious characteristics of unbalance for each lithology. Figure 12 shows the comprehensive histogram with predicted lithology based on SDEM, in which the panel 7 is the LWD data and the last panel is lithology identification result with SDEM. From the qualitative view, these two panels were basically consistent. Compared with other lithologies, the lithology of D and DL were relatively undeveloped. However, for these undeveloped lithologies, the SDEM can also effectively the corresponding lithology, such as the developed lithology D at the depth of 3451.5–3505, the developed lithology DL at the depth of 3532.875-3535.375 m, et al. Thus, the SDEM improves the phenomenon of “underestimation of few samples”. The samples, with a thin thickness less than 0.5m at the depth of 3520.125-3520.375m with developed lithology DL and 3510.25-3510.625m with the developed lithology CD, can also be identified, indicating the advantages in thin thickness of layer in Well2. From the quantitative view, the number of predicted lithology corresponding L, DL, AL, D, CD and AD was 106, 58, 190, 3, 94, and 353, respectively. The accuracy of each lithology identification was high (basically above 95%) in addition to the lithology AD with too many misclassification samples. The advantages of the SDEM can especially be reflected for that the lithology D only had three sample, however, the three samples were all identified correctly. Figure 13 shows the comparison results of two cases above with single method and traditional ELM, respectively, where the green color represents Well1 and the cyan color represents Well2. Figure 13a is the comparison plots of SDEM with its individual method for two wells. As can be seen in Fig. 13a, the accuracy of four methods for Well1 are all above 93% and the SDEM get the highest accuracy, approximately 96.2%. For Well2, the SDEM also gain a good performance, with an accuracy of 95.4% and 1.4% higher than other three methods. Figure 13b is the comparison plots of SDEM with traditional ELMs for two wells. Here the random forest (RF) method, which is the representative of the bagging methods, and the gradient boosting decision tree (GBDT) method, which is the representative of the boosting methods, were selected as the traditional ELMs. By contrast, the SDEM got the highest accuracy for Well1, followed by RF and the worst was GBDT. For Well2, the highest accuracy was also the SDEM, then GBDT and the worst classifier was RF. Comparing the two cases, it can be seen that the individual method in the SDEM method or the traditional ELMs can gain a good performance with the SDEM in one case, however, it is lower in the other cases. That is, these individual method or traditional ELMs have different application effects for different cases. However, the SDEM can all get high performance for different cases and can be popularized with other oil or gas field with the same characteristics of this two cases. Discussion In the present study, BPNN, SVM and DT served as the base models to establish the SDEM. In addition, other machine learning methods can also be used. The reasons to choose these three methods are mainly in their own advantages, including the learning ability of the training sets, the prediction performance of the unbalanced data, the generalization ability, the processing ability to the multiple variables, et al. In close future, other classification methods, such as KNN, NB, FL, perceptron neural network (PNN), et al., can also be explored in the lithology identification. The base models used in this paper are only some simple methods. As can be seen from the comparison result, the traditional ELMs, such as RF and GBDT, are also gaining a good performance. Therefore, we can also consider these ELMs as the base model to conduct in-depth analysis on the performance of SDEM. For the level2 of the SDEM, the logistics regression (LR) method is usually recommended as the meta-model. In fact, we tested the accuracy when the LR method was used, however, its accuracy is lower (Sun & Li 2020). By contrast, the DT model achieves higher accuracy than that of the LR model. We will continue investigate more classifier methods as meta-model in the near future and compare the performance between them. Conclusions This study focuses on lithology identification for complex carbonate reservoir based on SEDM combined with wireline logs. The theory related to the lithology identification methods were introduced and a systematic workflow was also established. Based on the comprehensive study, several important conclusions are presented. (1) Lithology identification is a classification problem. The base model selected in the SDEM has a classification function. Three models, BPNN, DT and SVM were used as the base model in this paper. The ensemble strategy was important for the SDEM and the DT gained good performance using as the meta-model comparing with LR model commonly used. (2) Based on the LWD data and core observations, six kinds of lithologies (L, DL, AL, D, CD and AD) are developed and six kinds of lithology-sensitive wireline logs (AC, GR, CNL, PE, DEN, and RLLD), are chosen. The Boxplot is also used to analyze the lithologic characteristics of different wireline logs. Multiple wireline logs are taken into account for constructing the identification model and the nonlinear relationships between wireline logs and lithology are built. (3) The hyperparameters of BPNN, SVM and DT are determined through the grid search combined with 10-fold cross-validation, followed by the establishment of the SDEM. The test results showed that the proposed method gained good performance, approximately 95.1%, than its individual methods. (4) Based on the same training sets, the established model is applied to two carbonate reservoirs with similar characteristics. All the evaluation standards (accuracy, precision, recall and f1-score) is not less than other individual methods or the traditional ELMs. The proposed method also improves the identified performance of “underestimation of few samples” and the thin thickness of layers. In summary, it is reasonable to conclude that the SDEM exhibits superior predictive performance to all its individual methods or traditional ELMs. Accordingly, the proposed method is expected to be applicable for lithology identification in other analogous gas or oil fields, and can be further extended to other carbonate reservoirs exploration tasks, such as fluids typing, petrophysical analysis, et al. Declarations Acknowledgments The authors wish to express sincere gratitude to all individuals and institutions that contributed to the completion of this work. Author contributions Yan Zhang: Roles/Writing-original draft and Writing-review & editing; Final approval of the version to be submitted. Yang Chen: Investigation; The conception and design of the study; Project administration. Weiwei Xie: Interpretation of data; Analysis of data. Kai Xing: Acquisition of data. Shaobo Cheng: Analysis of data. Funding This work was supported and funded by the China Geological Survey China Mining News. Data Availability The data used in this research comes from oilfield engineering projects. Given the sensitivity of the data involved in the project and the strict constraints of scientific research confidentiality regulations and unit intellectual property protection agreements, in order to prevent potential risks that may arise from data leakage, the original data and processed datasets cannot be publicly shared. Conflict of interest The authors declare no Conflict of interests. References Abbey CP, Okpogo EU, Atueyi IO (2018) Application of rock physics parameters for lithology and fluid prediction of ‘TN’ field of Niger Delta basin, Nigeria. Egyptian Journal of Petroleum 27: 853-866. https://doi.org/10.1016/j.ejpe.2018.01.001. Agarwal S, Chowdary CR (2020) A-Stacking and A-Bagging: Adaptive versions of ensemble learning algorithms for spoof fingerprint detection. Expert Systems with Applications 146:113160. https://doi.org/10.1016/j.eswa.2019.113160. Bai XL, Zhang SN, Huang QY, et al (2016) Origin of dolomite in the middle Ordovician peritidal platform carbonates in the northern Ordos basin, western China. Petroleum Science 13:434-449.https://doi.org/10.1007/s12182-016-0114-5. Bisoyi N, Gupta H, Padhy NP, et al. (2019) Prediction of daily sediment discharge using a back propagation neural network training algorithm: A case study of the Narmada River, India. International Journal of Sediment Research 34: 125-135. https://doi.org/10.1016/j.ijsrc.2018.10.010. Bressan TS, Souza MKD, Girelli TJ, et al. (2020) Evaluation of machine learning methods for lithology classification using geophysical data. Computers & Geosciences 139: 104475. https://doi.org/10.1016/j.cageo.2020.104475. Corina AN, Hovda S (2018) Automatic lithology prediction from well logging using kernel density estimation. Journal of Petroleum Science and Engineering 170: 664-674. https://doi.org/10.1016/j.petrol.2018.06.012. Dev VA, Eden MR (2018) Evaluating the Boosting Approach to Machine Learning for Formation Lithology Classification. Computer Aided Chemical Engineering 44: 1465-1470. https://doi.org/10.1016/B978-0-444-64241-7.50239-1. Dev VA, Eden MR (2019) Formation lithology classification using scalable gradient boosted decision trees. Computers & Chemical Engineering 128: 392-404. https://doi.org/10.1016/j.compchemeng.2019.06.001. Fang JY, Yan Z, Lu XY, et al. (2024) An oil production prediction approach based on variational mode decomposition and ensemble learning model. Computers & Geosciences 193: 105734. https://doi.org/10.1016/j.cageo.2024.105734. Gu YF, Bao ZD, Lin YB, et al. (2017) The porosity and permeability prediction methods for carbonate reservoirs with extremely limited logging data: Stepwise regression vs. N-way analysis of variance. Journal of Natural Gas Science and Engineering 42:99-119.https://doi.org/10.1016/j.jngse.2017.03.010. Hughes HE, Thomas AT (2011) Trilobite associations, taphonomy, lithofacies and environments of the Silurian reefs of North Greenland. Palaeogeography, Palaeoclimatology, Palaeoecology 302: 142-155. https://doi.org/10.1016/j.palaeo.2010.12.009. Horrocks T, Holden EJ, Wedge D, et al. (2015) Evaluation of automated lithology classification architectures using highly-sampled wireline logs for coal exploration. Computers & Geosciences 83: 209-218. https://doi.org/10.1016/j.cageo.2015.07.013. Insua TL, Hamel L, Moran K, et al. (2015) Advanced classification of carbonate sediments based on physical properties. Sedimentology 62: 590-606. https://doi.org/10.1111/sed.12168. Jia AL, Meng DW, Wang GT, et al. (2025) Development technologies and models of different types of gas reservoirs in Ordos Basin, NW China. Petroleum Exploration and Development 52(3):779-794. https://doi.org/10.1016/S1876-3804(25)60602-1. Konaté AA, Ma HL, Pan HP, et al. (2017) Lithology and mineralogy recognition from geochemical logging tool data using multivariate statistical analysis. Applied Radiation and Isotopes 128: 55-67. https://doi.org/10.1016/j.apradiso.2017.06.041. Li AH, Feng MY, Li YR, et al. (2016) Application of Outlier Mining in Insider Identification Based on Boxplot Method. Procedia Computer Science 91: 245-251. https://doi.org/10.1016/j.procs.2016.07.069. Li XY, Li HQ (2013) A new method of identification of complex lithologies and reservoirs: task-driven data mining. Journal of Petroleum Science and Engineering 109: 241-249. https://doi.org/10.1016/j.petrol.2013.08.049. Liu J, Min XL, Qi ZL, et al. (2024) Lithology identification using electrical imaging logging image: A case study in Jiyang Depression, China. Journal of Applied Geophysics 230: 105536. https://doi.org/10.1016/j.jappgeo.2024.105536. Lopes DMR, Andrade AJN (2019) Lithology identification on well logs by fuzzy inference. Journal of Petroleum Science and Engineering 180: 357-368. https://doi.org/10.1016/j.petrol.2019.05.044. Ren XX, Hou JG, Song SH, et al. (2019) Lithology identification using well logs: A method by integrating artificial neural networks and sedimentary patterns. Journal of Petroleum Science and Engineering 182: 106336. https://doi.org/10.1016/j.petrol.2019.106336. Rumelhart D, Hinton G, Williams R, et al. (1986) Learning representations by back-propagating errors. Nature 323: 533–536. https://doi.org/10.1038/323533a0. Shuvo MdAI, Joy SMH (2024) A data driven approach to assess the petrophysical parametric sensitivity for lithology identification based on ensemble learning. Journal of Applied Geophysics 222: 105330. https://doi.org/10.1016/j.jappgeo.2024.105330. Sun J, Li Q, Chen MQ, et al. (2019) Optimization of models for a rapid identification of lithology while drilling - A win-win strategy based on machine learning. Journal of Petroleum Science and Engineering 176: 321-341. https://doi.org/10.1016/j.petrol.2019.01.006. Sun W, Li ZQ (2020) Hourly PM2.5 concentration forecasting based on feature extraction and stacking-driven ensemble model for the winter of the Beijing-Tianjin-Hebei area. Atmospheric Pollution Research 11(6):110-121. https://doi.org/10.1016/j.apr.2020.02.022. Tan MJ, Bai Y, Zhang HT, et al. (2020) Fluid typing in tight sandstone from wireline logs using classification committee machine. Fuel 271: 117601. https://doi.org/10.1016/j.fuel.2020.117601. Tewari S, Dwivedi UD (2019) Ensemble-based big data analytics of lithofacies for automatic development of petroleum reservoirs. Computers & Industrial Engineering 128: 937-947. https://doi.org/10.1016/j.cie.2018.08.018. Vapnik V (1995) The Nature of Statistical Learning Theory. New York :Springer. Wolpert DH (1992) Stacked generalization. Neural Network 5 (2): 241–259. https://doi.org/10.1016/S0893-6080(05)80023-1. Wang GC, Carr TR (2012) Methodology of organic-rich shale lithofacies identification and prediction: A case study from Marcellus Shale in the Appalachian basin. Computers & Geosciences 49:151-163. https://doi.org/10.1016/j.cageo.2012.07.011 Wang GC, Carr TR, Ju YW, et al. (2014) Identifying organic-rich Marcellus Shale lithofacies by support vector machine classifier in the Appalachian basin. Computers & Geosciences 64: 52-60. https://doi.org/10.1016/j.cageo.2013.12.002. Wang XD, Yang SC, Zhao YF, et al. (2018) Lithology identification using an optimized KNN clustering method based on entropy-weighed cosine distance in Mesozoic strata of Gaoqing field, Jiyang depression. Journal of Petroleum Science and Engineering 166: 157-174. https://doi.org/10.1016/j.petrol.2018.03.034. Wang H, Bi DM, He ZS, et al. (2025) Machine learning-based stacked ensemble model for predicting and regulating oxygen-containing compounds in nitrogen-rich pyrolysis bio-oil. Renewable Energy 241: 122330. https://doi.org/10.1016/j.renene.2024.122330. Wang YJ, Wang XX, Wang KY, et al. (2025) Lithology recognition and porosity prediction from well logs based on Convolutional Neural Networks and sliding window. Journal of Applied Geophysics 242: 105905. https://doi.org/10.1016/j.jappgeo.2025.105905. Xiang M, Qin PB, Zhang FW (2020) Research and application of logging lithology identification for igneous reservoirs based on deep learning. Journal of Applied Geophysics 173: 103929. https://doi.org/10.1016/j.jappgeo.2019.103929. Xie YX, Zhu CY, Zhou W, et al. (2018) Evaluation of machine learning methods for formation lithology identification: A comparison of tuning processes and model performances. Journal of Petroleum Science and Engineering 160: 182-193. https://doi.org/10.1016/j.petrol.2017.10.028. Zhang HR, Hu YT, Li XS, et al. (2024) Application of support vector machines and genetic algorithms to fluid identification in Offshore Granitic subduction hill reservoirs. Geoenergy Science and Engineering 240: 213013. https://doi.org/10.1016/j.geoen. 2024.213013. Zhai BX, Chen JG (2018) Development of a stacked ensemble model for forecasting and analyzing daily average PM2.5 concentrations in Beijing, China. Science of The Total Environment 635: 644-658. https://doi.org/10.1016/j.scitotenv.2018.04.040. Zhao XZ, Pu XG, Han WZ, et al. (2017) A new method for lithology identification of fine grained deposits and reservoir sweet spot analysis: A case study of Kong 2 Member in Cangdong sag, Bohai Bay Basin, China. Petroleum Exploration and Development 44: 524-534. https://doi.org/10.1016/S1876-3804(17)30061-7 Zhang Y, Zhong HR, Wu ZY, et al. (2020) Improvement of petrophysical workflow for shear wave velocity prediction based on machine learning methods for complex carbonate reservoirs. Journal of Petroleum Science and Engineering 192: 107234. https://doi.org/10.1016/j.petrol.2020.107234. Zhang Y, Zhang CL, Ma QY, et al. (2022) Automatic prediction of shear wave velocity using convolutional neural networks for different reservoirs in Ordos Basin. Journal of Petroleum Science and Engineering 208: 109252. https://doi.org/10.1016/j.petrol.2021.109252. Additional Declarations No competing interests reported. Cite Share Download PDF Status: Under Review Version 1 posted Reviews received at journal 06 May, 2026 Reviews received at journal 20 Apr, 2026 Reviews received at journal 17 Apr, 2026 Reviewers agreed at journal 02 Apr, 2026 Reviewers agreed at journal 30 Mar, 2026 Reviewers agreed at journal 30 Mar, 2026 Reviewers invited by journal 30 Mar, 2026 Editor assigned by journal 28 Feb, 2026 Submission checks completed at journal 21 Feb, 2026 First submitted to journal 21 Feb, 2026 You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-8931314","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":614816016,"identity":"06b43b4f-823d-42ca-aae8-d8765d93f2f6","order_by":0,"name":"Yan Zhang","email":"","orcid":"","institution":"International Mining Research Center, China Geological Survey","correspondingAuthor":false,"prefix":"","firstName":"Yan","middleName":"","lastName":"Zhang","suffix":""},{"id":614816017,"identity":"6eb8a4a2-cb53-4121-938f-b7185cd886fb","order_by":1,"name":"Yang Chen","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAA8UlEQVRIiWNgGAWjYDACCWQOY4MNDz97AylaDjakyUj2HCBNy2EbgxsO+HXwz24+9pin5o5dA/sZ080fd5znYbjBwPjhYw4eS+4cSzfmOfYsuYEnx+zGwTO3eRhnNzBLztyGW4uBRI6ZNA/b4WQGCR6glrbbPMwyB9iYefFqyf8mzfMPruUcD5tEAiEtOWzSvG2H7aBaDvDwENIicSPNTHJu3+EEBp60shtnzyTzSPAcbMbrF/4Zyc8k3nw7bM/Afnjbjcoddvb2x5sPfviIRwsIMPEwMCTuPwDnMzbgVw9S8oOBwZ6gqlEwCkbBKBi5AABjg1MF2wgbUgAAAABJRU5ErkJggg==","orcid":"","institution":"International Mining Research Center, China Geological Survey","correspondingAuthor":true,"prefix":"","firstName":"Yang","middleName":"","lastName":"Chen","suffix":""},{"id":614816019,"identity":"c07e1b49-ba0a-40c7-a5f1-9c31d33fb4ec","order_by":2,"name":"Weiwei Xie","email":"","orcid":"","institution":"SINOPEC Petroleum Exploration \u0026 Development Institute","correspondingAuthor":false,"prefix":"","firstName":"Weiwei","middleName":"","lastName":"Xie","suffix":""},{"id":614816021,"identity":"7a5630d3-d423-471e-bb89-2c5a81d80e4f","order_by":3,"name":"Kai Xing","email":"","orcid":"","institution":"International Mining Research Center, China Geological Survey","correspondingAuthor":false,"prefix":"","firstName":"Kai","middleName":"","lastName":"Xing","suffix":""},{"id":614816022,"identity":"bece8c12-b7b6-4bec-9f1f-07c5c2e2ba4d","order_by":4,"name":"Shaobo Cheng","email":"","orcid":"","institution":"International Mining Research Center, China Geological Survey","correspondingAuthor":false,"prefix":"","firstName":"Shaobo","middleName":"","lastName":"Cheng","suffix":""}],"badges":[],"createdAt":"2026-02-21 07:08:19","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-8931314/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-8931314/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":105984732,"identity":"b6064132-88c3-45e0-b84e-69d485c4a415","added_by":"auto","created_at":"2026-04-02 07:16:59","extension":"jpeg","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":160844,"visible":true,"origin":"","legend":"\u003cp\u003eThe structure of ELMs\u003c/p\u003e","description":"","filename":"floatimage1.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-8931314/v1/6e548f8f78b91ca485da58cc.jpeg"},{"id":106094018,"identity":"b5e549ca-e171-406d-80dc-621f6f62905b","added_by":"auto","created_at":"2026-04-03 11:40:43","extension":"jpeg","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":417294,"visible":true,"origin":"","legend":"\u003cp\u003eThe specific process of ensemble strategy for SDEM\u003c/p\u003e","description":"","filename":"floatimage2.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-8931314/v1/8c87cf6e7fc59f9ebb4a23ba.jpeg"},{"id":106093723,"identity":"a5fb05f7-ef33-4136-86f9-0061584893b5","added_by":"auto","created_at":"2026-04-03 11:38:48","extension":"jpeg","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":417891,"visible":true,"origin":"","legend":"\u003cp\u003eWorkflow for lithology identification\u003c/p\u003e","description":"","filename":"floatimage3.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-8931314/v1/fe9b0d07d8d6a5f015e1f40b.jpeg"},{"id":106093871,"identity":"e9b8c70b-2f9c-4a00-8325-be2a02ecc3d0","added_by":"auto","created_at":"2026-04-03 11:39:44","extension":"jpeg","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":168336,"visible":true,"origin":"","legend":"\u003cp\u003eThe Boxplot of wireline logs for different lithologies. The arrows in Fig.4a indicate the outlier, maximum, 75th percentile, median, 25th percentile and minimum from the top to the bottom, respectively. a AC. b GR. c CNL. d \u0026nbsp;DEN. e PE. f RLLD.\u003c/p\u003e","description":"","filename":"floatimage4.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-8931314/v1/2c0090082250f06a71aa0273.jpeg"},{"id":105984734,"identity":"7a680345-eb54-4a03-8a1e-faaf21ca7eeb","added_by":"auto","created_at":"2026-04-02 07:16:59","extension":"jpeg","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":109177,"visible":true,"origin":"","legend":"\u003cp\u003eThe plot of average accuracy of 10-fold cross-validation and variance for DT with different hyperparameters: a max_depth. b min_sample_leaf.\u003c/p\u003e","description":"","filename":"floatimage5.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-8931314/v1/6adbf9beec32aedc5e7843aa.jpeg"},{"id":106094161,"identity":"71be068f-d4c4-4505-8c5c-6aee65749225","added_by":"auto","created_at":"2026-04-03 11:41:19","extension":"jpeg","order_by":6,"title":"Figure 6","display":"","copyAsset":false,"role":"figure","size":95952,"visible":true,"origin":"","legend":"\u003cp\u003eThe plot of average accuracy of 10-fold cross-validation and variance for BPNN with different hyperparameter \u003cem\u003enum_hidden_nodes\u003c/em\u003e.\u003c/p\u003e","description":"","filename":"floatimage6.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-8931314/v1/c76972a0a7cfdcc1e2506c18.jpeg"},{"id":106093314,"identity":"52b3b4ba-230e-45c6-ab7f-4648d68c72f8","added_by":"auto","created_at":"2026-04-03 11:36:41","extension":"jpeg","order_by":7,"title":"Figure 7","display":"","copyAsset":false,"role":"figure","size":63534,"visible":true,"origin":"","legend":"\u003cp\u003eThe contour map of average accuracy of 10-fold cross-validation for SVM with different \u003cem\u003eC\u003c/em\u003e and gamma.\u003c/p\u003e","description":"","filename":"floatimage7.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-8931314/v1/c04e9920cfa7dc0d2830b638.jpeg"},{"id":105984736,"identity":"86a0ca66-4c16-40b1-aa2b-fd62f79c47bf","added_by":"auto","created_at":"2026-04-02 07:16:59","extension":"jpeg","order_by":8,"title":"Figure 8","display":"","copyAsset":false,"role":"figure","size":422914,"visible":true,"origin":"","legend":"\u003cp\u003eComparison of prediction results of SDEM classifier and its individual classifier on different lithology: a L. b DL. c AL. d D. e CD. f AD.\u003c/p\u003e","description":"","filename":"floatimage8.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-8931314/v1/c9ad6d9e23ae16e67a777c64.jpeg"},{"id":105984741,"identity":"a396d4af-13c6-43ca-b494-8e73f09f3674","added_by":"auto","created_at":"2026-04-02 07:16:59","extension":"jpeg","order_by":9,"title":"Figure 9","display":"","copyAsset":false,"role":"figure","size":224779,"visible":true,"origin":"","legend":"\u003cp\u003eThe comparison of the 10-fold cross-validation accuracy and average accuracy between SDEM and three individual methods. The grey box indicates the average accuracy of 10-fold cross-validation.\u003c/p\u003e","description":"","filename":"floatimage9.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-8931314/v1/21e052f33eecfad2582b0e84.jpeg"},{"id":105984742,"identity":"d86e6164-da4c-4702-94ca-1f2fe19d3013","added_by":"auto","created_at":"2026-04-02 07:16:59","extension":"jpeg","order_by":10,"title":"Figure 10","display":"","copyAsset":false,"role":"figure","size":1073569,"visible":true,"origin":"","legend":"\u003cp\u003eLithology identification result based on SDEM in Well1\u003c/p\u003e","description":"","filename":"floatimage10.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-8931314/v1/9bca23c9aad739ade6b2cca8.jpeg"},{"id":106093689,"identity":"43234321-1386-4b1e-a102-bbf8ab3f1c6e","added_by":"auto","created_at":"2026-04-03 11:38:37","extension":"jpeg","order_by":11,"title":"Figure 11","display":"","copyAsset":false,"role":"figure","size":297043,"visible":true,"origin":"","legend":"\u003cp\u003eThe confusion matrxi obtained with SDEM in Well1\u003c/p\u003e","description":"","filename":"floatimage11.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-8931314/v1/7c0d3b9ca92caf3741e18aa1.jpeg"},{"id":105984738,"identity":"92b34e88-d46e-4db7-9f2f-ea6b9c655cff","added_by":"auto","created_at":"2026-04-02 07:16:59","extension":"png","order_by":12,"title":"Figure 12","display":"","copyAsset":false,"role":"figure","size":168569,"visible":true,"origin":"","legend":"\u003cp\u003eLithology identification result based on SDEM in Well2. The legend for litholgy is shown in Fig.11.\u003c/p\u003e","description":"","filename":"floatimage12.png","url":"https://assets-eu.researchsquare.com/files/rs-8931314/v1/d1e480e7011dbae103bcbcb9.png"},{"id":105984739,"identity":"8b65ab11-44c7-4390-89a4-ade4967088e4","added_by":"auto","created_at":"2026-04-02 07:16:59","extension":"jpeg","order_by":13,"title":"Figure 13","display":"","copyAsset":false,"role":"figure","size":100471,"visible":true,"origin":"","legend":"\u003cp\u003eComparision result with individual method, traditional ELM and SDEM for two cases. The y axis represents the relative accuray compared with 0.9. The texts in the histogram represtent the accuracy of each method. a Individusl method and SDEM. b Traditional ELM and SDEM.\u003c/p\u003e","description":"","filename":"floatimage13.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-8931314/v1/a3975650cb45cbe00f824af5.jpeg"},{"id":106096298,"identity":"cc0a4ba3-e70d-4f60-92df-825828ed07e8","added_by":"auto","created_at":"2026-04-03 11:54:01","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":4286354,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-8931314/v1/a2558515-5aea-4fac-84ad-8d0ee8c93931.pdf"}],"financialInterests":"No competing interests reported.","formattedTitle":"Lithology identification with wireline logs based on stacked- driven ensemble method for complex carbonate reservoirs","fulltext":[{"header":"Introduction","content":"\u003cp\u003eLithology identification is the basis of reservoir description, formation evaluation and reservoir evaluation, especially for the carbonate reservoirs, which has the characteristics of strong heterogeneity and complex lithology, making the lithology identification more difficulty (Bai et al. 2016; Zhao et al. 2017; Zhang et al. 2020). Therefore, a good understanding of the spatial distribution of lithology is of great significance to the evaluation of carbonate reservoirs (Jia et al. 2025). At present, three methods, mainly well logging, observations of well cores and the wireline logs analysis, are commonly used to obtain the lithology information (Hughes \u0026amp; Thomas 2011; Liu et al. 2024; Wang et al. 2025). Due to the subjectivity for the well logging method and the high cost for the well cores, the last method has become an important means of lithology identification.\u003c/p\u003e \u003cp\u003ePresently, the methods on the lithology identification based with wireline logs are relatively mature, mainly including qualitative analysis, crossplots analysis and mathematical methods (Li \u0026amp; Li 2013; Horrocks et al. 2015; Insua et al. 2015; Konate et al. 2017; Abbey et al. 2018; Ren et al. 2019). Qualitative analysis was commonly based on the morphological characteristics of the wireline logs, which was rarely used (Horrocks et al. 2015; Insua et al. 2015; Ren et al. 2019). The crossplots, mainly including the crossplots of GR with three porosity curves (AC, DEN, CNL) and the crossplot of PE and DEN, were used for its simple characteristics. However, only two logs can be used and it cannot analyze the relationships for the lithology with more logs, especially for complex carbonate reservoirs (Abbey et al. 2018; Ren et al. 2019). Considering the complexity and diversity of the carbonate reservoirs, there is a large overlap between different lithologies with any two logs. The accuracy of lithology identification with this method is reduced (Corina \u0026amp; Hovda 2018; Ren et al. 2019). In recent years, with the development of machine learning (ML) methods, the third method has become a hot spot for the lithology identification of carbonate reservoirs (Li \u0026amp; Li 2013; Konate et al. 2017; Xie et al. 2018; Ren et al. 2019; Bressan et al. 2020). Numerous studies of lithology identification with the ML methods, such as K-nearest Neighbors (KNN) (Wang et al. 2018), Na\u0026iuml;ve Bayes (NB) (Wang \u0026amp; Carr 2012), Support Vector Machine (SVM) (Li \u0026amp; Li 2013; Wang et al. 2014; Xie et al. 2018), Artificial Neural Network (ANN) (Bressan et al. 2020; Wang et al. 2025), Fuzzy Logic System (FLS) (Lopes \u0026amp; Andrade 2019), Decision Tree (DT) (Xie et al. 2018; Bressan et al. 2020), Deep Learning (DL) (Xiang et al. 2020), have been carried out. Although each method has shown high performance, certain shortcomings are inevitably existing, such as DL cannot estimate the distribution characteristics of data without bias, ANN is prone to converging to local optimal values and face an inherent contradiction between the generalization performance and learning capacity, et al. These drawbacks collectively result in inferior generalization and weak robustness of the prediction model, and additionally prevent it from effectively processing misclassified samples. Therefore, the lithology identification of carbonate reservoirs based on a single machine learning method has certain limitations (Tan et al. 2020; Shuvo \u0026amp; Joy 2024).\u003c/p\u003e \u003cp\u003eIn order to solve the drawbacks of single method on the lithology identification, many scholars have carried out the research on the another identification mechanism-ensemble learning method (ELM), which combined one more learners (also called weaker learner or base model) with the bias and variance into a better learner (also called strong learner or meta-model) (Xie et al. 2018; Tewari \u0026amp; Dwivedi 2019; Bressan et al. 2020). Sun et al.(2019) analyzed three popular machine learning algorithms (OVO SVMs, OVR SVMs and RF) for lithology identification and the RF classifier is better than the other two classifiers with a accuracy of greater than 90%. Dev et al. (2019) applied recently developed gradient boosted decision tree (GBDT) systems and achieved higher performance. The difference between various ELMs is mainly on the ensemble strategy in the process of creating a strong learner. Commonly used ensemble strategies were voting, averaging, boosting, bagging, et al. All these ensemble strategies basically combined weak learners based on deterministic algorithms and cannot significantly reduce the bias and variance of the prediction model (Dev \u0026amp; Eden 2018; Dev \u0026amp; Eden 2019; Sun et al. 2019; Tewari \u0026amp; Dwivedi 2019; Tan et al. 2020). Another commonly used ELM was based on the stacked generalization method (also called stacked-driven ensemble method, SDEM). SDEM is a two-level model, in which the first level is inheriting and integrating the merits of base models and the second layer uses the meta-model to reduce the bias and variance of the final model, thus, it improves the performance of the model without the risk of overfitting (Fang et al. 2024; Wang et al. 2025). At present, this method has gained much ground with the classification or regression problems in many fields (Zhai \u0026amp; Chen 2018; Agarwal \u0026amp; Chowdary 2020; Sun \u0026amp; Li 2020). However, it remains an innovative technology for oil and gas exploration, particularly for lithology identification.\u003c/p\u003e \u003cp\u003eIn this study, we focus on the complex lithology identification based on a two-level SDEM with the wireline logs, which provides a feasible method for the complex carbonate reservoirs. To accomplish this technical objective, the theory and methodology of SDEM and associated methods were investigated initially. For the ensemble model, base models consisted of BPNN, SVM and DT, and the meta-model was specified as DT. The grid search method combined with 10-fold cross-validation was adopted to search for the optimal hyperparameters of SDEM. In order to analyze the generalization of method, two cases of Well1 and Well2 within the Ma 5 Member of Majiagou Formation located in the eastern and western regions of the Sulige Gas Field were tested. And the predicted were all well consistent with the logging while drilling (LWD) equipment.\u003c/p\u003e"},{"header":"Theory and Methodology","content":"\u003cp\u003eIn this section, the modeling structure of conventional ELM was presented. Moreover, each model used in this paper and the ensemble strategies were also introduced.\u003c/p\u003e \u003cdiv id=\"Sec3\" class=\"Section2\"\u003e \u003ch2\u003eModeling structure\u003c/h2\u003e \u003cp\u003eELM represents a machine learning framework that addresses the target problem via the joint training of multiple expert models, and combines the options of multiple experts to obtain a better decision result. The ELMs help to reduce the chance of error while increasing the overall reliability and confidence of the model (Dev \u0026amp; Eden 2018; Fang et al. 2024; Wang et al. 2025). Figure\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e shows the structure of ELM, which consist two parts. One is the training process of multiple weak learner models. Each model will produce a decision result based on the training sets. The second part is the ensemble strategy. Considering the high bias and variance of base model, the ensemble strategy used the individual result produced by the base learners as the input of the ELM. Thus, the misclassified samples were corrected and a comprehensive result was obtained, which provided the best solution overall (Zhai \u0026amp; Chen 2018; Sun \u0026amp; Li 2020). This way of solving problems often occurs in part of human daily lives, such as, the article will be reviewed by multiple experts before it is accepted or rejected, the company will call the relevant experts to evaluate the projects before investing in them, the potential for gas or oil of one well will be evaluated by the experts with various data before drilling, et al. All the situations above are the committee of experts to reduce the risk of taking a wrong decision on a problem. Therefore, in terms of practical application and theory, the ensemble learning methods will achieve better results, less risky and unbiased, than the individual method (Agarwal \u0026amp; Chowdary 2020; Tan et al. 2020).\u003c/p\u003e \u003cp\u003eELMs can serve as an effective approach for both classification and regression. Depending on the problem solved, the base models are also different. The classification problems output different categories (Dev \u0026amp; Eden 2018; Dev \u0026amp; Eden 2019), and the regression problems output the specific value (Zhai \u0026amp; Chen 2018; Agarwal \u0026amp; Chowdary 2020; Sun \u0026amp; Li 2020). The lithology identification problem based on wireline logs developed in this paper is a classification problem.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e\n\u003ch3\u003eBase models\u003c/h3\u003e\n\u003cp\u003eThe lithology identification is a classification problem, thus some methods that can be competent for classification were used. Considering the complex nonlinear relationship between wireline logs and lithology, three methods, including DT, BPNN and SVM, were chosen as the base model, while the DT method was adopted as the meta-model. The three models have their own advantages, such as BPNN has the ability of strong learning and processing noisy data (Bressan et al. 2020), DT method fully considers the mutual relationships between different features (Xie et al. 2018; Bressan et al. 2020), SVM has the excellent generalization ability (Bressan et al. 2020). Three methods are described as follows.\u003c/p\u003e \u003cp\u003eThe DT is a supervised classification method with the tree structure. The root node of tree will be achieved that best divides the training data. From the root node, each internal node (non-leaf nodes) represents the splitting boundaries of selected feature, each branch represents the discriminant results based on the above internal node, the leaf nodes represent the final classification results (Xie et al. 2018; Bressan et al. 2020). The DT mainly includes two processes, namely the construction of tree and the classification with tree. The former process is used to summarize the decision tree with training sets. The latter is used to classify the training sets (Xie et al. 2018; Bressan et al. 2020; Tan et al. 2020).\u003c/p\u003e \u003cp\u003eThe BPNN is a multi-layer feed-forward neural network based on the gradient descent method, which was proposed by Rumelhart and McClelland in 1986 (Rumelhart et al. 1986). It consists two stages, namely the signal forward propagation and error back-propagation. In the forward propagation stage, signals are transmitted from the input layer through the hidden layer and finally to the output layer.\u003c/p\u003e \u003cp\u003eThe second stage corresponds to error back-propagation, where the gradient descent algorithm is adopted to update the weights and biases between the output and hidden layers, as well as those between the hidden and input layers. The ultimate objective is to minimize the sum of squared errors. The training process terminates when the error converges to a minimum value or the predefined number of iterations is reached (Bisoyi et al. 2019).\u003c/p\u003e \u003cp\u003eThe SVM is a machine learning method developed on the basis of statistical learning theory, which was proposed by Vapnik (1995). The method aims to minimize the structural risk, thereby enhancing the generalization capability of the learning model and enabling satisfactory statistical inference even with limited training samples. At present, this method has been widely used in petroleum exploration fields such as reservoir lithology identification, fluid identification, and reservoir prediction (Wang et al. 2014; Tan et al. 2020; Zhang et al. 2024). The fundamental principle of SVM lies in establishing an optimal separating hyperplane to distinguish different types of samples. The hyperplane is optimized by maximizing the distance to the closest samples of the two categories. Considering the problem solved, it is difficult to find a linearly separable hyperplane, that is, most of the problems solved are linearly inseparable. Therefore, the kernel function is used to convert the nonlinear problem into a linear problem (Zhang et al. 2024). Simultaneously, considering the noise samples in the training sets, a penalty factor is added into the establishment of the SVM model, which aims to punish the misclassified sample points, reduce the sensitivity of the model to the noise points and then improve the generalization performance of the model (Vapnik 1995; Zhang et al. 2024).\u003c/p\u003e\n\u003ch3\u003eEnsemble strategy\u003c/h3\u003e\n\u003cp\u003eThe ensemble strategy is important for the ELMs. Different from others ensemble strategies (Dev \u0026amp; Eden 2018; Dev \u0026amp; Eden 2019; Sun et al. 2019; Tewari \u0026amp; Dwivedi 2019; Tan et al. 2020), the SDEM is based on the stacked generalization, which was firstly put forward by Wolpert (1992). By integrating different base learners, the SDEM achieves a more accurate result owning to different base learners can provide various information from different perspectives and compensate the errors originating from individual model (Wolpert 1992). This method is a hierarchical ensemble model, mainly including base model and meta-model. To enhance the generalization performance of the proposed method, K-fold cross-validation is employed during the training of the base learners. The output of base model is used as the features for the meta-model. Then the final output is obtained with the meta-model (Zhai \u0026amp; Chen 2018; Agarwal \u0026amp; Chowdary 2020; Sun \u0026amp; Li 2020).\u003c/p\u003e \u003cp\u003eAs described in Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003e, the model construction of proposed SDEM included two levels (Wolpert 1992; Sun \u0026amp; Li 2020):\u003c/p\u003e \u003cp\u003e(1) Level0\u003c/p\u003e \u003cp\u003eHere, a common base model was used to describe the construction process and the 5-fold cross-validation was used. Actually, different K values of fold can be selected and the base model can use the different or same model. For the different base model, the training process is the same. Level0 can be divided into three steps.\u003c/p\u003e \u003cp\u003eStep1: Dividing the sample set into training sets and test sets, with size of M and N, respectively.\u003c/p\u003e \u003cp\u003eStep2: 5-fold cross-validation was employed for the base model. The training set was randomly divided into 5 subsets, denoted as T1, T2, T3, T4 and T5. In each fold, four subsets with a total size of 4*M/5 were used as training data, and the remaining subset of size M/5 served as the test set. Based on the base model, five prediction results, named P5, P4, P3, P2 and P1, were obtained when the process repeated five times. The prediction results of test data were accordingly obtained with the five training models and the result were correspondingly labeled as T-P1, T-P2, T-P3, T-P4 and T-P5.\u003c/p\u003e \u003cp\u003eStep3: The prediction outputs of five training subsets were stacked to form an M*1 matrix denoted as A1. The prediction results of five test subsets were averaged to form an N*1 matrix denoted as B1. A1 and B1 were shown in the last column of the Level0 in the Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003e.\u003c/p\u003e \u003cp\u003eBased on the above process, the other two base models are used and the matrix of A2, B2 and A3, B3 were obtained.\u003c/p\u003e \u003cp\u003e(2) Level1\u003c/p\u003e \u003cp\u003eStep1: The outputs of the base models, i.e., matrices A1, A2, A3and B1, B2, B3, were concatenated to form new matrices that serve as input features for the meta-model. Specifically, A1, A2 and A3 were used as the new training set, while B1, B2 and B3 constituted the test set. In the process of construction, all the samples label did not change.\u003c/p\u003e \u003cp\u003eStep2: The meta-model is trained based on the newly constructed training set and test set. And then final prediction result was obtained.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e\n\u003ch3\u003eWorkflow\u003c/h3\u003e\n\u003cp\u003eAs shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003e, the research workflow for the lithology identification of carbonate reservoirs in this paper is presented in detailIt is carried out in four parts.\u003c/p\u003e \u003cp\u003ePart1: Wireline logs parameters analysis. Based on the LWD data and the core observations, the lithology developed of sutdy area were analyzed. The characteristics of wireline logs for different lithologies were also analyzed with the Boxplot tool, indicating the sensitive parameters for differetnt lithologies. According to the analysis results, the training, validation, and test sets required for modeling were accordingly obtained.\u003c/p\u003e \u003cp\u003ePart2: Model optimization. The optimal hyperparameters for each base model were determined using grid search integrated with K-fold cross-validation. For BPNN, the key hyperparameter optimized was the number of hidden layer nodes(\u003cem\u003enum_hidden_nodes\u003c/em\u003e). For the DT, In the case of the DT, the focus was on two key parameters: maximum depth (\u003cem\u003emax_depth\u003c/em\u003e) and minimum number of samples per leaf(\u003cem\u003emin_sample_leaf\u003c/em\u003e). As for the SVM, the hyperparameters optimized were the error penalty parameter(\u003cem\u003eC\u003c/em\u003e) and the kernel coefficient.\u003c/p\u003e \u003cp\u003ePart3: Ensemle method. Three base models are established with the optimized hyperparameters and the outputs of three base models were obtained with the training sets with the steps as is shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003e. Then the DT was used as the meta-model and made the final judement with the three outputs of base models.\u003c/p\u003e \u003cp\u003ePart4: Result analysis. According to the identification results, the evaluation standards, such as confusion matrix, accuracy, precsion, recall and f1-score, were adopted to assess the prediction performance. To further vertify the generalization performance and robustness of the predictive model, two cases in the Majiagou Formation Ma 5 Member in the east and west of the Sulige Gas Field were tested.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e"},{"header":"Result","content":"\u003cp\u003eIn this section, the SDEM was applied using wireline logs in the Sulige gas filed in Ordos Basin. Four sections were introduced, namely data background, wireline log parameters analysis, model optimization and the result analysis of lithology identification.\u003c/p\u003e\n\u003cdiv id=\"Sec8\" class=\"Section2\"\u003e\n \u003ch2\u003eData background\u003c/h2\u003e\n \u003cp\u003eThe data studied is located in the eastern part of Sulige Gas Field in the Ordos Basin and the Ma 5 Member in Majiagou Formation is an important gas-bearing reservoir, which is the target formation (Gu et al. 2017; Bai et al. 2016). The porosity in this area is mainly distributed in 2\u0026ndash;6% and the permeability is less than 0.02 \u0026times; 10-3um\u003csup\u003e3\u003c/sup\u003e, with the characteristics of low porosity and low permeability (Jia et al. 2025). According to the core observations and LWD data analysis, the lithologies of the Ma 5 Member predominantly consist of limestone (L), dolomitic limestone (DL), argillaceous limestone (AL), dolomite (D), calcite dolomite (CD) and argillaceous dolomite (AD) (Zhang et al 2022). The gas-bearing reservoirs are generally distributed in two types of lithology, calcite dolomite and dolomitic limestone (Gu et al. 2017). Therefore, the research of lithology identification is of great significance for the reservoir evaluation of oil and gas.\u003c/p\u003e\n \u003cp\u003eIn this work, 2934 data samples from 3 wells in the Ma 5 Member of the study area were employed as the dataset for the lithology identification model. These samples were partitioned into three subsets: 1642 samples (training sets) for model training, 705 samples (validation sets) for model validation and hyperparameters optimization, and 587 samples (test sets) for independent testing to quantitatively assess the lithology identification performance.\u003c/p\u003e\n\u003c/div\u003e\n\u003ch3\u003eWireline logs parameter analysis\u003c/h3\u003e\n\u003cp\u003eIn order to identify lithology with wireline logs, the wireline logs parameters analysis with different lithologies should be studied and the sensitivity parameters should be selected (Zhang et al. 2020). Through previous studies on the carbonate reservoirs and the analysis of logging data in the study area, different wireline logs exhibit distinct indicative significance for lithology identification, such as GR (natural gamma ray logging) and PE (photoelectric absorption cross-section logging) are primarily sensitive to reservoir lithology. The AC (compensated acoustic logging), CNL (neutron logging) and DEN (density logging) predominantly characterize the physical properties of the reservoirs, while RLLD (deep lateral resistivity logging) serves as an effective indicator of gas-bearing properties. Different wireline logs reflect different characteristics of reservoirs in various aspects (Zhang et al. 2020). Thus, these six wireline logs are adopted as the input parameters of SDEM method.\u003c/p\u003e\n\u003cp\u003eBased on the 2934 data samples in the study area, the Boxplot of each wireline log were obtained for different lithologies. The Boxplot contains the minimum value, maximum value, median value, 25% percentile value, 75% percentile value and the outliers of the samples (Li et al. 2016). The box size represents the interval between the 75th and 25th percentiles. A larger box size corresponds to a more dispersed data distribution and stronger data fluctuation. Otherwise, the distribution of the data is more concentrated. Compared with other plots, the Boxplot has obvious advantages, which clearly shows the distribution characteristics and the comparative analysis results between multiple variables of different lithologies types (Li et al. 2016).\u003c/p\u003e\n\u003cp\u003eFigure\u0026nbsp;4 shows the Boxplot of the wireline logs with different lithologies. The detailed lithologic characteristics of different wireline logs are shown as follow.\u003c/p\u003e\n\u003cp\u003e(1)The AC log: As is shown in Fig.\u0026nbsp;4a, the value of AC for lithology of L, DL and AL is higher than D, CD and AD. The value for lithology of L is mainly distributed between 156.1 us/m and 158.4 us/m and the size of box is small, indicating the distribution of data is comparatively concentrated, with a value of 157 us/m. The lithologies of DL have a higher overlap with lithology L, mainly between 155 us/m and 161.1 us/m. Compared with other lithologies, the value of lithology AL is the highest, with a value of 159.4-171.3 us/m, and the lithology D is the lowest, with a value of 146.4-152.9 us/m. The lithology of CD and AD are overlapped with other lithologies, the former is mainly distributed at 149.6-157.4 us / m and the latter is mainly at 155-165.5 us / m.\u003c/p\u003e\n\u003cp\u003e(2) The GR log: As is shown in Fig.\u0026nbsp;4b, the value of GR is higher when the carbonate reservoirs contain clay minerals. And the value will be higher as the contents of clay minerals increases. Therefore, the GR log can be used to clearly distinguish the lithologies of AL, AD with other four lithologies. The lithology AL is mainly distributed between 32.74 API and 59.17 API and the maximum value is up to 97.2 API. The lithology AD are mostly ranging from 30.8 API to 57.6 API, and concentrated in 40.6 API. The other four lithologies are primarily below 20 API and the lithologies of L and D are the lowest.\u003c/p\u003e\n\u003cp\u003e(3) The CNL log: As is shown in Fig.\u0026nbsp;4c, the value of CNL log for lithology L can be clearly distinguished from other lithologies, mostly below 1.28%. The value of AD and AL are much higher than DL, D and CD. The lithology AD is chiefly distributed at 5.0-8.6%, the AL ranges from 2.76% to 7.08%. Other lithologies, including DL, D and CD, have a high degree of overlap, mostly less than 6.5%.\u003c/p\u003e\n\u003cp\u003e(4) The DEN log: As is shown in Fig.\u0026nbsp;4d, the value of DEN for lithologies L and D can be clearly distinguished from the other four types of lithology. The lithology D is more than 2.82 g/cm\u003csup\u003e3\u003c/sup\u003e, the lithology L is less than 2.72 g/cm\u003csup\u003e3\u003c/sup\u003e and other four types of lithology range from 2.72 g/cm\u003csup\u003e3\u003c/sup\u003e to 2.82 g/cm\u003csup\u003e3\u003c/sup\u003e. The lithologies of AL and DL are generally smaller than those of AD and CD. The lithologies of AL, DL, AD and CD are distributed at 2.71\u0026ndash;2.76 g/cm\u003csup\u003e3\u003c/sup\u003e, 2.72\u0026ndash;2.76 g/cm\u003csup\u003e3\u003c/sup\u003e, 2.78\u0026ndash;2.83 g/cm\u003csup\u003e3\u003c/sup\u003e and 2.77\u0026ndash;2.82 g/cm\u003csup\u003e3\u003c/sup\u003e, in turn.\u003c/p\u003e\n\u003cp\u003e(5) The PE log: As is shown in Fig.\u0026nbsp;4e, the PE log is relatively sensitive to the lithology. Among six kinds of lithologies, the lithology of L has the highest PE value, mostly higher than 5 b/e, the lithology of D has the lowest PE value, mostly lower than 3.1 b/e. The PE value of other four types of lithology is between the lithologies of L and D, and the lithologies of AL and DL is much higher than AD and CD. The PE value of lithology AL and DL have a higher overlap, mainly distributed between 3.89 b/e and 4.43 b/e and 3.82 b/e and 4.37 b/e, respectively. The lithology AD are mostly distributed at 3.10\u0026ndash;3.52 b/e and CD are at 3.25\u0026ndash;3.57 b/e.\u003c/p\u003e\n\u003cp\u003e(6) The RLLD log: As is shown in Fig.\u0026nbsp;4f, six types of lithology have a bigger size of boxes on RLLD compared with other five wireline logs. This phenomenon shows that the samples on the RLLD is more scatter and the distribution range is wider. On the whole, the RLLD value of lithology L is much higher and the AL and AD is smaller among six types of lithology. The distribution of AL and AD is mostly overlapped. The lithologies, including D, CD and DL, also have a high degree of overlap, in which the D is mainly ranging from 2.42 Ω.m to 3.77 Ω.m, the CD is between 2.50Ω.m and 3.62 Ω.m, the DL is between 2.45Ω.m, and 3.75Ω.m.\u003c/p\u003e\n\u003cp\u003eIt can be seen from the Boxplot of wireline logs for different lithologies that each wireline log has a certain indication to the different lithologies, while the high overlaps make the lithology identification uncertainty with a single wireline log. Therefore, multiple wireline logs should be considered for the lithology identification.\u003c/p\u003e\n\u003ch3\u003eModel optimization\u003c/h3\u003e\n\u003cp\u003eThe base model in the SDEM usually involve multiple hyperparameters and the value of hyperparameters exert a significant influence on enhancing model performance. The model optimization is to use an evaluation standard to evaluate the performance of the classifier with different hyperparameters and to obtain the optimized hyperparameters for each base model (Xie et al. 2018; Tan et al. 2020). Here, the grid search method was employed to traverse all hyperparameter combinations, and a 10-fold cross-validation strategy was adopted for each combination, with classification accuracy serving as the evaluation metric. The combination achieving the highest accuracy was selected as the optimal hyperparameter configuration (Xie et al. 2018; Tan et al. 2020). The classification performance of DT is highly dependent on the hyperparameter \u003cem\u003emax_depth\u003c/em\u003e and \u003cem\u003emin_samples_leaf\u003c/em\u003e. The DT with too larger \u003cem\u003emax_depth\u003c/em\u003e tends to cause the phenomenon of overfitting. The hyperparameter \u003cem\u003emin_samples_leaf\u003c/em\u003e serves to perform pruning on the decision tree. The node will be pruned for the number of leaf node less than the sample data (Xie et al. 2018; Bressan et al. 2020; Tan et al. 2020). Figure\u0026nbsp;5a shows the plots of average cross-validation accuracy and the variance based on the validation sets for the hyperparameter \u003cem\u003emax_depth\u003c/em\u003e. The plot of average cross-validation accuracy shows the accuracy under each hyperparameter and the plot of variance shows the deviation of accuracy for each fold. So it is necessary to consider two plots simultaneously when choosing the optimal hyperparameters. The cross-validation accuracy was highest and the variance was lowest when the \u003cem\u003emax_depth\u003c/em\u003e is 9, 33 and 37, as the cross-validation accuracy was approximately 95%. Considering that the training time will increase as the maximum depth increases, the \u003cem\u003emax_depth\u003c/em\u003e equals to 9 was selected as the optimal hyperparameter in case of high accuracy and low variance. Figure\u0026nbsp;5b shows the plots of average cross-validation accuracy and the variance based on the validation sets for the hyperparameter \u003cem\u003emin_samples_leaf\u003c/em\u003e. The cross-validation accuracy was highest (approximately 0.95) when the \u003cem\u003emin_samples_leaf\u003c/em\u003e was 5 and 7. While, compared with the variance of two hyperparameters, the \u003cem\u003emin_samples_leaf\u003c/em\u003e equals to 7 was chosen as the optimal hyperparameter. Finally, the optimal classifier was obtained when the \u003cem\u003emax_depth\u003c/em\u003e was 9 and the \u003cem\u003emin_samples_leaf\u003c/em\u003e was 7.\u003c/p\u003e\n\u003cp\u003eThe BPNN is typically configured as a three-layer network, consisting of one input layer, one hidden layer, as well as one output layer. The number of nodes in the input layer is the same as the dimension of the input data, and the number of nodes in the output layer matches the number of classification categories. Considering that the nodes number of input layer and output layer is constant, the classification performance of BPNN is highly dependent on the hyperparameter \u003cem\u003enum_hidden_nodes\u003c/em\u003e (Rumelhart et al. 1986; Bisoyi et al. 2019). Increasing the \u003cem\u003enum_hidden_nodes\u003c/em\u003e can reduce network errors and improve model accuracy. However, it also prolongs training time, enhances the tendency of overfitting, and increases network complexity. By continuously updating the \u003cem\u003enum_hidden_nodes\u003c/em\u003e, the generalization ability of the network can be improved (Bisoyi et al. 2019; Tan et al. 2020). Figure\u0026nbsp;6 shows the plots of average cross-validation accuracy and the variance based on the validation sets for BPNN model with different \u003cem\u003enum_hidden_nodes\u003c/em\u003e. It can be seen that the cross-validation accuracy was the highest (approximately 0.72), when the\u003cem\u003enum_hidden_nodes\u003c/em\u003e was 216, 256 and 296. While the plot of variance shows that the variance was relatively small (approximately 0.031) when the \u003cem\u003enum_hidden_nodes\u003c/em\u003e equals 296. Therefore, the optimal BPNN classifier was obtained when the \u003cem\u003enum_hidden_nodes\u003c/em\u003e was 296. For the BPNN model, the sigmoid function was adopted as the activation function and the optimized stochastic gradient descent algorithm served as the weight optimizer.\u003c/p\u003e\n\u003cp\u003eThe classification performance of SVM is highly dependent on the hyperparameter \u003cem\u003eC\u003c/em\u003e and kernel function coefficients (Wang et al. 2014; Tan et al. 2020; Zhang et al. 2024). As the value of \u003cem\u003eC\u003c/em\u003e increases, the penalty for misclassified samples becomes more severe. This will result in a higher accuracy for the training sets and leads to the phenomenon of overfitting and a weak generalization ability for the constructed model. Conversely, the smaller of hyperparameter \u003cem\u003eC\u003c/em\u003e, the milder the penalty. It will enhance the fault tolerance and the generalization ability, while the phenomenon of underfitting will occur (Wang et al. 2014). In the SVM, the radial basis function (RBF) is used as the kernel function and the gamma is the kernel function coefficients (Wang et al. 2014), which implicitly determines the distribution of the data mapped to the new feature space. A larger gamma value corresponds to fewer support vectors, whereas a smaller gamma value results in more support vectors. The number of support vectors affects the time and speed of model training and prediction (Wang et al. 2014). The range of two hyperparameters that require tuning is shown in Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e2\u003c/span\u003e. The hyperparameters C ranged from 10\u003csup\u003e\u0026minus;\u0026thinsp;5\u003c/sup\u003e to 10\u003csup\u003e5\u003c/sup\u003e with the increment of 10 and the gamma ranged from 10\u003csup\u003e\u0026minus;\u0026thinsp;7\u003c/sup\u003e to 10\u003csup\u003e3\u003c/sup\u003e with the increment of 10. With the pairs of C and gamma, the average accuracy of 10-fold cross-validation was obtained. Based on the 121 points, the contours map of accuracy was plotted (Fig.\u0026nbsp;7). On the whole, the average accuracy varied greatly with different pairs of C and gamma. The while region in Fig.\u0026nbsp;7 represented the average accuracy of less than 0.6 and the filled region was above 0.6. The average accuracy greater than 0.9 was primarily distributed in the right region, where the C was between 10\u003csup\u003e0\u003c/sup\u003e and 10\u003csup\u003e5\u003c/sup\u003e and the gamma was between 10\u003csup\u003e\u0026minus;\u0026thinsp;3\u003c/sup\u003e and 10\u003csup\u003e1\u003c/sup\u003e. Compared with all the result, the optimal hyperparameters for SVM classifier was 10\u003csup\u003e3\u003c/sup\u003e and 10\u003csup\u003e\u0026minus;\u0026thinsp;2\u003c/sup\u003e. Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e2\u003c/span\u003e shows the hyperparameters of each model, search range and the optimal hyperparameters achieved.\u003c/p\u003e\n\u003cdiv class=\"gridtable\"\u003e\n \u003cdiv class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003cstrong\u003eTable 2.\u0026nbsp;\u003c/strong\u003eModel hyperparameters of each base model and corresponding optimal values\u003ctable float=\"No\" id=\"Tabb\" border=\"1\"\u003e\n \u003cthead\u003e\n \u003ctr\u003e\n \u003cth align=\"left\" colname=\"c1\"\u003e\n \u003cp\u003eBase model\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\" colname=\"c2\"\u003e\n \u003cp\u003eHyperparameters\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\" colname=\"c3\"\u003e\n \u003cp\u003eSearch range\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\" colname=\"c4\"\u003e\n \u003cp\u003eOptimal hyperparameter\u003c/p\u003e\n \u003c/th\u003e\n \u003c/tr\u003e\n \u003c/thead\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\" colname=\"c1\"\u003e\n \u003cp\u003eBPNN\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\" colname=\"c2\"\u003e\n \u003cp\u003e\u003cem\u003enum_hidden_nodes\u003c/em\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\" colname=\"c3\"\u003e\n \u003cp\u003e100\u0026ndash;300\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\" colname=\"c4\"\u003e\n \u003cp\u003e296\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\" colname=\"c1\"\u003e\n \u003cp\u003eSVM\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\" colname=\"c2\"\u003e\n \u003cp\u003e\u003cem\u003eC\u003c/em\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\" colname=\"c3\"\u003e\n \u003cp\u003e10\u003csup\u003e\u0026minus;\u0026thinsp;5\u003c/sup\u003e-10\u003csup\u003e5\u003c/sup\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\" colname=\"c4\"\u003e\n \u003cp\u003e1000\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003ctd align=\"left\" colname=\"c2\"\u003e\n \u003cp\u003egamma\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\" colname=\"c3\"\u003e\n \u003cp\u003e10\u003csup\u003e\u0026minus;\u0026thinsp;7\u003c/sup\u003e-10\u003csup\u003e3\u003c/sup\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\" colname=\"c4\"\u003e\n \u003cp\u003e0.01\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\" colname=\"c1\"\u003e\n \u003cp\u003eDT\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\" colname=\"c2\"\u003e\n \u003cp\u003e\u003cem\u003emax_depth\u003c/em\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\" colname=\"c3\"\u003e\n \u003cp\u003e5-100\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\" colname=\"c4\"\u003e\n \u003cp\u003e9\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003ctd align=\"left\" colname=\"c2\"\u003e\n \u003cp\u003e\u003cem\u003emin_sample_leaf\u003c/em\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\" colname=\"c3\"\u003e\n \u003cp\u003e5-100\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\" colname=\"c4\"\u003e\n \u003cp\u003e7\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n \u003c/table\u003e\n \u003c/div\u003e\n\u003c/div\u003e\n\u003cp\u003e\u003cbr\u003e\u003c/p\u003e\n\u003cdiv id=\"Sec11\" class=\"Section2\"\u003e\n \u003ch2\u003eResult analysis\u003c/h2\u003e\n \u003cp\u003eOn the basis of 1642 training sets and 705 validation sets, the optimal hyperparameters of three base models were obtained and then the optimal SVM, BPNN and DT models were consequently concluded. Based on the three base models, the SDEM were constructed as is illustrated in Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003e. To evaluate the practical application performance, 587 test samples data in the study area were used for analysis.\u003c/p\u003e\n \u003cp\u003eFigure\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e8\u003c/span\u003e shows the prediction results of proposed method and three individual methods for six kinds of lithologies. Some conclusions were achieved. For lithology L, there were 49 samples among the test sets. The result of BPNN had one misclassification sample. The accuracy of two other individual methods and the SDEM reached 100%. For lithology DL, the SDEM had obvious advantages over other methods and the prediction accuracy was approximately 96.5%. For lithology AL and CD, the SDEM and DT achieved higher accuracy than the other two methods, both of which exceeded 97%. For lithology D, the SDEM and SVM had a higher prediction accuracy, up to 97.1%. Whereas, the DT and BP were approximately 95.6%. For lithology AD, 176 samples were turned out to be consistent with the true label among 177 samples and the SVM had a higher misclassification samples. Compared with the result of all the lithologies, each single method had its own advantage for each lithology, however, the proposed method SDEM was higher than any of single method for any of lithologies.\u003c/p\u003e\n \u003cp\u003eIn order to further discuss the prediction effect and robustness of the SDEM, the 10-fold cross-validation method was utilized to evaluate its performance. Figure \u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e9\u003c/span\u003e shows the cross-validation accuracy of lithology identification in each fold and the average cross average accuracy (grey box in Fig. \u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e9\u003c/span\u003e). On the whole, seven out of ten prediction results had a higher accuracy for SDEM than other methods and the average cross-validation accuracy was 95.1%. The average cross-validation accuracy of SVM, DT and BPNN was 93.7%, 92.7% and 91.0%, respectively. Compared with the calculated variance in 10-folds, the variance for SDEM was also the lowest, approximately 0.02, which shows that the SDEM is stable.\u003c/p\u003e\n \u003cp\u003eIn conclusion, the prediction accuracy of three single methods and SDEM shows that the SDEM has higher accuracy than any of single method and can effectively improve the performance of lithology identification for carbonate reservoirs.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec12\" class=\"Section2\"\u003e\n \u003ch2\u003eCase study\u003c/h2\u003e\n \u003cp\u003eThe implementation above proved that the SDEM showed high performance. To further prove the generalization capability and effectiveness of the proposed model, two cases, including two wells, were selected. By calculating the confusion matrix, the evaluation standards of accuracy, precision, recall and f1-score were obtained (Xie et al. 2018; Bressan et al. 2020).\u003c/p\u003e\n \u003cp\u003e(1) Case1\u003c/p\u003e\n \u003cp\u003eWell 1 is located in the carbonate reservoir of the eastern Sulige Gas Field, Ordos Basin and it belongs to the same area as the data used in this study. The target formation is the Ma 5 Member and six kinds of lithologies, including L, DL, AL, D, CD and AD, are developed. The selected sensitive wireline logs comprise AC, GR, CNL, PE, DEN, and RLLD.\u003c/p\u003e\n \u003cp\u003eAs illustrated in Fig. \u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e10\u003c/span\u003e, a comprehensive histogram of SDEM-based predicted lithology is presented. Panel 3 to panel 6 exhibit the lithology-sensitive wireline logs (GR, PE, AC, DEN, CNL, and RLLD) arranged from left to right and top to bottom. Panel 7 is the LWD data and the last panel is lithology identification result with SDEM. Through comparative analysis, the identification result with SDEM was highly correlated with the LWD data. It can also be seen from the panel 7 that there are many thin layers in Well1, with a thickness of 0.2\u0026ndash;3.2 m. Based on the statistical result of thickness for lithology identification, three sections, mainly greater than 1.2m, 0.6-1.2m and less than 0.6m, were divided for different thickness of layers. The number of samples with thickness greater than 1.2m was 767 and the identified samples correctly was 752. The number of samples with thickness between 0.6 m and 1.2 m was 376 and the correct samples was 359. The number of samples with thickness less than 0.6 m was 330 and the correct samples was 306. The accuracy of three sections were 98.0%, 95.5% and 92.7%, respectively. As the decrease of the thickness, the identification accuracy was decrease. The thickness of layer less than 0.4m was also analyzed and the accuracy is only 86.6%. The SDEM cannot obtain a satisfied result of lithology identification when the thickness of layer was less than 0.4m. Therefore, the SDEM had better performance for the identification of thin layer and the minimum thickness that can be identified is between 0.4 m and 0.6 m.\u003c/p\u003e\n \u003cp\u003eFigure \u003cspan refid=\"Fig7\" class=\"InternalRef\"\u003e11\u003c/span\u003e shows the confusion matrix obtained by the SDEM. The vertical direction represents predicted lithology type and the horizontal direction represents the true lithology type. The number of lithology L, DL, AL, D, CD and AD were 70, 219, 118, 69, 393 and 604 with LWD equipment, respectively. The number identified correctly were 67, 216, 110, 67, 367, 590 and the overall accuracy was approximately 96.2%. For each lithology, the four types, including L, DL, AL and D, tended to be good performance, whereas, the lithology of CD and AD were likely to be misclassified. Other than confusion matrix and accuracy, f1-score, precision and recall were also used as evaluation metrics. In terms of recall, the classification accuracy of the six types of lithology was basically above 90%, indicating that the proportion of each lithology identified correctly was high among the true lithology type. From the perspective of precision, in addition to lithology DL, the classification accuracy of the other five types was also above 90%, indicating that the proportion of each lithology identified correctly was high among the predicted lithology type. The value of f1-score, which was synthetically the weight of recall and precision, was also high, indicating better performance of SDEM.\u003c/p\u003e\n \u003cp\u003e(2) Case2\u003c/p\u003e\n \u003cp\u003eWell 2 is located in the carbonate reservoir of the western Sulige gas field, Ordos Basin. The target formation is also the Ma 5 Member, which contains six types of lithologies and six wireline logs described above. There are a total of 840 sample data in with lithologies for Well2. The number of lithology L, DL, AL, D, CD and AD are 109, 59, 193, 3, 99 and 377, respectively, which shows the obvious characteristics of unbalance for each lithology. Figure\u0026nbsp;12 shows the comprehensive histogram with predicted lithology based on SDEM, in which the panel 7 is the LWD data and the last panel is lithology identification result with SDEM.\u003c/p\u003e\n \u003cp\u003eFrom the qualitative view, these two panels were basically consistent. Compared with other lithologies, the lithology of D and DL were relatively undeveloped. However, for these undeveloped lithologies, the SDEM can also effectively the corresponding lithology, such as the developed lithology D at the depth of 3451.5\u0026ndash;3505, the developed lithology DL at the depth of 3532.875-3535.375 m, et al. Thus, the SDEM improves the phenomenon of \u0026ldquo;underestimation of few samples\u0026rdquo;. The samples, with a thin thickness less than 0.5m at the depth of 3520.125-3520.375m with developed lithology DL and 3510.25-3510.625m with the developed lithology CD, can also be identified, indicating the advantages in thin thickness of layer in Well2. From the quantitative view, the number of predicted lithology corresponding L, DL, AL, D, CD and AD was 106, 58, 190, 3, 94, and 353, respectively. The accuracy of each lithology identification was high (basically above 95%) in addition to the lithology AD with too many misclassification samples. The advantages of the SDEM can especially be reflected for that the lithology D only had three sample, however, the three samples were all identified correctly.\u003c/p\u003e\n \u003cp\u003eFigure\u0026nbsp;13 shows the comparison results of two cases above with single method and traditional ELM, respectively, where the green color represents Well1 and the cyan color represents Well2. Figure\u0026nbsp;13a is the comparison plots of SDEM with its individual method for two wells. As can be seen in Fig.\u0026nbsp;13a, the accuracy of four methods for Well1 are all above 93% and the SDEM get the highest accuracy, approximately 96.2%. For Well2, the SDEM also gain a good performance, with an accuracy of 95.4% and 1.4% higher than other three methods.\u003c/p\u003e\n \u003cp\u003eFigure 13b is the comparison plots of SDEM with traditional ELMs for two wells. Here the random forest (RF) method, which is the representative of the bagging methods, and the gradient boosting decision tree (GBDT) method, which is the representative of the boosting methods, were selected as the traditional ELMs. By contrast, the SDEM got the highest accuracy for Well1, followed by RF and the worst was GBDT. For Well2, the highest accuracy was also the SDEM, then GBDT and the worst classifier was RF.\u003c/p\u003e\n \u003cp\u003eComparing the two cases, it can be seen that the individual method in the SDEM method or the traditional ELMs can gain a good performance with the SDEM in one case, however, it is lower in the other cases. That is, these individual method or traditional ELMs have different application effects for different cases. However, the SDEM can all get high performance for different cases and can be popularized with other oil or gas field with the same characteristics of this two cases.\u003c/p\u003e\n\u003c/div\u003e"},{"header":"Discussion","content":"\u003cp\u003eIn the present study, BPNN, SVM and DT served as the base models to establish the SDEM. In addition, other machine learning methods can also be used. The reasons to choose these three methods are mainly in their own advantages, including the learning ability of the training sets, the prediction performance of the unbalanced data, the generalization ability, the processing ability to the multiple variables, et al. In close future, other classification methods, such as KNN, NB, FL, perceptron neural network (PNN), et al., can also be explored in the lithology identification. The base models used in this paper are only some simple methods. As can be seen from the comparison result, the traditional ELMs, such as RF and GBDT, are also gaining a good performance. Therefore, we can also consider these ELMs as the base model to conduct in-depth analysis on the performance of SDEM.\u003c/p\u003e \u003cp\u003eFor the level2 of the SDEM, the logistics regression (LR) method is usually recommended as the meta-model. In fact, we tested the accuracy when the LR method was used, however, its accuracy is lower (Sun \u0026amp; Li 2020). By contrast, the DT model achieves higher accuracy than that of the LR model. We will continue investigate more classifier methods as meta-model in the near future and compare the performance between them.\u003c/p\u003e"},{"header":"Conclusions","content":"\u003cp\u003eThis study focuses on lithology identification for complex carbonate reservoir based on SEDM combined with wireline logs. The theory related to the lithology identification methods were introduced and a systematic workflow was also established. Based on the comprehensive study, several important conclusions are presented.\u003c/p\u003e \u003cp\u003e(1) Lithology identification is a classification problem. The base model selected in the SDEM has a classification function. Three models, BPNN, DT and SVM were used as the base model in this paper. The ensemble strategy was important for the SDEM and the DT gained good performance using as the meta-model comparing with LR model commonly used.\u003c/p\u003e \u003cp\u003e(2) Based on the LWD data and core observations, six kinds of lithologies (L, DL, AL, D, CD and AD) are developed and six kinds of lithology-sensitive wireline logs (AC, GR, CNL, PE, DEN, and RLLD), are chosen. The Boxplot is also used to analyze the lithologic characteristics of different wireline logs. Multiple wireline logs are taken into account for constructing the identification model and the nonlinear relationships between wireline logs and lithology are built.\u003c/p\u003e \u003cp\u003e(3) The hyperparameters of BPNN, SVM and DT are determined through the grid search combined with 10-fold cross-validation, followed by the establishment of the SDEM. The test results showed that the proposed method gained good performance, approximately 95.1%, than its individual methods.\u003c/p\u003e \u003cp\u003e(4) Based on the same training sets, the established model is applied to two carbonate reservoirs with similar characteristics. All the evaluation standards (accuracy, precision, recall and f1-score) is not less than other individual methods or the traditional ELMs. The proposed method also improves the identified performance of \u0026ldquo;underestimation of few samples\u0026rdquo; and the thin thickness of layers.\u003c/p\u003e \u003cp\u003eIn summary, it is reasonable to conclude that the SDEM exhibits superior predictive performance to all its individual methods or traditional ELMs. Accordingly, the proposed method is expected to be applicable for lithology identification in other analogous gas or oil fields, and can be further extended to other carbonate reservoirs exploration tasks, such as fluids typing, petrophysical analysis, et al.\u003c/p\u003e"},{"header":"Declarations","content":"\u003cp\u003e\u003cstrong\u003eAcknowledgments\u003c/strong\u003e\u003cstrong\u003e\u0026nbsp;\u003c/strong\u003eThe authors wish to express sincere gratitude to all individuals and institutions that contributed to the completion of this work.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAuthor contributions\u0026nbsp;\u003c/strong\u003eYan Zhang: Roles/Writing-original draft and Writing-review \u0026amp; editing; Final approval of the version to be submitted. Yang Chen: Investigation; The conception and design of the study; Project administration. Weiwei Xie: Interpretation of data; Analysis of data. Kai Xing: Acquisition of data. Shaobo Cheng: Analysis of data.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eFunding\u003c/strong\u003e This work was supported and funded by the China Geological Survey China Mining News.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eData Availability\u003c/strong\u003e The data used in this research comes from oilfield engineering projects. Given the sensitivity of the data involved in the project and the strict constraints of scientific research confidentiality regulations and unit intellectual property protection agreements, in order to prevent potential risks that may arise from data leakage, the original data and processed datasets cannot be publicly shared.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eConflict of interest\u003c/strong\u003e The authors declare no Conflict of interests.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\n\u003cli\u003eAbbey CP, Okpogo EU, Atueyi IO (2018) Application of rock physics parameters for lithology and fluid prediction of \u0026lsquo;TN\u0026rsquo; field of Niger Delta basin, Nigeria. Egyptian Journal of Petroleum 27: 853-866. https://doi.org/10.1016/j.ejpe.2018.01.001.\u003c/li\u003e\n\u003cli\u003eAgarwal S, Chowdary CR (2020) A-Stacking and A-Bagging: Adaptive versions of ensemble learning algorithms for spoof fingerprint detection. Expert Systems with Applications 146:113160. https://doi.org/10.1016/j.eswa.2019.113160.\u003c/li\u003e\n\u003cli\u003eBai XL, Zhang SN, Huang QY, et al (2016) Origin of dolomite in the middle Ordovician peritidal platform carbonates in the northern Ordos basin, western China. Petroleum Science 13:434-449.https://doi.org/10.1007/s12182-016-0114-5.\u003c/li\u003e\n\u003cli\u003eBisoyi N, Gupta H, Padhy NP, et al. (2019) Prediction of daily sediment discharge using a back propagation neural network training algorithm: A case study of the Narmada River, India. International Journal of Sediment Research 34: 125-135. https://doi.org/10.1016/j.ijsrc.2018.10.010.\u003c/li\u003e\n\u003cli\u003eBressan TS, Souza MKD, Girelli TJ, et al. (2020) Evaluation of machine learning methods for lithology classification using geophysical data. Computers \u0026amp; Geosciences 139: 104475. https://doi.org/10.1016/j.cageo.2020.104475.\u003c/li\u003e\n\u003cli\u003eCorina AN, Hovda S (2018) Automatic lithology prediction from well logging using kernel density estimation. Journal of Petroleum Science and Engineering 170: 664-674. https://doi.org/10.1016/j.petrol.2018.06.012.\u003c/li\u003e\n\u003cli\u003eDev VA, Eden MR (2018) Evaluating the Boosting Approach to Machine Learning for Formation Lithology Classification. Computer Aided Chemical Engineering 44: 1465-1470. https://doi.org/10.1016/B978-0-444-64241-7.50239-1.\u003c/li\u003e\n\u003cli\u003eDev VA, Eden MR (2019) Formation lithology classification using scalable gradient boosted decision trees. Computers \u0026amp; Chemical Engineering 128: 392-404. https://doi.org/10.1016/j.compchemeng.2019.06.001.\u003c/li\u003e\n\u003cli\u003eFang JY, Yan Z, Lu XY, et al. (2024) An oil production prediction approach based on variational mode decomposition and ensemble learning model. Computers \u0026amp; Geosciences 193: 105734. https://doi.org/10.1016/j.cageo.2024.105734.\u003c/li\u003e\n\u003cli\u003eGu YF, Bao ZD, Lin YB, et al. (2017) The porosity and permeability prediction methods for carbonate reservoirs with extremely limited logging data: Stepwise regression vs. N-way analysis of variance. Journal of Natural Gas Science and Engineering 42:99-119.https://doi.org/10.1016/j.jngse.2017.03.010.\u003c/li\u003e\n\u003cli\u003eHughes HE, Thomas AT (2011) Trilobite associations, taphonomy, lithofacies and environments of the Silurian reefs of North Greenland. Palaeogeography, Palaeoclimatology, Palaeoecology 302: 142-155. https://doi.org/10.1016/j.palaeo.2010.12.009.\u003c/li\u003e\n\u003cli\u003eHorrocks T, Holden EJ, Wedge D, et al. (2015) Evaluation of automated lithology classification architectures using highly-sampled wireline logs for coal exploration. Computers \u0026amp; Geosciences 83: 209-218. https://doi.org/10.1016/j.cageo.2015.07.013.\u003c/li\u003e\n\u003cli\u003eInsua TL, Hamel L, Moran K, et al. (2015) Advanced classification of carbonate sediments based on physical properties. Sedimentology 62: 590-606. https://doi.org/10.1111/sed.12168.\u003c/li\u003e\n\u003cli\u003eJia AL, Meng DW, Wang GT, et al. (2025) Development technologies and models of different types of gas reservoirs in Ordos Basin, NW China. Petroleum Exploration and Development 52(3):779-794. https://doi.org/10.1016/S1876-3804(25)60602-1.\u003c/li\u003e\n\u003cli\u003eKonat\u0026eacute; AA, Ma HL, Pan HP, et al. (2017) Lithology and mineralogy recognition from geochemical logging tool data using multivariate statistical analysis. Applied Radiation and Isotopes 128: 55-67. https://doi.org/10.1016/j.apradiso.2017.06.041.\u003c/li\u003e\n\u003cli\u003eLi AH, Feng MY, Li YR, et al. (2016) Application of Outlier Mining in Insider Identification Based on Boxplot Method. Procedia Computer Science 91: 245-251. https://doi.org/10.1016/j.procs.2016.07.069.\u003c/li\u003e\n\u003cli\u003eLi XY, Li HQ (2013) A new method of identification of complex lithologies and reservoirs: task-driven data mining. Journal of Petroleum Science and Engineering 109: 241-249. https://doi.org/10.1016/j.petrol.2013.08.049.\u003c/li\u003e\n\u003cli\u003eLiu J, Min XL, Qi ZL, et al. (2024) Lithology identification using electrical imaging logging image: A case study in Jiyang Depression, China. Journal of Applied Geophysics 230: 105536. https://doi.org/10.1016/j.jappgeo.2024.105536.\u003c/li\u003e\n\u003cli\u003eLopes DMR, Andrade AJN (2019) Lithology identification on well logs by fuzzy inference. Journal of Petroleum Science and Engineering 180: 357-368. https://doi.org/10.1016/j.petrol.2019.05.044.\u003c/li\u003e\n\u003cli\u003eRen XX, Hou JG, Song SH, et al. (2019) Lithology identification using well logs: A method by integrating artificial neural networks and sedimentary patterns. Journal of Petroleum Science and Engineering 182: 106336. https://doi.org/10.1016/j.petrol.2019.106336.\u003c/li\u003e\n\u003cli\u003eRumelhart D, Hinton G, Williams R, et al. (1986) Learning representations by back-propagating errors. Nature 323: 533\u0026ndash;536. https://doi.org/10.1038/323533a0.\u003c/li\u003e\n\u003cli\u003eShuvo MdAI, Joy SMH (2024) A data driven approach to assess the petrophysical parametric sensitivity for lithology identification based on ensemble learning. Journal of Applied Geophysics 222: 105330. https://doi.org/10.1016/j.jappgeo.2024.105330.\u003c/li\u003e\n\u003cli\u003eSun J, Li Q, Chen MQ, et al. (2019) Optimization of models for a rapid identification of lithology while drilling - A win-win strategy based on machine learning. Journal of Petroleum Science and Engineering 176: 321-341. https://doi.org/10.1016/j.petrol.2019.01.006.\u003c/li\u003e\n\u003cli\u003eSun W, Li ZQ (2020) Hourly PM2.5 concentration forecasting based on feature extraction and stacking-driven ensemble model for the winter of the Beijing-Tianjin-Hebei area. Atmospheric Pollution Research 11(6):110-121. https://doi.org/10.1016/j.apr.2020.02.022.\u003c/li\u003e\n\u003cli\u003eTan MJ, Bai Y, Zhang HT, et al. (2020) Fluid typing in tight sandstone from wireline logs using classification committee machine. Fuel 271: 117601. https://doi.org/10.1016/j.fuel.2020.117601.\u003c/li\u003e\n\u003cli\u003eTewari S, Dwivedi UD (2019) Ensemble-based big data analytics of lithofacies for automatic development of petroleum reservoirs. Computers \u0026amp; Industrial Engineering 128: 937-947. https://doi.org/10.1016/j.cie.2018.08.018.\u003c/li\u003e\n\u003cli\u003eVapnik V (1995) The Nature of Statistical Learning Theory. New York :Springer.\u003c/li\u003e\n\u003cli\u003eWolpert DH (1992) Stacked generalization. Neural Network 5 (2): 241\u0026ndash;259. https://doi.org/10.1016/S0893-6080(05)80023-1.\u003c/li\u003e\n\u003cli\u003eWang GC, Carr TR (2012) Methodology of organic-rich shale lithofacies identification and prediction: A case study from Marcellus Shale in the Appalachian basin. Computers \u0026amp; Geosciences 49:151-163. https://doi.org/10.1016/j.cageo.2012.07.011\u003c/li\u003e\n\u003cli\u003eWang GC, Carr TR, Ju YW, et al. (2014) Identifying organic-rich Marcellus Shale lithofacies by support vector machine classifier in the Appalachian basin. Computers \u0026amp; Geosciences 64: 52-60. https://doi.org/10.1016/j.cageo.2013.12.002.\u003c/li\u003e\n\u003cli\u003eWang XD, Yang SC, Zhao YF, et al. (2018) Lithology identification using an optimized KNN clustering method based on entropy-weighed cosine distance in Mesozoic strata of Gaoqing field, Jiyang depression. Journal of Petroleum Science and Engineering 166: 157-174. https://doi.org/10.1016/j.petrol.2018.03.034.\u003c/li\u003e\n\u003cli\u003eWang H, Bi DM, He ZS, et al. (2025) Machine learning-based stacked ensemble model for predicting and regulating oxygen-containing compounds in nitrogen-rich pyrolysis bio-oil. Renewable Energy 241: 122330. https://doi.org/10.1016/j.renene.2024.122330. \u003c/li\u003e\n\u003cli\u003eWang YJ, Wang XX, Wang KY, et al. (2025) Lithology recognition and porosity prediction from well logs based on Convolutional Neural Networks and sliding window. Journal of Applied Geophysics 242: 105905. https://doi.org/10.1016/j.jappgeo.2025.105905.\u003c/li\u003e\n\u003cli\u003eXiang M, Qin PB, Zhang FW (2020) Research and application of logging lithology identification for igneous reservoirs based on deep learning. Journal of Applied Geophysics 173: 103929. https://doi.org/10.1016/j.jappgeo.2019.103929.\u003c/li\u003e\n\u003cli\u003eXie YX, Zhu CY, Zhou W, et al. (2018) Evaluation of machine learning methods for formation lithology identification: A comparison of tuning processes and model performances. Journal of Petroleum Science and Engineering 160: 182-193. https://doi.org/10.1016/j.petrol.2017.10.028.\u003c/li\u003e\n\u003cli\u003eZhang HR, Hu YT, Li XS, et al. (2024) Application of support vector machines and genetic algorithms to fluid identification in Offshore Granitic subduction hill reservoirs. Geoenergy Science and Engineering 240: 213013. https://doi.org/10.1016/j.geoen. 2024.213013.\u003c/li\u003e\n\u003cli\u003eZhai BX, Chen JG (2018) Development of a stacked ensemble model for forecasting and analyzing daily average PM2.5 concentrations in Beijing, China. Science of The Total Environment 635: 644-658. https://doi.org/10.1016/j.scitotenv.2018.04.040.\u003c/li\u003e\n\u003cli\u003eZhao XZ, Pu XG, Han WZ, et al. (2017) A new method for lithology identification of fine grained deposits and reservoir sweet spot analysis: A case study of Kong 2 Member in Cangdong sag, Bohai Bay Basin, China. Petroleum Exploration and Development 44: 524-534. https://doi.org/10.1016/S1876-3804(17)30061-7\u003c/li\u003e\n\u003cli\u003eZhang Y, Zhong HR, Wu ZY, et al. (2020) Improvement of petrophysical workflow for shear wave velocity prediction based on machine learning methods for complex carbonate reservoirs. Journal of Petroleum Science and Engineering 192: 107234. https://doi.org/10.1016/j.petrol.2020.107234.\u003c/li\u003e\n\u003cli\u003eZhang Y, Zhang CL, Ma QY, et al. (2022) Automatic prediction of shear wave velocity using convolutional neural networks for different reservoirs in Ordos Basin. Journal of Petroleum Science and Engineering 208: 109252. https://doi.org/10.1016/j.petrol.2021.109252.\u003c/li\u003e\n\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":false,"email":"","identity":"journal-of-petroleum-exploration-and-production-technology","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"","title":"Journal of Petroleum Exploration and Production Technology","twitterHandle":"","acdcEnabled":false,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"VoR Journals","inReviewEnabled":false,"inReviewRevisionsEnabled":false},"keywords":"Carbonate reservoirs, Lithology identification, Stacked-driven ensemble method, Wireline logs","lastPublishedDoi":"10.21203/rs.3.rs-8931314/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-8931314/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eCarbonate reservoirs are influenced by structure, deposition and diagenesis, making its lithology complex and diverse. Thus, carrying out the research of lithology identification is significantly essential for reservoir evaluation. With the development of intelligent methods and ensemble strategies, the stacked-driven ensemble method (SDEM) was proposed. Firstly, the lithologies were determined through the logging while drill (LWD) data and core observations. And six sensitive wireline logs were selected as the input of the SDEM, namely GR (natural gamma ray logging), DEN (density logging), AC (compensated acoustic logging), RLLD (deep lateral resistivity logging), PE (photoelectric absorption cross-section logging) and CNL (neutron logging). Then, a two-level SDEM was constructed. For the first level, base models, including BPNN (back-propagation neural network), SVM (support vector machine) and DT (decision tree), were used, while the DT was employed as the meta-model in the second level. In addition, the grid search method combined with 10-fold cross-validation was adopted to search for the optimal hyperparameters of SDEM. The results showed that the average classification accuracy of 10-fold cross-validation reached 95.1%, which was higher (approximately 2.6%) than any individual method. Finally, two cases in different regions in the Sulige gas field of Ordos Basin were discussed and the results showed that the proposed SDEM outperforms all other individual approaches or traditional ensemble learning methods (ELMs) with higher accuracy and superior performance. Subsequently, the developed approach is applicable to the predictive work in other oil and gas exploration fields, which can improve exploration precision and raise hydrocarbon production.\u003c/p\u003e","manuscriptTitle":"Lithology identification with wireline logs based on stacked- driven ensemble method for complex carbonate reservoirs","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2026-04-02 07:16:53","doi":"10.21203/rs.3.rs-8931314/v1","editorialEvents":[{"type":"communityComments","content":0},{"type":"editorInvitedReview","content":"","date":"2026-05-06T11:11:57+00:00","index":"hide","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2026-04-20T14:44:04+00:00","index":"hide","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2026-04-17T18:02:17+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"16354239427469282912000386725971560251","date":"2026-04-02T09:57:59+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"303460375097640390954142661855926968612","date":"2026-03-30T18:40:43+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"116985085338709148280945979558288755177","date":"2026-03-30T16:07:25+00:00","index":"hide","fulltext":""},{"type":"reviewersInvited","content":"","date":"2026-03-30T15:51:41+00:00","index":"","fulltext":""},{"type":"editorAssigned","content":"","date":"2026-03-01T00:08:59+00:00","index":"","fulltext":""},{"type":"checksComplete","content":"","date":"2026-02-21T10:37:37+00:00","index":"","fulltext":""},{"type":"submitted","content":"Journal of Petroleum Exploration and Production Technology","date":"2026-02-21T06:54:04+00:00","index":"","fulltext":""}],"status":"published","journal":{"display":false,"email":"","identity":"journal-of-petroleum-exploration-and-production-technology","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"","title":"Journal of Petroleum Exploration and Production Technology","twitterHandle":"","acdcEnabled":false,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"VoR Journals","inReviewEnabled":false,"inReviewRevisionsEnabled":false}}],"origin":"","ownerIdentity":"85ceead8-6494-4b9c-b46e-e991135a6f40","owner":[],"postedDate":"April 2nd, 2026","published":true,"recentEditorialEvents":[{"type":"editorInvitedReview","content":"","date":"2026-05-06T11:11:57+00:00","index":20,"fulltext":""}],"rejectedJournal":[],"revision":"","amendment":"","status":"under-review","subjectAreas":[],"tags":[],"updatedAt":"2026-04-02T07:16:53+00:00","versionOfRecord":[],"versionCreatedAt":"2026-04-02 07:16:53","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-8931314","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-8931314","identity":"rs-8931314","version":["v1"]},"buildId":"XKTyCvWXoU3ODBz1xrDgd","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

⚙ Ask this paper AI returns verbatim quotes from the full text · source: preprint-html ⓘ

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2026) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc: last seen: 2026-05-20T01:45:00.602351+00:00