Full text
215,414 characters
· extracted from
preprint-html
· click to expand
Real-Time Lithology and Log Prediction from Drilling Parameters Using Machine Learning for High-Pressure Salt-Bearing Formation, Missan Oilfields, Iraq | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article Real-Time Lithology and Log Prediction from Drilling Parameters Using Machine Learning for High-Pressure Salt-Bearing Formation, Missan Oilfields, Iraq Hayder Yousif, Xuri Huang, Osama Al-Salih, Hayder Shaaban This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-7771316/v1 This work is licensed under a CC BY 4.0 License Status: Under Review Version 1 posted 11 You are reading this latest preprint version Abstract Drilling through high-pressure, salt-bearing sequences poses severe operational challenges due to rapid pore-pressure fluctuations, borehole instability, and the complex, discontinuous lithologies typical of evaporites. This study develops and rigorously validates a single, unified machine-learning (ML) framework that simultaneously predicts lithology, formation members, and synthetic gamma-ray (GR) and sonic travel-time (DT) logs directly from surface drilling parameters, providing a practical alternative when wireline logging is risky, delayed, or impractical. A depth-indexed dataset of 30,500 records from four wells in the Buzurgan oilfield was compiled, including rate of penetration (ROP), weight on bit (WOB), revolutions per minute (RPM), torque, flow rate (FR), and standpipe pressure (SPP). Seven supervised algorithms were benchmarked: Random Forest (RF) and Extreme Gradient Boosting (XGBoost), tuned with Optuna and evaluated with held-out tests and blind-well validation, delivered the best performance. The optimized ensembles exceeded 97% accuracy for lithology classification and 99% for formation-member identification, while regressors showed strong agreement with wireline measurements (R² ≥ 0.93 for GR and ≥ 0.91 for DT). Feature-importance analyses indicated torque and WOB as the most influential predictors, consistent with their direct coupling to bit–rock interaction and formation strength; FR, SPP, and RPM contributed secondarily. Operationally, the framework supports real-time casing-point selection, proactive adjustments to drilling parameters, and mud-property optimization—capabilities that are especially valuable across critical salt–anhydrite intervals to reduce open-hole exposure, non-productive time (NPT), and well-control risk. Limitations include potential site-specific bias (four-well training within a single field), dependence on data quality and sensor calibration, and the need for prospective cross-field validation and concept drift monitoring as operating practices change. Nonetheless, the results demonstrate that a field-deployable, ensemble-based workflow can reliably replace or complement traditional formation-evaluation methods in high-pressure, salt-bearing environments, enabling faster, evidence-based decisions at the rig site within a transparent, interpretable ML framework. Machine learning Random Forest Lithology prediction Log prediction Missan oilfields Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Figure 6 Figure 7 Figure 8 Figure 9 Figure 10 Figure 11 Figure 12 Figure 13 1. Introduction 1.1 Background and Literature Review High-pressure, salt-bearing formations are among the most demanding drilling environments encountered in the oil and gas industry (Zilberman et al. 2002 ; Willson and Fredrich 2005 ; Loizzo et al. 2024 ; Cruz et al. 2024 ). Such geological settings frequently exhibit rapid, nonlinear variations in formation pressures, complex and heterogeneous rock mechanical properties, and heightened risks associated with borehole instability. Common operational issues include ballooning, washouts, and collapses of the borehole wall, which can significantly complicate drilling operations (Lao et al. 2012 ). Accurate real-time identification of lithology and precise determination of formation tops are essential under these challenging conditions. Timely and reliable geological data facilitate the optimization of drilling parameters, help minimize NPT, and reduce risks of critical incidents such as well control problems or severe mud losses. However, conventional formation evaluation methods such as mud logging and cuttings analysis provide limited vertical resolution, suffer from sample mixing, and are plagued by delays in data availability (Desouky et al. 2023 ). Meanwhile, wireline logging, though providing higher accuracy, is both expensive and hazardous under the extreme conditions typical of salt-bearing formations. In the Missan area, lithology and stratigraphic boundaries are traditionally inferred during and after drilling using mud logs, cuttings, and wireline logging, particularly in non-productive sections where these conventional methods are sufficient for identifying lithology. However, in deep salt intervals or high-risk zones, such approaches become impractical due to high costs, operational delays, and the elevated risks of open-hole logging in unstable or overpressured formations. This has created a growing demand for alternative approaches capable of delivering accurate, real-time geological interpretations while maintaining operational efficiency (Burak et al. 2024 ). In recent years, numerous studies have highlighted the promising capability of artificial intelligence (AI) and machine learning (ML) techniques, such as RF, XGBoost, Convolutional Neural Networks (CNNs), and others, for accurately predicting lithology and formation members based solely on surface drilling parameters (Zhou et al. 2011 ; Moazzeni and Haffar 2015 ; Gupta et al. 2020 ; Popescu et al. 2021 ; Yao et al. 2022 ; Desouky et al. 2023 ; Khalifa et al. 2023 ; Ibrahim et al. 2023 ; Gamal et al. 2024 ). However, based on the literature reviewed, it still lacks addressing lithology and log prediction in salt-bearing, high-pressure formations that use ML methods relying solely on surface data. Although several studies have successfully predicted sonic logs (Cao et al. 2017 ; Gowida and Elkatatny 2020 ; Gamal et al. 2022 ; Smith et al. 2022 ) or gamma-ray (Osarogiagbon et al. 2020 ; Aly et al. 2021 ; Gnyedykh et al. 2022 ; Ibrahim and Elkatatny 2022 ) using various ML approaches and various input data types, including some studies relying exclusively on drilling parameters (Gowida and Elkatatny 2020 ; Osarogiagbon et al. 2020 ; Aly et al. 2021 ; Gamal et al. 2022 ; Ibrahim and Elkatatny 2022 ), such geological settings, log responses deviate significantly from normal conditions due to the anomalous physical properties of evaporites, extreme pressure regimes, and the frequent presence of overpressured shales, these factors make salt-bearing intervals and HP zones both technically challenging and economically critical, as they directly impact wellbore stability, drilling safety, and reservoir characterization. This gap highlights the need for further research and validation of predictive models tailored to these complex environments, thereby underscoring the scientific value of developing ML-based methods adapted to salt-influenced and high-pressure formations. Accordingly, this study aims to address this gap by developing and validating ML models for real-time lithology and formation members prediction, as well as log prediction in high-pressure, salt-bearing formations. By integrating domain knowledge with intelligent algorithms, the study aims to enhance decision-making accuracy, reduce formation misclassification risks, and support safer and more efficient drilling practices under extreme geological conditions. 1.2 Problem Statement Despite their operational utility, conventional cuttings descriptions and mud logs have inherent limitations. Cutting samples typically provide low vertical resolution, commonly one to two meters, and are frequently mixed during transport to the surface due to lag time variation, irregular mud circulation, and borehole geometry effects (Clark et al. 1928 ; Salim and Lagraba P. 2018; Zong et al. 2024 ). These factors delay and distort the identification of lithological boundaries and formation tops, often introducing significant uncertainty in real-time formation evaluation. In deep, high-pressure salt-bearing sequences, these challenges become more pronounced. Rapid lithological alternations of halite, anhydrite, and shale, ductile deformation, and irregular stratigraphic continuity characterize such intervals. Accurately identifying the termination of the final salt bed is particularly critical, as pore pressures often drop sharply beyond this point, directly influencing casing-setting depths and well control safety. Conventional wireline logging, while more precise, is costly and hazardous in unstable or overpressured salt intervals, where extended open-hole exposure may jeopardize wellbore stability. Consequently, there is a pressing need for real-time, high-resolution predictive tools derived from surface drilling parameters that can overcome these limitations. By addressing distorted log responses and complex pressure regimes, such tools could substantially improve the reliability of formation evaluation and operational decision-making in these challenging environments. 1.3 Field Geological and Operational Context Figure 1 illustrates an integrated subsurface profile from the Buzurgan oilfield, establishing the geological and operational context for this study. The profile incorporates seismic reflectivity data, formation tops, lithological distribution, wellbore architecture, and the delineation between normal and overpressured regimes. Key formation members are marked as MB5 to MB1, where “MB” refers to individual members of the Fatha Formation—a regionally significant stratigraphic unit within the Mesopotamian Basin of Iraq and Iran (Jassim and Goff 2006 ). In this field operation, between approximately 2,100 m and 2,800 m depth, the well trajectory normally traverses formation members (MB5) through the uppermost part of MB1, with the MB4–MB2 interval comprising a thick, high-pressure salt-bearing zone. Consistent with standard well construction practices in the region, a 17 − 1/2″ section was constructed and secured with 13 − 3/8″ casing down to 2,100 m to isolate the normal-pressure interval. Subsequently, a 12 − 1/4″ section was completed and cased with 9 − 5/8″ casing across the overpressured salt-bearing formation (2,100–2,800 m). The final phase extended the wellbore in an 8 − 1/4″ open hole, which is then lined with a 6 − 5/8″ liner to reach the target Mishrif reservoir at approximately 3,700 m. This staged casing design was adopted to preserve wellbore integrity and ensure effective pressure management throughout the drilling operation. The integrated geological and operational profile highlights the stratigraphic complexity and pressure heterogeneity within the study interval, which significantly complicate real-time lithology interpretation and formation evaluation. The adjacent lithology column reveals the rapid alternation between shale, anhydrite, and massive salt layers, which introduces substantial uncertainty into decisions regarding casing depths, mud-weight programs, and pore-pressure management strategies. For simplicity, these interbedded shale intervals will hereafter be referred to collectively as “shale”. It is important to note, however, that the shale in this context is predominantly dolomitic shale, which explains the relatively low Gamma Ray readings observed at certain depths despite being lithologically classified as shale. The geological setting of the study area is characterized by the presence of several key members of the Fatha Formation, with the MB1 member serving as both the basal unit of stratigraphic significance and the operational base. Figure 2 presents a time-structure contour map of the Top MB1 surface, illustrating spatial variations in elevation time across the Missan oilfields. All studied wells (BU-N1 to BU-N5) are positioned above the Buzurgan Anticline, with BU-N3 and BU-N4 located near the structural crest and BU-N1 and BU-N5 situated closer to the flanks. 1.4 Research Objectives This study aims to develop a machine learning-based predictive framework capable of simultaneously predicting lithology, formation members, and synthetic logs (gamma-ray and sonic logs) solely from surface drilling parameters within a single unified system. By leveraging advanced supervised learning algorithms and automated hyperparameter optimization, the framework is designed to deliver highly accurate predictions that can be integrated into real-time drilling advisory systems. Beyond improving drilling safety and decision-making, the system provides on-site geologists and engineers with continuous, data-driven insights that replace subjective interpretations and manual correlations. This enables faster and more reliable formation evaluation, while also guiding real-time adjustments to drilling parameters, mud properties, and casing-setting depths, thereby enhancing efficiency and reducing operational risks. 1.5 Operational Relevance and Field Applications Drilling operations in high-pressure, salt-bearing formations pose substantial technical and safety challenges (Willson and Fredrich 2005 ; Moiseenkov et al. 2019 ; Jin et al. 2023 ). In these complex environments, rapid and reliable predictions of lithology, formation members, and petrophysical properties—such as gamma-ray and sonic logs—are critical for minimizing NPT and ensuring safe well construction. One of the most pressing operational concerns in these intervals is the narrow time window between drilling through the final salt layer and installing casing. Delays or misjudgments in this stage can lead to severe mud losses, borehole instability, and compromised well integrity. The developed ML framework directly addresses these challenges by delivering continuous, high-resolution predictions derived solely from real-time drilling parameters. This capability provides well-site geologists with an objective decision-support tool, enabling them to compare synthetic logs (e.g., gamma ray and sonic) against offset wells in real time and refine formation correlations under conditions where traditional methods are unreliable. In parallel, drilling supervisors and mud engineers benefit from predictive insights that guide immediate adjustments to weight on bit, rotary speed, flow rates, and mud properties. These proactive measures optimize penetration rates, improve hole cleaning, and mitigate well-control risks. Furthermore, predictive modeling helps extend bit life by reducing mechanical stress during lithology transitions and supports more accurate casing-setting decisions in overpressured zones. By integrating predictive modeling with operational workflows, the proposed system transforms formation evaluation and drilling practices from reactive, experience-driven processes into proactive, data-driven operations. This integration enhances drilling safety, reduces NPT, and improves overall efficiency in the most technically demanding sections of the well. 2. Data and Methodology 2.1 Data Description The dataset analyzed in this study comprises 30,500 depth-indexed records collected from four Buzurgan wells (BU-N1 through BU-N4), covering the 2,100–2,800 m interval—a high-pressure, salt-bearing section drilled with a 12¼-inch bit. Each record includes drilling parameters (ROP, RPM, WOB, torque, FL, and SPP) along with measured depth. The BU-N5 well was entirely reserved for blind validation to evaluate the model’s generalization capability. This structured dataset provided a robust foundation for model training, internal assessment, and external real-time testing. Figure 3 summarizes the distribution of lithology and stratigraphy in the dataset. The upper plots show that shale is the dominant lithology, accounting for 51.7% of the samples, followed by anhydrite (27.8%) and salt (20.5%). The lower plots display the distribution of samples across the members of the Fatha Formation. The MB4 member contains the highest proportion of samples (43.7%), followed by MB5 (31.6%), MB3 (19.7%), MB2 (5.07%), and a minimal fraction from the uppermost 0.5 m of MB1 (0.0719%). These distributions underscore the stratigraphic variability within the study interval and inform class balancing considerations for supervised ML. 2.2 Workflow Overview Figure 4 illustrates the comprehensive workflow implemented in this study, covering the entire sequence from data acquisition to real-time operational decision-making. The workflow was developed to process well-log datasets and integrate machine-learning techniques for predicting petrophysical properties and lithological classifications in complex, high-pressure drilling environments. Data acquisition involved collecting subsurface data from multiple sources, including Final Well Reports (FWR), Daily Geological Reports (DGR), Master Logs, and Wireline Logging (WL) Reports. An initial exploratory data analysis (EDA) phase was conducted to investigate data distributions, detect inconsistencies, and assess the relationships between potential input features and target variables. This stage was followed by feature selection procedures to identify the most informative parameters influencing lithology, GR, and DT responses. The data were subsequently normalized and preprocessed to address scaling disparities, manage missing values, and ensure consistency across different input features. Following preprocessing, the dataset was randomly split into training and testing subsets in an 80:20 ratio to evaluate the generalization performance of the models objectively. A suite of ML algorithms was implemented for both regression and classification tasks, including LR, RF, SVM, XGBoost, MLP, TabNet, and CNN. Regression models were trained to predict continuous log responses (GR and DT) while classification models were used for lithology identification and formation member prediction. Hyperparameter tuning was conducted using the Optuna optimization framework coupled with cross-validation procedures to enhance model performance and mitigate overfitting risks. The model evaluation relied on task-appropriate performance metrics: Coefficient of determination (R²), Mean Absolute Error (MAE), and Root Mean Square Error (RMSE) for regression tasks; and Accuracy, Precision, Recall, F1-score, and Matthews Correlation Coefficient (MCC) for classification models. The best-performing models were saved and subsequently applied to new incoming well data in real-time, with continuous data normalization and preprocessing performed to maintain workflow integrity. Predictions were compared against actual field measurements to assess accuracy and validate model robustness. This iterative feedback loop facilitated ongoing model refinement and operational decision support, improving drilling safety, formation evaluation accuracy, and operational efficiency in complex and overpressured environments. 2.3 Data Preprocessing Prior to any analytical or modeling tasks, the dataset undergoes a rigorous preprocessing workflow designed to enhance data integrity and ensure suitability for ML applications. This workflow addresses several key aspects: handling missing values, detecting and mitigating outliers, reconciling multi-resolution data sources, and standardizing feature scales. A notable challenge involves harmonizing lithological descriptions, typically recorded at coarse 1–2 m intervals in master lithology logs, with drilling parameter and wireline data, both acquired at a finer vertical resolution of 0.1 m. Leveraging domain expertise and field-proven experience from drilling dozens of wells in the Missan oilfields, a manual alignment procedure is implemented to resolve this discrepancy. This alignment process is applied only to the training and internal evaluation datasets, where ground-truth lithology and log measurements are available. In contrast, blind-test datasets used for real-time validation require no such alignment, as the models operate solely on surface drilling parameters without access to labeled geological data. 2.3.1 Outlier Detection and Initial Data Assessment Figure 5 presents box plots for the primary drilling and logging parameters, including ROP, WOB, RPM, SPP, torque, FL, GR, and DT. These visualizations offer an initial assessment of the data’s distribution and variability, supporting the identification of potential outliers. Each box plot displays the median, interquartile range (IQR), and any data points lying outside the whiskers, which are flagged as possible outliers. The results indicate that parameters such as ROP and RPM exhibit substantial dispersion and noticeable skewness, while more symmetric and narrowly clustered distributions characterize GR and DT logs. This preliminary analysis helps in understanding the quality and characteristics of the dataset before model development. In addition to parameter-wise outlier screening, lithology-specific boxplots of GR and DT were generated for the three principal rock types: shale, anhydrite, and salt. This stratified visualization highlights petrophysical anomalies within each lithologic boundary, allowing for the identification of extreme values, particularly in mixed facies such as shaly anhydrite, dirty salt, or transitional intervals. These outliers serve as indicators of intra-formational heterogeneity. Such insights are crucial for refining lithology labels, enhancing data quality, and ultimately improving the predictive accuracy of downstream ML models. 2.3.2 Addressing Class Imbalance and Feature Normalization To address the severe class imbalance, particularly the underrepresentation of the MB1 class, which constitutes only 0.0719% of the dataset, the Synthetic Minority Over-sampling Technique (SMOTE) was employed using Python’s imbalanced-learn library. SMOTE generates synthetic samples by interpolating between existing minority class instances and their nearest neighbors in feature space (Elreedy and Atiya 2019; Pelayo and Dick 2019 ), thereby creating more diverse and representative training data. In this study, 500 synthetic MB1 samples were generated and incorporated, significantly improving the class distribution and enabling more equitable and stable classification performance by reducing bias toward dominant formations. Following outlier treatment and class rebalancing, feature normalization was applied to all numerical input variables to ensure equal contribution during model training. Normalizing features such as ROP, SPP, WOB, torque, RPM, and FR resulted in standardized distributions centered around zero. This transformation enhances model convergence speed, training efficiency, and predictive stability across both regression and classification tasks (Abdi 2022 ; Raymaekers and Rousseeuw 2024 ). 2.4 Model Development and Evaluation To develop a robust predictive framework, seven supervised ML algorithms are employed, including RF, SVM, LR, XGBoost, MLP, CNN, and TabNet. These models are chosen to represent a broad spectrum of learning paradigms, spanning traditional statistical methods, ensemble approaches, and advanced deep learning architectures suitable for structured tabular data. The dataset was randomly partitioned into 80% for training and 20% for testing to evaluate each model’s generalization capability objectively. Baseline performance was first established using default hyperparameters, followed by targeted hyperparameter optimization with Optuna—an efficient and flexible framework that leverages Bayesian optimization via a define-by-run strategy. This optimization facilitated a dynamic and adaptive exploration of the hyperparameter space, yielding optimal model configurations. Model performance was quantitatively assessed on the independent test set. Given the dual-task nature of the classification—categorical lithology and continuous log regression—metrics were selected accordingly. For classification, accuracy, precision, recall, F1-score, and Matthews Correlation Coefficient (MCC) are reported, with MCC offering a robust evaluation in the presence of class imbalance. For regression tasks predicting gamma-ray (GR) and sonic travel-time (DT) logs, the coefficient of determination (R²), mean absolute error (MAE), and root mean square error (RMSE) are used to assess both predictive accuracy and variance explanation. 2.5 Validation Strategy While model performance metrics provide valuable benchmarks under controlled conditions, they do not fully capture the operational variability and uncertainties inherent to real-time drilling environments (Gnyedykh et al. 2022 ; Elmgerbi et al. 2022 ). Accordingly, an operational validation strategy was implemented to assess model robustness and practical utility under dynamic field conditions. Specifically, an independent validation was performed using data from a well that is not used in the training, within the Buzurgan oilfield. Although the primary goal of drilling in this region is to reach and produce from hydrocarbon-bearing reservoirs situated beneath the salt-bearing intervals, penetrating and drilling through the thick, overpressured salt section is an unavoidable prerequisite for accessing these deeper targets. The validation procedure involved the real-time application of the trained predictive models during the drilling of the salt-bearing interval. Figure 6 illustrates depth profiles of key drilling parameters collected from the external validation well BU-N5 over a depth range from 2080 to 2800 meters. These drilling parameters provided exclusive input data for the models, enabling continuous real-time predictions of lithology, well tops, and synthetic logs without relying on downhole logging tools. This real-time operational validation strategy enabled an assessment of the model’s robustness, reliability, and adaptability under realistic field conditions within the complex salt-bearing interval. 3. Results and Model Interpretation 3.1 Correlation Analysis The relationships between drilling parameters (ROP, WOB, RPM, SPP, torque, and FR) and key geological target variables, GR and DT, lithology, and well tops, were systematically evaluated using Pearson and point-biserial correlation heatmaps (Fig. 7 ). These heat maps provided a quantitative foundation for identifying the most informative drilling parameters relevant to subsurface geological variations. Among all drilling-surface parameters, torque emerged as the most diagnostic signal, displaying a strong negative Pearson correlation (Fig. 7 a, b) with both gamma ray (r = − 0.65) and sonic travel time (r = − 0.63). Higher torque, therefore, coincides with cleaner, mechanically competent formations—those that register lower GR values (reduced shale content) and shorter DT values (higher acoustic velocity). Point-biserial correlation further clarified lithology-specific drilling behavior (Fig. 7 c). Torque shows a strong negative association with shale, consistent with its mechanically soft character, and a positive association with anhydrite, which is hard and drilling-resistant. Salt intervals, although typically drilled with lower WOB and higher ROP, still generate elevated torque because salt is ductile yet mechanically stiff (Aubertin et al. 1999 ). This contrast is most pronounced at salt–shale and anhydrite–shale interfaces, where torque exhibits a sharp drop as the bit crosses between the mechanically stiff units (salt or anhydrite) and the softer shale, regardless of the direction of transition. These trends are visually reinforced in Fig. 8 , which presents scatter plots of torque versus GR and DT, color-coded by lithology (shale, salt, and anhydrite). The scatter plots clearly illustrate that shale beds correspond to high GR, high DT, and low torque. Salt exhibits moderate DT, low GR, and elevated torque. Low DT, low GR, and the highest torque levels mark anhydrite. This visualization complements the numerical correlation results in Fig. 7 c, providing intuitive insight into how lithological changes are reflected in real-time drilling behavior. Furthermore, specific formation members showed meaningful associations with drilling parameters (Fig. 7 d). For example, ROP was positively correlated with MB4 and negatively correlated with MB5, suggesting variable drilling efficiency across these formation members. 3.2 Feature-Importance Analysis (SHAP) SHAP (SHapley Additive exPlanations) is a game-theoretic approach that assigns each feature a contribution value toward a model’s prediction, providing a consistent framework for interpreting complex machine-learning models (Lundberg and Lee 2017 ). The geological–mechanical analysis of SHAP feature-importance results (Fig. 9 ) highlights how drilling parameters influence the four RF models developed in this study. For the regression targets, torque emerges as the dominant predictor. In the DT model, torque accounts for more than half of the total SHAP contribution, whereas ROP and FR share a secondary influence, followed by WOB and SPP, while RPM exerts nearly negligible impact. A similar pattern appears in the GR model, where torque again ranks first, followed by FR and SPP, with ROP and WOB contributing only marginally. These outcomes confirm that the mechanical resistance sensed at the bit is strongly coupled to both acoustic velocity and natural gamma response. For the classification targets, torque remains the leading driver of the lithology classifier, with WOB and ROP providing complementary discriminatory power. While FR, SPP, and RPM contribute progressively less. In contrast, the formation member classifier is governed primarily by flow rate, followed by ROP and RPM, whereas torque and WOB play lesser yet non-negligible roles. This inversion suggests that hydraulic flow variations are more sensitive to stratigraphic boundaries than to absolute rock strength. At the same time, the mechanical signature of torque and bit load most effectively captures changes in lithology. Collectively, these findings demonstrate that (i) torque is the most informative real-time variable for predicting petrophysical properties (GR, DT) and lithology, and (ii) flow-related parameters (FR, ROP) become pivotal when detecting formation tops. Such insights can guide well-site engineers in prioritizing torque monitoring for lithology transitions and focusing on flow dynamics when anticipating stratigraphic tops, particularly in high-pressure, salt-bearing intervals, where rapid decisions are crucial. 3.3 Classification Performance The classification models developed for predicting lithology and formation members were evaluated using confusion matrices (Fig. 10 ) and quantitative metrics summarized in Table 1 . This evaluation included accuracy, precision, recall, F1-score, and MCC, providing comprehensive insights into model strengths and misclassification tendencies. 3.3.1 Lithology Classification The confusion matrices of the RF models, presented in Figs. 10 a and 10 b, illustrate the performance of the lithology classification model across both the training and testing datasets. The model demonstrated robust performance, particularly in the accurate classification of shale, with 12,053 correct predictions in the training set and 3,043 in the testing set. Minor misclassifications were observed, suggesting subtle ambiguities or transitional characteristics between lithologies, which may indicate the presence of mixed lithological features or gradational contacts within the formation. The testing confusion matrix confirmed good generalization capability, though it showed a slight increase in misclassifications compared to the training dataset. This modest decline in performance is expected, as models typically face greater challenges when applied to unseen data. Quantitatively, RF delivered the highest performance, achieving an accuracy of 0.9798 with a corresponding MCC of 0.9660 during the training phase. Its strong performance was consistent in testing, maintaining an accuracy of 0.9762 and an MCC of 0.9596. XGBoost closely matched RF performance, with training accuracy of 0.9779 and testing accuracy of 0.9729, alongside MCC values of 0.9630 and 0.9540, respectively. In contrast, SVM and CNN yielded lower testing accuracies (0.9311 and 0.8724, respectively). This reduction is unlikely to stem from sample size (≈ 30,000 records) but rather from a model–data mismatch. Convolutional networks are tailored to exploit local spatial stationarity and weight sharing on grid-like inputs with strong spatial autocorrelation (e.g., images), which is not the case for tabular drilling parameters (Lecun et al. 2015 ; Goodfellow et al. 2017). Under such conditions, tree-based ensembles such as RF and XGBoost typically capture heterogeneous, non-monotonic interactions more effectively, which is consistent with their superior performance in this context (Chen and Guestrin 2016 ). Table 1 Performance metrics (Accuracy, Precision, Recall, F1-score, MCC) for lithology and formation members predictions of the seven models, highlighting the strong superiority of ensemble models. Target Models Training Testing Accuracy Precision Recall F1-score MCC Accuracy Precision Recall F1-score MCC Lithology prediction RF 0.9798 0.98 0.98 0.98 0.966 0.9762 0.098 0.98 0.98 0.9596 SVM 0.9322 0.93 0.93 0.93 0.8863 0.9311 0.93 0.93 0.93 0.883 LR 0.8267 0.83 0.83 0.83 0.7085 0.8338 0.83 0.83 0.83 0.7172 XGBoost 0.9779 0.98 0.98 0.98 0.963 0.9729 0.97 0.97 0.97 0.954 MLP 0.8666 0.87 0.87 0.87 0.7794 0.8673 0.87 0.87 0.87 0.7776 TabNet 0.8968 0.9 0.9 0.9 0.8187 0.8977 0.9 0.9 0.9 0.8214 CNN 0.8693 0.87 0.87 0.87 0.7853 0.8724 0.88 0.87 0.87 0.7876 Formation members prediction RF 0.9986 0.99 0.99 0.99 0.9979 0.9979 0.99 0.99 0.99 0.9969 SVM 0.9669 0.97 0.97 0.97 0.9495 0.9641 0.96 0.96 0.96 0.9452 LR 0.7202 0.71 0.72 0.71 0.5677 0.7207 0.71 0.72 0.71 0.5686 XGBoost 0.99 0.99 00.99 0.99 0.99 0.99 0.99 0.99 0.99 0.99 MLP 0.7919 0.79 0.79 0.78 0.6787 0.7868 0.79 0.79 0.78 0.6708 TabNet 0.8786 0.88 0.88 0.88 0.8161 0.8740 0.87 0.87 0.87 0.8099 CNN 0.8403 0.84 0.84 0.84 0.7576 0.8395 0.84 0.84 0.84 0.7564 3.3.2 Formation Members Classification The confusion matrices for the classification of formation members using the RF model across formation members (MB1–MB5) (Figs. 10 c and 10 d) demonstrated excellent performance in accurately capturing geological boundaries. Training results revealed nearly perfect classification for all formation members (e.g., 9,848 correct classifications for MB4 and 7,621 for MB5). This exceptional performance persisted in the testing phase, indicating high model reliability and effectiveness. The quantitative results from Table 1 support this conclusion. Both RF and XGBoost exhibited outstanding results, achieving near-perfect accuracy (approximately 0.9986 for training and 0.9979 for testing) along with consistently high precision, recall, and F1 Scores (0.99). MCC values also remained very high (approximately 0.9979 for training and 0.9969 for testing). In contrast, LR and CNN achieved lower accuracies (0.7207 and 0.8395, respectively), suggesting that these models may be less effective in capturing the intricate, nonlinear geological relationships in this dataset. Overall, the results unequivocally demonstrate the superiority of ensemble methods (RF and XGBoost) in effectively classifying lithology and identifying geological formations. Such reliable performance validates their utility for accurate, real-time geological interpretation and informed decision-making in subsurface exploration and production. 3.4 Regression Performance 3.4.1 Gamma-ray Log Prediction Table 2 summarizes the performance metrics for GR prediction across various models. Among the tested models, RF delivered the highest accuracy, with training and testing R² values of 0.9327 and 0.9155, respectively. RF also demonstrated low error values (MAE = 1.824, RMSE = 2.9946 training; MAE = 2.0941, RMSE = 3.3388 testing), indicating robust predictive performance and reliability. XGBoost closely matched RF performance, achieving R² values of 0.9318 in training and 0.9154 in testing, with comparably low errors (MAE = 1.8621, RMSE = 3.0151 training; MAE = 2.1196, RMSE = 3.3417 testing). Other evaluated models, including the CNN and SVM, exhibited moderate predictive capability, as reflected by lower R² values in both the training and testing phases (CNN: R² = 0.8301, testing; SVM: R² = 0.7767, testing). Conversely, LR underperformed substantially, yielding an R² of only 0.4647 in testing, along with significantly higher errors. 3.4.2 Sonic log Prediction Similarly, Table 2 provides a comprehensive evaluation of the models predicting DT. Again, ensemble methods significantly outperformed the other models. RF yielded the best results, with an R² of 0.9422 during training and 0.9298 during testing. Associated errors remained low (MAE = 2.9827, RMSE = 5.1758 training; MAE = 3.3883, RMSE = 5.7255 testing), demonstrating the model’s predictive robustness. XGBoost exhibited comparable accuracy, with an R² of 0.9415 in training and 0.9296 in testing, alongside similarly low error metrics (MAE = 3.065 in training; MAE = 3.4613 in testing). Models such as CNN, TabNet, and MLP showed limited performance, with significantly lower R² values (approximately 0.63–0.71) and higher error values. Again, linear regression (LR) delivered particularly weak performance (R² ~0.50). Overall, this comprehensive evaluation confirms that ensemble-based regression models, notably Random Forest and XGBoost, are effective in predicting petrophysical parameters from drilling data. The demonstrated high accuracy and generalization potential substantiate their practical value for precise, real-time geological interpretation and informed operational decision-making. Table 2 Performance Metrics of Training and Testing Data for Gamma Ray (GR) Prediction and Delta Time (DT) Prediction Using Various ML Models, Including Tuned Final Models for the Best Models. Target Models Training Testing R 2 MAE RMSE R 2 MAE RMSE GR prediction RF 0.9327 1.824 2.9946 0.9155 2.0941 3.3388 SVM 0.7924 3.3106 5.2605 0.7767 3.4884 5.4286 LR 0.4821 6.6446 8.3092 0.4647 6.6882 8.4057 XGBoost 0.9318 1.8621 3.0151 0.9154 2.1196 3.3417 MLP 0.7727 3.9411 5.5044 0.7592 4.0571 5.6373 TabNet 0.6581 5.0075 6.6556 0.6556 5.0288 6.6815 CNN 0.8472 3.1816 4.5126 0.8301 3.3599 4.736 DT prediction RF 0.9422 2.9827 5.1758 0.9298 3.3883 5.7255 SVM 0.8072 5.9627 9.4543 0.8018 6.2573 9.6178 LR 0.5064 12.0005 15.1286 0.5001 12.019 15.2751 XGBoost 0.9415 3.065 5.2097 0.9296 3.4613 5.7344 MLP 0.6388 9.9331 12.941 0.6321 10.00 13.1042 TabNet 0.7067 8.551 11.7284 0.7113 8.4855 11.6702 CNN 0.698 8.665 11.8328 0.6961 8.7433 11.9101 3.4.3 Actual vs. Predicted Analysis of Regression Scatter plots (Fig. 11 ) were generated to visually assess and further validate the predictive reliability of the RF regression models for GR and DT. The plots clearly depict a strong linear relationship, with predicted values closely aligning along the ideal identity line, confirming high predictive accuracy. For DT, the training dataset exhibited an R² of 0.9422 (Fig. 11 a), while the testing dataset showed consistent accuracy with an R² of 0.9298 (Fig. 11 b). GR predictions similarly demonstrated strong performance, achieving an R² of 0.9327 for training (Fig. 11 c) and maintaining high accuracy in testing with an R² of 0.9155 (Fig. 11 d). These visual results reinforce the quantitative findings presented in Table 2 , underscoring the robust generalization capability of the ensemble regression models. 3.5 Field Well Validation Results To rigorously assess the generalization capability and practical effectiveness of the developed models, external validation was performed in real-time using data from a well not included in the training data, BU-N5, located within the Buzurgan oilfield. The results from this validation are presented in two main subsections: Classification and Regression. 3.5.1 Classification Results Quantitative performance metrics for lithology and formation members classification by RF and XGBoost models for the well BU-N5 are summarized in Table 3 . The lithology classification demonstrated excellent accuracy, with RF and XGBoost achieving accuracy values of 0.9731 and 0.9718, respectively. Both models exhibited consistent precision, recall, and F1-scores of 0.97, along with high MCC values (0.9570 for RF and 0.9549 for XGBoost). For the formation members classification, RF exhibited near-perfect performance, recording an accuracy of 0.9963 and MCC of 0.9944, while XGBoost closely matched this robust performance (accuracy: 0.9663, MCC: 0.9944). Visual comparisons of actual and predicted lithology and formation members are provided in Fig. 12 . This fig. clearly illustrates the strong model performance in lithological identification, accurately distinguishing interbedded layers of shale, anhydrite, and salt. Major stratigraphic boundaries (MB5 to MB1) were also captured effectively, confirming model reliability and highlighting minor discrepancies that are minimal within operational contexts. These classification results collectively validate the practical robustness and reliability of both RF and XGBoost models for geological interpretation and formation identification under real-time drilling scenarios. Table 3 Performance metrics of RF and XGBoost models on external validation well BU-N5 for lithology and formation members. Target Models Application Accuracy Precision Recall F1-score MCC Lithology prediction RF 0.9731 0.97 0.97 0.97 0.957 XGBoost 0.9718 0.97 0.97 0.97 0.9549 Formation Members RF 0.9963 0.99 0.99 0.99 0.9944 XGBoost 0.9663 0.99 0.99 0.99 0.9944 3.5.2 Regression Results Regression results for predicting petrophysical properties, Gr and DT, from the validation well (BU-N5) are summarized quantitatively in Table 4 . The RF model demonstrated robust performance in predicting gamma ray values, achieving an R² of 0.9342, MAE of 1.8775, and RMSE of 3.1539. XGBoost displayed comparable accuracy, with an R² of 0.9337, MAE of 1.9251, and RMSE of 3.1640. Similarly, for DT, RF yielded strong predictive performance (R² of 0.9332, MAE of 3.0465, RMSE of 5.3102), while XGBoost closely matched these results (R² of 0.9321, MAE of 3.1234, RMSE of 5.3543). Figure 13 presents a detailed visual comparison between the actual and predicted GR and DT logs across the depth interval of 2080–2800 m. The predicted logs exhibit a strong agreement with the measured data, effectively capturing the overall trends and local variations in both GR and DT values. Minor discrepancies between predicted and actual curves remain within acceptable operational tolerances, underscoring the robustness of the models in accommodating real-time variations commonly encountered under field conditions. Taken together, these regression outcomes, supported by quantitative evaluation metrics (Table 4 ) and visual evidence (Fig. 13 ), highlight the high predictive reliability and practical applicability of the RF and XGBoost models in real-time drilling operations. Table 4 Regression performance metrics (R², MAE, RMSE) for GR and DT predictions using external validation data. Target Models R 2 MAE RMSE GR prediction RF 0.9342 1.8775 3.1539 XGBoost 0.9337 1.9251 3.164 DT prediction RF 0.9332 3.0465 5.3102 XGBoost 0.9321 3.1234 5.3543 4. Discussion The primary aim of this study is to develop robust ML models capable of accurately predicting lithology, formation members, GR, and DT solely from surface drilling parameters. The results presented in Section 3 demonstrate the strong predictive accuracy and generalization capability of the developed models, particularly the ensemble-based methods RF and XGBoost. These models are designed to assist geologists in identifying formation members and determining casing landing zones in real-time, especially in high-pressure zones where operational risks are elevated and additional precautions are essential. Additionally, they provide valuable guidance for drilling engineers and mud engineers, supporting their decisions on drilling parameters, mud-weight adjustments, and operational strategies to enhance drilling safety, efficiency, and accuracy. 4.1 Interpretation of Relationships between Drilling Parameters and Geological Properties Correlation analysis (Section 3.1 ) revealed meaningful relationships between key drilling parameters and subsurface geological and petrophysical properties. Torque exhibited a negative correlation with GR values, indicating that intervals with lower GR values, typically cleaner, non-shaly lithologies such as anhydrite and salt, tend to generate higher torque. Conversely, shale intervals, characterized by high GR readings, consistently showed lower torque values, likely due to their softer and less mechanically resistant nature. A similar pattern is evident in the torque–DT relationship. Shale, which exhibits the highest DT values (reflecting low acoustic velocity), is associated with the lowest measured torque. Salt, with intermediate DT values, yields moderate torque, whereas anhydrite—marked by low DT (high velocity and stiffness)—produces the highest torque response. This trend highlights a direct correlation between formation stiffness and the torque generated at the bit. As for WOB, positive correlations were observed with anhydrite, reflecting its high mechanical resistance. However, this trend does not extend to salt, which, despite generating high torque, requires relatively low WOB due to its ductile and easily penetrable nature. This observation aligns with field evidence indicating that salt formations can often be drilled rapidly (1–3 minutes per meter) with moderate or even low bit pressure despite elevated torque responses. Furthermore, specific formation members showed meaningful associations with drilling parameters (Fig. 7 d). For example, the ROP was positively correlated with MB4 and negatively correlated with MB5, suggesting variable drilling efficiency across these members. This difference in drilling rate can be attributed primarily to the presence of thick salt layers within MB4, typically drilled rapidly, and secondly, to the abnormally high pore pressure in MB4, which mechanically weakens the strata and further enhances drilling rates. This correlation-based understanding supports accurate formation identification, optimal bit selection, and real-time lithology prediction during drilling operations. In summary, these findings underscore the diagnostic value of torque and WOB as real-time indicators of lithological transitions. Torque, in particular, appears sensitive to both formation stiffness and acoustic response, making it a powerful parameter for anticipating lithological changes during drilling operations. 4.2 Comparative Analysis of Model Performance Classification performance results (Section 3.3 ) clearly showed that RF and XGBoost outperformed LR and SVM, which are comparatively less flexible for capturing highly nonlinear and heterogeneous geological patterns. Ensemble methods consistently delivered near-perfect classification metrics, validating their effectiveness in handling complex, nonlinear geological patterns and heterogeneity in formations. Previous studies corroborate these findings, underscoring the robustness of ensemble methods for geological classification tasks in drilling contexts. Regression results (Section 3.4 ) confirmed the superiority of RF and XGBoost models, which demonstrated high predictive accuracy (R² values of 0.91–0.94) for both gamma ray and sonic travel time predictions. Low error values (RMSE and MAE) further substantiate their precision, making these models highly reliable for real-time log prediction. The notable underperformance of traditional regression models, such as LR, emphasizes the necessity of employing more sophisticated, nonlinear ML techniques when dealing with complex subsurface conditions. Although R² was approximately 0.93, visual inspection reveals an almost perfect match between the actual and predicted curves (Fig. 13 ) for both GR and DT logs. In many instances, the differences between predicted and actual values are minimal and do not affect geological interpretation. For example, at a depth of 2470 meters, the model predicted a DT value of approximately 68 µs/ft compared to an actual recorded value of 67 µs/ft. This slight deviation does not alter the lithological classification, as both values fall within the typical range for salt layers. However, from a purely mathematical standpoint, such minor differences contribute to the remaining 7% of unexplained variance in the R² calculation. Consequently, these results confirm the strong practical performance and operational reliability of the model, even when numerical predictions are not entirely precise. This analysis highlights a fundamental limitation of using strict numerical accuracy measures to evaluate model performance in geoscientific applications, where interpretive accuracy often holds greater practical significance. 4.3 Validation through Independent External Well Data The external validation results (Section 3.5 ) provided critical insights into the applicability of the developed models in realistic operational scenarios. The quantitative metrics summarized in Tables 3 and 4 , combined with visual evaluations presented in Figs. 12 and 13 , demonstrate excellent performance for both classification and regression predictions. Lithology classification displayed high consistency, effectively capturing thinly interbedded layers, while formation member predictions accurately identified major geological boundaries. Minor discrepancies were observed between the predicted and actual formation boundaries and petrophysical logs, which remained within acceptable operational ranges, aligning with the typical geological uncertainties inherent in subsurface operations. This external validation confirmed that both RF and XGBoost models generalize effectively to new wells within the Buzurgan oilfield, highlighting their reliability for practical applications. Real-time operational testing further underscored the robustness, adaptability, and significant practical value of these models in supporting decision-making processes during drilling operations. 4.4 Limitations and Practical Considerations Despite the strong performance achieved, several limitations must be acknowledged. First, the models were trained exclusively on surface drilling parameters collected from the Buzurgan field, which may have limited their predictive accuracy in significantly different geological settings with varying lithologies, pore-pressure regimes, or bit hydraulics. Second, the training data consisted of only three distinct rock types, salt, anhydrite, and shale, with pronounced mechanical contrasts and sonic log responses. Generalizing these models to include additional lithologies (e.g., carbonates or sandstones) may necessitate retraining or transfer-learning strategies. Third, occasional mismatches in predicted formation members suggest that incorporating complementary information, such as seismic attributes or logging while drilling (LWD) data, could further enhance the reliability of predictions. 4.5 Future Research Recommendations Future research can extend this study in two promising directions. First, multi-source data fusion should be explored by integrating seismic attributes, offset-well logs, or real-time LWD measurements (where available) to strengthen model robustness and improve uncertainty quantification. Second, cross-field validation across different geological settings beyond the Buzurgan area is recommended to assess the transferability of the workflow. Given that salt-bearing, overpressured sections extend across multiple oilfields; such validation would provide valuable insights into the generalization potential of the proposed framework. 5. Conclusion This study addresses a significant research gap by simultaneously predicting lithology, formation members, GR, and DT exclusively using surface drilling parameters within high-pressure, salt-bearing sequences. The innovative, unified modeling approach eliminates the necessity for developing separate models for each geological and petrophysical target. Consequently, the developed ensemble models (RF and XGBoost) serve as comprehensive decision-support tools for drilling personnel, enhancing real-time situational awareness and operational decision-making capabilities, particularly in high-risk drilling environments. The optimized ensemble algorithms demonstrated exceptional real-time predictive accuracy, exceeding 97% in lithology classification, nearly precise identification of formation boundaries, and accurate reconstruction of wireline-quality gamma-ray and sonic curves. These robust results were consistently validated through both internal training wells and an independent blind-test well, confirming their reliability under actual operational conditions. This reliability is especially critical in high-pressure, salt-bearing intervals where traditional open-hole logging presents considerable safety and operational risks. Torque and WOB emerged as the most influential input parameters, underscoring their strong association with formation mechanical properties and geophysical characteristics. The results highlight the transformative potential of data-driven approaches in streamlining formation evaluation, minimizing NPT, and enhancing well control. Additionally, the integration of physics-informed feature selection, such as leveraging torque fluctuations to predict gamma-ray and sonic-log variations and using flow-rate anomalies to identify formation members, illustrates how conventional drilling parameters can serve as powerful geological indicators. Crucially, this study emphasizes that numerical performance metrics alone (e.g., R² or RMSE) may not fully represent the practical geological accuracy of model predictions. Minor numerical discrepancies (e.g., predicting a sonic travel time of 68 µs/ft instead of the actual value of 67 µs/ft) are statistically significant but typically remain within acceptable lithological ranges and do not compromise geological interpretations. Therefore, comprehensive performance evaluations should integrate visual validation and geological reasoning alongside statistical accuracy metrics. In summary, the proposed machine-learning approach offers a highly practical, efficient, and reliable solution for real-time geological interpretation and operational decision-making during drilling operations. By significantly reducing dependence on conventional logging methods, the models facilitate rapid and safe formation evaluations, precise casing decisions, and improved operational efficiency in challenging, pressure-sensitive geological environments. Declarations Competing Interests The authors declare that they have no competing interests. Author Contribution H.Y. wrote the main manuscript text and was responsible for data collection, data preprocessing, conceptualization, data validation and methodology development.X.H. was responsible for conceptualization, methodology development, supervision, and manuscript editing.O.A. contributed to data preprocessing, conceptualization, data validation, and methodology development.H.S. contributed to conceptualization, data validation, and manuscript review.All authors reviewed and approved the manuscript. Acknowledgement The authors express their sincere gratitude to the Iraqi Ministry of Oil and Missan Oil Company for providing access to seismic and well-log data, which made this research possible. We also extend our sincere appreciation to Southwest Petroleum University for its support. Additionally, we thank our colleagues for their insightful discussions and contributions to the interpretation of the results. References Abdi H (2022) Normalizing Data. Experiments of the Mind. Princeton University Press, pp 84–108 Aly M, Ibrahim AF, Elkatatny S, Abdulraheem A (2021) Artificial intelligence models for real-time synthetic gamma-ray log generation using surface drilling data in Middle East Oil Field. J Appl Geophy 194:104462. https://doi.org/10.1016/j.jappgeo.2021.104462 Aubertin M, Julien MR, Servant S, Gill DE (1999) A rate-dependent model for the ductile behavior of salt rocks. Can Geotech J 36:660–674. https://doi.org/10.1139/t99-033 Burak T, Sharma A, Hoel E et al (2024) Real-Time Lithology Prediction at the Bit Using Machine Learning. Geosciences (Switzerland) 14:. https://doi.org/10.3390/geosciences14100250 Cao J, Shi Y, Wang D, Zhang X (2017) Acoustic Log Prediction on the Basis of Kernel Extreme Learning Machine for Wells in GJH Survey, Erdos Basin. J Electr Comput Eng 2017:1–7. https://doi.org/10.1155/2017/3824086 Chen T, Guestrin C (2016) XGBoost. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, New York, NY, USA, pp 785–794 Clark SK, Daniel JI, Richards JT (1928) Logging Rotary Wells from Drill Cuttings. Am Assoc Pet Geol Bull 12:59–76. https://doi.org/10.1306/3D9327DA-16B1-11D7-8645000102C1865D Cruz A, Ivanov R, Juaristi J et al (2024) Engineering and Operational Solutions to Drill a Challenging Hpht Exploratory Well Through a Salt Dome in Mca. In: ADIPEC. SPE Desouky M, Alqubalee A, Gowida A (2023) Decision Tree Ensembles for Automatic Identification of Lithology. In: SPE Symposium Leveraging Artificial Intelligence to Shape the Future of the Energy Industry. SPE Elmgerbi A, Chuykov E, Thonhauser G, Nascimento A (2022) Machine Learning Techniques Application for Real-Time Drilling Hydraulic Optimization. In: International Petroleum Technology Conference. IPTC Elreedy D, Atiya AF (2019) A Novel Distribution Analysis for SMOTE Oversampling Method in Handling Class Imbalance. In: Rodrigues MF, Cardoso PJS, Monteiro J et al (eds) Computational Science -- ICCS 2019. Springer International Publishing, Cham, pp 236–248 Gamal H, Alsaihati A, Elkatatny S (2022) Predicting the Rock Sonic Logs While Drilling by Random Forest and Decision Tree-Based Algorithms. J Energy Resour Technol 144. https://doi.org/10.1115/1.4051670 Gamal H, Elkatatny S, Abdulaziz AM (2024) Intelligent Solution for Auto-Detecting Lithology Scheme While Drilling by Machine Learning. In: International Petroleum Technology Conference. IPTC Gnyedykh V, De Paola G, Ibanez E et al (2022) Manifold Learning for Realtime Log While Drilling Prediction. ECMOR 2022. European Association of Geoscientists & Engineers, pp 1–12 Goodfellow Ian B, Yoshua C, Aaron (2017) Deep learning. The MIT Press Gowida A, Elkatatny S (2020) Prediction of Sonic Wave Transit Times From Drilling Parameters While Horizontal Drilling in Carbonate Rocks Using Neural Networks. Petrophysics – SPWLA J Formation Evaluation Reserv Description 61:482–494. https://doi.org/10.30632/PJV61N5-2020a6 Gupta I, Tran N, Devegowda D et al (2020) Looking Ahead of the Bit Using Surface Drilling and Petrophysical Data: Machine-Learning-Based Real-Time Geosteering in Volve Field. SPE J 25:990–1006. https://doi.org/10.2118/199882-PA Ibrahim AF, Ahmed A, Elkatatny S (2023) Applications of Different Classification Machine Learning Techniques to Predict Formation Tops and Lithology While Drilling. ACS Omega 8:42152–42163. https://doi.org/10.1021/acsomega.3c03725 Ibrahim AF, Elkatatny S (2022) Real-Time GR logs Estimation While Drilling Using Surface Drilling Data; AI Application. Arab J Sci Eng 47:11187–11196. https://doi.org/10.1007/s13369-021-05854-7 Jassim SZ, Goff JC (2006) Geology of Iraq. Dolin, Prague and Moravian Museum, Brno Jin F, Wanting J, Longlian C et al (2023) Research on Drilling Technologies of Ultra-Thick Salt Domes in Middle Asia and Pre-Salt Strata in Middle East: Lessons Learnt from a Pilot Well in Kenkyak Oilfield and an HPHT Well in Halfaya Oilfield. In: Offshore Technology Conference Brasil. OTC Khalifa H, Tomomewo OS, Ndulue UF, Berrehal BE (2023) Machine Learning-Based Real-Time Prediction of Formation Lithology and Tops Using Drilling Parameters with a Web App Integration. Eng 4:2443–2467. https://doi.org/10.3390/eng4030139 Lao K, Bruno MS, Serajian V (2012) Analysis of Salt Creep and Well Casing Damage in High Pressure and High Temperature Environments. In: Offshore Technology Conference. OTC Lecun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521:436–444 Loizzo M, Houghton RD, Zahmuwl AH et al (2024) A Deeper Understanding of the Role of Salts and Creeping Formations in Well Integrity. In: SPE Europe Energy Conference and Exhibition. SPE Lundberg SM, Lee S-I (2017) A unified approach to interpreting model predictions. In: 31st Conference on Neural Information Processing Systems. Long Beach, CA, USA, pp 4768–4777 Moazzeni A, Haffar MA (2015) Artificial Intelligence for Lithology Identification through Real-Time Drilling Data. J Earth Sci Clim Change 06. https://doi.org/10.4172/2157-7617.1000265 Moiseenkov A, Al Hadhrami A, Khayrutdinov F et al (2019) Openhole Completions as Recovery Case for Drilling Across Salt and High Pressure Floaters. In: Abu Dhabi International Petroleum Exhibition & Conference. SPE Osarogiagbon AU, Oloruntobi O, Khan F et al (2020) Gamma ray log generation from drilling parameters using deep learning. J Pet Sci Eng 195:107906. https://doi.org/10.1016/j.petrol.2020.107906 Pelayo L, Dick S (2019) Synthetic minority oversampling for function approximation problems. Int J Intell Syst 34:2741–2768. https://doi.org/10.1002/int.22120 Popescu M, Head R, Ferriday T et al (2021) Using Supervised Machine Learning Algorithms for Automated Lithology Prediction from Wireline Log Data. In: SPE Eastern Europe Subsurface Conference. SPE Raymaekers J, Rousseeuw PJ (2024) Transforming variables to central normality. Mach Learn 113:4953–4975. https://doi.org/10.1007/s10994-021-05960-5 Salim A, Lagraba PJO (2018) Utilizing Drill Cuttings to Enhance Characterization and Description of Tight Carbonate Reservoirs. In: SPE Annual Technical Conference and Exhibition. SPE Smith R, Bakulin A, Golikov P, AlBinHassan N (2022) Predicting sonic and density logs from drilling parameters using temporal convolutional networks. Lead Edge 41:617–627. https://doi.org/10.1190/tle41090617.1 Willson SM, Fredrich JT (2005) Geomechanics Considerations for Through- and Near-Salt Well Design. In: SPE Annual Technical Conference and Exhibition. SPE Yao X, Song X, Han L et al (2022) A Novel Method for Real-Time Identification of Formation Lithology Based on Machine Learning. In: 56th U.S. Rock Mechanics/Geomechanics Symposium. ARMA Zhou H, Hatherly P, Ramos F, Nettleton E (2011) An adaptive data driven model for characterizing rock properties from Drilling data. In: 2011 IEEE International Conference on Robotics and Automation. IEEE, pp 1909–1915 Zilberman VI, Serebryakov VA, Gorfunkel MV et al (2002) Chap. 9 Prediction of abnormally high pressures in petroliferous salt-bearing sections. pp 209–221 Zong X, Li X, Gao Y et al (2024) Research and application of cuttings intelligent collection equipment technology. J Phys Conf Ser 2901:012014. https://doi.org/10.1088/1742-6596/2901/1/012014 Additional Declarations No competing interests reported. Cite Share Download PDF Status: Under Review Version 1 posted Editorial decision: Revision requested 31 Jan, 2026 Reviews received at journal 24 Jan, 2026 Reviewers agreed at journal 02 Jan, 2026 Reviews received at journal 23 Oct, 2025 Reviewers agreed at journal 21 Oct, 2025 Reviewers agreed at journal 20 Oct, 2025 Reviewers agreed at journal 15 Oct, 2025 Reviewers invited by journal 15 Oct, 2025 Editor assigned by journal 09 Oct, 2025 Submission checks completed at journal 09 Oct, 2025 First submitted to journal 03 Oct, 2025 You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-7771316","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":534915995,"identity":"fa0b75db-0f05-4eaf-8891-e25df96feefa","order_by":0,"name":"Hayder Yousif","email":"","orcid":"","institution":"Southwest Petroleum University","correspondingAuthor":false,"prefix":"","firstName":"Hayder","middleName":"","lastName":"Yousif","suffix":""},{"id":534915996,"identity":"9cc39c24-d742-48ee-9bee-834e29ccebf3","order_by":1,"name":"Xuri Huang","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAAwUlEQVRIiWNgGAWjYBACNvbGBoMPFTbM/OwNRGrh4zl8oHDGmTR2yZ4DRGqRk0hL+MzZcpjf4EYCsQ6TyDHczNiQJi058/HGGww1NtGEtfC8MTYu3GFjzC+dVmzBcCwtt4GgFvYcM+OZZ9KSJWfnmEkwNhwmQgtDjvlv3rbD9RtuniFWC0dagjFQC7PBDR5itQAD2RAYyMySPUC/JBDjF/l2eFQe3njjQ40NYS3IwEAigRTlEC2k6hgFo2AUjIKRAQBtt0B9xRi9jQAAAABJRU5ErkJggg==","orcid":"","institution":"Southwest Petroleum University","correspondingAuthor":true,"prefix":"","firstName":"Xuri","middleName":"","lastName":"Huang","suffix":""},{"id":534915997,"identity":"a658c977-e936-4d2e-8ffb-5f2d66af77b8","order_by":2,"name":"Osama Al-Salih","email":"","orcid":"","institution":"Southwest Petroleum University","correspondingAuthor":false,"prefix":"","firstName":"Osama","middleName":"","lastName":"Al-Salih","suffix":""},{"id":534915998,"identity":"289d4ffb-0cda-4f1e-b824-f4581b9a5682","order_by":3,"name":"Hayder Shaaban","email":"","orcid":"","institution":"Southwest Petroleum University","correspondingAuthor":false,"prefix":"","firstName":"Hayder","middleName":"","lastName":"Shaaban","suffix":""}],"badges":[],"createdAt":"2025-10-03 07:53:12","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-7771316/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-7771316/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":94728971,"identity":"c4da48bf-0898-4694-814e-13257eead22e","added_by":"auto","created_at":"2025-10-30 07:04:27","extension":"docx","order_by":0,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":33234342,"visible":true,"origin":"","legend":"","description":"","filename":"RealTimeLithologyandLogPrediction.docx","url":"https://assets-eu.researchsquare.com/files/rs-7771316/v1/da9a42a598767dcb7978686a.docx"},{"id":94689566,"identity":"a635f3b4-9946-4777-8720-ea824fc9ef32","added_by":"auto","created_at":"2025-10-29 16:16:16","extension":"json","order_by":1,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":7266,"visible":true,"origin":"","legend":"","description":"","filename":"e4d3818c34e7417aaac016963e5b4bb6.json","url":"https://assets-eu.researchsquare.com/files/rs-7771316/v1/431c4a127c24fb8e9f047a25.json"},{"id":94689573,"identity":"671b927c-e0bd-44bf-800a-eda1463a9bb9","added_by":"auto","created_at":"2025-10-29 16:16:16","extension":"xml","order_by":2,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":146870,"visible":true,"origin":"","legend":"","description":"","filename":"e4d3818c34e7417aaac016963e5b4bb61enriched.xml","url":"https://assets-eu.researchsquare.com/files/rs-7771316/v1/2a1a2d1fcfcc99206c1f0bf3.xml"},{"id":94689567,"identity":"0baebad1-d7ce-4731-a980-03e532dd9358","added_by":"auto","created_at":"2025-10-29 16:16:16","extension":"gif","order_by":3,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":181,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage1.gif","url":"https://assets-eu.researchsquare.com/files/rs-7771316/v1/b68996c784e8a870b6a39c67.gif"},{"id":94728467,"identity":"0c6ccd05-d456-4524-a4e7-239a6628a005","added_by":"auto","created_at":"2025-10-30 07:03:51","extension":"png","order_by":4,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":3589321,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage10.png","url":"https://assets-eu.researchsquare.com/files/rs-7771316/v1/7c22e69f4697743b4b00f750.png"},{"id":94728762,"identity":"c56256e8-5616-4ebe-941c-877ca59e715c","added_by":"auto","created_at":"2025-10-30 07:04:15","extension":"png","order_by":5,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":1263104,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage11.png","url":"https://assets-eu.researchsquare.com/files/rs-7771316/v1/3930ce8c7747c1cb64e83e1d.png"},{"id":94728980,"identity":"dce176e8-7b27-4f3b-a77d-c84e65954af7","added_by":"auto","created_at":"2025-10-30 07:04:27","extension":"png","order_by":6,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":2146953,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage12.png","url":"https://assets-eu.researchsquare.com/files/rs-7771316/v1/c9ad4b427b813d1821deb6fb.png"},{"id":94689582,"identity":"ead2ce4e-2340-4b9a-a0a8-9889e657ce63","added_by":"auto","created_at":"2025-10-29 16:16:16","extension":"png","order_by":7,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":1175312,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage13.png","url":"https://assets-eu.researchsquare.com/files/rs-7771316/v1/3bbe3faddb9d71edb1db1d64.png"},{"id":94728875,"identity":"db03b4d3-1133-4d10-94e7-01161c8caf7c","added_by":"auto","created_at":"2025-10-30 07:04:19","extension":"png","order_by":8,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":1156926,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage14.png","url":"https://assets-eu.researchsquare.com/files/rs-7771316/v1/fe0f6c0f982027a1e9f9d1a4.png"},{"id":94728912,"identity":"249e2771-45e9-483b-b8ca-9ee018e9976f","added_by":"auto","created_at":"2025-10-30 07:04:23","extension":"png","order_by":9,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":6599606,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage15.png","url":"https://assets-eu.researchsquare.com/files/rs-7771316/v1/096468bb237493050d04cfa8.png"},{"id":94689579,"identity":"339be039-101b-4ce8-ada9-73da7ff3e4f6","added_by":"auto","created_at":"2025-10-29 16:16:16","extension":"png","order_by":10,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":681860,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage16.png","url":"https://assets-eu.researchsquare.com/files/rs-7771316/v1/2f4ba8a73f538cdd3df0a989.png"},{"id":94689611,"identity":"909a98e7-3489-4f94-805e-c3d462a79995","added_by":"auto","created_at":"2025-10-29 16:16:17","extension":"png","order_by":11,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":6872324,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage17.png","url":"https://assets-eu.researchsquare.com/files/rs-7771316/v1/ef290cd1eaef618df731299e.png"},{"id":94728992,"identity":"4867add6-1124-41cc-9aa0-cb8497d70066","added_by":"auto","created_at":"2025-10-30 07:04:27","extension":"gif","order_by":12,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":181,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage1.gif","url":"https://assets-eu.researchsquare.com/files/rs-7771316/v1/1f6b232e309be8f3c76442c3.gif"},{"id":94729148,"identity":"afc67ad4-4184-440d-9e2a-7d7cbb750feb","added_by":"auto","created_at":"2025-10-30 07:04:39","extension":"gif","order_by":13,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":181,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage1.gif","url":"https://assets-eu.researchsquare.com/files/rs-7771316/v1/049687f7ba7236276fdd9391.gif"},{"id":94728499,"identity":"f8b03b00-9502-418a-b897-2ea1f9168b55","added_by":"auto","created_at":"2025-10-30 07:03:56","extension":"gif","order_by":14,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":181,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage1.gif","url":"https://assets-eu.researchsquare.com/files/rs-7771316/v1/e00b375a289b02758be3e383.gif"},{"id":94728871,"identity":"bcc0b639-f698-48ce-84ab-e2bd628e6ab1","added_by":"auto","created_at":"2025-10-30 07:04:19","extension":"png","order_by":15,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":6584534,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage5.png","url":"https://assets-eu.researchsquare.com/files/rs-7771316/v1/32a24060de071e0e3991607e.png"},{"id":94728609,"identity":"99239f81-37f0-464b-8ce2-1205bf6f5e90","added_by":"auto","created_at":"2025-10-30 07:04:04","extension":"png","order_by":16,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":138611,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage6.png","url":"https://assets-eu.researchsquare.com/files/rs-7771316/v1/666cb3b23f3ad6666157c53c.png"},{"id":94728700,"identity":"bfdc392e-c64e-4a24-8455-ad45600c29a8","added_by":"auto","created_at":"2025-10-30 07:04:10","extension":"jpeg","order_by":17,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":274378,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage7.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-7771316/v1/80af5729172697a1ef2351d8.jpeg"},{"id":94728342,"identity":"4c5ae1cd-2794-4f67-b01f-8351e45f52fd","added_by":"auto","created_at":"2025-10-30 07:03:35","extension":"png","order_by":18,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":936599,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage8.png","url":"https://assets-eu.researchsquare.com/files/rs-7771316/v1/e66355f660977919077d6188.png"},{"id":94689592,"identity":"2c9ad745-aea3-4a7b-a3fd-3f6d98b1f645","added_by":"auto","created_at":"2025-10-29 16:16:17","extension":"png","order_by":19,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":924528,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage9.png","url":"https://assets-eu.researchsquare.com/files/rs-7771316/v1/bc874fba2f44abf79e6676bf.png"},{"id":94689599,"identity":"df13c467-d5bc-4fc1-8198-553ff1e4af80","added_by":"auto","created_at":"2025-10-29 16:16:17","extension":"png","order_by":20,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":415,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage1.png","url":"https://assets-eu.researchsquare.com/files/rs-7771316/v1/734eea3fe45959983c54974a.png"},{"id":94728530,"identity":"507ce078-8ef4-488a-a45e-4959b61e6d25","added_by":"auto","created_at":"2025-10-30 07:03:58","extension":"png","order_by":21,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":2047902,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage10.png","url":"https://assets-eu.researchsquare.com/files/rs-7771316/v1/7a09b8eecaa246190f3107e9.png"},{"id":94728414,"identity":"58a346a9-592b-4cb0-accd-d4723d632e1b","added_by":"auto","created_at":"2025-10-30 07:03:46","extension":"png","order_by":22,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":306462,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage11.png","url":"https://assets-eu.researchsquare.com/files/rs-7771316/v1/bbfbf8cf2c60abb9f95cef90.png"},{"id":94728985,"identity":"f2ec0780-4460-4281-88a4-49d636a7e161","added_by":"auto","created_at":"2025-10-30 07:04:27","extension":"png","order_by":23,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":344186,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage12.png","url":"https://assets-eu.researchsquare.com/files/rs-7771316/v1/31f62330c295ad23ff78992f.png"},{"id":94689606,"identity":"7eed1867-b6d5-4f7f-8985-efff745b8d35","added_by":"auto","created_at":"2025-10-29 16:16:17","extension":"png","order_by":24,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":384807,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage13.png","url":"https://assets-eu.researchsquare.com/files/rs-7771316/v1/c299bd0acdcf66b074e432b7.png"},{"id":94728548,"identity":"fbc44832-c66d-4a8a-a1a5-cab6e35406b8","added_by":"auto","created_at":"2025-10-30 07:04:01","extension":"png","order_by":25,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":432109,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage14.png","url":"https://assets-eu.researchsquare.com/files/rs-7771316/v1/d88fbf7e474e61041d2be239.png"},{"id":94689605,"identity":"a38b0c20-32ac-4093-9820-4977070eb098","added_by":"auto","created_at":"2025-10-29 16:16:17","extension":"png","order_by":26,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":1315157,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage15.png","url":"https://assets-eu.researchsquare.com/files/rs-7771316/v1/e6339e3cb0738159d993165f.png"},{"id":94729245,"identity":"efd81290-9248-4917-8825-52ca6cc9054d","added_by":"auto","created_at":"2025-10-30 07:04:43","extension":"png","order_by":27,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":226772,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage16.png","url":"https://assets-eu.researchsquare.com/files/rs-7771316/v1/a8e391807a134968c1702254.png"},{"id":94689607,"identity":"fc15771e-47ad-4bb9-8ec9-7440dd9f983f","added_by":"auto","created_at":"2025-10-29 16:16:17","extension":"png","order_by":28,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":2248685,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage17.png","url":"https://assets-eu.researchsquare.com/files/rs-7771316/v1/606d3b506393cae32727d923.png"},{"id":94728622,"identity":"556ddeef-3477-48f1-a847-ead3529964bd","added_by":"auto","created_at":"2025-10-30 07:04:05","extension":"png","order_by":29,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":415,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage2.png","url":"https://assets-eu.researchsquare.com/files/rs-7771316/v1/8ac2f7933c8eb62e364c0d57.png"},{"id":94689593,"identity":"16ade78e-0e5b-4ed4-a745-76b1758c6c80","added_by":"auto","created_at":"2025-10-29 16:16:17","extension":"png","order_by":30,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":415,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage2.png","url":"https://assets-eu.researchsquare.com/files/rs-7771316/v1/8114ef1331aaad358b4a1ec9.png"},{"id":94689601,"identity":"f9295f25-a951-4351-b328-b29ec6ccb539","added_by":"auto","created_at":"2025-10-29 16:16:17","extension":"png","order_by":31,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":415,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage2.png","url":"https://assets-eu.researchsquare.com/files/rs-7771316/v1/7a7cf4765138340ede60dbeb.png"},{"id":94689604,"identity":"43dc5356-f44c-4be4-ae0a-27293dd38521","added_by":"auto","created_at":"2025-10-29 16:16:17","extension":"png","order_by":32,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":642517,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage5.png","url":"https://assets-eu.researchsquare.com/files/rs-7771316/v1/42b278c8510aad6a9191f82f.png"},{"id":94689603,"identity":"874a620a-20fe-4f71-94e4-da50badaf502","added_by":"auto","created_at":"2025-10-29 16:16:17","extension":"png","order_by":33,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":58880,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage6.png","url":"https://assets-eu.researchsquare.com/files/rs-7771316/v1/03afb1782aed8e0d40e9fa5d.png"},{"id":94728299,"identity":"4f935536-2488-4bf2-9ef0-2049b625372c","added_by":"auto","created_at":"2025-10-30 07:03:29","extension":"png","order_by":34,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":52670,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage7.png","url":"https://assets-eu.researchsquare.com/files/rs-7771316/v1/ccb84218c75b3d2a3387eae2.png"},{"id":94728594,"identity":"503c601e-1001-4a63-9908-532a3efd9186","added_by":"auto","created_at":"2025-10-30 07:04:02","extension":"png","order_by":35,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":355425,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage8.png","url":"https://assets-eu.researchsquare.com/files/rs-7771316/v1/381042d2ba12fca29163de39.png"},{"id":94689616,"identity":"d18f4d2b-0bcc-4e57-965a-c2d8e3acb816","added_by":"auto","created_at":"2025-10-29 16:16:17","extension":"png","order_by":36,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":364373,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage9.png","url":"https://assets-eu.researchsquare.com/files/rs-7771316/v1/8978f1246ad3a3ecd80c92db.png"},{"id":94728676,"identity":"b8613504-0a6f-4395-ae1f-c65a87be971b","added_by":"auto","created_at":"2025-10-30 07:04:10","extension":"xml","order_by":37,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":146896,"visible":true,"origin":"","legend":"","description":"","filename":"e4d3818c34e7417aaac016963e5b4bb61structuring.xml","url":"https://assets-eu.researchsquare.com/files/rs-7771316/v1/c0821a3d3d81f8145a94bb43.xml"},{"id":94689609,"identity":"aa6d013e-0f01-41df-b890-69f9b35fa6e6","added_by":"auto","created_at":"2025-10-29 16:16:17","extension":"html","order_by":38,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":154444,"visible":true,"origin":"","legend":"","description":"","filename":"earlyproof.html","url":"https://assets-eu.researchsquare.com/files/rs-7771316/v1/c61e9d3c82e0eb23e51c8368.html"},{"id":94689577,"identity":"10c2c457-ac6f-4f4d-b8df-90c589de8be5","added_by":"auto","created_at":"2025-10-29 16:16:16","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":6584534,"visible":true,"origin":"","legend":"\u003cp\u003eIntegrated subsurface profile from the Missan oilfield, illustrating the relationship between seismic reflectivity, lithology, wellbore design, and pressure regimes.\u003c/p\u003e","description":"","filename":"image2.png","url":"https://assets-eu.researchsquare.com/files/rs-7771316/v1/da9e972276b72ec9b3d412a2.png"},{"id":94689568,"identity":"bfca6de7-9974-4e18-916b-827a5579e764","added_by":"auto","created_at":"2025-10-29 16:16:16","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":138611,"visible":true,"origin":"","legend":"\u003cp\u003eTime-structure contour map of the Top MB1 surface within the Fatha Formation, Buzurgan oilfield. The studied wells (BU-N1 to BU-N5) are situated along the Buzurgan Anticline, with BU-N3 and BU-N4 located near the structural crest. The inset map displays the position of the Missan oilfields within the southeastern region of Iraq.\u003c/p\u003e","description":"","filename":"image3.png","url":"https://assets-eu.researchsquare.com/files/rs-7771316/v1/505013b553e40a4ff91d2f1d.png"},{"id":94689569,"identity":"cf311fbf-6c0b-4e1f-984f-b3dcec185932","added_by":"auto","created_at":"2025-10-29 16:16:16","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":1004625,"visible":true,"origin":"","legend":"\u003cp\u003eCount and proportion plots for lithology and formation members. The bar plots display the frequency of each category, and the pie charts illustrate their proportional distribution within the dataset.\u003c/p\u003e","description":"","filename":"image4.png","url":"https://assets-eu.researchsquare.com/files/rs-7771316/v1/3e11d10c857a0e1d88e29cff.png"},{"id":94728813,"identity":"67038af1-833a-45f7-bd67-b1676b2b0883","added_by":"auto","created_at":"2025-10-30 07:04:18","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":936599,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eWorkflow diagram summarizing the end-to-end ML framework for predicting geophysical properties (GR, DT), lithology, and well tops using drilling parameter data. The system incorporates data preprocessing, model training, and evaluation, as well as real-time deployment for operational decision-making.\u003c/strong\u003e\u003c/p\u003e","description":"","filename":"image5.png","url":"https://assets-eu.researchsquare.com/files/rs-7771316/v1/443594e8d0b824b41f753542.png"},{"id":94689570,"identity":"9351016a-2d2c-4be3-ac14-8268bff09096","added_by":"auto","created_at":"2025-10-29 16:16:16","extension":"png","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":924528,"visible":true,"origin":"","legend":"\u003cp\u003eBoxplots illustrating the distribution, variability, and outliers of key drilling and logging parameters. The lower plots further illustrate the standardized distributions of GR and DT by lithology class (anhydrite, shale, and salt). These visualizations provide an overview of data quality and range, supporting initial consistency and outlier checks prior to model development.\u003c/p\u003e","description":"","filename":"image6.png","url":"https://assets-eu.researchsquare.com/files/rs-7771316/v1/20b08103226ef7d83ae1409e.png"},{"id":94689578,"identity":"8fb28faa-a820-4736-8909-704ac4a6086d","added_by":"auto","created_at":"2025-10-29 16:16:16","extension":"png","order_by":6,"title":"Figure 6","display":"","copyAsset":false,"role":"figure","size":3589321,"visible":true,"origin":"","legend":"\u003cp\u003eDepth profiles of key drilling parameters (ROP, WOB, RPM, SPP, Torque, and FR) recorded from the external validation well BU-N5 during real-time operational model validation.\u003c/p\u003e","description":"","filename":"image7.png","url":"https://assets-eu.researchsquare.com/files/rs-7771316/v1/10ea84236e5a4ad95c4abd04.png"},{"id":94729115,"identity":"574ceb83-186a-42de-8399-d181371d0e9d","added_by":"auto","created_at":"2025-10-30 07:04:37","extension":"png","order_by":7,"title":"Figure 7","display":"","copyAsset":false,"role":"figure","size":1263104,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eCorrelation heatmaps between drilling parameters and target variables. (a, b) Pearson correlations with continuous logs (sonic dt and gamma-ray gr). (c, d) Point-biserial correlations with categorical variables (lithology classes and formation tops). Torque and rate of penetration show the strongest associations, making them key predictors for subsequent ML modeling.\u003c/strong\u003e\u003c/p\u003e","description":"","filename":"image8.png","url":"https://assets-eu.researchsquare.com/files/rs-7771316/v1/2475c9ed803d7a0014d3877d.png"},{"id":94689591,"identity":"3e51a593-b66c-42cd-8518-e8cd298fdf0c","added_by":"auto","created_at":"2025-10-29 16:16:17","extension":"png","order_by":8,"title":"Figure 8","display":"","copyAsset":false,"role":"figure","size":2146953,"visible":true,"origin":"","legend":"\u003cp\u003eScatter plots of torque vs. DT and torque vs. GR, colored by lithology, illustrating distinct mechanical responses across shale, salt, and anhydrite.\u003c/p\u003e","description":"","filename":"image9.png","url":"https://assets-eu.researchsquare.com/files/rs-7771316/v1/8211d83af018609593d2b3c8.png"},{"id":94689574,"identity":"010064b4-1fa5-44c4-a57d-9ffddad1fe8d","added_by":"auto","created_at":"2025-10-29 16:16:16","extension":"png","order_by":9,"title":"Figure 9","display":"","copyAsset":false,"role":"figure","size":1175312,"visible":true,"origin":"","legend":"\u003cp\u003eCompares SHAP-derived feature importances across the four Random Forest models. Torque, in addition to lithology, dominates the regression targets (DT, GR), whereas FL and ROP assume leading roles in the formation member classifier.\u003c/p\u003e","description":"","filename":"image10.png","url":"https://assets-eu.researchsquare.com/files/rs-7771316/v1/0bc8398f580c38619059ac17.png"},{"id":94728615,"identity":"8fbbb8a8-6f3b-4155-8f4d-2bc5bbec682c","added_by":"auto","created_at":"2025-10-30 07:04:04","extension":"png","order_by":10,"title":"Figure 10","display":"","copyAsset":false,"role":"figure","size":1156926,"visible":true,"origin":"","legend":"\u003cp\u003eConfusion matrices of the RF models comparing actual and predicted classes for lithology (anhydrite, salt, shale) and formation members (MB1–MB5) in both training and testing datasets. (a, b) Lithology classification shows excellent agreement, with only minor misclassifications between anhydrite and shale. (c, d) Formation-top prediction demonstrates near-perfect performance, with the vast majority of samples correctly identified across MB1–MB5. The strong diagonal dominance in all panels confirms the robustness of the RF model in distinguishing both lithological and stratigraphic categories.\u003c/p\u003e","description":"","filename":"image11.png","url":"https://assets-eu.researchsquare.com/files/rs-7771316/v1/a806878042233c01df961ab1.png"},{"id":94689588,"identity":"6a3b9f0d-0ec3-4c37-9ee3-fdce660a62a4","added_by":"auto","created_at":"2025-10-29 16:16:17","extension":"png","order_by":11,"title":"Figure 11","display":"","copyAsset":false,"role":"figure","size":6599606,"visible":true,"origin":"","legend":"\u003cp\u003eScatter plots showing actual vs. predicted values for DT and GR datasets in the training and testing phases. R² values (0.9155–0.9327) reflect strong model performance and reliable predictive accuracy.\u003c/p\u003e","description":"","filename":"image12.png","url":"https://assets-eu.researchsquare.com/files/rs-7771316/v1/a8ac992569aa52f05a575c07.png"},{"id":94728473,"identity":"43283137-a9c1-4470-ba2d-41cd52257b91","added_by":"auto","created_at":"2025-10-30 07:03:53","extension":"png","order_by":12,"title":"Figure 12","display":"","copyAsset":false,"role":"figure","size":681860,"visible":true,"origin":"","legend":"\u003cp\u003eVisual comparison between actual and predicted lithology and formation members in the external validation well (BU-N5), covering the depth interval from 2080 m to 2800 m. The predictions were generated using the RF model.\u003c/p\u003e","description":"","filename":"image13.png","url":"https://assets-eu.researchsquare.com/files/rs-7771316/v1/eadda155d7b590a3b76d5e27.png"},{"id":94729014,"identity":"c01f698b-2090-4a91-919b-f611c263f830","added_by":"auto","created_at":"2025-10-30 07:04:28","extension":"png","order_by":13,"title":"Figure 13","display":"","copyAsset":false,"role":"figure","size":6872324,"visible":true,"origin":"","legend":"\u003cp\u003eComparison of actual and predicted GR and DT logs over the depth range 2080–2800 m from the external validation well (BU-N5), illustrating strong predictive accuracy and reliable model generalization.\u003c/p\u003e","description":"","filename":"image14.png","url":"https://assets-eu.researchsquare.com/files/rs-7771316/v1/c996e99462ea426a60e17d96.png"}],"financialInterests":"No competing interests reported.","formattedTitle":"Real-Time Lithology and Log Prediction from Drilling Parameters Using Machine Learning for High-Pressure Salt-Bearing Formation, Missan Oilfields, Iraq","fulltext":[{"header":"1. Introduction","content":"\u003cdiv id=\"Sec2\" class=\"Section2\"\u003e\u003ch2\u003e1.1 Background and Literature Review\u003c/h2\u003e\u003cp\u003eHigh-pressure, salt-bearing formations are among the most demanding drilling environments encountered in the oil and gas industry (Zilberman et al. \u003cspan citationid=\"CR38\" class=\"CitationRef\"\u003e2002\u003c/span\u003e; Willson and Fredrich \u003cspan citationid=\"CR35\" class=\"CitationRef\"\u003e2005\u003c/span\u003e; Loizzo et al. \u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e2024\u003c/span\u003e; Cruz et al. \u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e2024\u003c/span\u003e). Such geological settings frequently exhibit rapid, nonlinear variations in formation pressures, complex and heterogeneous rock mechanical properties, and heightened risks associated with borehole instability. Common operational issues include ballooning, washouts, and collapses of the borehole wall, which can significantly complicate drilling operations (Lao et al. \u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e2012\u003c/span\u003e). Accurate real-time identification of lithology and precise determination of formation tops are essential under these challenging conditions. Timely and reliable geological data facilitate the optimization of drilling parameters, help minimize NPT, and reduce risks of critical incidents such as well control problems or severe mud losses. However, conventional formation evaluation methods such as mud logging and cuttings analysis provide limited vertical resolution, suffer from sample mixing, and are plagued by delays in data availability (Desouky et al. \u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e2023\u003c/span\u003e). Meanwhile, wireline logging, though providing higher accuracy, is both expensive and hazardous under the extreme conditions typical of salt-bearing formations.\u003c/p\u003e\u003cp\u003eIn the Missan area, lithology and stratigraphic boundaries are traditionally inferred during and after drilling using mud logs, cuttings, and wireline logging, particularly in non-productive sections where these conventional methods are sufficient for identifying lithology. However, in deep salt intervals or high-risk zones, such approaches become impractical due to high costs, operational delays, and the elevated risks of open-hole logging in unstable or overpressured formations. This has created a growing demand for alternative approaches capable of delivering accurate, real-time geological interpretations while maintaining operational efficiency (Burak et al. \u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e2024\u003c/span\u003e).\u003c/p\u003e\u003cp\u003eIn recent years, numerous studies have highlighted the promising capability of artificial intelligence (AI) and machine learning (ML) techniques, such as RF, XGBoost, Convolutional Neural Networks (CNNs), and others, for accurately predicting lithology and formation members based solely on surface drilling parameters (Zhou et al. \u003cspan citationid=\"CR37\" class=\"CitationRef\"\u003e2011\u003c/span\u003e; Moazzeni and Haffar \u003cspan citationid=\"CR27\" class=\"CitationRef\"\u003e2015\u003c/span\u003e; Gupta et al. \u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e2020\u003c/span\u003e; Popescu et al. \u003cspan citationid=\"CR31\" class=\"CitationRef\"\u003e2021\u003c/span\u003e; Yao et al. \u003cspan citationid=\"CR36\" class=\"CitationRef\"\u003e2022\u003c/span\u003e; Desouky et al. \u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e2023\u003c/span\u003e; Khalifa et al. \u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e2023\u003c/span\u003e; Ibrahim et al. \u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e2023\u003c/span\u003e; Gamal et al. \u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e2024\u003c/span\u003e). However, based on the literature reviewed, it still lacks addressing lithology and log prediction in salt-bearing, high-pressure formations that use ML methods relying solely on surface data.\u003c/p\u003e\u003cp\u003eAlthough several studies have successfully predicted sonic logs (Cao et al. \u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e2017\u003c/span\u003e; Gowida and Elkatatny \u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e2020\u003c/span\u003e; Gamal et al. \u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e2022\u003c/span\u003e; Smith et al. \u003cspan citationid=\"CR34\" class=\"CitationRef\"\u003e2022\u003c/span\u003e) or gamma-ray (Osarogiagbon et al. \u003cspan citationid=\"CR29\" class=\"CitationRef\"\u003e2020\u003c/span\u003e; Aly et al. \u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2021\u003c/span\u003e; Gnyedykh et al. \u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e2022\u003c/span\u003e; Ibrahim and Elkatatny \u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e2022\u003c/span\u003e) using various ML approaches and various input data types, including some studies relying exclusively on drilling parameters (Gowida and Elkatatny \u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e2020\u003c/span\u003e; Osarogiagbon et al. \u003cspan citationid=\"CR29\" class=\"CitationRef\"\u003e2020\u003c/span\u003e; Aly et al. \u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2021\u003c/span\u003e; Gamal et al. \u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e2022\u003c/span\u003e; Ibrahim and Elkatatny \u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e2022\u003c/span\u003e), such geological settings, log responses deviate significantly from normal conditions due to the anomalous physical properties of evaporites, extreme pressure regimes, and the frequent presence of overpressured shales, these factors make salt-bearing intervals and HP zones both technically challenging and economically critical, as they directly impact wellbore stability, drilling safety, and reservoir characterization. This gap highlights the need for further research and validation of predictive models tailored to these complex environments, thereby underscoring the scientific value of developing ML-based methods adapted to salt-influenced and high-pressure formations.\u003c/p\u003e\u003cp\u003eAccordingly, this study aims to address this gap by developing and validating ML models for real-time lithology and formation members prediction, as well as log prediction in high-pressure, salt-bearing formations. By integrating domain knowledge with intelligent algorithms, the study aims to enhance decision-making accuracy, reduce formation misclassification risks, and support safer and more efficient drilling practices under extreme geological conditions.\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec3\" class=\"Section2\"\u003e\u003ch2\u003e1.2 Problem Statement\u003c/h2\u003e\u003cp\u003eDespite their operational utility, conventional cuttings descriptions and mud logs have inherent limitations. Cutting samples typically provide low vertical resolution, commonly one to two meters, and are frequently mixed during transport to the surface due to lag time variation, irregular mud circulation, and borehole geometry effects (Clark et al. \u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e1928\u003c/span\u003e; Salim and Lagraba P. 2018; Zong et al. \u003cspan citationid=\"CR39\" class=\"CitationRef\"\u003e2024\u003c/span\u003e). These factors delay and distort the identification of lithological boundaries and formation tops, often introducing significant uncertainty in real-time formation evaluation.\u003c/p\u003e\u003cp\u003eIn deep, high-pressure salt-bearing sequences, these challenges become more pronounced. Rapid lithological alternations of halite, anhydrite, and shale, ductile deformation, and irregular stratigraphic continuity characterize such intervals. Accurately identifying the termination of the final salt bed is particularly critical, as pore pressures often drop sharply beyond this point, directly influencing casing-setting depths and well control safety.\u003c/p\u003e\u003cp\u003eConventional wireline logging, while more precise, is costly and hazardous in unstable or overpressured salt intervals, where extended open-hole exposure may jeopardize wellbore stability. Consequently, there is a pressing need for real-time, high-resolution predictive tools derived from surface drilling parameters that can overcome these limitations. By addressing distorted log responses and complex pressure regimes, such tools could substantially improve the reliability of formation evaluation and operational decision-making in these challenging environments.\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec4\" class=\"Section2\"\u003e\u003ch2\u003e1.3 Field Geological and Operational Context\u003c/h2\u003e\u003cp\u003eFigure\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e illustrates an integrated subsurface profile from the Buzurgan oilfield, establishing the geological and operational context for this study. The profile incorporates seismic reflectivity data, formation tops, lithological distribution, wellbore architecture, and the delineation between normal and overpressured regimes. Key formation members are marked as MB5 to MB1, where \u0026ldquo;MB\u0026rdquo; refers to individual members of the Fatha Formation\u0026mdash;a regionally significant stratigraphic unit within the Mesopotamian Basin of Iraq and Iran (Jassim and Goff \u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e2006\u003c/span\u003e).\u003c/p\u003e\u003cp\u003eIn this field operation, between approximately 2,100 m and 2,800 m depth, the well trajectory normally traverses formation members (MB5) through the uppermost part of MB1, with the MB4\u0026ndash;MB2 interval comprising a thick, high-pressure salt-bearing zone. Consistent with standard well construction practices in the region, a 17\u0026thinsp;\u0026minus;\u0026thinsp;1/2\u0026Prime; section was constructed and secured with 13\u0026thinsp;\u0026minus;\u0026thinsp;3/8\u0026Prime; casing down to 2,100 m to isolate the normal-pressure interval. Subsequently, a 12\u0026thinsp;\u0026minus;\u0026thinsp;1/4\u0026Prime; section was completed and cased with 9\u0026thinsp;\u0026minus;\u0026thinsp;5/8\u0026Prime; casing across the overpressured salt-bearing formation (2,100\u0026ndash;2,800 m). The final phase extended the wellbore in an 8\u0026thinsp;\u0026minus;\u0026thinsp;1/4\u0026Prime; open hole, which is then lined with a 6\u0026thinsp;\u0026minus;\u0026thinsp;5/8\u0026Prime; liner to reach the target Mishrif reservoir at approximately 3,700 m. This staged casing design was adopted to preserve wellbore integrity and ensure effective pressure management throughout the drilling operation.\u003c/p\u003e\u003cp\u003eThe integrated geological and operational profile highlights the stratigraphic complexity and pressure heterogeneity within the study interval, which significantly complicate real-time lithology interpretation and formation evaluation. The adjacent lithology column reveals the rapid alternation between shale, anhydrite, and massive salt layers, which introduces substantial uncertainty into decisions regarding casing depths, mud-weight programs, and pore-pressure management strategies. For simplicity, these interbedded shale intervals will hereafter be referred to collectively as \u0026ldquo;shale\u0026rdquo;. It is important to note, however, that the shale in this context is predominantly dolomitic shale, which explains the relatively low Gamma Ray readings observed at certain depths despite being lithologically classified as shale.\u003c/p\u003e\u003cp\u003e\u003c/p\u003e\u003cp\u003eThe geological setting of the study area is characterized by the presence of several key members of the Fatha Formation, with the MB1 member serving as both the basal unit of stratigraphic significance and the operational base. Figure\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003e presents a time-structure contour map of the Top MB1 surface, illustrating spatial variations in elevation time across the Missan oilfields. All studied wells (BU-N1 to BU-N5) are positioned above the Buzurgan Anticline, with BU-N3 and BU-N4 located near the structural crest and BU-N1 and BU-N5 situated closer to the flanks.\u003c/p\u003e\u003cp\u003e\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec5\" class=\"Section2\"\u003e\u003ch2\u003e1.4 Research Objectives\u003c/h2\u003e\u003cp\u003eThis study aims to develop a machine learning-based predictive framework capable of simultaneously predicting lithology, formation members, and synthetic logs (gamma-ray and sonic logs) solely from surface drilling parameters within a single unified system. By leveraging advanced supervised learning algorithms and automated hyperparameter optimization, the framework is designed to deliver highly accurate predictions that can be integrated into real-time drilling advisory systems.\u003c/p\u003e\u003cp\u003eBeyond improving drilling safety and decision-making, the system provides on-site geologists and engineers with continuous, data-driven insights that replace subjective interpretations and manual correlations. This enables faster and more reliable formation evaluation, while also guiding real-time adjustments to drilling parameters, mud properties, and casing-setting depths, thereby enhancing efficiency and reducing operational risks.\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec6\" class=\"Section2\"\u003e\u003ch2\u003e1.5 Operational Relevance and Field Applications\u003c/h2\u003e\u003cp\u003eDrilling operations in high-pressure, salt-bearing formations pose substantial technical and safety challenges (Willson and Fredrich \u003cspan citationid=\"CR35\" class=\"CitationRef\"\u003e2005\u003c/span\u003e; Moiseenkov et al. \u003cspan citationid=\"CR28\" class=\"CitationRef\"\u003e2019\u003c/span\u003e; Jin et al. \u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e2023\u003c/span\u003e). In these complex environments, rapid and reliable predictions of lithology, formation members, and petrophysical properties\u0026mdash;such as gamma-ray and sonic logs\u0026mdash;are critical for minimizing NPT and ensuring safe well construction. One of the most pressing operational concerns in these intervals is the narrow time window between drilling through the final salt layer and installing casing. Delays or misjudgments in this stage can lead to severe mud losses, borehole instability, and compromised well integrity.\u003c/p\u003e\u003cp\u003eThe developed ML framework directly addresses these challenges by delivering continuous, high-resolution predictions derived solely from real-time drilling parameters. This capability provides well-site geologists with an objective decision-support tool, enabling them to compare synthetic logs (e.g., gamma ray and sonic) against offset wells in real time and refine formation correlations under conditions where traditional methods are unreliable.\u003c/p\u003e\u003cp\u003eIn parallel, drilling supervisors and mud engineers benefit from predictive insights that guide immediate adjustments to weight on bit, rotary speed, flow rates, and mud properties. These proactive measures optimize penetration rates, improve hole cleaning, and mitigate well-control risks. Furthermore, predictive modeling helps extend bit life by reducing mechanical stress during lithology transitions and supports more accurate casing-setting decisions in overpressured zones.\u003c/p\u003e\u003cp\u003eBy integrating predictive modeling with operational workflows, the proposed system transforms formation evaluation and drilling practices from reactive, experience-driven processes into proactive, data-driven operations. This integration enhances drilling safety, reduces NPT, and improves overall efficiency in the most technically demanding sections of the well.\u003c/p\u003e\u003c/div\u003e"},{"header":"2. Data and Methodology","content":"\u003cdiv id=\"Sec8\" class=\"Section2\"\u003e\u003ch2\u003e2.1 Data Description\u003c/h2\u003e\u003cp\u003eThe dataset analyzed in this study comprises 30,500 depth-indexed records collected from four Buzurgan wells (BU-N1 through BU-N4), covering the 2,100\u0026ndash;2,800 m interval\u0026mdash;a high-pressure, salt-bearing section drilled with a 12\u0026frac14;-inch bit. Each record includes drilling parameters (ROP, RPM, WOB, torque, FL, and SPP) along with measured depth. The BU-N5 well was entirely reserved for blind validation to evaluate the model\u0026rsquo;s generalization capability. This structured dataset provided a robust foundation for model training, internal assessment, and external real-time testing.\u003c/p\u003e\u003cp\u003eFigure\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003e summarizes the distribution of lithology and stratigraphy in the dataset. The upper plots show that shale is the dominant lithology, accounting for 51.7% of the samples, followed by anhydrite (27.8%) and salt (20.5%). The lower plots display the distribution of samples across the members of the Fatha Formation. The MB4 member contains the highest proportion of samples (43.7%), followed by MB5 (31.6%), MB3 (19.7%), MB2 (5.07%), and a minimal fraction from the uppermost 0.5 m of MB1 (0.0719%). These distributions underscore the stratigraphic variability within the study interval and inform class balancing considerations for supervised ML.\u003c/p\u003e\u003cp\u003e\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec9\" class=\"Section2\"\u003e\u003ch2\u003e2.2 Workflow Overview\u003c/h2\u003e\u003cp\u003eFigure\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003e illustrates the comprehensive workflow implemented in this study, covering the entire sequence from data acquisition to real-time operational decision-making. The workflow was developed to process well-log datasets and integrate machine-learning techniques for predicting petrophysical properties and lithological classifications in complex, high-pressure drilling environments. Data acquisition involved collecting subsurface data from multiple sources, including Final Well Reports (FWR), Daily Geological Reports (DGR), Master Logs, and Wireline Logging (WL) Reports. An initial exploratory data analysis (EDA) phase was conducted to investigate data distributions, detect inconsistencies, and assess the relationships between potential input features and target variables. This stage was followed by feature selection procedures to identify the most informative parameters influencing lithology, GR, and DT responses. The data were subsequently normalized and preprocessed to address scaling disparities, manage missing values, and ensure consistency across different input features.\u003c/p\u003e\u003cp\u003eFollowing preprocessing, the dataset was randomly split into training and testing subsets in an 80:20 ratio to evaluate the generalization performance of the models objectively. A suite of ML algorithms was implemented for both regression and classification tasks, including LR, RF, SVM, XGBoost, MLP, TabNet, and CNN. Regression models were trained to predict continuous log responses (GR and DT) while classification models were used for lithology identification and formation member prediction. Hyperparameter tuning was conducted using the Optuna optimization framework coupled with cross-validation procedures to enhance model performance and mitigate overfitting risks. The model evaluation relied on task-appropriate performance metrics: Coefficient of determination (R\u0026sup2;), Mean Absolute Error (MAE), and Root Mean Square Error (RMSE) for regression tasks; and Accuracy, Precision, Recall, F1-score, and Matthews Correlation Coefficient (MCC) for classification models. The best-performing models were saved and subsequently applied to new incoming well data in real-time, with continuous data normalization and preprocessing performed to maintain workflow integrity. Predictions were compared against actual field measurements to assess accuracy and validate model robustness. This iterative feedback loop facilitated ongoing model refinement and operational decision support, improving drilling safety, formation evaluation accuracy, and operational efficiency in complex and overpressured environments.\u003c/p\u003e\u003cp\u003e\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec10\" class=\"Section2\"\u003e\u003ch2\u003e2.3 Data Preprocessing\u003c/h2\u003e\u003cp\u003ePrior to any analytical or modeling tasks, the dataset undergoes a rigorous preprocessing workflow designed to enhance data integrity and ensure suitability for ML applications. This workflow addresses several key aspects: handling missing values, detecting and mitigating outliers, reconciling multi-resolution data sources, and standardizing feature scales.\u003c/p\u003e\u003cp\u003eA notable challenge involves harmonizing lithological descriptions, typically recorded at coarse 1\u0026ndash;2 m intervals in master lithology logs, with drilling parameter and wireline data, both acquired at a finer vertical resolution of 0.1 m. Leveraging domain expertise and field-proven experience from drilling dozens of wells in the Missan oilfields, a manual alignment procedure is implemented to resolve this discrepancy. This alignment process is applied only to the training and internal evaluation datasets, where ground-truth lithology and log measurements are available. In contrast, blind-test datasets used for real-time validation require no such alignment, as the models operate solely on surface drilling parameters without access to labeled geological data.\u003c/p\u003e\u003cdiv id=\"Sec11\" class=\"Section3\"\u003e\u003ch2\u003e2.3.1 Outlier Detection and Initial Data Assessment\u003c/h2\u003e\u003cp\u003eFigure\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003e presents box plots for the primary drilling and logging parameters, including ROP, WOB, RPM, SPP, torque, FL, GR, and DT. These visualizations offer an initial assessment of the data\u0026rsquo;s distribution and variability, supporting the identification of potential outliers. Each box plot displays the median, interquartile range (IQR), and any data points lying outside the whiskers, which are flagged as possible outliers. The results indicate that parameters such as ROP and RPM exhibit substantial dispersion and noticeable skewness, while more symmetric and narrowly clustered distributions characterize GR and DT logs. This preliminary analysis helps in understanding the quality and characteristics of the dataset before model development.\u003c/p\u003e\u003cp\u003e\u003c/p\u003e\u003cp\u003eIn addition to parameter-wise outlier screening, lithology-specific boxplots of GR and DT were generated for the three principal rock types: shale, anhydrite, and salt. This stratified visualization highlights petrophysical anomalies within each lithologic boundary, allowing for the identification of extreme values, particularly in mixed facies such as shaly anhydrite, dirty salt, or transitional intervals. These outliers serve as indicators of intra-formational heterogeneity. Such insights are crucial for refining lithology labels, enhancing data quality, and ultimately improving the predictive accuracy of downstream ML models.\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec12\" class=\"Section3\"\u003e\u003ch2\u003e2.3.2 Addressing Class Imbalance and Feature Normalization\u003c/h2\u003e\u003cp\u003eTo address the severe class imbalance, particularly the underrepresentation of the MB1 class, which constitutes only 0.0719% of the dataset, the Synthetic Minority Over-sampling Technique (SMOTE) was employed using Python\u0026rsquo;s imbalanced-learn library. SMOTE generates synthetic samples by interpolating between existing minority class instances and their nearest neighbors in feature space (Elreedy and Atiya 2019; Pelayo and Dick \u003cspan citationid=\"CR30\" class=\"CitationRef\"\u003e2019\u003c/span\u003e), thereby creating more diverse and representative training data. In this study, 500 synthetic MB1 samples were generated and incorporated, significantly improving the class distribution and enabling more equitable and stable classification performance by reducing bias toward dominant formations.\u003c/p\u003e\u003cp\u003eFollowing outlier treatment and class rebalancing, feature normalization was applied to all numerical input variables to ensure equal contribution during model training. Normalizing features such as ROP, SPP, WOB, torque, RPM, and FR resulted in standardized distributions centered around zero. This transformation enhances model convergence speed, training efficiency, and predictive stability across both regression and classification tasks (Abdi \u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e2022\u003c/span\u003e; Raymaekers and Rousseeuw \u003cspan citationid=\"CR32\" class=\"CitationRef\"\u003e2024\u003c/span\u003e).\u003c/p\u003e\u003c/div\u003e\u003c/div\u003e\u003cdiv id=\"Sec13\" class=\"Section2\"\u003e\u003ch2\u003e2.4 Model Development and Evaluation\u003c/h2\u003e\u003cp\u003eTo develop a robust predictive framework, seven supervised ML algorithms are employed, including RF, SVM, LR, XGBoost, MLP, CNN, and TabNet. These models are chosen to represent a broad spectrum of learning paradigms, spanning traditional statistical methods, ensemble approaches, and advanced deep learning architectures suitable for structured tabular data. The dataset was randomly partitioned into 80% for training and 20% for testing to evaluate each model\u0026rsquo;s generalization capability objectively. Baseline performance was first established using default hyperparameters, followed by targeted hyperparameter optimization with Optuna\u0026mdash;an efficient and flexible framework that leverages Bayesian optimization via a define-by-run strategy. This optimization facilitated a dynamic and adaptive exploration of the hyperparameter space, yielding optimal model configurations.\u003c/p\u003e\u003cp\u003eModel performance was quantitatively assessed on the independent test set. Given the dual-task nature of the classification\u0026mdash;categorical lithology and continuous log regression\u0026mdash;metrics were selected accordingly. For classification, accuracy, precision, recall, F1-score, and Matthews Correlation Coefficient (MCC) are reported, with MCC offering a robust evaluation in the presence of class imbalance. For regression tasks predicting gamma-ray (GR) and sonic travel-time (DT) logs, the coefficient of determination (R\u0026sup2;), mean absolute error (MAE), and root mean square error (RMSE) are used to assess both predictive accuracy and variance explanation.\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec14\" class=\"Section2\"\u003e\u003ch2\u003e2.5 Validation Strategy\u003c/h2\u003e\u003cp\u003eWhile model performance metrics provide valuable benchmarks under controlled conditions, they do not fully capture the operational variability and uncertainties inherent to real-time drilling environments (Gnyedykh et al. \u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e2022\u003c/span\u003e; Elmgerbi et al. \u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e2022\u003c/span\u003e). Accordingly, an operational validation strategy was implemented to assess model robustness and practical utility under dynamic field conditions. Specifically, an independent validation was performed using data from a well that is not used in the training, within the Buzurgan oilfield. Although the primary goal of drilling in this region is to reach and produce from hydrocarbon-bearing reservoirs situated beneath the salt-bearing intervals, penetrating and drilling through the thick, overpressured salt section is an unavoidable prerequisite for accessing these deeper targets.\u003c/p\u003e\u003cp\u003eThe validation procedure involved the real-time application of the trained predictive models during the drilling of the salt-bearing interval. Figure\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003e illustrates depth profiles of key drilling parameters collected from the external validation well BU-N5 over a depth range from 2080 to 2800 meters. These drilling parameters provided exclusive input data for the models, enabling continuous real-time predictions of lithology, well tops, and synthetic logs without relying on downhole logging tools. This real-time operational validation strategy enabled an assessment of the model\u0026rsquo;s robustness, reliability, and adaptability under realistic field conditions within the complex salt-bearing interval.\u003c/p\u003e\u003cp\u003e\u003c/p\u003e\u003c/div\u003e"},{"header":"3. Results and Model Interpretation","content":"\u003cdiv id=\"Sec16\" class=\"Section2\"\u003e\u003ch2\u003e3.1 Correlation Analysis\u003c/h2\u003e\u003cp\u003eThe relationships between drilling parameters (ROP, WOB, RPM, SPP, torque, and FR) and key geological target variables, GR and DT, lithology, and well tops, were systematically evaluated using Pearson and point-biserial correlation heatmaps (Fig.\u0026nbsp;\u003cspan refid=\"Fig7\" class=\"InternalRef\"\u003e7\u003c/span\u003e). These heat maps provided a quantitative foundation for identifying the most informative drilling parameters relevant to subsurface geological variations.\u003c/p\u003e\u003cp\u003eAmong all drilling-surface parameters, torque emerged as the most diagnostic signal, displaying a strong negative Pearson correlation (Fig.\u0026nbsp;\u003cspan refid=\"Fig7\" class=\"InternalRef\"\u003e7\u003c/span\u003ea, b) with both gamma ray (r\u0026thinsp;=\u0026thinsp;\u0026minus;\u0026thinsp;0.65) and sonic travel time (r\u0026thinsp;=\u0026thinsp;\u0026minus;\u0026thinsp;0.63). Higher torque, therefore, coincides with cleaner, mechanically competent formations\u0026mdash;those that register lower GR values (reduced shale content) and shorter DT values (higher acoustic velocity).\u003c/p\u003e\u003cp\u003e\u003c/p\u003e\u003cp\u003ePoint-biserial correlation further clarified lithology-specific drilling behavior (Fig.\u0026nbsp;\u003cspan refid=\"Fig7\" class=\"InternalRef\"\u003e7\u003c/span\u003ec). Torque shows a strong negative association with shale, consistent with its mechanically soft character, and a positive association with anhydrite, which is hard and drilling-resistant. Salt intervals, although typically drilled with lower WOB and higher ROP, still generate elevated torque because salt is ductile yet mechanically stiff (Aubertin et al. \u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e1999\u003c/span\u003e). This contrast is most pronounced at salt\u0026ndash;shale and anhydrite\u0026ndash;shale interfaces, where torque exhibits a sharp drop as the bit crosses between the mechanically stiff units (salt or anhydrite) and the softer shale, regardless of the direction of transition.\u003c/p\u003e\u003cp\u003eThese trends are visually reinforced in Fig.\u0026nbsp;\u003cspan refid=\"Fig8\" class=\"InternalRef\"\u003e8\u003c/span\u003e, which presents scatter plots of torque versus GR and DT, color-coded by lithology (shale, salt, and anhydrite). The scatter plots clearly illustrate that shale beds correspond to high GR, high DT, and low torque. Salt exhibits moderate DT, low GR, and elevated torque. Low DT, low GR, and the highest torque levels mark anhydrite. This visualization complements the numerical correlation results in Fig.\u0026nbsp;\u003cspan refid=\"Fig7\" class=\"InternalRef\"\u003e7\u003c/span\u003ec, providing intuitive insight into how lithological changes are reflected in real-time drilling behavior. Furthermore, specific formation members showed meaningful associations with drilling parameters (Fig.\u0026nbsp;\u003cspan refid=\"Fig7\" class=\"InternalRef\"\u003e7\u003c/span\u003ed). For example, ROP was positively correlated with MB4 and negatively correlated with MB5, suggesting variable drilling efficiency across these formation members.\u003c/p\u003e\u003cp\u003e\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec17\" class=\"Section2\"\u003e\u003ch2\u003e3.2 Feature-Importance Analysis (SHAP)\u003c/h2\u003e\u003cp\u003eSHAP (SHapley Additive exPlanations) is a game-theoretic approach that assigns each feature a contribution value toward a model\u0026rsquo;s prediction, providing a consistent framework for interpreting complex machine-learning models (Lundberg and Lee \u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e2017\u003c/span\u003e). The geological\u0026ndash;mechanical analysis of SHAP feature-importance results (Fig.\u0026nbsp;\u003cspan refid=\"Fig9\" class=\"InternalRef\"\u003e9\u003c/span\u003e) highlights how drilling parameters influence the four RF models developed in this study. For the regression targets, torque emerges as the dominant predictor. In the DT model, torque accounts for more than half of the total SHAP contribution, whereas ROP and FR share a secondary influence, followed by WOB and SPP, while RPM exerts nearly negligible impact. A similar pattern appears in the GR model, where torque again ranks first, followed by FR and SPP, with ROP and WOB contributing only marginally. These outcomes confirm that the mechanical resistance sensed at the bit is strongly coupled to both acoustic velocity and natural gamma response. For the classification targets, torque remains the leading driver of the lithology classifier, with WOB and ROP providing complementary discriminatory power. While FR, SPP, and RPM contribute progressively less. In contrast, the formation member classifier is governed primarily by flow rate, followed by ROP and RPM, whereas torque and WOB play lesser yet non-negligible roles. This inversion suggests that hydraulic flow variations are more sensitive to stratigraphic boundaries than to absolute rock strength. At the same time, the mechanical signature of torque and bit load most effectively captures changes in lithology.\u003c/p\u003e\u003cp\u003eCollectively, these findings demonstrate that (i) torque is the most informative real-time variable for predicting petrophysical properties (GR, DT) and lithology, and (ii) flow-related parameters (FR, ROP) become pivotal when detecting formation tops. Such insights can guide well-site engineers in prioritizing torque monitoring for lithology transitions and focusing on flow dynamics when anticipating stratigraphic tops, particularly in high-pressure, salt-bearing intervals, where rapid decisions are crucial.\u003c/p\u003e\u003cp\u003e\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec18\" class=\"Section2\"\u003e\u003ch2\u003e3.3 Classification Performance\u003c/h2\u003e\u003cp\u003eThe classification models developed for predicting lithology and formation members were evaluated using confusion matrices (Fig.\u0026nbsp;\u003cspan refid=\"Fig10\" class=\"InternalRef\"\u003e10\u003c/span\u003e) and quantitative metrics summarized in Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e. This evaluation included accuracy, precision, recall, F1-score, and MCC, providing comprehensive insights into model strengths and misclassification tendencies.\u003c/p\u003e\u003cdiv id=\"Sec19\" class=\"Section3\"\u003e\u003ch2\u003e3.3.1 Lithology Classification\u003c/h2\u003e\u003cp\u003eThe confusion matrices of the RF models, presented in Figs.\u0026nbsp;\u003cspan refid=\"Fig10\" class=\"InternalRef\"\u003e10\u003c/span\u003ea and \u003cspan refid=\"Fig10\" class=\"InternalRef\"\u003e10\u003c/span\u003eb, illustrate the performance of the lithology classification model across both the training and testing datasets. The model demonstrated robust performance, particularly in the accurate classification of shale, with 12,053 correct predictions in the training set and 3,043 in the testing set. Minor misclassifications were observed, suggesting subtle ambiguities or transitional characteristics between lithologies, which may indicate the presence of mixed lithological features or gradational contacts within the formation. The testing confusion matrix confirmed good generalization capability, though it showed a slight increase in misclassifications compared to the training dataset. This modest decline in performance is expected, as models typically face greater challenges when applied to unseen data.\u003c/p\u003e\u003cp\u003eQuantitatively, RF delivered the highest performance, achieving an accuracy of 0.9798 with a corresponding MCC of 0.9660 during the training phase. Its strong performance was consistent in testing, maintaining an accuracy of 0.9762 and an MCC of 0.9596. XGBoost closely matched RF performance, with training accuracy of 0.9779 and testing accuracy of 0.9729, alongside MCC values of 0.9630 and 0.9540, respectively. In contrast, SVM and CNN yielded lower testing accuracies (0.9311 and 0.8724, respectively). This reduction is unlikely to stem from sample size (\u0026asymp;\u0026thinsp;30,000 records) but rather from a model\u0026ndash;data mismatch. Convolutional networks are tailored to exploit local spatial stationarity and weight sharing on grid-like inputs with strong spatial autocorrelation (e.g., images), which is not the case for tabular drilling parameters (Lecun et al. \u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e2015\u003c/span\u003e; Goodfellow et al. 2017). Under such conditions, tree-based ensembles such as RF and XGBoost typically capture heterogeneous, non-monotonic interactions more effectively, which is consistent with their superior performance in this context (Chen and Guestrin \u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e2016\u003c/span\u003e).\u003c/p\u003e\u003cp\u003e\u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab1\" border=\"1\"\u003e\u003ccaption language=\"En\"\u003e\u003cdiv class=\"CaptionNumber\"\u003eTable 1\u003c/div\u003e\u003cdiv class=\"CaptionContent\"\u003e\u003cp\u003ePerformance metrics (Accuracy, Precision, Recall, F1-score, MCC) for lithology and formation members predictions of the seven models, highlighting the strong superiority of ensemble models.\u003c/p\u003e\u003c/div\u003e\u003c/caption\u003e\u003ccolgroup cols=\"12\"\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c7\" colnum=\"7\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c8\" colnum=\"8\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c9\" colnum=\"9\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c10\" colnum=\"10\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c11\" colnum=\"11\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c12\" colnum=\"12\"\u003e\u003c/div\u003e\u003ctbody\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\" morerows=\"1\" rowspan=\"2\"\u003e\u003cp\u003eTarget\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\" morerows=\"1\" rowspan=\"2\"\u003e\u003cp\u003eModels\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colspan=\"5\" nameend=\"c7\" namest=\"c3\"\u003e\u003cp\u003eTraining\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colspan=\"5\" nameend=\"c12\" namest=\"c8\"\u003e\u003cp\u003eTesting\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003eAccuracy\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003ePrecision\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u003cp\u003eRecall\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c6\"\u003e\u003cp\u003eF1-score\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c7\"\u003e\u003cp\u003eMCC\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c8\"\u003e\u003cp\u003eAccuracy\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c9\"\u003e\u003cp\u003ePrecision\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c10\"\u003e\u003cp\u003eRecall\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c11\"\u003e\u003cp\u003eF1-score\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c12\"\u003e\u003cp\u003eMCC\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\" morerows=\"6\" rowspan=\"7\"\u003e\u003cp\u003eLithology prediction\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eRF\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e0.9798\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e0.98\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u003cp\u003e0.98\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c6\"\u003e\u003cp\u003e0.98\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c7\"\u003e\u003cp\u003e0.966\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c8\"\u003e\u003cp\u003e0.9762\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c9\"\u003e\u003cp\u003e0.098\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c10\"\u003e\u003cp\u003e0.98\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c11\"\u003e\u003cp\u003e0.98\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c12\"\u003e\u003cp\u003e0.9596\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eSVM\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e0.9322\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e0.93\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u003cp\u003e0.93\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c6\"\u003e\u003cp\u003e0.93\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c7\"\u003e\u003cp\u003e0.8863\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c8\"\u003e\u003cp\u003e0.9311\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c9\"\u003e\u003cp\u003e0.93\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c10\"\u003e\u003cp\u003e0.93\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c11\"\u003e\u003cp\u003e0.93\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c12\"\u003e\u003cp\u003e0.883\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eLR\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e0.8267\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e0.83\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u003cp\u003e0.83\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c6\"\u003e\u003cp\u003e0.83\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c7\"\u003e\u003cp\u003e0.7085\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c8\"\u003e\u003cp\u003e0.8338\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c9\"\u003e\u003cp\u003e0.83\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c10\"\u003e\u003cp\u003e0.83\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c11\"\u003e\u003cp\u003e0.83\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c12\"\u003e\u003cp\u003e0.7172\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eXGBoost\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e0.9779\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e0.98\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u003cp\u003e0.98\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c6\"\u003e\u003cp\u003e0.98\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c7\"\u003e\u003cp\u003e0.963\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c8\"\u003e\u003cp\u003e0.9729\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c9\"\u003e\u003cp\u003e0.97\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c10\"\u003e\u003cp\u003e0.97\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c11\"\u003e\u003cp\u003e0.97\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c12\"\u003e\u003cp\u003e0.954\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eMLP\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e0.8666\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e0.87\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u003cp\u003e0.87\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c6\"\u003e\u003cp\u003e0.87\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c7\"\u003e\u003cp\u003e0.7794\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c8\"\u003e\u003cp\u003e0.8673\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c9\"\u003e\u003cp\u003e0.87\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c10\"\u003e\u003cp\u003e0.87\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c11\"\u003e\u003cp\u003e0.87\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c12\"\u003e\u003cp\u003e0.7776\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eTabNet\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e0.8968\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e0.9\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u003cp\u003e0.9\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c6\"\u003e\u003cp\u003e0.9\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c7\"\u003e\u003cp\u003e0.8187\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c8\"\u003e\u003cp\u003e0.8977\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c9\"\u003e\u003cp\u003e0.9\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c10\"\u003e\u003cp\u003e0.9\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c11\"\u003e\u003cp\u003e0.9\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c12\"\u003e\u003cp\u003e0.8214\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eCNN\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e0.8693\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e0.87\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u003cp\u003e0.87\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c6\"\u003e\u003cp\u003e0.87\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c7\"\u003e\u003cp\u003e0.7853\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c8\"\u003e\u003cp\u003e0.8724\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c9\"\u003e\u003cp\u003e0.88\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c10\"\u003e\u003cp\u003e0.87\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c11\"\u003e\u003cp\u003e0.87\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c12\"\u003e\u003cp\u003e0.7876\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\" morerows=\"6\" rowspan=\"7\"\u003e\u003cp\u003eFormation members prediction\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eRF\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e0.9986\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e0.99\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u003cp\u003e0.99\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c6\"\u003e\u003cp\u003e0.99\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c7\"\u003e\u003cp\u003e0.9979\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c8\"\u003e\u003cp\u003e0.9979\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c9\"\u003e\u003cp\u003e0.99\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c10\"\u003e\u003cp\u003e0.99\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c11\"\u003e\u003cp\u003e0.99\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c12\"\u003e\u003cp\u003e0.9969\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eSVM\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e0.9669\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e0.97\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u003cp\u003e0.97\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c6\"\u003e\u003cp\u003e0.97\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c7\"\u003e\u003cp\u003e0.9495\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c8\"\u003e\u003cp\u003e0.9641\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c9\"\u003e\u003cp\u003e0.96\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c10\"\u003e\u003cp\u003e0.96\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c11\"\u003e\u003cp\u003e0.96\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c12\"\u003e\u003cp\u003e0.9452\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eLR\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e0.7202\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e0.71\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u003cp\u003e0.72\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c6\"\u003e\u003cp\u003e0.71\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c7\"\u003e\u003cp\u003e0.5677\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c8\"\u003e\u003cp\u003e0.7207\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c9\"\u003e\u003cp\u003e0.71\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c10\"\u003e\u003cp\u003e0.72\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c11\"\u003e\u003cp\u003e0.71\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c12\"\u003e\u003cp\u003e0.5686\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eXGBoost\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e0.99\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e0.99\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u003cp\u003e00.99\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c6\"\u003e\u003cp\u003e0.99\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c7\"\u003e\u003cp\u003e0.99\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c8\"\u003e\u003cp\u003e0.99\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c9\"\u003e\u003cp\u003e0.99\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c10\"\u003e\u003cp\u003e0.99\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c11\"\u003e\u003cp\u003e0.99\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c12\"\u003e\u003cp\u003e0.99\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eMLP\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e0.7919\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e0.79\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u003cp\u003e0.79\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c6\"\u003e\u003cp\u003e0.78\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c7\"\u003e\u003cp\u003e0.6787\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c8\"\u003e\u003cp\u003e0.7868\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c9\"\u003e\u003cp\u003e0.79\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c10\"\u003e\u003cp\u003e0.79\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c11\"\u003e\u003cp\u003e0.78\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c12\"\u003e\u003cp\u003e0.6708\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eTabNet\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e0.8786\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e0.88\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u003cp\u003e0.88\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c6\"\u003e\u003cp\u003e0.88\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c7\"\u003e\u003cp\u003e0.8161\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c8\"\u003e\u003cp\u003e0.8740\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c9\"\u003e\u003cp\u003e0.87\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c10\"\u003e\u003cp\u003e0.87\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c11\"\u003e\u003cp\u003e0.87\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c12\"\u003e\u003cp\u003e0.8099\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eCNN\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e0.8403\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e0.84\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u003cp\u003e0.84\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c6\"\u003e\u003cp\u003e0.84\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c7\"\u003e\u003cp\u003e0.7576\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c8\"\u003e\u003cp\u003e0.8395\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c9\"\u003e\u003cp\u003e0.84\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c10\"\u003e\u003cp\u003e0.84\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c11\"\u003e\u003cp\u003e0.84\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c12\"\u003e\u003cp\u003e0.7564\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003c/tbody\u003e\u003c/colgroup\u003e\u003c/table\u003e\u003c/div\u003e\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec20\" class=\"Section3\"\u003e\u003ch2\u003e3.3.2 Formation Members Classification\u003c/h2\u003e\u003cp\u003eThe confusion matrices for the classification of formation members using the RF model across formation members (MB1\u0026ndash;MB5) (Figs.\u0026nbsp;\u003cspan refid=\"Fig10\" class=\"InternalRef\"\u003e10\u003c/span\u003ec and \u003cspan refid=\"Fig10\" class=\"InternalRef\"\u003e10\u003c/span\u003ed) demonstrated excellent performance in accurately capturing geological boundaries. Training results revealed nearly perfect classification for all formation members (e.g., 9,848 correct classifications for MB4 and 7,621 for MB5). This exceptional performance persisted in the testing phase, indicating high model reliability and effectiveness.\u003c/p\u003e\u003cp\u003eThe quantitative results from Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e support this conclusion. Both RF and XGBoost exhibited outstanding results, achieving near-perfect accuracy (approximately 0.9986 for training and 0.9979 for testing) along with consistently high precision, recall, and F1 Scores (0.99). MCC values also remained very high (approximately 0.9979 for training and 0.9969 for testing). In contrast, LR and CNN achieved lower accuracies (0.7207 and 0.8395, respectively), suggesting that these models may be less effective in capturing the intricate, nonlinear geological relationships in this dataset. Overall, the results unequivocally demonstrate the superiority of ensemble methods (RF and XGBoost) in effectively classifying lithology and identifying geological formations. Such reliable performance validates their utility for accurate, real-time geological interpretation and informed decision-making in subsurface exploration and production.\u003c/p\u003e\u003cp\u003e\u003c/p\u003e\u003c/div\u003e\u003c/div\u003e\u003cdiv id=\"Sec21\" class=\"Section2\"\u003e\u003ch2\u003e3.4 Regression Performance\u003c/h2\u003e\u003cdiv id=\"Sec22\" class=\"Section3\"\u003e\u003ch2\u003e3.4.1 Gamma-ray Log Prediction\u003c/h2\u003e\u003cp\u003eTable\u0026nbsp;\u003cspan refid=\"Tab2\" class=\"InternalRef\"\u003e2\u003c/span\u003e summarizes the performance metrics for GR prediction across various models. Among the tested models, RF delivered the highest accuracy, with training and testing R\u0026sup2; values of 0.9327 and 0.9155, respectively. RF also demonstrated low error values (MAE\u0026thinsp;=\u0026thinsp;1.824, RMSE\u0026thinsp;=\u0026thinsp;2.9946 training; MAE\u0026thinsp;=\u0026thinsp;2.0941, RMSE\u0026thinsp;=\u0026thinsp;3.3388 testing), indicating robust predictive performance and reliability. XGBoost closely matched RF performance, achieving R\u0026sup2; values of 0.9318 in training and 0.9154 in testing, with comparably low errors (MAE\u0026thinsp;=\u0026thinsp;1.8621, RMSE\u0026thinsp;=\u0026thinsp;3.0151 training; MAE\u0026thinsp;=\u0026thinsp;2.1196, RMSE\u0026thinsp;=\u0026thinsp;3.3417 testing).\u003c/p\u003e\u003cp\u003eOther evaluated models, including the CNN and SVM, exhibited moderate predictive capability, as reflected by lower R\u0026sup2; values in both the training and testing phases (CNN: R\u0026sup2; = 0.8301, testing; SVM: R\u0026sup2; = 0.7767, testing). Conversely, LR underperformed substantially, yielding an R\u0026sup2; of only 0.4647 in testing, along with significantly higher errors.\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec23\" class=\"Section3\"\u003e\u003ch2\u003e3.4.2 Sonic log Prediction\u003c/h2\u003e\u003cp\u003eSimilarly, Table\u0026nbsp;\u003cspan refid=\"Tab2\" class=\"InternalRef\"\u003e2\u003c/span\u003e provides a comprehensive evaluation of the models predicting DT. Again, ensemble methods significantly outperformed the other models. RF yielded the best results, with an R\u0026sup2; of 0.9422 during training and 0.9298 during testing. Associated errors remained low (MAE\u0026thinsp;=\u0026thinsp;2.9827, RMSE\u0026thinsp;=\u0026thinsp;5.1758 training; MAE\u0026thinsp;=\u0026thinsp;3.3883, RMSE\u0026thinsp;=\u0026thinsp;5.7255 testing), demonstrating the model\u0026rsquo;s predictive robustness. XGBoost exhibited comparable accuracy, with an R\u0026sup2; of 0.9415 in training and 0.9296 in testing, alongside similarly low error metrics (MAE\u0026thinsp;=\u0026thinsp;3.065 in training; MAE\u0026thinsp;=\u0026thinsp;3.4613 in testing). Models such as CNN, TabNet, and MLP showed limited performance, with significantly lower R\u0026sup2; values (approximately 0.63\u0026ndash;0.71) and higher error values. Again, linear regression (LR) delivered particularly weak performance (R\u0026sup2; ~0.50).\u003c/p\u003e\u003cp\u003eOverall, this comprehensive evaluation confirms that ensemble-based regression models, notably Random Forest and XGBoost, are effective in predicting petrophysical parameters from drilling data. The demonstrated high accuracy and generalization potential substantiate their practical value for precise, real-time geological interpretation and informed operational decision-making.\u003c/p\u003e\u003cp\u003e\u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab2\" border=\"1\"\u003e\u003ccaption language=\"En\"\u003e\u003cdiv class=\"CaptionNumber\"\u003eTable 2\u003c/div\u003e\u003cdiv class=\"CaptionContent\"\u003e\u003cp\u003ePerformance Metrics of Training and Testing Data for Gamma Ray (GR) Prediction and Delta Time (DT) Prediction Using Various ML Models, Including Tuned Final Models for the Best Models.\u003c/p\u003e\u003c/div\u003e\u003c/caption\u003e\u003ccolgroup cols=\"8\"\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c7\" colnum=\"7\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c8\" colnum=\"8\"\u003e\u003c/div\u003e\u003ctbody\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\" morerows=\"1\" rowspan=\"2\"\u003e\u003cp\u003eTarget\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\" morerows=\"1\" rowspan=\"2\"\u003e\u003cp\u003eModels\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colspan=\"3\" nameend=\"c5\" namest=\"c3\"\u003e\u003cp\u003eTraining\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colspan=\"3\" nameend=\"c8\" namest=\"c6\"\u003e\u003cp\u003eTesting\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003eR\u003csup\u003e2\u003c/sup\u003e\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003eMAE\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u003cp\u003eRMSE\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c6\"\u003e\u003cp\u003eR\u003csup\u003e2\u003c/sup\u003e\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c7\"\u003e\u003cp\u003eMAE\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c8\"\u003e\u003cp\u003eRMSE\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\" morerows=\"6\" rowspan=\"7\"\u003e\u003cp\u003eGR prediction\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eRF\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e0.9327\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e1.824\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u003cp\u003e2.9946\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c6\"\u003e\u003cp\u003e0.9155\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c7\"\u003e\u003cp\u003e2.0941\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c8\"\u003e\u003cp\u003e3.3388\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eSVM\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e0.7924\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e3.3106\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u003cp\u003e5.2605\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c6\"\u003e\u003cp\u003e0.7767\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c7\"\u003e\u003cp\u003e3.4884\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c8\"\u003e\u003cp\u003e5.4286\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eLR\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e0.4821\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e6.6446\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u003cp\u003e8.3092\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c6\"\u003e\u003cp\u003e0.4647\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c7\"\u003e\u003cp\u003e6.6882\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c8\"\u003e\u003cp\u003e8.4057\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eXGBoost\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e0.9318\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e1.8621\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u003cp\u003e3.0151\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c6\"\u003e\u003cp\u003e0.9154\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c7\"\u003e\u003cp\u003e2.1196\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c8\"\u003e\u003cp\u003e3.3417\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eMLP\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e0.7727\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e3.9411\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u003cp\u003e5.5044\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c6\"\u003e\u003cp\u003e0.7592\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c7\"\u003e\u003cp\u003e4.0571\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c8\"\u003e\u003cp\u003e5.6373\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eTabNet\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e0.6581\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e5.0075\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u003cp\u003e6.6556\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c6\"\u003e\u003cp\u003e0.6556\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c7\"\u003e\u003cp\u003e5.0288\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c8\"\u003e\u003cp\u003e6.6815\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eCNN\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e0.8472\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e3.1816\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u003cp\u003e4.5126\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c6\"\u003e\u003cp\u003e0.8301\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c7\"\u003e\u003cp\u003e3.3599\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c8\"\u003e\u003cp\u003e4.736\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\" morerows=\"6\" rowspan=\"7\"\u003e\u003cp\u003eDT prediction\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eRF\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e0.9422\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e2.9827\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u003cp\u003e5.1758\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c6\"\u003e\u003cp\u003e0.9298\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c7\"\u003e\u003cp\u003e3.3883\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c8\"\u003e\u003cp\u003e5.7255\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eSVM\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e0.8072\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e5.9627\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u003cp\u003e9.4543\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c6\"\u003e\u003cp\u003e0.8018\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c7\"\u003e\u003cp\u003e6.2573\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c8\"\u003e\u003cp\u003e9.6178\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eLR\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e0.5064\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e12.0005\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u003cp\u003e15.1286\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c6\"\u003e\u003cp\u003e0.5001\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c7\"\u003e\u003cp\u003e12.019\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c8\"\u003e\u003cp\u003e15.2751\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eXGBoost\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e0.9415\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e3.065\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u003cp\u003e5.2097\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c6\"\u003e\u003cp\u003e0.9296\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c7\"\u003e\u003cp\u003e3.4613\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c8\"\u003e\u003cp\u003e5.7344\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eMLP\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e0.6388\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e9.9331\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u003cp\u003e12.941\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c6\"\u003e\u003cp\u003e0.6321\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c7\"\u003e\u003cp\u003e10.00\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c8\"\u003e\u003cp\u003e13.1042\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eTabNet\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e0.7067\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e8.551\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u003cp\u003e11.7284\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c6\"\u003e\u003cp\u003e0.7113\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c7\"\u003e\u003cp\u003e8.4855\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c8\"\u003e\u003cp\u003e11.6702\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eCNN\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e0.698\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e8.665\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u003cp\u003e11.8328\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c6\"\u003e\u003cp\u003e0.6961\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c7\"\u003e\u003cp\u003e8.7433\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c8\"\u003e\u003cp\u003e11.9101\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003c/tbody\u003e\u003c/colgroup\u003e\u003c/table\u003e\u003c/div\u003e\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec24\" class=\"Section3\"\u003e\u003ch2\u003e3.4.3 Actual vs. Predicted Analysis of Regression\u003c/h2\u003e\u003cp\u003eScatter plots (Fig.\u0026nbsp;\u003cspan refid=\"Fig11\" class=\"InternalRef\"\u003e11\u003c/span\u003e) were generated to visually assess and further validate the predictive reliability of the RF regression models for GR and DT. The plots clearly depict a strong linear relationship, with predicted values closely aligning along the ideal identity line, confirming high predictive accuracy.\u003c/p\u003e\u003cp\u003eFor DT, the training dataset exhibited an R\u0026sup2; of 0.9422 (Fig.\u0026nbsp;\u003cspan refid=\"Fig11\" class=\"InternalRef\"\u003e11\u003c/span\u003ea), while the testing dataset showed consistent accuracy with an R\u0026sup2; of 0.9298 (Fig.\u0026nbsp;\u003cspan refid=\"Fig11\" class=\"InternalRef\"\u003e11\u003c/span\u003eb). GR predictions similarly demonstrated strong performance, achieving an R\u0026sup2; of 0.9327 for training (Fig.\u0026nbsp;\u003cspan refid=\"Fig11\" class=\"InternalRef\"\u003e11\u003c/span\u003ec) and maintaining high accuracy in testing with an R\u0026sup2; of 0.9155 (Fig.\u0026nbsp;\u003cspan refid=\"Fig11\" class=\"InternalRef\"\u003e11\u003c/span\u003ed). These visual results reinforce the quantitative findings presented in Table\u0026nbsp;\u003cspan refid=\"Tab2\" class=\"InternalRef\"\u003e2\u003c/span\u003e, underscoring the robust generalization capability of the ensemble regression models.\u003c/p\u003e\u003cp\u003e\u003c/p\u003e\u003c/div\u003e\u003c/div\u003e\u003cdiv id=\"Sec25\" class=\"Section2\"\u003e\u003ch2\u003e3.5 Field Well Validation Results\u003c/h2\u003e\u003cp\u003eTo rigorously assess the generalization capability and practical effectiveness of the developed models, external validation was performed in real-time using data from a well not included in the training data, BU-N5, located within the Buzurgan oilfield. The results from this validation are presented in two main subsections: Classification and Regression.\u003c/p\u003e\u003cdiv id=\"Sec26\" class=\"Section3\"\u003e\u003ch2\u003e3.5.1 Classification Results\u003c/h2\u003e\u003cp\u003eQuantitative performance metrics for lithology and formation members classification by RF and XGBoost models for the well BU-N5 are summarized in Table\u0026nbsp;\u003cspan refid=\"Tab3\" class=\"InternalRef\"\u003e3\u003c/span\u003e. The lithology classification demonstrated excellent accuracy, with RF and XGBoost achieving accuracy values of 0.9731 and 0.9718, respectively. Both models exhibited consistent precision, recall, and F1-scores of 0.97, along with high MCC values (0.9570 for RF and 0.9549 for XGBoost). For the formation members classification, RF exhibited near-perfect performance, recording an accuracy of 0.9963 and MCC of 0.9944, while XGBoost closely matched this robust performance (accuracy: 0.9663, MCC: 0.9944).\u003c/p\u003e\u003cp\u003eVisual comparisons of actual and predicted lithology and formation members are provided in Fig.\u0026nbsp;\u003cspan refid=\"Fig12\" class=\"InternalRef\"\u003e12\u003c/span\u003e. This fig. clearly illustrates the strong model performance in lithological identification, accurately distinguishing interbedded layers of shale, anhydrite, and salt. Major stratigraphic boundaries (MB5 to MB1) were also captured effectively, confirming model reliability and highlighting minor discrepancies that are minimal within operational contexts. These classification results collectively validate the practical robustness and reliability of both RF and XGBoost models for geological interpretation and formation identification under real-time drilling scenarios.\u003c/p\u003e\u003cp\u003e\u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab3\" border=\"1\"\u003e\u003ccaption language=\"En\"\u003e\u003cdiv class=\"CaptionNumber\"\u003eTable 3\u003c/div\u003e\u003cdiv class=\"CaptionContent\"\u003e\u003cp\u003ePerformance metrics of RF and XGBoost models on external validation well BU-N5 for lithology and formation members.\u003c/p\u003e\u003c/div\u003e\u003c/caption\u003e\u003ccolgroup cols=\"7\"\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c7\" colnum=\"7\"\u003e\u003c/div\u003e\u003ctbody\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\" morerows=\"1\" rowspan=\"2\"\u003e\u003cp\u003eTarget\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\" morerows=\"1\" rowspan=\"2\"\u003e\u003cp\u003eModels\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colspan=\"5\" nameend=\"c7\" namest=\"c3\"\u003e\u003cp\u003eApplication\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003eAccuracy\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003ePrecision\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u003cp\u003eRecall\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c6\"\u003e\u003cp\u003eF1-score\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c7\"\u003e\u003cp\u003eMCC\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\" morerows=\"1\" rowspan=\"2\"\u003e\u003cp\u003eLithology prediction\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eRF\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e0.9731\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e0.97\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u003cp\u003e0.97\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c6\"\u003e\u003cp\u003e0.97\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c7\"\u003e\u003cp\u003e0.957\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eXGBoost\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e0.9718\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e0.97\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u003cp\u003e0.97\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c6\"\u003e\u003cp\u003e0.97\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c7\"\u003e\u003cp\u003e0.9549\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\" morerows=\"1\" rowspan=\"2\"\u003e\u003cp\u003eFormation Members\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eRF\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e0.9963\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e0.99\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u003cp\u003e0.99\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c6\"\u003e\u003cp\u003e0.99\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c7\"\u003e\u003cp\u003e0.9944\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eXGBoost\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e0.9663\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e0.99\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u003cp\u003e0.99\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c6\"\u003e\u003cp\u003e0.99\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c7\"\u003e\u003cp\u003e0.9944\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003c/tbody\u003e\u003c/colgroup\u003e\u003c/table\u003e\u003c/div\u003e\u003c/p\u003e\u003cp\u003e\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec27\" class=\"Section3\"\u003e\u003ch2\u003e3.5.2 Regression Results\u003c/h2\u003e\u003cp\u003eRegression results for predicting petrophysical properties, Gr and DT, from the validation well (BU-N5) are summarized quantitatively in Table\u0026nbsp;\u003cspan refid=\"Tab4\" class=\"InternalRef\"\u003e4\u003c/span\u003e. The RF model demonstrated robust performance in predicting gamma ray values, achieving an R\u0026sup2; of 0.9342, MAE of 1.8775, and RMSE of 3.1539. XGBoost displayed comparable accuracy, with an R\u0026sup2; of 0.9337, MAE of 1.9251, and RMSE of 3.1640. Similarly, for DT, RF yielded strong predictive performance (R\u0026sup2; of 0.9332, MAE of 3.0465, RMSE of 5.3102), while XGBoost closely matched these results (R\u0026sup2; of 0.9321, MAE of 3.1234, RMSE of 5.3543).\u003c/p\u003e\u003cp\u003eFigure\u0026nbsp;\u003cspan refid=\"Fig13\" class=\"InternalRef\"\u003e13\u003c/span\u003e presents a detailed visual comparison between the actual and predicted GR and DT logs across the depth interval of 2080\u0026ndash;2800 m. The predicted logs exhibit a strong agreement with the measured data, effectively capturing the overall trends and local variations in both GR and DT values. Minor discrepancies between predicted and actual curves remain within acceptable operational tolerances, underscoring the robustness of the models in accommodating real-time variations commonly encountered under field conditions. Taken together, these regression outcomes, supported by quantitative evaluation metrics (Table\u0026nbsp;\u003cspan refid=\"Tab4\" class=\"InternalRef\"\u003e4\u003c/span\u003e) and visual evidence (Fig.\u0026nbsp;\u003cspan refid=\"Fig13\" class=\"InternalRef\"\u003e13\u003c/span\u003e), highlight the high predictive reliability and practical applicability of the RF and XGBoost models in real-time drilling operations.\u003c/p\u003e\u003cp\u003e\u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab4\" border=\"1\"\u003e\u003ccaption language=\"En\"\u003e\u003cdiv class=\"CaptionNumber\"\u003eTable 4\u003c/div\u003e\u003cdiv class=\"CaptionContent\"\u003e\u003cp\u003eRegression performance metrics (R\u0026sup2;, MAE, RMSE) for GR and DT predictions using external validation data.\u003c/p\u003e\u003c/div\u003e\u003c/caption\u003e\u003ccolgroup cols=\"5\"\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e\u003cthead\u003e\u003ctr\u003e\u003cth align=\"left\" colname=\"c1\"\u003e\u003cp\u003eTarget\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c2\"\u003e\u003cp\u003eModels\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c3\"\u003e\u003cp\u003eR\u003csup\u003e2\u003c/sup\u003e\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c4\"\u003e\u003cp\u003eMAE\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c5\"\u003e\u003cp\u003eRMSE\u003c/p\u003e\u003c/th\u003e\u003c/tr\u003e\u003c/thead\u003e\u003ctbody\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\" morerows=\"1\" rowspan=\"2\"\u003e\u003cp\u003eGR prediction\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eRF\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e0.9342\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e1.8775\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\u003cp\u003e3.1539\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eXGBoost\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e0.9337\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e1.9251\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\u003cp\u003e3.164\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\" morerows=\"1\" rowspan=\"2\"\u003e\u003cp\u003eDT prediction\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eRF\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e0.9332\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e3.0465\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\u003cp\u003e5.3102\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eXGBoost\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e0.9321\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e3.1234\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\u003cp\u003e5.3543\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003c/tbody\u003e\u003c/colgroup\u003e\u003c/table\u003e\u003c/div\u003e\u003c/p\u003e\u003cp\u003e\u003c/p\u003e\u003c/div\u003e\u003c/div\u003e"},{"header":"4. Discussion","content":"\u003cp\u003eThe primary aim of this study is to develop robust ML models capable of accurately predicting lithology, formation members, GR, and DT solely from surface drilling parameters. The results presented in Section \u003cspan refid=\"Sec15\" class=\"InternalRef\"\u003e3\u003c/span\u003e demonstrate the strong predictive accuracy and generalization capability of the developed models, particularly the ensemble-based methods RF and XGBoost. These models are designed to assist geologists in identifying formation members and determining casing landing zones in real-time, especially in high-pressure zones where operational risks are elevated and additional precautions are essential. Additionally, they provide valuable guidance for drilling engineers and mud engineers, supporting their decisions on drilling parameters, mud-weight adjustments, and operational strategies to enhance drilling safety, efficiency, and accuracy.\u003c/p\u003e\u003cdiv id=\"Sec29\" class=\"Section2\"\u003e\u003ch2\u003e4.1 Interpretation of Relationships between Drilling Parameters and Geological Properties\u003c/h2\u003e\u003cp\u003eCorrelation analysis (Section \u003cspan refid=\"Sec16\" class=\"InternalRef\"\u003e3.1\u003c/span\u003e) revealed meaningful relationships between key drilling parameters and subsurface geological and petrophysical properties. Torque exhibited a negative correlation with GR values, indicating that intervals with lower GR values, typically cleaner, non-shaly lithologies such as anhydrite and salt, tend to generate higher torque. Conversely, shale intervals, characterized by high GR readings, consistently showed lower torque values, likely due to their softer and less mechanically resistant nature.\u003c/p\u003e\u003cp\u003eA similar pattern is evident in the torque\u0026ndash;DT relationship. Shale, which exhibits the highest DT values (reflecting low acoustic velocity), is associated with the lowest measured torque. Salt, with intermediate DT values, yields moderate torque, whereas anhydrite\u0026mdash;marked by low DT (high velocity and stiffness)\u0026mdash;produces the highest torque response. This trend highlights a direct correlation between formation stiffness and the torque generated at the bit.\u003c/p\u003e\u003cp\u003eAs for WOB, positive correlations were observed with anhydrite, reflecting its high mechanical resistance. However, this trend does not extend to salt, which, despite generating high torque, requires relatively low WOB due to its ductile and easily penetrable nature. This observation aligns with field evidence indicating that salt formations can often be drilled rapidly (1\u0026ndash;3 minutes per meter) with moderate or even low bit pressure despite elevated torque responses.\u003c/p\u003e\u003cp\u003eFurthermore, specific formation members showed meaningful associations with drilling parameters (Fig.\u0026nbsp;\u003cspan refid=\"Fig7\" class=\"InternalRef\"\u003e7\u003c/span\u003ed). For example, the ROP was positively correlated with MB4 and negatively correlated with MB5, suggesting variable drilling efficiency across these members. This difference in drilling rate can be attributed primarily to the presence of thick salt layers within MB4, typically drilled rapidly, and secondly, to the abnormally high pore pressure in MB4, which mechanically weakens the strata and further enhances drilling rates. This correlation-based understanding supports accurate formation identification, optimal bit selection, and real-time lithology prediction during drilling operations.\u003c/p\u003e\u003cp\u003eIn summary, these findings underscore the diagnostic value of torque and WOB as real-time indicators of lithological transitions. Torque, in particular, appears sensitive to both formation stiffness and acoustic response, making it a powerful parameter for anticipating lithological changes during drilling operations.\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec30\" class=\"Section2\"\u003e\u003ch2\u003e4.2 Comparative Analysis of Model Performance\u003c/h2\u003e\u003cp\u003eClassification performance results (Section \u003cspan refid=\"Sec18\" class=\"InternalRef\"\u003e3.3\u003c/span\u003e) clearly showed that RF and XGBoost outperformed LR and SVM, which are comparatively less flexible for capturing highly nonlinear and heterogeneous geological patterns. Ensemble methods consistently delivered near-perfect classification metrics, validating their effectiveness in handling complex, nonlinear geological patterns and heterogeneity in formations. Previous studies corroborate these findings, underscoring the robustness of ensemble methods for geological classification tasks in drilling contexts.\u003c/p\u003e\u003cp\u003eRegression results (Section \u003cspan refid=\"Sec21\" class=\"InternalRef\"\u003e3.4\u003c/span\u003e) confirmed the superiority of RF and XGBoost models, which demonstrated high predictive accuracy (R\u0026sup2; values of 0.91\u0026ndash;0.94) for both gamma ray and sonic travel time predictions. Low error values (RMSE and MAE) further substantiate their precision, making these models highly reliable for real-time log prediction. The notable underperformance of traditional regression models, such as LR, emphasizes the necessity of employing more sophisticated, nonlinear ML techniques when dealing with complex subsurface conditions.\u003c/p\u003e\u003cp\u003eAlthough R\u0026sup2; was approximately 0.93, visual inspection reveals an almost perfect match between the actual and predicted curves (Fig.\u0026nbsp;\u003cspan refid=\"Fig13\" class=\"InternalRef\"\u003e13\u003c/span\u003e) for both GR and DT logs. In many instances, the differences between predicted and actual values are minimal and do not affect geological interpretation. For example, at a depth of 2470 meters, the model predicted a DT value of approximately 68 \u0026micro;s/ft compared to an actual recorded value of 67 \u0026micro;s/ft. This slight deviation does not alter the lithological classification, as both values fall within the typical range for salt layers. However, from a purely mathematical standpoint, such minor differences contribute to the remaining 7% of unexplained variance in the R\u0026sup2; calculation. Consequently, these results confirm the strong practical performance and operational reliability of the model, even when numerical predictions are not entirely precise. This analysis highlights a fundamental limitation of using strict numerical accuracy measures to evaluate model performance in geoscientific applications, where interpretive accuracy often holds greater practical significance.\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec31\" class=\"Section2\"\u003e\u003ch2\u003e4.3 Validation through Independent External Well Data\u003c/h2\u003e\u003cp\u003eThe external validation results (Section \u003cspan refid=\"Sec25\" class=\"InternalRef\"\u003e3.5\u003c/span\u003e) provided critical insights into the applicability of the developed models in realistic operational scenarios. The quantitative metrics summarized in Tables\u0026nbsp;\u003cspan refid=\"Tab3\" class=\"InternalRef\"\u003e3\u003c/span\u003e and \u003cspan refid=\"Tab4\" class=\"InternalRef\"\u003e4\u003c/span\u003e, combined with visual evaluations presented in Figs.\u0026nbsp;\u003cspan refid=\"Fig12\" class=\"InternalRef\"\u003e12\u003c/span\u003e and \u003cspan refid=\"Fig13\" class=\"InternalRef\"\u003e13\u003c/span\u003e, demonstrate excellent performance for both classification and regression predictions. Lithology classification displayed high consistency, effectively capturing thinly interbedded layers, while formation member predictions accurately identified major geological boundaries. Minor discrepancies were observed between the predicted and actual formation boundaries and petrophysical logs, which remained within acceptable operational ranges, aligning with the typical geological uncertainties inherent in subsurface operations.\u003c/p\u003e\u003cp\u003eThis external validation confirmed that both RF and XGBoost models generalize effectively to new wells within the Buzurgan oilfield, highlighting their reliability for practical applications. Real-time operational testing further underscored the robustness, adaptability, and significant practical value of these models in supporting decision-making processes during drilling operations.\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec32\" class=\"Section2\"\u003e\u003ch2\u003e4.4 Limitations and Practical Considerations\u003c/h2\u003e\u003cp\u003eDespite the strong performance achieved, several limitations must be acknowledged. First, the models were trained exclusively on surface drilling parameters collected from the Buzurgan field, which may have limited their predictive accuracy in significantly different geological settings with varying lithologies, pore-pressure regimes, or bit hydraulics. Second, the training data consisted of only three distinct rock types, salt, anhydrite, and shale, with pronounced mechanical contrasts and sonic log responses. Generalizing these models to include additional lithologies (e.g., carbonates or sandstones) may necessitate retraining or transfer-learning strategies. Third, occasional mismatches in predicted formation members suggest that incorporating complementary information, such as seismic attributes or logging while drilling (LWD) data, could further enhance the reliability of predictions.\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec33\" class=\"Section2\"\u003e\u003ch2\u003e4.5 Future Research Recommendations\u003c/h2\u003e\u003cp\u003eFuture research can extend this study in two promising directions. First, multi-source data fusion should be explored by integrating seismic attributes, offset-well logs, or real-time LWD measurements (where available) to strengthen model robustness and improve uncertainty quantification. Second, cross-field validation across different geological settings beyond the Buzurgan area is recommended to assess the transferability of the workflow. Given that salt-bearing, overpressured sections extend across multiple oilfields; such validation would provide valuable insights into the generalization potential of the proposed framework.\u003c/p\u003e\u003c/div\u003e"},{"header":"5. Conclusion","content":"\u003cp\u003eThis study addresses a significant research gap by simultaneously predicting lithology, formation members, GR, and DT exclusively using surface drilling parameters within high-pressure, salt-bearing sequences. The innovative, unified modeling approach eliminates the necessity for developing separate models for each geological and petrophysical target. Consequently, the developed ensemble models (RF and XGBoost) serve as comprehensive decision-support tools for drilling personnel, enhancing real-time situational awareness and operational decision-making capabilities, particularly in high-risk drilling environments.\u003c/p\u003e\u003cp\u003eThe optimized ensemble algorithms demonstrated exceptional real-time predictive accuracy, exceeding 97% in lithology classification, nearly precise identification of formation boundaries, and accurate reconstruction of wireline-quality gamma-ray and sonic curves. These robust results were consistently validated through both internal training wells and an independent blind-test well, confirming their reliability under actual operational conditions. This reliability is especially critical in high-pressure, salt-bearing intervals where traditional open-hole logging presents considerable safety and operational risks.\u003c/p\u003e\u003cp\u003eTorque and WOB emerged as the most influential input parameters, underscoring their strong association with formation mechanical properties and geophysical characteristics. The results highlight the transformative potential of data-driven approaches in streamlining formation evaluation, minimizing NPT, and enhancing well control. Additionally, the integration of physics-informed feature selection, such as leveraging torque fluctuations to predict gamma-ray and sonic-log variations and using flow-rate anomalies to identify formation members, illustrates how conventional drilling parameters can serve as powerful geological indicators.\u003c/p\u003e\u003cp\u003eCrucially, this study emphasizes that numerical performance metrics alone (e.g., R\u0026sup2; or RMSE) may not fully represent the practical geological accuracy of model predictions. Minor numerical discrepancies (e.g., predicting a sonic travel time of 68 \u0026micro;s/ft instead of the actual value of 67 \u0026micro;s/ft) are statistically significant but typically remain within acceptable lithological ranges and do not compromise geological interpretations. Therefore, comprehensive performance evaluations should integrate visual validation and geological reasoning alongside statistical accuracy metrics.\u003c/p\u003e\u003cp\u003eIn summary, the proposed machine-learning approach offers a highly practical, efficient, and reliable solution for real-time geological interpretation and operational decision-making during drilling operations. By significantly reducing dependence on conventional logging methods, the models facilitate rapid and safe formation evaluations, precise casing decisions, and improved operational efficiency in challenging, pressure-sensitive geological environments.\u003c/p\u003e"},{"header":"Declarations","content":"\u003cp\u003e\u003ch2\u003eCompeting Interests\u003c/h2\u003e\u003cp\u003eThe authors declare that they have no competing interests.\u003c/p\u003e\u003c/p\u003e\u003ch2\u003eAuthor Contribution\u003c/h2\u003e\u003cp\u003eH.Y. wrote the main manuscript text and was responsible for data collection, data preprocessing, conceptualization, data validation and methodology development.X.H. was responsible for conceptualization, methodology development, supervision, and manuscript editing.O.A. contributed to data preprocessing, conceptualization, data validation, and methodology development.H.S. contributed to conceptualization, data validation, and manuscript review.All authors reviewed and approved the manuscript.\u003c/p\u003e\u003ch2\u003eAcknowledgement\u003c/h2\u003e\u003cp\u003eThe authors express their sincere gratitude to the Iraqi Ministry of Oil and Missan Oil Company for providing access to seismic and well-log data, which made this research possible. We also extend our sincere appreciation to Southwest Petroleum University for its support. Additionally, we thank our colleagues for their insightful discussions and contributions to the interpretation of the results.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\u003cli\u003e\u003cspan\u003eAbdi H (2022) Normalizing Data. Experiments of the Mind. Princeton University Press, pp 84\u0026ndash;108\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eAly M, Ibrahim AF, Elkatatny S, Abdulraheem A (2021) Artificial intelligence models for real-time synthetic gamma-ray log generation using surface drilling data in Middle East Oil Field. J Appl Geophy 194:104462. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/j.jappgeo.2021.104462\u003c/span\u003e\u003cspan address=\"10.1016/j.jappgeo.2021.104462\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eAubertin M, Julien MR, Servant S, Gill DE (1999) A rate-dependent model for the ductile behavior of salt rocks. Can Geotech J 36:660\u0026ndash;674. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1139/t99-033\u003c/span\u003e\u003cspan address=\"10.1139/t99-033\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eBurak T, Sharma A, Hoel E et al (2024) Real-Time Lithology Prediction at the Bit Using Machine Learning. Geosciences (Switzerland) 14:. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.3390/geosciences14100250\u003c/span\u003e\u003cspan address=\"10.3390/geosciences14100250\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eCao J, Shi Y, Wang D, Zhang X (2017) Acoustic Log Prediction on the Basis of Kernel Extreme Learning Machine for Wells in GJH Survey, Erdos Basin. J Electr Comput Eng 2017:1\u0026ndash;7. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1155/2017/3824086\u003c/span\u003e\u003cspan address=\"10.1155/2017/3824086\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eChen T, Guestrin C (2016) XGBoost. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, New York, NY, USA, pp 785\u0026ndash;794\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eClark SK, Daniel JI, Richards JT (1928) Logging Rotary Wells from Drill Cuttings. Am Assoc Pet Geol Bull 12:59\u0026ndash;76. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1306/3D9327DA-16B1-11D7-8645000102C1865D\u003c/span\u003e\u003cspan address=\"10.1306/3D9327DA-16B1-11D7-8645000102C1865D\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eCruz A, Ivanov R, Juaristi J et al (2024) Engineering and Operational Solutions to Drill a Challenging Hpht Exploratory Well Through a Salt Dome in Mca. In: ADIPEC. SPE\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eDesouky M, Alqubalee A, Gowida A (2023) Decision Tree Ensembles for Automatic Identification of Lithology. In: SPE Symposium Leveraging Artificial Intelligence to Shape the Future of the Energy Industry. SPE\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eElmgerbi A, Chuykov E, Thonhauser G, Nascimento A (2022) Machine Learning Techniques Application for Real-Time Drilling Hydraulic Optimization. In: International Petroleum Technology Conference. IPTC\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eElreedy D, Atiya AF (2019) A Novel Distribution Analysis for SMOTE Oversampling Method in Handling Class Imbalance. In: Rodrigues MF, Cardoso PJS, Monteiro J et al (eds) Computational Science -- ICCS 2019. Springer International Publishing, Cham, pp 236\u0026ndash;248\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eGamal H, Alsaihati A, Elkatatny S (2022) Predicting the Rock Sonic Logs While Drilling by Random Forest and Decision Tree-Based Algorithms. J Energy Resour Technol 144. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1115/1.4051670\u003c/span\u003e\u003cspan address=\"10.1115/1.4051670\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eGamal H, Elkatatny S, Abdulaziz AM (2024) Intelligent Solution for Auto-Detecting Lithology Scheme While Drilling by Machine Learning. In: International Petroleum Technology Conference. IPTC\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eGnyedykh V, De Paola G, Ibanez E et al (2022) Manifold Learning for Realtime Log While Drilling Prediction. ECMOR 2022. European Association of Geoscientists \u0026amp; Engineers, pp 1\u0026ndash;12\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eGoodfellow Ian B, Yoshua C, Aaron (2017) Deep learning. The MIT Press\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eGowida A, Elkatatny S (2020) Prediction of Sonic Wave Transit Times From Drilling Parameters While Horizontal Drilling in Carbonate Rocks Using Neural Networks. Petrophysics \u0026ndash; SPWLA J Formation Evaluation Reserv Description 61:482\u0026ndash;494. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.30632/PJV61N5-2020a6\u003c/span\u003e\u003cspan address=\"10.30632/PJV61N5-2020a6\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eGupta I, Tran N, Devegowda D et al (2020) Looking Ahead of the Bit Using Surface Drilling and Petrophysical Data: Machine-Learning-Based Real-Time Geosteering in Volve Field. SPE J 25:990\u0026ndash;1006. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.2118/199882-PA\u003c/span\u003e\u003cspan address=\"10.2118/199882-PA\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eIbrahim AF, Ahmed A, Elkatatny S (2023) Applications of Different Classification Machine Learning Techniques to Predict Formation Tops and Lithology While Drilling. ACS Omega 8:42152\u0026ndash;42163. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1021/acsomega.3c03725\u003c/span\u003e\u003cspan address=\"10.1021/acsomega.3c03725\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eIbrahim AF, Elkatatny S (2022) Real-Time GR logs Estimation While Drilling Using Surface Drilling Data; AI Application. Arab J Sci Eng 47:11187\u0026ndash;11196. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1007/s13369-021-05854-7\u003c/span\u003e\u003cspan address=\"10.1007/s13369-021-05854-7\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eJassim SZ, Goff JC (2006) Geology of Iraq. Dolin, Prague and Moravian Museum, Brno\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eJin F, Wanting J, Longlian C et al (2023) Research on Drilling Technologies of Ultra-Thick Salt Domes in Middle Asia and Pre-Salt Strata in Middle East: Lessons Learnt from a Pilot Well in Kenkyak Oilfield and an HPHT Well in Halfaya Oilfield. In: Offshore Technology Conference Brasil. OTC\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eKhalifa H, Tomomewo OS, Ndulue UF, Berrehal BE (2023) Machine Learning-Based Real-Time Prediction of Formation Lithology and Tops Using Drilling Parameters with a Web App Integration. Eng 4:2443\u0026ndash;2467. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.3390/eng4030139\u003c/span\u003e\u003cspan address=\"10.3390/eng4030139\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eLao K, Bruno MS, Serajian V (2012) Analysis of Salt Creep and Well Casing Damage in High Pressure and High Temperature Environments. In: Offshore Technology Conference. OTC\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eLecun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521:436\u0026ndash;444\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eLoizzo M, Houghton RD, Zahmuwl AH et al (2024) A Deeper Understanding of the Role of Salts and Creeping Formations in Well Integrity. In: SPE Europe Energy Conference and Exhibition. SPE\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eLundberg SM, Lee S-I (2017) A unified approach to interpreting model predictions. In: 31st Conference on Neural Information Processing Systems. Long Beach, CA, USA, pp 4768\u0026ndash;4777\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eMoazzeni A, Haffar MA (2015) Artificial Intelligence for Lithology Identification through Real-Time Drilling Data. J Earth Sci Clim Change 06. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.4172/2157-7617.1000265\u003c/span\u003e\u003cspan address=\"10.4172/2157-7617.1000265\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eMoiseenkov A, Al Hadhrami A, Khayrutdinov F et al (2019) Openhole Completions as Recovery Case for Drilling Across Salt and High Pressure Floaters. In: Abu Dhabi International Petroleum Exhibition \u0026amp; Conference. SPE\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eOsarogiagbon AU, Oloruntobi O, Khan F et al (2020) Gamma ray log generation from drilling parameters using deep learning. J Pet Sci Eng 195:107906. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/j.petrol.2020.107906\u003c/span\u003e\u003cspan address=\"10.1016/j.petrol.2020.107906\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003ePelayo L, Dick S (2019) Synthetic minority oversampling for function approximation problems. Int J Intell Syst 34:2741\u0026ndash;2768. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1002/int.22120\u003c/span\u003e\u003cspan address=\"10.1002/int.22120\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003ePopescu M, Head R, Ferriday T et al (2021) Using Supervised Machine Learning Algorithms for Automated Lithology Prediction from Wireline Log Data. In: SPE Eastern Europe Subsurface Conference. SPE\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eRaymaekers J, Rousseeuw PJ (2024) Transforming variables to central normality. Mach Learn 113:4953\u0026ndash;4975. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1007/s10994-021-05960-5\u003c/span\u003e\u003cspan address=\"10.1007/s10994-021-05960-5\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eSalim A, Lagraba PJO (2018) Utilizing Drill Cuttings to Enhance Characterization and Description of Tight Carbonate Reservoirs. In: SPE Annual Technical Conference and Exhibition. SPE\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eSmith R, Bakulin A, Golikov P, AlBinHassan N (2022) Predicting sonic and density logs from drilling parameters using temporal convolutional networks. Lead Edge 41:617\u0026ndash;627. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1190/tle41090617.1\u003c/span\u003e\u003cspan address=\"10.1190/tle41090617.1\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eWillson SM, Fredrich JT (2005) Geomechanics Considerations for Through- and Near-Salt Well Design. In: SPE Annual Technical Conference and Exhibition. SPE\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eYao X, Song X, Han L et al (2022) A Novel Method for Real-Time Identification of Formation Lithology Based on Machine Learning. In: 56th U.S. Rock Mechanics/Geomechanics Symposium. ARMA\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eZhou H, Hatherly P, Ramos F, Nettleton E (2011) An adaptive data driven model for characterizing rock properties from Drilling data. In: 2011 IEEE International Conference on Robotics and Automation. IEEE, pp 1909\u0026ndash;1915\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eZilberman VI, Serebryakov VA, Gorfunkel MV et al (2002) Chap. 9 Prediction of abnormally high pressures in petroliferous salt-bearing sections. pp 209\u0026ndash;221\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eZong X, Li X, Gao Y et al (2024) Research and application of cuttings intelligent collection equipment technology. J Phys Conf Ser 2901:012014. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1088/1742-6596/2901/1/012014\u003c/span\u003e\u003cspan address=\"10.1088/1742-6596/2901/1/012014\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":false,"email":"","identity":"journal-of-petroleum-exploration-and-production-technology","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"","title":"Journal of Petroleum Exploration and Production Technology","twitterHandle":"","acdcEnabled":false,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"VoR Journals","inReviewEnabled":false,"inReviewRevisionsEnabled":false},"keywords":"Machine learning, Random Forest, Lithology prediction, Log prediction, Missan oilfields","lastPublishedDoi":"10.21203/rs.3.rs-7771316/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-7771316/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eDrilling through high-pressure, salt-bearing sequences poses severe operational challenges due to rapid pore-pressure fluctuations, borehole instability, and the complex, discontinuous lithologies typical of evaporites. This study develops and rigorously validates a single, unified machine-learning (ML) framework that simultaneously predicts lithology, formation members, and synthetic gamma-ray (GR) and sonic travel-time (DT) logs directly from surface drilling parameters, providing a practical alternative when wireline logging is risky, delayed, or impractical. A depth-indexed dataset of 30,500 records from four wells in the Buzurgan oilfield was compiled, including rate of penetration (ROP), weight on bit (WOB), revolutions per minute (RPM), torque, flow rate (FR), and standpipe pressure (SPP). Seven supervised algorithms were benchmarked: Random Forest (RF) and Extreme Gradient Boosting (XGBoost), tuned with Optuna and evaluated with held-out tests and blind-well validation, delivered the best performance. The optimized ensembles exceeded 97% accuracy for lithology classification and 99% for formation-member identification, while regressors showed strong agreement with wireline measurements (R\u0026sup2; \u0026ge; 0.93 for GR and \u0026ge;\u0026thinsp;0.91 for DT). Feature-importance analyses indicated torque and WOB as the most influential predictors, consistent with their direct coupling to bit\u0026ndash;rock interaction and formation strength; FR, SPP, and RPM contributed secondarily. Operationally, the framework supports real-time casing-point selection, proactive adjustments to drilling parameters, and mud-property optimization\u0026mdash;capabilities that are especially valuable across critical salt\u0026ndash;anhydrite intervals to reduce open-hole exposure, non-productive time (NPT), and well-control risk. Limitations include potential site-specific bias (four-well training within a single field), dependence on data quality and sensor calibration, and the need for prospective cross-field validation and concept drift monitoring as operating practices change. Nonetheless, the results demonstrate that a field-deployable, ensemble-based workflow can reliably replace or complement traditional formation-evaluation methods in high-pressure, salt-bearing environments, enabling faster, evidence-based decisions at the rig site within a transparent, interpretable ML framework.\u003c/p\u003e","manuscriptTitle":"Real-Time Lithology and Log Prediction from Drilling Parameters Using Machine Learning for High-Pressure Salt-Bearing Formation, Missan Oilfields, Iraq","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-10-29 16:16:11","doi":"10.21203/rs.3.rs-7771316/v1","editorialEvents":[{"type":"communityComments","content":0},{"type":"decision","content":"Revision requested","date":"2026-01-31T05:49:24+00:00","index":"","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2026-01-24T17:33:13+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"251048327735376079019961096720690082923","date":"2026-01-02T15:57:05+00:00","index":"hide","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2025-10-23T08:29:16+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"243785422078043692390600653191822116795","date":"2025-10-21T21:30:53+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"162637177723057129215050103154001830482","date":"2025-10-20T16:31:34+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"58891134040379429997319262309482233244","date":"2025-10-15T16:30:07+00:00","index":"hide","fulltext":""},{"type":"reviewersInvited","content":"","date":"2025-10-15T16:22:50+00:00","index":"","fulltext":""},{"type":"editorAssigned","content":"","date":"2025-10-10T03:46:52+00:00","index":"","fulltext":""},{"type":"checksComplete","content":"","date":"2025-10-10T03:46:11+00:00","index":"","fulltext":""},{"type":"submitted","content":"Journal of Petroleum Exploration and Production Technology","date":"2025-10-03T07:38:28+00:00","index":"","fulltext":""}],"status":"published","journal":{"display":false,"email":"","identity":"journal-of-petroleum-exploration-and-production-technology","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"","title":"Journal of Petroleum Exploration and Production Technology","twitterHandle":"","acdcEnabled":false,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"VoR Journals","inReviewEnabled":false,"inReviewRevisionsEnabled":false}}],"origin":"","ownerIdentity":"f2ef879f-00c5-46ca-b7ab-8b0dc6ca4731","owner":[],"postedDate":"October 29th, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"under-review","subjectAreas":[],"tags":[],"updatedAt":"2026-04-13T16:12:36+00:00","versionOfRecord":[],"versionCreatedAt":"2025-10-29 16:16:11","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-7771316","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-7771316","identity":"rs-7771316","version":["v1"]},"buildId":"XKTyCvWXoU3ODBz1xrDgd","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.