Performance of interpolation methods in digital soil mapping: the influence of data characteristics

doi:10.21203/rs.3.rs-8020454/v1

Performance of interpolation methods in digital soil mapping: the influence of data characteristics

2025 · doi:10.21203/rs.3.rs-8020454/v1

preprint OA: closed

Full text JSON View at publisher

Full text 206,088 characters · extracted from preprint-html · click to expand

Performance of interpolation methods in digital soil mapping: the influence of data characteristics | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article Performance of interpolation methods in digital soil mapping: the influence of data characteristics Laura Delgado Bejarano, Agda Loureiro Gonçalves Oliveira, João Vitor Fiolo Pozzuto, and 2 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-8020454/v1 This work is licensed under a CC BY 4.0 License Status: Published Journal Publication published 27 Dec, 2025 Read the published version in Precision Agriculture → Version 1 posted You are reading this latest preprint version Abstract Purpose The selection of interpolation methods in digital soil mapping lacks a systematic approach, reducing map accuracy. This study aimed to evaluate whether data characteristics, such as sample size and spatial structure, influence the selection and performance of interpolation methods. Methods Six interpolation methods were evaluated across datasets representing three typical sampling density scenarios in Brazilian agriculture. Spatial structure was characterized using Moran’s index and the spatial dependence index derived from geostatistical semivariograms. Interpolation was performed, and the accuracy was assessed using test datasets and Lin’s concordance correlation coefficient. Consequently, two decision frameworks (multivariate and univariate) were developed to guide method selection. The univariate framework was then validated to assess its robustness. Results For small datasets (n < 50), deterministic methods, particularly Thin Plate Spline (TPS), consistently provided the most stable predictions. In contrast, the performance of the geostatistical and machine learning methods improved with increasing sample size and stronger spatial structure. In the largest datasets (n ≥ 100), most methods became competitive, shifting the primary selection criteria towards factors such as operational simplicity. These findings were synthesized into decision frameworks to guide optimal interpolator selection. Conclusion Interpolation performance is critically dependent on underlying data attributes (sample size and spatial structure). No universal interpolator exists for all datasets. Deterministic methods, specifically the TPS, demonstrated superior flexibility across diverse scenarios. A data-driven decision framework was developed in this study translating these key data attributes into clear, actionable recommendations, thereby providing users with an accessible tool to demonstrably improve the reliability of soil maps. Soil variability sampling density spatial structure decision framework map accuracy spatial prediction Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Figure 6 Figure 7 Figure 8 Figure 9 Figure 10 Figure 11 Highlights • Interpolator performance is critically dependent on sample size and spatial structure. • Thin Plate Spline (TPS) provides the most stable predictions, especially in small datasets (n < 50) • No single interpolation method is universally effective for all datasets. • A decision framework translates sample size and spatial structure into method choice Impact This work provides a practical, data-driven framework that guides users in selecting the optimal interpolation method by analyzing sample size and spatial structure. By replacing the common practice of arbitrary method selection, these findings directly improve the reliability and accuracy of digital soil maps, which are fundamental tools for effective site-specific management Introduction Soil is a fundamental component of agricultural systems, directly influencing crop productivity, environmental sustainability, and key management decisions, such as fertilization and irrigation. This critical resource is highly heterogeneous, however, due to natural formation processes and intensive anthropogenic practices (e.g., tillage and soil conservation) (Oliver & Webster, 2014 ; Panday et al., 2019 ; Pusch et al., 2022 ). This spatial and temporal heterogeneity renders the agricultural soil a complex and anisotropic domain. Consequently, recognizing and accurately quantifying this variability is crucial for guiding effective site-specific management decisions within Precision Agriculture (Cherubin et al., 2022 ; De Caires et al., 2025 ; Gebbers & Adamchuk, 2010 ). The quantification of soil variability relies on the generation of maps that depict the spatial distribution of soil properties (Piikki et al., 2021 ). These maps are generated by soil sampling to acquire a discrete set of data at known locations. However, the sampling process is costly and labor intensive (Amaral & Justina, 2019; Radočaj et al., 2022 ). Consequently, sampling strategies often prioritize economic constraints over the actual requirements to capture soil variability, which frequently compromises the representativeness of the collected data (Aldungarova et al., 2025 ; Amaral & Justina, 2019; Cherubin et al., 2022 ; Karp et al., 2024 ). The need to transform discrete point data into continuous maps necessitates the use of interpolation. In practice, however, interpolation methods are often selected arbitrarily, which compromises their effectiveness in producing reliable maps that accurately represent soil variability. This gap persists because no single method is universally effective across all dataset conditions (Barrena-González et al., 2022 ; Karp et al., 2024 ). Consequently, data characteristics can serve as a systematic guide for selecting the most appropriate interpolator, leveraging the specific requirements and limitations of each method (De Caires et al., 2025 ). Various interpolation methods predict property values at unknown locations in precision agriculture and are classified based on the underlying analytical approach (Hengl, 2009 ). Geostatistical methods are the most widely adopted in soil science, with Ordinary Kriging (OK) being the most notable example. This method models spatial dependence through the semivariogram, providing not only estimates but also an associated measure of uncertainty. Nevertheless, its performance is constrained by the quality of the fitted semivariogram, which directly depends on the sample size and the underlying spatial structure (Oliver & Webster, 2014 ). Deterministic methods constitute another class of interpolators, often representing simpler approaches. These approaches are based on mathematical functions that use the distance between observations, e.g., Inverse Distance Weighting (IDW), Radial Basis Functions (RBF), and Thiessen polygons (Hengl, 2009 ; Wong, 2017 ). More sophisticated interpolation methods, primarily Machine Learning (ML) algorithms, have recently emerged as powerful alternatives for digital soil mapping. These approaches leverage auxiliary covariates and effectively capture complex, non-linear spatial patterns. Prominent examples include Random Forest (RF), Support Vector Machine (SVM), and Convolutional Neural Networks (CNN) (Khaledian & Miller, 2020 ; Pereira et al., 2022 ; Radočaj et al., 2022 ). Interpolator performance is influenced by a range of factors, including sampling configuration, the method’s mathematical structure, and the spatial structure of the data, among others (Li & Heap, 2011 ; Pusch et al., 2023 ; Radočaj et al., 2021 ). Previous studies primarily focused on the effect of sampling density on mapping accuracy, highlighting that higher densities usually allow for better map quality (Long et al., 2018 ; Sun et al., 2012 ; Yang et al., 2020 ). However, research comparing different interpolators illustrates that performance varies according to the method and data characteristics (AbdelRahman et al., 2020; Duan et al., 2020). Addressing this need, some studies have successfully combined both perspectives, testing different interpolators under varying sampling densities However, these works often rely on large datasets or regional scales (Chen et al., 2022 ; Qu et al., 2024 ), a constraint that fails to reflect the limited number of samples typically available in operational agricultural fields. Crucially, in practice, even common strategies (e.g., collecting one sample per hectare) do not provide an adequate number of observations in smaller fields for the successful implementation of some interpolation models (Kerry & Oliver, 2007 ). Such methodological shortcomings highlight that adequate variability mapping requires adapting the interpolation method to the specific data conditions. This systematic adaptation ensures the generated maps effectively support site-specific management decisions. In contrast to the prevalent large-scale focus, recent contributions have examined interpolation specifically under low-sampling agricultural conditions. For example, Karp et al. ( 2024 ) introduced variants of conventional methods, such as OK and IDW, and developed parameter optimization strategies adapted to low-sampling scenarios. Sobjak et al. ( 2023 ) exemplifies recent efforts by automating OK and IDW as a workflow based on dataset spatial dependence. Although valuable, their focus remains narrowly fixed on optimizing a limited set of conventional interpolators, overlooking other robust alternatives. More importantly, these authors do not translate their findings into a general decision framework. As a result, practitioners still lack a simple, systematic guide for navigating the initial and most critical choice of interpolation method. This leads to unreliable maps and misguided management actions, such as under or over fertilizer application, compromising the efficiency of precision agriculture (Karp et al., 2024 ; Pusch et al., 2022 ). Therefore, this study addresses the need for more precise guidance in digital soil mapping by examining the influence of dataset characteristics on interpolation performance. Consequently, the objective of this study is to evaluate whether data characteristics, such as sample size and spatial structure, affect the selection and performance of interpolation methods used in digital soil mapping and, based on this, propose and validate a framework for choosing the interpolation method. Material and methods Study Area The study utilized six commercial farms located in different municipalities of São Paulo, Brazil (Fig. 1 ), designating three sites for framework development and the remaining three for its external validation. The first group comprised the following sites (Table 1 ): the Aliada farm, a 70 ha sugarcane field (20°51'26.1" S; 47°57'4.21" W); the São José farm, a 107 ha grain production area (22°41'55.16" S; 47°10'34.15" W); and the Campina farm, a 200 ha area used for both cattle and soybean production (21°38'13" S; 51°54'55" W). The validation group comprised three sugarcane farms: Boa Vista (21°50'23" S, 47°45'27" W); Ipiranga (21°49'07" S, 47°44'10" W); and Santa Fé (21°38'10.32" S; 48°39'1.90" W). The Köppen classification system (Lima et al., 2023) designated the following climates for the study sites: Cwa (humid subtropical) for Aliada, Ipiranga, Boa Vista, and Santa Fe farms; Cfa (sub-humid tropical) for São José farm; and Aw (tropical with a dry winter) for Campina farm. Oxisol is the predominant soil type across all areas, classified according to the United States Department of Agriculture (USDA, 2022). Oxisols are characteristic of tropical and subtropical regions and typically exhibit low natural fertility, thus necessitating fertilizer application for agricultural production (Eswaran & Reich, 2005). Soil sampling and analysis Soil sampling was primarily conducted using an instrumented quadricycle equipped with an auger; however, a manual hand auger was employed for the Campina and Santa Fé farms. At each central point, composite samples were collected by obtaining multiple subsamples within a 3–5 m radius. The specific sampling characteristics for each area are summarized in Table 1 . All samples were processed in the laboratory for the analysis of agronomically relevant variables, including available potassium (K, mmol c dm − 3 ), available phosphorus (P, mg dm − 3 ), pH, base saturation (V%), cation exchange capacity (CEC, mmolc dm − 3 ), and clay content (g kg − 1 ), following the procedures described by Raij et al. ( 2001 ). These variables were chosen due to their tendency to form distinct spatial patterns influenced by management practices (Amaral & Justina, 2019). Table 1 Soil sampling characteristics for the framework development and external validation sites Site Crop depth (m) Sub-samples Density (samples ha − 1 ) Radius 1 (m) Year 2 Ref 3 Aliada Sugarcane 0-0.25 6 6 5 2023 (Melo & Amaral, 2024 ) São José Soybean 0-0.20 6 6 5 2023 Campina Integrated livestock-soybean system 0-0.20 4 2 4 2018 (Magalhães & Amaral, 2022 ) Ipiranga Sugarcane 0-0.25 8 1 3 2017 (Pusch et al., 2022 ) Santa Fé Sugarcane 0-0.25 3 2 3 2017 (Fontenelli et al., 2021 ) Boa Vista Sugarcane 0-0.25 6 1 5 2025 1 Sampling radius used to obtain each composite sample 2 Sampling year for each site. 3 References indicate the original data sources or previous studies The original datasets for the study areas were systematically divided into three sampling densities (1, 0.5, and 0.2 samples ha − 1 ), which represent distinct scenarios of sample size and spatial dependence. These densities reflect common strategies adopted in Brazilian agriculture, although they may not fully capture the spatial variability of soil properties (Cherubin et al., 2022 ). For each area, the non-selected samples were designated as an independent test subset (Fig. 2). Figure 2 Distribution of soil sampling points across the study sites under different sampling densities (1, 0.5, and 0.2 samples ha⁻¹) and validation samples Spatial Structure Assessment The study assessed the spatial structure of the data using a combination of spatial dependence (SD), quantified through semivariogram analysis, and the Moran index (MI). The SD index (Table 2 ) quantifies the ratio between the partial sill (C 1 ), which represents the correlated semivariance indicating spatial continuity, and the sill (C o + C 1 ), which estimates the total variance of the random process. These parameters were obtained by fitting semivariograms using geostatistical models (spherical, exponential, and Gaussian). For datasets with less than 100 samples, the semivariogram fitting utilized the residual maximum likelihood (REML) method, as recommended by Kerry and Oliver ( 2007 ). All geostatistical fitting was performed using the geoR package in R (Diggle and Ribeiro, 2007). Conversely, the study applied the method of moments (MoM) for larger datasets (Oliver & Webster, 2015 ) utilizing the gstat package in R (Gräler et al., 2016 ). Model selection was performed using leave-one-out cross-validation (LOOCV). The study prioritized models that yielded the highest Lin’s concordance coefficient (LCCC) and the lowest root mean square error (RMSE). The MI was calculated using the k-nearest neighbors (KNN) approach, considering eight neighbors per sample point (Anselin, 1995 ). The study classified spatial patterns as clustered when the p-value was less than 0.05; otherwise, the pattern was considered random. Table 2 Classification of spatial dependence (Konopatzki et al., 2012 ) Spatial dependence (SD) Class < 20 Very Low (VL) [20–40) Low (L) [40–60) Moderate (M) [60–80) High (H) ≥ 80 Very High (VH) Notably, the MI and SD indices capture spatial patterns at distinct scales: the MI reflects local spatial autocorrelation (based on eight nearest neighbors), while the SD represents the global spatial autocorrelation derived from the semivariogram. Given their operation at these distinct scales, the joint use of both indicators provides a more comprehensive assessment of spatial structure within each dataset. Interpolation methods The study compared six interpolation methods across the scenarios, categorized as follows: two deterministic approaches: Inverse Distance Weighting (IDW) and Thin plate spline (TPS); one geostatistical method: Ordinary Kriging (OK); two machine learning algorithms: Support Vector Machine (SVM) and Spatial Random Forest (RFE); and one hybrid method combining machine learning and geostatistics: Regression Kriging (RK). Inverse Distance Weighting ( IDW ) is an exact interpolator that estimates unknown values by assigning weights to sample points in inverse proportion to distance. The influence of nearby points is modulated by a power parameter (p). In this study, the p parameter was optimized by testing values in a sequence from 0.5 to 6 (increments of 0.5) (Betzek et al., 2019 ; Sobjak et al., 2023 ), combined with neighbor counts ranging from 4 to 16. The optimal combination was selected using the lowest value of the Interpolation Selection Index (ISI) (Bier and de Souza, 2017 ). The ISI is a composite metric, derived from leave-one-out cross-validation, which integrates prediction error and residual variability to assess the accuracy and stability of the interpolation results. Thin Plate Spline (TPS) is a special case of radial basis functions (RBF) and an exact interpolator, ensuring the fitted surface passes through all observed data points. Its formulation minimizes a penalized sum of squares to balance fidelity to observations with surface smoothness, thereby generating continuous and smooth surfaces (Broomhead and Lowe, 1988; White et al., 2019 ). Conceptually, this is equivalent to minimizing the bending energy of a thin metal sheet. The study optimized the smooth parameter ε through k-fold cross-validation (k = 5), using the ISI criterion over a search space from 0.0001 to 0.5. Since ε = 0.0001 consistently provided the best performance across all cases, this value was adopted as the final parameter. The TPS interpolation was implemented globally, utilizing all sampled points to fit the surface rather than a localized neighborhood approach. Ordinary Kriging (OK) is considered the best unbiased linear predictor, operating under the assumption of a constant but unknown mean within the local neighborhood. The method relies on modeling spatial autocorrelation using the semivariogram, making it particularly effective when data exhibits strong spatial dependence (Oliver and Webster, 2015 ). The theoretical semivariogram model selected during the SD analysis was used for the OK interpolation. Support Vector Machine (SVM) is an ML algorithm that utilizes covariates to enhance spatial predictions. It projects data into a hyperspace using a radial basis kernel to effectively capture nonlinear patterns (Khaledian and Miller, 2020 ). The study tuned the hyperparameters C (error cost) and sigma (kernel parameter) via grid search and 10-fold cross-validation using the caret package in R (Kuhn, 2008 ). The (C, sigma) pair that minimized the RMSE was subsequently selected. Spatial Random Forest (RFE) , based on the Random Forest (RF) algorithm, combines decision trees with bootstrap aggregation (bagging) to reduce variance and increase stability (Breiman, 2001 ; Khaledian and Miller, 2020 ). Spatial dependence was incorporated by including coordinates (x, y) as covariates (Hengl et al., 2018). The study tuned the hyperparameters (ntree: 500–5000, mtry: 1–7, and nodesize: 1–50) through a grid search of 200 iterations, employing a 5-fold cross-validation. This optimization was performed using the mlr package in R (Bischl et al., 2016 ), with the optimal parameter set selected as the combination that minimized the RMSE. Regression Kriging (RK) is a hybrid method that combines regression and geostatistics (Hengl et al., 2007 ). In this study, the RFE algorithm modeled the regression component. Residuals were subsequently computed, modeled with a semivariogram, and interpolated using OK. The final RK prediction was obtained by adding the kriged residuals to the RFE predictions. Covariates Five covariates were used for the multivariate methods (SVM, RFE, RK): apparent electrical conductivity (ECa), elevation (DEM), and three spectral features derived from a synthetic soil image: the saturation (S) component of the RGB to HSI transform, the normalized difference clay index (NDCI), and the soil brightness index (SBI) (Supplementary Material, Fig. S1 ). Apparent electrical conductivity (ECa) was measured at an effective depth of 0.37m using an EM38 sensor (Geonics, Ltd, Mississauga, Ontario, Canada). The sensor, towed by a quadricycle at approximately 20 km h − 1 with 1 Hz frequency, generated a dense grid of measurements. For processing, the ECa data underwent filtering to remove outliers, maneuvering points, and boundary artifacts. The cleaned data was subsequently interpolated into a 10 m raster using Universal Kriging (UK). The UK method was chosen because the ECa data exhibited a spatial trend in the semivariogram analysis, allowing the UK to effectively capture this trend using coordinates (x, y). Furthermore, ECa integrates multiple soil physical and chemical properties and has proven effective in characterizing soil variability (Corwin and Scudiero, 2020 ; Pusch et al., 2021 ). Elevation (DEM) data was obtained from the ALOS PALSAR digital elevation model (30 m resolution) using the Google Earth Engine (GEE). The processing involved reprojecting each raster to the appropriate coordinate reference system (CRS, WGS 84 UTM 23S for all areas, except WGS 84 UTM 22S for the Campina farm). The rasters were subsequently clipped to field boundaries and resampled to 10 m using bilinear interpolation. As a key topographic feature, DEM inherently influences water redistribution and soil particle transport, effects that directly impact soil spatial variability (Pusch et al., 2022 ; Rodrigues et al., 2021 ). The Synthetic Soil Image (SYSI) was generated following an adapted GEOS workflow (Demattê et al., 2018 ) to overcome the limited bare-soil coverage typical of pasture and no-till management. The method composited Sentinel-2 time series data within the GEE, filtering for bare soil pixels (clouds ≤ 10%, NDVI < 0.25, and NBR 2 < 0.10). This 10 m composite served as the basis for three spatial covariates: Saturation (S) component (from HSI transform): Captures soil color information with reduced spectral redundancy (Zanotta et al., 2019 ). Normalized Difference Clay Index (NDCI) (Eq. 1): Sensitive to soil texture, and correlated with clay content, which enhances soil mapping (Danoedoro and Zukhrufiyati, 2015 ). Soil Brightness Index (SBI) (Eq. 2): Represents soil surface reflectance and brightness, influenced by soil texture, mineral composition, and moisture content, providing predictive value for soil properties (AbdelRahman et al., 2022 ). $$\:NDCI=\frac{SWIR2-SWIR1}{SWIR2+SWIR1}\:\left(1\right)$$ $$\:BI=\frac{\sqrt[2]{{R}^{2}+{G}^{2}\:\:}}{2}\:\left(2\right)$$ To evaluate the predictive associations supporting the multivariate interpolation methods, the study performed a Pearson correlation analysis between the soil properties and the covariates after standardizing all covariate layers (z-score). The resulting associations were classified according to the criteria presented in Table 3 . Table 3 Classification of Pearson correlation strength (Evans, 1996) Pearson Correlation |r| Class < 0.20 Very Weak [0.20, 0.40) Weak [0.40, 0.60) Moderate [0.60, 0.80) Strong ≥ 0.80 Very Strong Validation and Framework The study evaluated the interpolation performance using the independent test datasets from the study areas (Fig. 2), with model accuracy assessed by the Lin’s concordance coefficient (LCCC), a metric that integrates both precision and accuracy into a single measure (Khaledian and Miller, 2020 ).The validation results were stratified into scenarios defined by sample size (< 50, 50–100, ≥ 100), reflecting geostatistical limitations, and by the spatial structure of the data (IM-SD classification). For each scenario and interpolator, the median LCCC calculated across all six soil attributes analyzed (P, K, pH, CEC, V%, and clay content) served as the representative performance metric. The median was selected as a robust measure of central tendency to reduce sensitivity to outliers (Huber and Ronchetti, 2009 ; Webster and Oliver, 2007 ), mitigating the influence of extreme LCCC values that are expected when analyzing soil properties with inherently different levels of predictability. To synthesize these validation results into a practical guide, the study constructed a decision-making framework. The framework used the scenario characteristics (sample size and MI-SD classification) as inputs. The decision rule was performance-based: the interpolator yielding the highest median LCCC was selected as the primary recommendation. Furthermore, a 5% performance threshold was applied, identifying any method within this range of the top result as a suitable alternative, given that minor differences in performance metrics often lack practical significance. In parallel, a univariate framework was also developed using the same procedure. This approach, which limits the choice to simpler methods (TPS, IDW and OK), provides a more practical tool for users lacking the covariates or technical resources required for multivariate approaches. External validation External validation utilized three independent areas (Fig. 1 ), where multiple scenarios were simulated by systematically subsetting the datasets (Supplementary material - Fig. S2). This procedure involved dividing the areas into smaller subregions and applying the same sampling densities used in the core study. These simulated scenarios were crucial for evaluating the consistency of univariate interpolation methods (TPS, IDW, and OK) across varying spatial structures and sample sizes. The validation was performed using the LOOCV approach, with performance quantified by LCCC. Results and discussion The performance of the interpolation methods was fundamentally conditioned by data characteristics, particularly the interaction between sample size and spatial structure. Sampling density, therefore, emerges as a critical parameter in survey planning, governing both the total number of samples and the ability to capture spatial dependence. This is especially relevant in agricultural areas where small field sizes, even at high sampling densities, may fail to provide a sufficient number of samples for the proper application of specific methods (Kerry and Oliver, 2007 ; Radočaj et al., 2021 ). To clearly illustrate these relationships, the following sections present the results stratified by sample-size scenario, highlighting how the performance and suitability of each method changed across varying data conditions. Performance in scenarios with less than 50 samples For datasets with less than 50 samples (Fig. 3 ), the median LCCC revealed severe limitations when the data exhibited random autocorrelation behavior (SD: Very Low to Very High). Conversely, for clustered patterns, all interpolation methods yielded similar performance (SD: High and Very High). In all scenarios, deterministic approaches, particularly TPS, emerged as the most reliable predictors regardless of the underlying spatial structure (Fig. 3 ). The strong performance of TPS reflects its inherent ability to generate continuous and realistic surfaces, effectively minimizing interpolation artifacts while preserving the observed data values (Webster and Oliver, 2007 ). In contrast, geostatistical models produced unstable predictions in this range. This instability is expected, as these methods typically require a minimum of 100 samples for semivariogram construction, or at least 50 samples when employing advanced techniques like REML (Kerry and Oliver, 2007 ; Oliver and Webster, 2014 ). This finding aligns with the literature consensus, which recommends deterministic methods for small datasets lacking robust spatial dependence where constructing a reliable semivariogram is not feasible (Kravchenko, 2003 ; Radočaj et al., 2021 ; Rodrigues et al., 2018 ). Nevertheless, the apparent similarity in performance in clustered scenarios must be interpreted with caution, particularly for geostatistical methods (OK), as the semivariogram fitting process is highly sensitive to the analyst's technical expertise (Amaral and Justina, 2019; Oliver and Webster, 2015 ).The limited number of lags generated under small sample sizes allows for the adjustment of multiple plausible models, often yielding diverse results. Consequently, the quality of geostatistical outputs in these scenarios fundamentally reflects the subjective modeling decisions rather than an inherently robust spatial prediction. The visual evidence from the interpolated maps for datasets with fewer than 50 samples is consistent with the LCCC accuracy results, clearly revealing differences in spatial structure consistency among the methods (Fig. 4 ). Among the deterministic methods, TPS remained stable across all spatial structure scenarios, producing continuous surfaces that more faithfully capture the local variability of the soil properties. This strong performance stems from its nature as an exact interpolator, which preserves the values of the sample points while generating plausible predictions between observed data (Karwariya et al., 2021; White et al., 2019 ; Wong, 2017 ). In contrast, IDW generated surfaces with inconsistent continuity, displaying artifacts such as bull's-eye effects and blocky patterns, or overly smooth predictions, depending on the power optimization applied. Therefore, TPS is the preferred method, as it provides a more reliable representation of spatial variability than IDW (Rodrigues et al., 2018 ). This superiority is particularly important in practical applications, such as fertilizer recommendations, where smooth, realistic spatial transitions are essential to avoid abrupt dosage variations in the field. Conversely, Machine Learning (ML) models produced overly smooth surfaces (Fig. 4 ) that failed to represent the variability in soil properties accurately. This deficiency is attributed to two factors: the insufficient number of samples for model training (Goovaerts and Kerry, 2010 ; Pusch et al., 2022 ; Sekulić et al., 2020 ) and the weak correlation with the covariates (Kerry et al., 2024; Pusch et al., 2021 ). Specifically, the relationship between soil properties and covariates largely lacked significant correlations in this scenario across all study areas. When correlations were present, they were generally moderate to strong but isolated to specific variables or sites (ECa or DEM in Paulinia at 0.2 samples ha − 1 ; supplementary material – Fig. S3, S4, S6 and S9). The restricted capacity of the multivariate methods to accurately capture soil variability is explained by the combination of a small sample size and the inconsistency in predictive relationships with the covariates. Similarly, the OK results were also inconsistent, producing maps that ranged from excessively smooth surfaces to those with bull's-eye artifacts. This reduced accuracy reflects the inherent difficulty of modelling a reliable semivariogram from a limited number of samples, supporting the literature that advises against using geostatistical methods in this scenario (Oliver and Webster, 2014 ). Finally, RK achieved only partial improvements by incorporating covariates, capturing some spatial variation in specific scenarios, but its overall performance remained limited (Fig. 3 ). Although the methods exhibited similar accuracy in clustered patterns, the spatial appearance of the resulting maps differed substantially (Fig. 4 ), ranging from overly smoothed surfaces to maps exhibiting sharper transitions, as previously discussed. This divergence underscores the critical importance of considering both interpolation accuracy and spatial pattern, since accuracy metrics alone do not guarantee a map's practical quality. For practical applications like site-specific management, an interpolated map must achieve a crucial balance between statistical accuracy and agronomic feasibility, ensuring that spatial transitions are sufficiently coherent to effectively guide field decisions (Amaral and Justina, 2019; Karp et al., 2024 ). Performance in scenarios with 50 to 100 samples In intermediate scenarios (50–100 samples), the influence of spatial structure remains evident (Fig. 5 ). For datasets exhibiting random patterns, all interpolation methods showed limited predictive capacity, with LCCC values consistently below 0.5. This behavior suggests that when both local and global spatial structures are weak, the intermediate sampling density (0.2 and 0.5 samples ha − 1) may still be insufficient to capture the microvariability of the soil properties (Oliver and Webster, 2014 ). However, TPS still outperformed the other interpolation methods, delivering the most stable results and confirming its robustness. This finding aligns with the results of Wen et al. ( 2022 ), who report that RBF methods are highly accurate when data exhibit smooth variation, a condition analogous to the low to moderate SD observed in this study. Nonetheless, when a high degree of SD is present, SVM emerges as a viable alternative. This performance is directly associated with the increased significance of correlations between the soil properties and the covariates (Supplementary material - Fig. S5, S7). The SVM exploits these associations through its kernel functions, projecting non-linear data into a higher-dimensional space to capture complex patterns by constructing an optimal hyperplane for prediction (Khaledian and Miller, 2020 ). Contrarily, with clustered data, deterministic methods presented similar results, emerging as the preferred choice and competing closely with OK (Fig. 5 ). However, when the spatial dependency is very high, OK outperforms all other interpolators. This superiority is expected, as kriging methods are fundamentally based on the assumption that data variation is both random and spatially dependent (Oliver and Webster, 2015 ). In this specific context, the presence of both high SD and a clustered pattern is not a limitation but rather a core prerequisite for geostatistical methods to operate effectively. However, optimizing semivariogram modeling for this data range (50–100 samples) often necessitates advanced techniques, such as REML, which can be highly computationally demanding with larger datasets and requires a more advanced technical knowledge base (Kerry and Oliver, 2007 ). This requirement explains why kriging methods, despite their statistical superiority in specific conditions, are not widely implemented within standard GIS routines for practical use. Under these same conditions (Clustered - Very High SD and 50–100 samples), ML algorithms, such as RFE and SVM, and hybrid approaches like RK, showed strong predictive potential (Fig. 5 ). These models benefit from the coexistence of the strong spatial structure and meaningful covariate relationships, a synergy that substantially enhances their ability to capture complex spatial patterns. As noted above, the observed increase in meaningful correlation between soil properties and covariates in this sampling range supports the enhanced performance of multivariate approaches. This finding is consistent with Ho et al. ( 2025 ), who demonstrated that hybrid models combining RFE and geostatistics, such as RK, can outperform both standard geostatistical and standalone ML algorithms. Therefore, RK is recommended only for datasets with more than 50 samples and with residuals that exhibit spatial dependence. These specific conditions are necessary to reliably model the residual semivariogram, thereby effectively enhancing the predictions of the base model (Hengl, 2009 ; Hengl et al., 2007 ). At this sample level (50–100 samples), the spatial appearance of the interpolated maps clearly reflects the theoretical differences among the methods (Fig. 6 ). Deterministic approaches revealed the nature of exact interpolators: IDW intensified the formation of discontinuous patterns (bull's-eyes) as more points were incorporated, while TPS consistently maintained continuous surfaces, although with more pronounced local variability. Meanwhile, OK showed an improvement in its spatial representations, yet it exhibited a characteristic smoothing effect when SD was below 60% (Oliver and Webster, 2014 ). Among the ML-Hybrid methods (SVM, RFE, and RK), all of them captured general variability, but they created overly smooth patterns concerning local variations. Nevertheless, the SVM specifically exhibited patterns clearly influenced by the shapes of the covariates (Radočaj et al., 2020, 2022 ). These observed spatial differences, rather than statistical performance, emphasize the critical distinction that higher predictive accuracy does not necessarily correspond to more realistic or agronomically manageable spatial patterns. Performance in scenarios with more than 100 samples Finally, in the high sampling scenario (datasets with more than 100 samples, corresponding to ≥ 1.0 samples ha − 1 ), all datasets exhibited clustered patterns (Fig. 7 ). When SD was < 60%, the TPS consistently emerged as the most accurate method. As the SD increased to high and very high degrees, IDW and RK presented the most accurate results, respectively. However, overall performance among the methods was comparable. This similarity aligns with Barrena-González et al. ( 2022 ), who reported that under specific data conditions, the predictive accuracy of various interpolators converges, allowing for potential interchangeability. This highlights that simple methods, such as deterministic interpolators, remain highly competitive when properly optimized. Their operational simplicity positions them as efficient and practical alternatives, allowing the selection of the interpolation method to be guided by the availability of computational resources and operational ease (Li and Heap, 2011 ). Despite the increased sample size (100 samples), the spatial appearance of interpolated maps exhibiting SD < 80% remained similar to the previous scenario (50–100 samples). Specifically, deterministic methods continued to show a greater representation of local variability, OK still produced characteristically smoother surfaces, and the multivariate methods (SVM, RFE, and RK) generated smooth patterns influenced by specific covariate distributions. In contrast, when the data was clustered and SD was very high (≥ 80%), all interpolation methods proved competent at capturing the main spatial variability (Fig. 8 ), as demonstrated by the case of clay at the São José farm, where all methods clearly captured the spatial trend (highest values in the north and lowest in the west). Subtle visual differences, reflecting each algorithm's theoretical basis, were still apparent: deterministic algorithms provided greater local detail, whereas OK yielded a noticeably smoother map, thereby revealing the regional trend by smoothing local variations. Meanwhile, the Machine Learning and hybrid approaches (SVM, RFE, and RK) generated highly detailed maps, displaying complex patterns clearly influenced by the covariates. This finding highlights that, under these favorable sampling conditions, the selection of the interpolator transitions into a strategic decision guided not only by predictive accuracy but also by the map's objective and practical considerations, such as operational simplicity, information availability, and technical knowledge (Amaral & Justina, 2019; Hengl, 2009 ; Li & Heap, 2011 ). Critically, all maps interpolated at this high-density level yielded similar results, which sharply contrast with the outcomes observed in scenarios with fewer samples. Practical implications: A Decision-Making Framework These differences in spatial continuity and local detail carry significant practical implications for site-specific management. For example, in the context of fertilizer application, interpolation outputs that fail to accurately represent true soil variability, either by over-smoothing or generating artificial discontinuities, can directly mislead fertilizer prescriptions (Karp et al., 2024 ; Pusch et al., 2023 ). Overly smoothed maps, characteristic of methods like OK in low SD scenarios, can mask critical field variability, leading to uniform applications where site-specific management is necessary, thus directly contradicting PA principles. Conversely, maps displaying abrupt and unrealistic discontinuities, often observed with optimized IDW, generate spatial variations that are difficult to implement operationally, resulting in field errors (over- or under-application). Both extremes compromise the agronomic interpretation and overall quality of the final prescription maps, ultimately reducing the efficiency and sustainability of fertilizer use. The results of this study decisively confirm that no universal interpolation method is applicable to all datasets. The optimal selection of the interpolator is dependent on the key interplay between the sample size and the intrinsic spatial structure of the data. To address the current lack of objective criteria guiding interpolation method selection in practice, these critical findings were translated and integrated into a comprehensive decision framework (Fig. 9 ). This framework systematically guides the selection of the most suitable interpolator by directly linking observed data characteristics (sample size and spatial structure) to the optimal method. In doing so, it effectively fills a critical gap between empirical performance assessment and prescriptive practical applications in digital soil mapping. This proposed framework operates as a hierarchical decision tree, guiding the user through a sequence of logical nodes to determine the optimal interpolator. The starting point is the determination of the total sample size ( n ), or the number of observations. While the sampling density defines the spatial coverage and resolution, the total number of observations is the primary factor that governs the fundamental feasibility of the interpolation methods. This distinction between sample size and density is particularly relevant in agricultural fields. Even with high sampling densities, a small mapped area may result in a low total number of observations ( n ), which critically limits the applicability of specific methods, regardless of the apparent density. Therefore, this initial stratification (into < 50, 50–100, ≥ 100 samples) is fundamental, as it establishes the primary methodological condition for method selection. The framework thus ensures that subsequent recommendations are grounded in the methodological feasibility of each interpolation approach. Once the sample size ( n ) is defined, the next node for consideration is the spatial structure of the data, with the assessment process adapted to the constraints of each data category. For small datasets (n < 50), the assessment of the spatial structure is simplified, relying solely on the MI to distinguish between random and clustered patterns. This simplification is necessary because the low number of samples makes it infeasible to model a reliable semivariogram, and, consequently, a trustworthy SD measure cannot be obtained. For larger datasets (n ≥ 50 samples), the framework employs a two-stage approach. First, the MI is used to determine the spatial pattern (i.e., whether it is random or clustered). Then, the second stage quantifies the degree of SD, classifying the data from very low to very high. This dual evaluation is crucial as it ensures the accurate identification of both the presence and the degree of the spatial structure in the data. Each path in the framework concludes with a primary interpolator recommendation and, where applicable, viable alternatives (defined as methods performing within a 5% accuracy threshold). This threshold acknowledges that minor differences in accuracy metrics rarely translate to practically significant effects at the field level, thus allowing the final selection to be justified by the practical goals of the application. This approach effectively demonstrates the framework's adaptability: it guides users toward deterministic methods under data-limited conditions, while pointing to more sophisticated approaches, such as RFE and RK, in conditions with sufficient samples and strong spatial structure. The outcome is an evidence-guided method selection process that replaces arbitrary decisions with a systematic approach, thereby maximizing the reliability and utility of soil maps in precision agriculture. While the multivariate framework offers the potential for high-accuracy predictions, its practical applications are conditional. Success relies on the availability of high-quality covariates, the strength and stability of their relationships with the target variable, and the technical expertise required for advanced modeling. Several studies emphasize that despite the potential of covariates to enhance the predictive performance of some methods, their performance remains strongly conditioned by the stability of these relationships (Pusch et al., 2021 , 2023 ). Moreover, the limited interpretability of some sophisticated methods, such as machine learning, can restrict their operational use (Khaledian & Miller, 2020 ). Recognizing that these conditions are not always met in practice, a simplified univariate version of the framework was developed as a more accessible alternative (Fig. 10 ). This alternative provides a streamlined decision-making path for contexts where data limitations or operational constraints restrict the use of multivariate algorithms. Consequently, the univariate framework serves as a practical solution for quick evaluations of operational scenarios where simplicity and reproducibility are prioritized over methodological complexity. Figure 11 presents the external validation of the univariate framework across independent areas, confirming its robustness and reproducibility. Each panel represents one of the sample ranges evaluated in this study (< 50, 50–100, and ≥ 100 samples) and displays the median LCCC for each spatial structure category. For datasets with less than 50 samples, only the local spatial structure (obtained with the Moran's Index) was evaluated. This simplification is methodologically necessary because the limited number of observations makes it unreliable to model the semivariogram, thereby precluding the quantification of the SD degree. Under the most limited conditions (i.e., Random pattern and < 50 samples), the TPS remained highly competitive, thereby reinforcing its role as a reliable and adaptable interpolator when sampling constraints prevail. As the number of samples increased to the 50–100 range, the local and global spatial structure became a more important factor in method selection. Even in this moderate range, when random patterns are present, the TPS remains the preferred option. When the spatial data are clustered, the methods demonstrate improved performance as the SD increases, and the differences in method accuracy become more balanced. Finally, with a large number of samples (≥ 100), the IDW method consistently outperformed both OK and TPS, reflecting its ability to effectively exploit spatial structure information when dense data are available. These results decisively demonstrate that the framework’s decision logic remains valid beyond the specific datasets used for its development, confirming its applicability as a generalizable and evidence-based tool for selecting interpolation methods in digital soil mapping. Figure 11 External validation of univariate interpolators: median LCCC by sample size and spatial structure. IDW: Inverse distance weighting; TPS: Thin plate spline; OK: Ordinary kriging; LCCC: Lin’s concordance correlation . R: Random; C: Clustered; RVL: Spatially random (Random) by Moran’s I and with very low (Very Low) spatial dependence by semivariogram modeling; RL: Random- Low; RVH: Random -Very High; CL: Clustered - Low; CM: Clustered - Moderate; CH: Clustered - High; CVH: Clustered - Very High. *Only the spatial structure scenarios with available data are displayed. Conclusions Data characteristics, notably sample size (n) and spatial structure, fundamentally influence both the performance and subsequent selection of interpolation methods. Sampling density emerges as a key practical factor linking these two characteristics, as it simultaneously governs the observed spatial structure and the total number of observations captured. However, a critical limitation exists in Precision Agriculture, particularly within small fields, where even high sampling densities may not provide enough observations to support reliable spatial interpolation. The interaction between sample size and spatial structure, therefore, defines the operational thresholds for reliable mapping. This interaction is the crucial finding, indicating that the reliability of each method depends not merely on sample size (n) or spatial structure individually, but on their combined effect. Each interpolation method has limitations specific to the underlying data characteristics. Geostatistical approaches are critically restricted by sample size (n) and spatial structure. These methods yield suitable results only in scenarios that meet defined conditions: more than 50 samples coupled with a clear spatial structure. This clear structure must be evident both locally indicated by a clustered pattern in the Moran's Index and globally with a SD exceeding 60% as quantified by the semivariogram analysis. In contrast, deterministic methods offer greater flexibility, achieving more stable predictions irrespective of the spatial structure and sample availability. Among these, the Thin Plate Spline (TPS) emerges as a highly versatile alternative, providing a practical balance between prediction accuracy and simplicity. Machine learning methods, however, face limitations that go beyond data characteristics (sample size and spatial structure). They are also restricted by the availability of covariates and the strength of their relationship with the target variable. Additionally, the implementation of these methods requires specialized technical expertise, further limiting their operational accessibility. Ultimately, the results of this study confirm that there is no universal interpolator applicable to all datasets, thereby establishing that the interplay between sample size and data spatial structure must serve as the primary guide for method selection. A decision framework was successfully developed to guide the selection of the most suitable interpolator for accurate soil mapping. Rather than prescribing a single method, the framework effectively translates data characteristics into clear, actionable recommendations, providing users with an accessible tool that demonstrably improves the reliability of soil maps in Precision Agriculture. Declarations Funding This work was supported by the São Paulo Research Foundation (FAPESP). The authors acknowledge the FAPESP financial support provided through grants No. 2024/01557-3 (L.D.B.), No. 2024/13229-0 (J.V.F.P.), and the project grant No. 2022/03160-8 (L.R.A.) and the National Council for Scientific and Technological Development (CNPq - productivity fellowship n° 306867/2022-2 to L.R.A) Author Contribution L.D.B. and L.R.A. conceived the study. L.D.B., A.L.G.O., and L.R.A. designed the methodology. L.D.B. curated the data and performed the formal analysis. L.D.B. wrote the original draft. All authors contributed to the review and editing of the manuscript. L.R.A. supervised the work and acquired the funding. Acknowledgement The authors gratefully acknowledge all members of the Interdisciplinary Group for Technology in Precision Agriculture (GITAP) for their essential contribution in data collection across the six study sites. Special thanks are extended to the producers of the Aliada, São José, Campina, Santa Fé, Ipiranga, and Boa Vista farms for their generous permission to access their properties and for providing crucial logistical support during field operations. Data Availability The datasets that support the findings of this study are deposited in the UNICAMP Research Data Repository (Repositório de Dados de Pesquisa da Unicamp) under the DOI https://doi.org/10.25824/redu/DEBGPBThe data were generated as part of L.D.B.’s master’s research and will be publicly released after the defense process. Until that time, they are available from the corresponding author upon reasonable request via email. References AbdelRahman, M. A. E., Farg, E., Saleh, A. M., Sayed, M., Abutaleb, K., Arafat, S. M., & Elsharkawy, M. M. (2022). Mapping of soils and land-related environmental attributes in modern agriculture systems using geomatics. Sustainable Water Resources Management , 8 (4), 116. https://doi.org/10.1007/s40899-022-00704-2 Aldungarova, A., Utepov, Y., Mukhamejanova, A., Tulebekova, A., Nazarova, A., Tleubayeva, A., Makasheva, I., Zhakapbayeva, G., Makhiyev, B., & Mkilima, T. (2025). Advancing Intermediate Soil Properties (ISP) Interpolation for Enhanced Geotechnical Survey Accuracy. A Review Engineering Reports , 7 (8), e70328. https://doi.org/10.1002/ENG2.70328 ;CTYPE:STRING:JOURNAL. Amaral, L. R., do, & Justina, D. D. (2019). della. Spatial dependence degree and sampling neighborhood influence on interpolation process for fertilizer prescription maps. Engenharia Agrícola , 39 (spe), 85–95. https://doi.org/10.1590/1809-4430-eng.agric.v39nep85-95/2019 Anselin, L. (1995). Local Indicators of Spatial Association—LISA. Geographical Analysis , 27 (2), 93–115. https://doi.org/10.1111/j.1538-4632.1995.tb00338.x Barrena-González, J., Contador, L., J. F., & Pulido Fernández, M. (2022). Mapping Soil Properties at a Regional Scale: Assessing Deterministic vs. Geostatistical Interpolation Methods at Different Soil Depths. Sustainability (Switzerland) , 14 (16). https://doi.org/10.3390/su141610049 Betzek, N. M., de Souza, E. G., Bazzi, C. L., Schenatto, K., Gavioli, A., & Magalhães, P. S. G. (2019). Computational routines for the automatic selection of the best parameters used by interpolation methods to create thematic maps. Computers and Electronics in Agriculture , 157 (December 2018), 49–62. https://doi.org/10.1016/j.compag.2018.12.004 Bier, V. A., & de Souza, E. G. (2017). Interpolation selection index for delineation of thematic maps. Computers and Electronics in Agriculture , 136 , 202–209. https://doi.org/10.1016/j.compag.2017.03.008 Bischl, B., Lang, M., Kotthoff, L., Schiffner, J., Richter, J., Studerus, E., Casalicchio, G., & Jones, Z. M. (2016). Mlr: Machine learning in R. Journal of Machine Learning Research , 17 , 1–5. Breiman, L. (2001). Random Forests. Machine Learning , 45 , 5–32. https://doi.org/10.1023/A:1010933404324 Chen, C., Bei, Y., Li, Y., & Zhou, W. (2022). Effect of interpolation methods on quantifying terrain surface roughness under different data densities. Geomorphology , 417 (September), 108448. https://doi.org/10.1016/j.geomorph.2022.108448 Cherubin, M. R., Damian, J. M., Tavares, T. R., Trevisan, R. G., Colaço, A. F., Eitelwein, M. T., Martello, M., Inamasu, R. Y., de Pias, O. H. C., & Molin, J. P. (2022). Precision Agriculture in Brazil: The Trajectory of 25 Years of Scientific Research. Agriculture (Switzerland) , 12 (11), 1–29. https://doi.org/10.3390/agriculture12111882 Corwin, D. L., & Scudiero, E. (2020). Field-scale apparent soil electrical conductivity. Soil Science Society of America Journal , 84 (5), 1405–1441. https://doi.org/10.1002/saj2.20153 Danoedoro, P., & Zukhrufiyati, A. (2015). Integrating spectral indices and geostatistics based onlandsat-8 imagery for surface clay content mapping in GunungKidul area, Yogyakarta, Indonesia. Asian Conference on Remote Sensing. https://www.researchgate.net/publication/302580476 De Caires, S. A., Martin, C. S., Atwell, M. A., Kaya, F., Wuddivira, G. A., & Wuddivira, M. N. (2025). Advancing soil mapping and management using geostatistics and integrated machine learning and remote sensing techniques: a synoptic review. Discover Soil , 2 (1). https://doi.org/10.1007/s44378-025-00082-z Demattê, J. A. M., Fongaro, C. T., Rizzo, R., & Safanelli, J. L. (2018). Geospatial Soil Sensing System (GEOS3): A powerful data mining procedure to retrieve soil spectral reflectance from satellite images. Remote Sensing of Environment , 212 , 161–175. https://doi.org/10.1016/j.rse.2018.04.047 Fontenelli, J. V., Adamchuk, V. I., Ferreira, M. M. C., Amaral, L. R., Guimarães, C. C. B., Demattê, J. A. M., & Magalhães, P. S. G. (2021). Evaluating the synergy of three soil spectrometers for improving the prediction and mapping of soil properties in a high anthropic management area: A case of study from Southeast Brazil. Geoderma , 402. https://doi.org/10.1016/j.geoderma.2021.115347 Gebbers, R., & Adamchuk, V. I. (2010). Precision Agriculture and Food Security. Science , 327 (5967), 828–831. https://doi.org/10.1126/science.1183899 Goovaerts, P., & Kerry, R. (2010). Using Ancillary Data to Improve Prediction of Soil and Crop Attributes in Precision Agriculture. In M. A. Oliver (Ed.), Geostatistical Applications for Precision Agriculture (pp. 167–194). Springer Netherlands. https://doi.org/10.1007/978-90-481-9133-8_7 Gräler, B., Pebesma, E., & Heuvelink, G. (2016). Spatio-temporal interpolation using gstat. R Journal , 8 (1), 204–218. https://doi.org/10.32614/RJ-2016-014 Hengl, T. (2009). A practical guide to geostatistical mapping of environmental variables . Office for Official Publications of the European Communities. Hengl, T., Heuvelink, G. B. M., & Rossiter, D. G. (2007). About regression-kriging: From equations to case studies. Computers and Geosciences , 33 (10), 1301–1315. https://doi.org/10.1016/j.cageo.2007.05.001 Ho, V. H., Morita, H., Ho, T. H., Bachofer, F., & Nguyen, T. T. (2025). Comparison of geostatistics, machine learning algorithms, and their hybrid approaches for modeling soil organic carbon density in tropical forests. Journal of Soils and Sediments , 25 (5), 1554–1577. https://doi.org/10.1007/s11368-025-04027-5 Huber, P. J., & Ronchetti, E. M. (2009). Robust Statistics . Wiley. https://doi.org/10.1002/9780470434697 Karp, F. H. S., Adamchuk, V., Dutilleul, P., & Melnitchouck, A. (2024). Comparative study of interpolation methods for low-density sampling. Precision Agriculture , 25 (6), 2776–2800. https://doi.org/10.1007/s11119-024-10141-0 Kerry, R., & Oliver, M. A. (2007). Comparing sampling needs for variograms of soil properties computed by the method of moments and residual maximum likelihood. Geoderma , 140 (4), 383–396. https://doi.org/10.1016/j.geoderma.2007.04.019 Khaledian, Y., & Miller, B. A. (2020). Selecting appropriate machine learning methods for digital soil mapping. Applied Mathematical Modelling , 81 , 401–418. https://doi.org/10.1016/j.apm.2019.12.016 Konopatzki, M. R. S., Souza, E. G., Nóbrega, L. H. P., Uribe-Opazo, M. A., & Suszek, G. (2012). Spatial variability of yield and other parameters associated with pear trees. Engenharia Agricola , 32 (2), 381–392. https://doi.org/10.1590/S0100-69162012000200018 Kravchenko, A. N. (2003). Influence of Spatial Structure on Accuracy of Interpolation Methods. Soil Science Society of America Journal , 67 (5), 1564–1571. https://doi.org/10.2136/sssaj2003.1564 Kuhn, M. (2008). Building predictive models in R using the caret package. Journal of Statistical Software , 28 (5), 1–26. https://doi.org/10.18637/jss.v028.i05 Li, J., & Heap, A. D. (2011). A review of comparative studies of spatial interpolation methods in environmental sciences: Performance and impact factors. Ecological Informatics , 6 (3–4), 228–241. https://doi.org/10.1016/j.ecoinf.2010.12.003 Long, J., Liu, Y., Xing, S., Qiu, L., Huang, Q., Zhou, B., Shen, J., & Zhang, L. (2018). Effects of sampling density on interpolation accuracy for farmland soil organic matter concentration in a large region of complex topography. Ecological Indicators , 93 (May), 562–571. https://doi.org/10.1016/j.ecolind.2018.05.044 Magalhães, P. S. G., & Amaral, L. R. (2022). do. Monitoring integrated crop-livestock systems through remote sensing and precision agriculture for more sustainable production: towards low carbon agriculture (V3 ed.). Repositório de Dados de Pesquisa da Unicamp. https://doi.org/doi/10.25824/redu/CMAH9X Melo, D. D., & Amaral, L. R. (2024). do. Replication data for: Hierarchical stratification for spatial sampling and digital mapping of soil attributes (V1 ed.). Repositório de Dados de Pesquisa da Unicamp. https://doi.org/doi/10.25824/redu/8QITE4 Oliver, M. A., & Webster, R. (2014). A tutorial guide to geostatistics: Computing and modelling variograms and kriging. Catena , 113 , 56–69. https://doi.org/10.1016/j.catena.2013.09.006 Oliver, M. A., & Webster, R. (2015). Basic Steps in Geostatistics: The Variogram and Kriging. Springer International Publishing. https://doi.org/10.1007/978-3-319-15865-5 Panday, D., Ojha, R. B., Chalise, D., Das, S., & Twanabasu, B. (2019). Spatial variability of soil properties under different land use in the Dang district of Nepal. Cogent Food & Agriculture , 5 (1), 1600460. https://doi.org/10.1080/23311932.2019.1600460 Pereira, G. W., Valente, D. S. M., de Queiroz, D. M., Santos, N. T., & Fernandes-Filho, E. I. (2022). Soil mapping for precision agriculture using support vector machines combined with inverse distance weighting. Precision Agriculture , 23 (4), 1189–1204. https://doi.org/10.1007/s11119-022-09880-9 Piikki, K., Wetterlind, J., Söderström, M., & Stenberg, B. (2021). Perspectives on validation in digital soil mapping of continuous attributes—A review. Soil Use and Management , 37 (1), 7–21. https://doi.org/10.1111/sum.12694 Pusch, M., Oliveira, A. L. G., Fontenelli, J. V., & Amaral, L. R. D. (2021). Soil Properties Mapping Using Proximal and Remote Sensing As Covariate. Engenharia Agricola , 41 (6), 634–642. https://doi.org/10.1590/1809-4430-ENG.AGRIC.V41N6P634-642/2021 Pusch, M., Samuel-Rosa, A., Oliveira, A. L. G., Magalhães, P. S. G., & do Amaral, L. R. (2022). Improving soil property maps for precision agriculture in the presence of outliers using covariates. Precision Agriculture , 23 (5), 1575–1603. https://doi.org/10.1007/s11119-022-09898-z Pusch, M., Samuel-Rosa, A., Sergio Graziano Magalhães, P., & Rios, L. (2023). Covariates in sample planning optimization for digital soil fertility mapping in agricultural areas. Geoderma, 429 (October 2022). https://doi.org/10.1016/j.geoderma.2022.116252 Qu, L., Lu, H., Tian, Z., Schoorl, J. M., Huang, B., Liang, Y., Qiu, D., & Liang, Y. (2024). Spatial prediction of soil sand content at various sampling density based on geostatistical and machine learning algorithms in plain areas. Catena, 234 (October 2023), 107572. https://doi.org/10.1016/j.catena.2023.107572 Radočaj, D., Jug, I., Vukadinović, V., Jurišić, M., & Gašparović, M. (2021). The effect of soil sampling density and spatial autocorrelation on interpolation accuracy of chemical soil properties in arable cropland. Agronomy , 11 (12). https://doi.org/10.3390/agronomy11122430 Radočaj, D., Jurišić, M., & Gašparović, M. (2022). The Role of Remote Sensing Data and Methods in a Modern Approach to Fertilization in Precision Agriculture. Remote Sensing , 14 (3). https://doi.org/10.3390/rs14030778 van Raij, B., de Andrade, J. C., Cantarella, H., & Quaggio, J. A. (2001). Análise Química para Avaliação da Fertilidade de Solos Tropicais (B. van Raij, J. C. de Andrade, H. Cantarella, & J. A. Quaggio (Eds.)). Instituto Agronômico de Campinas. https://www.iac.sp.gov.br/publicacoes/arquivos/Raij_et_al_2001_Metod_Anal_IAC.pdf Rodrigues, A. C., Villa, P. M., Ferreira-Júnior, W. G., Schaefer, C. E. R. G., & Neri, A. V. (2021). Effects of topographic variability and forest attributes on fine-scale soil fertility in late-secondary succession of Atlantic Forest. Ecological Processes , 10 (1). https://doi.org/10.1186/s13717-021-00333-1 Rodrigues, M. S., Alves, D. C., De Souza, V. C., De Melo, A. C., Lima, A. M. N., & Cunha, J. C. (2018). Spatial interpolation techniques for site-specific irrigation management in a mango orchard. Comunicata Scientiae , 9 (1), 93–101. https://doi.org/10.14295/CS.v9i1.2645 Sekulić, A., Kilibarda, M., Heuvelink, G. B. M., Nikolić, M., & Bajat, B. (2020). Random forest spatial interpolation. Remote Sensing , 12 (10), 1–29. https://doi.org/10.3390/rs12101687 Sobjak, R., de Souza, E. G., Bazzi, C. L., Opazo, M. A. U., Mercante, E., & Junior, A., J (2023). Process improvement of selecting the best interpolator and its parameters to create thematic maps. Precision Agriculture , 24 (4), 1461–1496. https://doi.org/10.1007/s11119-023-09998-4 Sun, W., Zhao, Y., Huang, B., Shi, X., Landon Darilek, J., Yang, J., Wang, Z., & Zhang, B. (2012). Effect of sampling density on regional soil organic carbon estimation for cultivated soils. Journal of Plant Nutrition and Soil Science , 175 (5), 671–680. https://doi.org/10.1002/jpln.201100181 Webster, R., & Oliver, M. A. (2007). Geostatistics for Environmental Scientists. (2nd ed.). Wiley. https://doi.org/10.1002/9780470517277 Wen, L., Zhang, L., Bai, J., Wang, Y., Wei, Z., & Liu, H. (2022). Optimizing spatial interpolation method and sampling number for predicting cadmium distribution in the largest shallow lake of North China. Chemosphere , 309. https://doi.org/10.1016/j.chemosphere.2022.136789 White, G., Sun, D., & Speckman, P. (2019). Direct Sampling of Bayesian Thin-Plate Splines for Spatial Smoothing. http://arxiv.org/abs/1906.05575 Wong, D. W. S. (2017). Interpolation: Inverse-Distance Weighting. International Encyclopedia of Geography , 1–7. https://doi.org/10.1002/9781118786352.wbieg0066 Yang, L., Li, X., Shi, J., Shen, F., Qi, F., Gao, B., Chen, Z., Zhu, A. X., & Zhou, C. (2020). Evaluation of conditioned Latin hypercube sampling for soil mapping based on a machine learning method. Geoderma, 369( March), 114337. https://doi.org/10.1016/j.geoderma.2020.114337 Zanotta, D. C., Ferreira, M. P., & Zortea, M. (2019). Processamento de imagens de satélite . Oficina de Textos. Additional Declarations No competing interests reported. Supplementary Files suplementarV1.docx Cite Share Download PDF Status: Published Journal Publication published 27 Dec, 2025 Read the published version in Precision Agriculture → Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-8020454","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":544083948,"identity":"3880260d-c707-4a3d-ba28-68dcca9852da","order_by":0,"name":"Laura Delgado Bejarano","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAABRElEQVRIie2QMUvDQBTHXzhIlmuzXqnYr3AlUJCKfpWEQLIELQQkk6ZLugS6xskPUXAO3NAl2k2Ey1ARb8pQEKSggzFtrZwVHAXzg+M93uPHn3cANTV/kOaqpKCEVXOoV8Usn7pZSaiS4rTCXytr2Er5wndFy3qLJeT7aHTz8DwIZkTXpneLObCTpsaeJgPIT2UFe8ZlDMJQYtdoJxknrdjzExOYr2KnxxMQB1KuCp4BGJg1DB1oNyJ+TlPPLm9hVkSgx8sVlVP0wlDegF0MxwK9NqJbcjwrNor2slMhnoHKuakkjlqmpIQSl60VvDuFCB/tUdEdJkLt48wm5L5AYFLXirB3xhMqZEXX7YlSBHmnO3YQx8ER0cfuo7IM+tbVaHrNB0EuKyvKaXf7L5hWkw8Q7BYqOttWm3+26Gehpqam5v/wDoPqbZ1e6l1jAAAAAElFTkSuQmCC","orcid":"","institution":"State University of Campinas","correspondingAuthor":true,"prefix":"","firstName":"Laura","middleName":"Delgado","lastName":"Bejarano","suffix":""},{"id":544083949,"identity":"51dd2e6c-801a-43d4-9558-f248621d77ea","order_by":1,"name":"Agda Loureiro Gonçalves Oliveira","email":"","orcid":"","institution":"State University of Campinas","correspondingAuthor":false,"prefix":"","firstName":"Agda","middleName":"Loureiro Gonçalves","lastName":"Oliveira","suffix":""},{"id":544083950,"identity":"40ffe1a0-690a-4572-bd9a-e905e098fe9f","order_by":2,"name":"João Vitor Fiolo Pozzuto","email":"","orcid":"","institution":"State University of Campinas","correspondingAuthor":false,"prefix":"","firstName":"João","middleName":"Vitor Fiolo","lastName":"Pozzuto","suffix":""},{"id":544083951,"identity":"b02bfd96-3309-4676-a26e-a0f615bf1b77","order_by":3,"name":"Dario Castañeda Sánchez","email":"","orcid":"","institution":"National University of Colombia","correspondingAuthor":false,"prefix":"","firstName":"Dario","middleName":"Castañeda","lastName":"Sánchez","suffix":""},{"id":544083952,"identity":"894d33ee-7c8a-4cda-8adc-ee097de2efa1","order_by":4,"name":"Lucas Rios do Amaral","email":"","orcid":"","institution":"State University of Campinas","correspondingAuthor":false,"prefix":"","firstName":"Lucas","middleName":"Rios do","lastName":"Amaral","suffix":""}],"badges":[],"createdAt":"2025-11-03 14:53:24","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-8020454/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-8020454/v1","draftVersion":[],"editorialEvents":[{"content":"https://doi.org/10.1007/s11119-025-10311-8","type":"published","date":"2025-12-27T15:57:38+00:00"}],"editorialNote":"","failedWorkflow":false,"files":[{"id":96408581,"identity":"fdda0c05-33ca-4621-9b8e-0b83f6c27b20","added_by":"auto","created_at":"2025-11-20 17:55:27","extension":"docx","order_by":0,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":4885230,"visible":true,"origin":"","legend":"","description":"","filename":"PAgRedacaoV6.docx","url":"https://assets-eu.researchsquare.com/files/rs-8020454/v1/78d1ed22f378e62747078caa.docx"},{"id":96408577,"identity":"8de19030-9067-4bc6-8ece-e9e5071904bd","added_by":"auto","created_at":"2025-11-20 17:55:27","extension":"json","order_by":1,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":7915,"visible":true,"origin":"","legend":"","description":"","filename":"ee0e384331e14b5fad11198243eba5c5.json","url":"https://assets-eu.researchsquare.com/files/rs-8020454/v1/cd1cf14a6551200ff1df0615.json"},{"id":96454899,"identity":"5f68213d-8266-4e65-9e9a-c07fd62ade95","added_by":"auto","created_at":"2025-11-21 10:03:16","extension":"docx","order_by":2,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":2256283,"visible":true,"origin":"","legend":"","description":"","filename":"suplementarV1.docx","url":"https://assets-eu.researchsquare.com/files/rs-8020454/v1/9f134a3739ae494c504996dd.docx"},{"id":96454955,"identity":"e4fa75e2-2f0d-464d-be45-f278948c6556","added_by":"auto","created_at":"2025-11-21 10:03:22","extension":"xml","order_by":3,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":173586,"visible":true,"origin":"","legend":"","description":"","filename":"ee0e384331e14b5fad11198243eba5c51enriched.xml","url":"https://assets-eu.researchsquare.com/files/rs-8020454/v1/96e36c5b08881c6a3ee64993.xml"},{"id":96408580,"identity":"041ec380-86ef-4b78-97fe-fbfcb367075f","added_by":"auto","created_at":"2025-11-20 17:55:27","extension":"png","order_by":4,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":1019,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage1.png","url":"https://assets-eu.researchsquare.com/files/rs-8020454/v1/bd2e43ad77d3425deaef4ee3.png"},{"id":96453911,"identity":"00de844a-b263-4d03-9f98-15cd4e0ded4e","added_by":"auto","created_at":"2025-11-21 10:02:03","extension":"png","order_by":12,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":1019,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage1.png","url":"https://assets-eu.researchsquare.com/files/rs-8020454/v1/4eb347d0a432936126929c4f.png"},{"id":96453718,"identity":"8e89efb9-7027-4674-9000-727eba341351","added_by":"auto","created_at":"2025-11-21 10:01:22","extension":"png","order_by":13,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":1019,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage1.png","url":"https://assets-eu.researchsquare.com/files/rs-8020454/v1/7a6546253dd384263cec25f7.png"},{"id":96453780,"identity":"f5148d8e-b30d-484f-8256-c7729da7f807","added_by":"auto","created_at":"2025-11-21 10:01:44","extension":"png","order_by":14,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":1019,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage1.png","url":"https://assets-eu.researchsquare.com/files/rs-8020454/v1/7ec0c8512adf7c24e90d97c2.png"},{"id":96453868,"identity":"6b7dc98b-303a-40c3-bee6-be761353180e","added_by":"auto","created_at":"2025-11-21 10:01:58","extension":"png","order_by":15,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":1019,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage1.png","url":"https://assets-eu.researchsquare.com/files/rs-8020454/v1/d3b4a5dde675635bed12007c.png"},{"id":96454605,"identity":"a6334609-d1d0-4dbf-a556-9c93ef598828","added_by":"auto","created_at":"2025-11-21 10:02:57","extension":"png","order_by":20,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":514,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage1.png","url":"https://assets-eu.researchsquare.com/files/rs-8020454/v1/cf0dd0619fcc7c3a155acaf2.png"},{"id":96408592,"identity":"d7d422ee-6a75-41a0-8bc9-0b45c52bfe38","added_by":"auto","created_at":"2025-11-20 17:55:27","extension":"png","order_by":21,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":32226,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage10.png","url":"https://assets-eu.researchsquare.com/files/rs-8020454/v1/2577193c436cd4d560331537.png"},{"id":96454903,"identity":"4604b56a-94e4-4023-b22b-f1e1ff2761f8","added_by":"auto","created_at":"2025-11-21 10:03:17","extension":"png","order_by":22,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":129824,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage11.png","url":"https://assets-eu.researchsquare.com/files/rs-8020454/v1/d1c0ff4effaf3b24ef097d8b.png"},{"id":96454535,"identity":"be605898-ee67-4ffc-9e2b-8a2d694c7472","added_by":"auto","created_at":"2025-11-21 10:02:51","extension":"png","order_by":23,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":35239,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage12.png","url":"https://assets-eu.researchsquare.com/files/rs-8020454/v1/ace4e9a9cf3cb756616f90c7.png"},{"id":96453994,"identity":"4c32fbf7-4977-471e-ba5c-aa8d72c94bf6","added_by":"auto","created_at":"2025-11-21 10:02:14","extension":"png","order_by":24,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":104511,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage13.png","url":"https://assets-eu.researchsquare.com/files/rs-8020454/v1/f5c3035a05090c642576244b.png"},{"id":96408607,"identity":"16627443-8bf3-400e-b4f4-74eff60ad667","added_by":"auto","created_at":"2025-11-20 17:55:27","extension":"png","order_by":25,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":103611,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage14.png","url":"https://assets-eu.researchsquare.com/files/rs-8020454/v1/29c8941cf3386c16ae3a17c4.png"},{"id":96454086,"identity":"f0a347a1-3269-41bc-a7b1-2d8a7729581a","added_by":"auto","created_at":"2025-11-21 10:02:19","extension":"png","order_by":26,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":87240,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage15.png","url":"https://assets-eu.researchsquare.com/files/rs-8020454/v1/96c881d5e7d470bad68f1f19.png"},{"id":96454176,"identity":"a43d733b-4733-4205-b2ab-4df652af7726","added_by":"auto","created_at":"2025-11-21 10:02:26","extension":"png","order_by":27,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":27258,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage16.png","url":"https://assets-eu.researchsquare.com/files/rs-8020454/v1/11ae91e0ae7b0b32c370eddc.png"},{"id":96408598,"identity":"e04f9aaf-e56f-458b-a40b-cc67cbe03a8b","added_by":"auto","created_at":"2025-11-20 17:55:27","extension":"png","order_by":28,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":514,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage1.png","url":"https://assets-eu.researchsquare.com/files/rs-8020454/v1/e7e1cfc785486e3e855cdf7c.png"},{"id":96408603,"identity":"d316a154-f451-4fce-9f44-45f2560402c7","added_by":"auto","created_at":"2025-11-20 17:55:27","extension":"png","order_by":29,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":514,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage1.png","url":"https://assets-eu.researchsquare.com/files/rs-8020454/v1/e02a5e82ccb126ed6ffc8c01.png"},{"id":96454573,"identity":"f03c46fd-d6e2-4721-922e-6512e7a3e678","added_by":"auto","created_at":"2025-11-21 10:02:54","extension":"png","order_by":30,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":514,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage1.png","url":"https://assets-eu.researchsquare.com/files/rs-8020454/v1/a258a19781c6e847cc7df71f.png"},{"id":96408605,"identity":"d8fbe9e5-2fdc-4760-a5fe-c548cfa0666f","added_by":"auto","created_at":"2025-11-20 17:55:27","extension":"png","order_by":31,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":514,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage1.png","url":"https://assets-eu.researchsquare.com/files/rs-8020454/v1/b34c26043080176287352eb2.png"},{"id":96408610,"identity":"fee634c2-c23c-4f85-9264-c34a74360fd9","added_by":"auto","created_at":"2025-11-20 17:55:27","extension":"png","order_by":32,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":48284,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage6.png","url":"https://assets-eu.researchsquare.com/files/rs-8020454/v1/8f18b608c97b4bebbd07b08f.png"},{"id":96408608,"identity":"2ef7f480-73c4-4a05-9f57-b6fd6e6a0f0b","added_by":"auto","created_at":"2025-11-20 17:55:27","extension":"png","order_by":33,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":55627,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage7.png","url":"https://assets-eu.researchsquare.com/files/rs-8020454/v1/caae8c324fcc2a3916009e15.png"},{"id":96408613,"identity":"b0e2995d-37d5-4289-96a1-51a84cd20d84","added_by":"auto","created_at":"2025-11-20 17:55:27","extension":"png","order_by":34,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":32392,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage8.png","url":"https://assets-eu.researchsquare.com/files/rs-8020454/v1/023acfdb1a2a2675d1cc162d.png"},{"id":96408611,"identity":"3c3f8e64-c780-45b9-8a83-1dcb7eacc0dd","added_by":"auto","created_at":"2025-11-20 17:55:27","extension":"png","order_by":35,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":99607,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage9.png","url":"https://assets-eu.researchsquare.com/files/rs-8020454/v1/299910f39cc6bd98ea074624.png"},{"id":96453349,"identity":"dd7077fd-91a4-4326-860a-1b591ff55cb7","added_by":"auto","created_at":"2025-11-21 09:59:24","extension":"xml","order_by":36,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":169398,"visible":true,"origin":"","legend":"","description":"","filename":"ee0e384331e14b5fad11198243eba5c51structuring.xml","url":"https://assets-eu.researchsquare.com/files/rs-8020454/v1/6bcb2bba2e4cfdfbb8ed6e08.xml"},{"id":96408614,"identity":"f67a64f9-bc21-4feb-bf68-0a5d23a973a8","added_by":"auto","created_at":"2025-11-20 17:55:28","extension":"html","order_by":37,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":182577,"visible":true,"origin":"","legend":"","description":"","filename":"earlyproof.html","url":"https://assets-eu.researchsquare.com/files/rs-8020454/v1/f819b05623b0918d0ae872fc.html"},{"id":96408578,"identity":"d5ab13be-ade0-465c-81be-6bea29faba89","added_by":"auto","created_at":"2025-11-20 17:55:27","extension":"jpeg","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":206259,"visible":true,"origin":"","legend":"\u003cp\u003eLocation of the study and external validation sites in São Paulo state, Brazil\u003c/p\u003e","description":"","filename":"floatimage6.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-8020454/v1/3dbf7d229e79d48159282c63.jpeg"},{"id":96408579,"identity":"0a52e40e-472a-4e0f-9d32-6def6c6afd49","added_by":"auto","created_at":"2025-11-20 17:55:27","extension":"jpeg","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":241692,"visible":true,"origin":"","legend":"\u003cp\u003eDistribution of soil sampling points across the study sites under different sampling densities (1, 0.5, and 0.2 samples ha⁻¹) and validation samples\u003c/p\u003e","description":"","filename":"floatimage7.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-8020454/v1/4e5c5356c1e1b9f27c06ba7b.jpeg"},{"id":96453814,"identity":"ba56786a-febc-4148-a1df-1945ab3ca0b9","added_by":"auto","created_at":"2025-11-21 10:01:46","extension":"jpeg","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":123391,"visible":true,"origin":"","legend":"\u003cp\u003eMedian Lin’s concordance correlation coefficient (LCCC) for the interpolation methods in scenarios with less than 50 samples. The labels for interpolators correspond to: IDW: Inverse distance weighting; TPS: Thin plate spline; OK: Ordinary kriging; RK: Regression kriging; RFE: Spatial Random Forest; SVM: Support Vector Machine. The spatial structure labels combine the local spatial pattern derived from the Moran Index (MI) (Random or Clustered) and the global degree of spatial dependence (SD) obtained from the semivariogram modeling (Very Low to Very High).*Only the spatial structure scenarios with available data are displayed.\u003c/p\u003e","description":"","filename":"floatimage8.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-8020454/v1/6232b09ce051c68f491ea346.jpeg"},{"id":96453762,"identity":"27d0ea1d-abfb-4547-94f2-ec5df6ae87a8","added_by":"auto","created_at":"2025-11-21 10:01:33","extension":"jpeg","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":236138,"visible":true,"origin":"","legend":"\u003cp\u003eRepresentative interpolated maps for the low sampling scenario (\u0026lt; 50 samples), illustrating the visual characteristics of the methods under different spatial structure of the data. IDW: Inverse distance weighting; TPS: Thin plate spline; OK: Ordinary kriging; RK: Regression kriging; RFE: Spatial Random Forest; SVM: Support Vector Machine\u003c/p\u003e","description":"","filename":"floatimage9.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-8020454/v1/dc34608fe5304c37acf9c80d.jpeg"},{"id":96453862,"identity":"e668c88e-8119-48b5-a8da-a76f543b3ee3","added_by":"auto","created_at":"2025-11-21 10:01:57","extension":"jpeg","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":127451,"visible":true,"origin":"","legend":"\u003cp\u003eMedian Lin’s concordance correlation coefficient (LCCC) for the interpolation methods in intermediate sampling scenarios (50 to 100 samples). The labels for interpolators correspond to: IDW: Inverse distance weighting; TPS: Thin plate spline; OK: Ordinary kriging; RK: Regression kriging; RFE: Spatial Random Forest; SVM: Support Vector Machine. The labels for spatial structure combine: the local spatial pattern from Moran Index (Random or Clustered) and the global degree of SD from the semivariogram modeling (Very low to Very High) *Only the spatial structure scenarios with available data are displayed.\u003c/p\u003e","description":"","filename":"floatimage10.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-8020454/v1/59b08f5a1fbf6846bd73c644.jpeg"},{"id":96453874,"identity":"112974f1-a7dd-451f-abc8-52fde42df67d","added_by":"auto","created_at":"2025-11-21 10:01:59","extension":"jpeg","order_by":6,"title":"Figure 6","display":"","copyAsset":false,"role":"figure","size":277997,"visible":true,"origin":"","legend":"\u003cp\u003eRepresentatives interpolated maps for the intermediate sampling scenario (50- 100 samples) illustrating the visual characteristics of the methods under different spatial structures of the data. IDW: Inverse distance weighting; TPS: Thin plate spline; OK: Ordinary kriging; RK: Regression kriging; RFE: Spatial Random Forest; SVM: Support vector machine\u003c/p\u003e","description":"","filename":"floatimage11.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-8020454/v1/1d412f9c2b9aa5d4d0b74f87.jpeg"},{"id":96408584,"identity":"93fae79c-204c-4f3a-977f-b096957995e4","added_by":"auto","created_at":"2025-11-20 17:55:27","extension":"jpeg","order_by":7,"title":"Figure 7","display":"","copyAsset":false,"role":"figure","size":131704,"visible":true,"origin":"","legend":"\u003cp\u003eMedian Lin’s concordance correlation coefficient (LCCC) for the interpolation methods in scenarios with more than 100 samples. The labels for interpolators correspond to: IDW: Inverse distance weighting; TPS: Thin plate spline; OK: Ordinary kriging; RK: Regression kriging; RFE: Spatial Random Forest; SVM: Support Vector Machine. The labels for spatial structure combine: the local spatial pattern from Moran Index (Random or Clustered) and the global degree of SD from the semivariogram modeling (Very low to Very High) *Only the spatial structure scenarios with available data are displayed.\u003c/p\u003e","description":"","filename":"floatimage12.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-8020454/v1/5994592e57de17a2abdd2ca9.jpeg"},{"id":96408590,"identity":"7dd5567e-0b0f-44b6-9b5a-23933aa4050c","added_by":"auto","created_at":"2025-11-20 17:55:27","extension":"jpeg","order_by":8,"title":"Figure 8","display":"","copyAsset":false,"role":"figure","size":239549,"visible":true,"origin":"","legend":"\u003cp\u003eRepresentative interpolated maps for the high sampling scenario (≥ 100 samples), illustrating the visual results when the data exhibits a clustered pattern and very high spatial dependence (SD ≥ 80%). IDW: Inverse distance weighting; TPS: Thin plate spline; OK: Ordinary kriging; RK: Regression kriging; RFE: Spatial Random Forest; SVM: Support vector machine\u003c/p\u003e","description":"","filename":"floatimage13.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-8020454/v1/a49928d3af2911e7ea88c6a1.jpeg"},{"id":96408588,"identity":"d446be26-d5e2-4618-afbe-7ccea24e7225","added_by":"auto","created_at":"2025-11-20 17:55:27","extension":"jpeg","order_by":9,"title":"Figure 9","display":"","copyAsset":false,"role":"figure","size":184111,"visible":true,"origin":"","legend":"\u003cp\u003eDecision framework based on data characteristics. IDW: Inverse distance weighting; TPS: Thin plate spline; OK: Ordinary kriging; RK: Regression kriging; RFE: Spatial Random Forest; SVM: Support vector machine; LCCC: Lin’s concordance correlation; LOOCV: Leave one out cross-validation.\u003c/p\u003e","description":"","filename":"floatimage14.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-8020454/v1/dcea5b5c270e32d36aea93f7.jpeg"},{"id":96454852,"identity":"d3bd9be6-9cc1-452c-823e-8d4110559f6a","added_by":"auto","created_at":"2025-11-21 10:03:11","extension":"jpeg","order_by":10,"title":"Figure 10","display":"","copyAsset":false,"role":"figure","size":156191,"visible":true,"origin":"","legend":"\u003cp\u003eDecision univariate framework based on data characteristics. IDW: Inverse distance weighting; TPS: Thin plate spline; OK: Ordinary kriging; LCCC: Lin’s concordance correlation; LOOCV: Leave one out cross-validation.\u003c/p\u003e","description":"","filename":"floatimage15.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-8020454/v1/b1d2d7c6cf63f1ebd3481359.jpeg"},{"id":96408593,"identity":"93f4b4ce-9215-47fd-8b14-5a9e2714e084","added_by":"auto","created_at":"2025-11-20 17:55:27","extension":"jpeg","order_by":11,"title":"Figure 11","display":"","copyAsset":false,"role":"figure","size":105935,"visible":true,"origin":"","legend":"\u003cp\u003eExternal validation of univariate interpolators: median LCCC by sample size and spatial structure. IDW: Inverse distance weighting; TPS: Thin plate spline; OK: Ordinary kriging; LCCC: Lin’s concordance correlation. R: Random; C: Clustered; RVL: Spatially random (Random) by Moran’s I and with very low (Very Low) spatial dependence by semivariogram modeling; RL: Random- Low; RVH: Random -Very High; CL: Clustered - Low; CM: Clustered - Moderate; CH: Clustered - High; CVH: Clustered - Very High. *Only the spatial structure scenarios with available data are displayed.\u003c/p\u003e","description":"","filename":"floatimage16.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-8020454/v1/28e7af2f4a63266bce94034e.jpeg"},{"id":99172281,"identity":"1ec64cc6-8a35-43e5-ab36-13847359a54f","added_by":"auto","created_at":"2025-12-29 16:07:07","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":2971176,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-8020454/v1/80e42fcf-b37c-4efb-9fb5-eccecba7865a.pdf"},{"id":96454890,"identity":"183c9490-4726-4787-acfd-2923cd60b4af","added_by":"auto","created_at":"2025-11-21 10:03:16","extension":"docx","order_by":0,"title":"","display":"","copyAsset":false,"role":"supplement","size":2256283,"visible":true,"origin":"","legend":"","description":"","filename":"suplementarV1.docx","url":"https://assets-eu.researchsquare.com/files/rs-8020454/v1/4a21f17f8d993b767b3e0fa8.docx"}],"financialInterests":"No competing interests reported.","formattedTitle":"Performance of interpolation methods in digital soil mapping: the influence of data characteristics","fulltext":[{"header":"Highlights","content":"\u003cp\u003e\u0026bull; Interpolator performance is critically dependent on sample size and spatial structure.\u003c/p\u003e\u003cp\u003e\u0026bull; Thin Plate Spline (TPS) provides the most stable predictions, especially in small datasets (n\u0026thinsp;\u0026lt;\u0026thinsp;50)\u003c/p\u003e\u003cp\u003e\u0026bull; No single interpolation method is universally effective for all datasets.\u003c/p\u003e\u003cp\u003e\u0026bull; A decision framework translates sample size and spatial structure into method choice\u003c/p\u003e\u003cp\u003eImpact\u003c/p\u003e\u003cp\u003eThis work provides a practical, data-driven framework that guides users in selecting the optimal interpolation method by analyzing sample size and spatial structure. By replacing the common practice of arbitrary method selection, these findings directly improve the reliability and accuracy of digital soil maps, which are fundamental tools for effective site-specific management\u003c/p\u003e"},{"header":"Introduction","content":"\u003cp\u003eSoil is a fundamental component of agricultural systems, directly influencing crop productivity, environmental sustainability, and key management decisions, such as fertilization and irrigation. This critical resource is highly heterogeneous, however, due to natural formation processes and intensive anthropogenic practices (e.g., tillage and soil conservation) (Oliver \u0026amp; Webster, \u003cspan citationid=\"CR34\" class=\"CitationRef\"\u003e2014\u003c/span\u003e; Panday et al., \u003cspan citationid=\"CR36\" class=\"CitationRef\"\u003e2019\u003c/span\u003e; Pusch et al., \u003cspan citationid=\"CR40\" class=\"CitationRef\"\u003e2022\u003c/span\u003e). This spatial and temporal heterogeneity renders the agricultural soil a complex and anisotropic domain. Consequently, recognizing and accurately quantifying this variability is crucial for guiding effective site-specific management decisions within Precision Agriculture (Cherubin et al., \u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e2022\u003c/span\u003e; De Caires et al., \u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e2025\u003c/span\u003e; Gebbers \u0026amp; Adamchuk, \u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e2010\u003c/span\u003e). The quantification of soil variability relies on the generation of maps that depict the spatial distribution of soil properties (Piikki et al., \u003cspan citationid=\"CR38\" class=\"CitationRef\"\u003e2021\u003c/span\u003e). These maps are generated by soil sampling to acquire a discrete set of data at known locations. However, the sampling process is costly and labor intensive (Amaral \u0026amp; Justina, 2019; Radočaj et al., \u003cspan citationid=\"CR44\" class=\"CitationRef\"\u003e2022\u003c/span\u003e). Consequently, sampling strategies often prioritize economic constraints over the actual requirements to capture soil variability, which frequently compromises the representativeness of the collected data (Aldungarova et al., \u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2025\u003c/span\u003e; Amaral \u0026amp; Justina, 2019; Cherubin et al., \u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e2022\u003c/span\u003e; Karp et al., \u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e2024\u003c/span\u003e).\u003c/p\u003e\u003cp\u003eThe need to transform discrete point data into continuous maps necessitates the use of interpolation. In practice, however, interpolation methods are often selected arbitrarily, which compromises their effectiveness in producing reliable maps that accurately represent soil variability. This gap persists because no single method is universally effective across all dataset conditions (Barrena-Gonz\u0026aacute;lez et al., \u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e2022\u003c/span\u003e; Karp et al., \u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e2024\u003c/span\u003e). Consequently, data characteristics can serve as a systematic guide for selecting the most appropriate interpolator, leveraging the specific requirements and limitations of each method (De Caires et al., \u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e2025\u003c/span\u003e).\u003c/p\u003e\u003cp\u003eVarious interpolation methods predict property values at unknown locations in precision agriculture and are classified based on the underlying analytical approach (Hengl, \u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e2009\u003c/span\u003e). Geostatistical methods are the most widely adopted in soil science, with Ordinary Kriging (OK) being the most notable example. This method models spatial dependence through the semivariogram, providing not only estimates but also an associated measure of uncertainty. Nevertheless, its performance is constrained by the quality of the fitted semivariogram, which directly depends on the sample size and the underlying spatial structure (Oliver \u0026amp; Webster, \u003cspan citationid=\"CR34\" class=\"CitationRef\"\u003e2014\u003c/span\u003e). Deterministic methods constitute another class of interpolators, often representing simpler approaches. These approaches are based on mathematical functions that use the distance between observations, e.g., Inverse Distance Weighting (IDW), Radial Basis Functions (RBF), and Thiessen polygons (Hengl, \u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e2009\u003c/span\u003e; Wong, \u003cspan citationid=\"CR54\" class=\"CitationRef\"\u003e2017\u003c/span\u003e). More sophisticated interpolation methods, primarily Machine Learning (ML) algorithms, have recently emerged as powerful alternatives for digital soil mapping. These approaches leverage auxiliary covariates and effectively capture complex, non-linear spatial patterns. Prominent examples include Random Forest (RF), Support Vector Machine (SVM), and Convolutional Neural Networks (CNN) (Khaledian \u0026amp; Miller, \u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e2020\u003c/span\u003e; Pereira et al., \u003cspan citationid=\"CR37\" class=\"CitationRef\"\u003e2022\u003c/span\u003e; Radočaj et al., \u003cspan citationid=\"CR44\" class=\"CitationRef\"\u003e2022\u003c/span\u003e).\u003c/p\u003e\u003cp\u003eInterpolator performance is influenced by a range of factors, including sampling configuration, the method\u0026rsquo;s mathematical structure, and the spatial structure of the data, among others (Li \u0026amp; Heap, \u003cspan citationid=\"CR30\" class=\"CitationRef\"\u003e2011\u003c/span\u003e; Pusch et al., \u003cspan citationid=\"CR41\" class=\"CitationRef\"\u003e2023\u003c/span\u003e; Radočaj et al., \u003cspan citationid=\"CR43\" class=\"CitationRef\"\u003e2021\u003c/span\u003e). Previous studies primarily focused on the effect of sampling density on mapping accuracy, highlighting that higher densities usually allow for better map quality (Long et al., \u003cspan citationid=\"CR31\" class=\"CitationRef\"\u003e2018\u003c/span\u003e; Sun et al., \u003cspan citationid=\"CR50\" class=\"CitationRef\"\u003e2012\u003c/span\u003e; Yang et al., \u003cspan citationid=\"CR55\" class=\"CitationRef\"\u003e2020\u003c/span\u003e). However, research comparing different interpolators illustrates that performance varies according to the method and data characteristics (AbdelRahman et al., 2020; Duan et al., 2020). Addressing this need, some studies have successfully combined both perspectives, testing different interpolators under varying sampling densities However, these works often rely on large datasets or regional scales (Chen et al., \u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e2022\u003c/span\u003e; Qu et al., \u003cspan citationid=\"CR42\" class=\"CitationRef\"\u003e2024\u003c/span\u003e), a constraint that fails to reflect the limited number of samples typically available in operational agricultural fields. Crucially, in practice, even common strategies (e.g., collecting one sample per hectare) do not provide an adequate number of observations in smaller fields for the successful implementation of some interpolation models (Kerry \u0026amp; Oliver, \u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e2007\u003c/span\u003e). Such methodological shortcomings highlight that adequate variability mapping requires adapting the interpolation method to the specific data conditions. This systematic adaptation ensures the generated maps effectively support site-specific management decisions.\u003c/p\u003e\u003cp\u003eIn contrast to the prevalent large-scale focus, recent contributions have examined interpolation specifically under low-sampling agricultural conditions. For example, Karp et al. (\u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e2024\u003c/span\u003e) introduced variants of conventional methods, such as OK and IDW, and developed parameter optimization strategies adapted to low-sampling scenarios. Sobjak et al. (\u003cspan citationid=\"CR49\" class=\"CitationRef\"\u003e2023\u003c/span\u003e) exemplifies recent efforts by automating OK and IDW as a workflow based on dataset spatial dependence. Although valuable, their focus remains narrowly fixed on optimizing a limited set of conventional interpolators, overlooking other robust alternatives. More importantly, these authors do not translate their findings into a general decision framework. As a result, practitioners still lack a simple, systematic guide for navigating the initial and most critical choice of interpolation method. This leads to unreliable maps and misguided management actions, such as under or over fertilizer application, compromising the efficiency of precision agriculture (Karp et al., \u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e2024\u003c/span\u003e; Pusch et al., \u003cspan citationid=\"CR40\" class=\"CitationRef\"\u003e2022\u003c/span\u003e). Therefore, this study addresses the need for more precise guidance in digital soil mapping by examining the influence of dataset characteristics on interpolation performance. Consequently, the objective of this study is to evaluate whether data characteristics, such as sample size and spatial structure, affect the selection and performance of interpolation methods used in digital soil mapping and, based on this, propose and validate a framework for choosing the interpolation method.\u003c/p\u003e"},{"header":"Material and methods","content":"\u003cdiv id=\"Sec3\" class=\"Section2\"\u003e\u003ch2\u003eStudy Area\u003c/h2\u003e\u003cp\u003eThe study utilized six commercial farms located in different municipalities of S\u0026atilde;o Paulo, Brazil (Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e), designating three sites for framework development and the remaining three for its external validation. The first group comprised the following sites (Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e): the Aliada farm, a 70 ha sugarcane field (20\u0026deg;51'26.1\" S; 47\u0026deg;57'4.21\" W); the S\u0026atilde;o Jos\u0026eacute; farm, a 107 ha grain production area (22\u0026deg;41'55.16\" S; 47\u0026deg;10'34.15\" W); and the Campina farm, a 200 ha area used for both cattle and soybean production (21\u0026deg;38'13\" S; 51\u0026deg;54'55\" W). The validation group comprised three sugarcane farms: Boa Vista (21\u0026deg;50'23\" S, 47\u0026deg;45'27\" W); Ipiranga (21\u0026deg;49'07\" S, 47\u0026deg;44'10\" W); and Santa F\u0026eacute; (21\u0026deg;38'10.32\" S; 48\u0026deg;39'1.90\" W). The K\u0026ouml;ppen classification system (Lima et al., 2023) designated the following climates for the study sites: Cwa (humid subtropical) for Aliada, Ipiranga, Boa Vista, and Santa Fe farms; Cfa (sub-humid tropical) for S\u0026atilde;o Jos\u0026eacute; farm; and Aw (tropical with a dry winter) for Campina farm. Oxisol is the predominant soil type across all areas, classified according to the United States Department of Agriculture (USDA, 2022). Oxisols are characteristic of tropical and subtropical regions and typically exhibit low natural fertility, thus necessitating fertilizer application for agricultural production (Eswaran \u0026amp; Reich, 2005).\u003c/p\u003e\u003cp\u003e\u003c/p\u003e\u003c/div\u003e\n\u003ch3\u003eSoil sampling and analysis\u003c/h3\u003e\n\u003cp\u003eSoil sampling was primarily conducted using an instrumented quadricycle equipped with an auger; however, a manual hand auger was employed for the Campina and Santa F\u0026eacute; farms. At each central point, composite samples were collected by obtaining multiple subsamples within a 3\u0026ndash;5 m radius. The specific sampling characteristics for each area are summarized in Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e. All samples were processed in the laboratory for the analysis of agronomically relevant variables, including available potassium (K, mmol\u003csub\u003ec\u003c/sub\u003e dm\u003csup\u003e\u0026minus;\u0026thinsp;3\u003c/sup\u003e), available phosphorus (P, mg dm\u003csup\u003e\u0026minus;\u0026thinsp;3\u003c/sup\u003e), pH, base saturation (V%), cation exchange capacity (CEC, mmolc dm\u003csup\u003e\u0026minus;\u0026thinsp;3\u003c/sup\u003e), and clay content (g kg\u003csup\u003e\u0026minus;\u0026thinsp;1\u003c/sup\u003e), following the procedures described by Raij et al. (\u003cspan citationid=\"CR45\" class=\"CitationRef\"\u003e2001\u003c/span\u003e). These variables were chosen due to their tendency to form distinct spatial patterns influenced by management practices (Amaral \u0026amp; Justina, 2019).\u003c/p\u003e\u003cp\u003e\u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab1\" border=\"1\"\u003e\u003ccaption language=\"En\"\u003e\u003cdiv class=\"CaptionNumber\"\u003eTable 1\u003c/div\u003e\u003cdiv class=\"CaptionContent\"\u003e\u003cp\u003eSoil sampling characteristics for the framework development and external validation sites\u003c/p\u003e\u003c/div\u003e\u003c/caption\u003e\u003ccolgroup cols=\"8\"\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\"\u0026minus;\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c7\" colnum=\"7\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c8\" colnum=\"8\"\u003e\u003c/div\u003e\u003cthead\u003e\u003ctr\u003e\u003cth align=\"left\" colname=\"c1\"\u003e\u003cp\u003eSite\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c2\"\u003e\u003cp\u003eCrop\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c3\"\u003e\u003cp\u003edepth (m)\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c4\"\u003e\u003cp\u003eSub-samples\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c5\"\u003e\u003cp\u003eDensity (samples ha\u003csup\u003e\u0026minus;\u0026thinsp;1\u003c/sup\u003e)\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c6\"\u003e\u003cp\u003eRadius\u003csup\u003e1\u003c/sup\u003e (m)\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c7\"\u003e\u003cp\u003eYear\u003csup\u003e2\u003c/sup\u003e\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c8\"\u003e\u003cp\u003eRef\u003csup\u003e3\u003c/sup\u003e\u003c/p\u003e\u003c/th\u003e\u003c/tr\u003e\u003c/thead\u003e\u003ctbody\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eAliada\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eSugarcane\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\"\u0026minus;\" colname=\"c3\"\u003e\u003cp\u003e0-0.25\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e6\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\u003cp\u003e6\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e\u003cp\u003e5\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e\u003cp\u003e2023\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c8\" morerows=\"1\" rowspan=\"2\"\u003e\u003cp\u003e(Melo \u0026amp; Amaral, \u003cspan citationid=\"CR33\" class=\"CitationRef\"\u003e2024\u003c/span\u003e)\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eS\u0026atilde;o Jos\u0026eacute;\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eSoybean\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\"\u0026minus;\" colname=\"c3\"\u003e\u003cp\u003e0-0.20\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e6\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\u003cp\u003e6\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e\u003cp\u003e5\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e\u003cp\u003e2023\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eCampina\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eIntegrated livestock-soybean system\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\"\u0026minus;\" colname=\"c3\"\u003e\u003cp\u003e0-0.20\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e4\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\u003cp\u003e2\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e\u003cp\u003e4\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e\u003cp\u003e2018\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c8\"\u003e\u003cp\u003e(Magalh\u0026atilde;es \u0026amp; Amaral, \u003cspan citationid=\"CR32\" class=\"CitationRef\"\u003e2022\u003c/span\u003e)\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eIpiranga\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eSugarcane\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\"\u0026minus;\" colname=\"c3\"\u003e\u003cp\u003e0-0.25\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e8\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\u003cp\u003e1\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e\u003cp\u003e3\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e\u003cp\u003e2017\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c8\"\u003e\u003cp\u003e(Pusch et al., \u003cspan citationid=\"CR40\" class=\"CitationRef\"\u003e2022\u003c/span\u003e)\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eSanta F\u0026eacute;\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eSugarcane\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\"\u0026minus;\" colname=\"c3\"\u003e\u003cp\u003e0-0.25\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e3\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\u003cp\u003e2\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e\u003cp\u003e3\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e\u003cp\u003e2017\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c8\"\u003e\u003cp\u003e(Fontenelli et al., \u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e2021\u003c/span\u003e)\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eBoa Vista\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eSugarcane\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\"\u0026minus;\" colname=\"c3\"\u003e\u003cp\u003e0-0.25\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e6\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\u003cp\u003e1\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e\u003cp\u003e5\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e\u003cp\u003e2025\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c8\"\u003e\u0026nbsp;\u003c/td\u003e\u003c/tr\u003e\u003c/tbody\u003e\u003c/colgroup\u003e\u003ctfoot\u003e\u003ctr\u003e\u003ctd colspan=\"8\"\u003e\u003csup\u003e1\u003c/sup\u003e Sampling radius used to obtain each composite sample\u003c/td\u003e\u003c/tr\u003e\u003c/tfoot\u003e\u003c/table\u003e\u003c/div\u003e\u003c/p\u003e\u003cp\u003e\u003csup\u003e2\u003c/sup\u003e Sampling year for each site.\u003c/p\u003e\u003cp\u003e\u003csup\u003e3\u003c/sup\u003e References indicate the original data sources or previous studies\u003c/p\u003e\u003cp\u003e\u003c/p\u003e\u003cp\u003eThe original datasets for the study areas were systematically divided into three sampling densities (1, 0.5, and 0.2 samples ha\u003csup\u003e\u0026minus;\u0026thinsp;\u003cb\u003e1\u003c/b\u003e\u003c/sup\u003e), which represent distinct scenarios of sample size and spatial dependence. These densities reflect common strategies adopted in Brazilian agriculture, although they may not fully capture the spatial variability of soil properties (Cherubin et al., \u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e2022\u003c/span\u003e). For each area, the non-selected samples were designated as an independent test subset (Fig.\u0026nbsp;2).\u003c/p\u003e\u003cp\u003e\u003cb\u003eFigure\u0026nbsp;2\u003c/b\u003e Distribution of soil sampling points across the study sites under different sampling densities (1, 0.5, and 0.2 samples ha⁻\u0026sup1;) and validation samples\u003c/p\u003e\n\u003ch3\u003eSpatial Structure Assessment\u003c/h3\u003e\n\u003cp\u003eThe study assessed the spatial structure of the data using a combination of spatial dependence (SD), quantified through semivariogram analysis, and the Moran index (MI). The SD index (Table\u0026nbsp;\u003cspan refid=\"Tab2\" class=\"InternalRef\"\u003e2\u003c/span\u003e) quantifies the ratio between the partial sill (C\u003csub\u003e1\u003c/sub\u003e), which represents the correlated semivariance indicating spatial continuity, and the sill (C\u003csub\u003eo\u003c/sub\u003e + C\u003csub\u003e1\u003c/sub\u003e), which estimates the total variance of the random process. These parameters were obtained by fitting semivariograms using geostatistical models (spherical, exponential, and Gaussian). For datasets with less than 100 samples, the semivariogram fitting utilized the residual maximum likelihood (REML) method, as recommended by Kerry and Oliver (\u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e2007\u003c/span\u003e). All geostatistical fitting was performed using the geoR package in R (Diggle and Ribeiro, 2007). Conversely, the study applied the method of moments (MoM) for larger datasets (Oliver \u0026amp; Webster, \u003cspan citationid=\"CR35\" class=\"CitationRef\"\u003e2015\u003c/span\u003e) utilizing the gstat package in R (Gr\u0026auml;ler et al., \u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e2016\u003c/span\u003e). Model selection was performed using leave-one-out cross-validation (LOOCV). The study prioritized models that yielded the highest Lin\u0026rsquo;s concordance coefficient (LCCC) and the lowest root mean square error (RMSE). The MI was calculated using the k-nearest neighbors (KNN) approach, considering eight neighbors per sample point (Anselin, \u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e1995\u003c/span\u003e). The study classified spatial patterns as clustered when the p-value was less than 0.05; otherwise, the pattern was considered random.\u003c/p\u003e\u003cp\u003e\u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab2\" border=\"1\"\u003e\u003ccaption language=\"En\"\u003e\u003cdiv class=\"CaptionNumber\"\u003eTable 2\u003c/div\u003e\u003cdiv class=\"CaptionContent\"\u003e\u003cp\u003eClassification of spatial dependence (Konopatzki et al., \u003cspan citationid=\"CR27\" class=\"CitationRef\"\u003e2012\u003c/span\u003e)\u003c/p\u003e\u003c/div\u003e\u003c/caption\u003e\u003ccolgroup cols=\"2\"\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e\u003cthead\u003e\u003ctr\u003e\u003cth align=\"left\" colname=\"c1\"\u003e\u003cp\u003eSpatial dependence (SD)\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c2\"\u003e\u003cp\u003eClass\u003c/p\u003e\u003c/th\u003e\u003c/tr\u003e\u003c/thead\u003e\u003ctbody\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003e\u0026lt;\u0026thinsp;20\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eVery Low (VL)\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003e[20\u0026ndash;40)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eLow (L)\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003e[40\u0026ndash;60)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eModerate (M)\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003e[60\u0026ndash;80)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eHigh (H)\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003e\u0026ge;\u0026thinsp;80\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eVery High (VH)\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003c/tbody\u003e\u003c/colgroup\u003e\u003c/table\u003e\u003c/div\u003e\u003c/p\u003e\u003cp\u003eNotably, the MI and SD indices capture spatial patterns at distinct scales: the MI reflects local spatial autocorrelation (based on eight nearest neighbors), while the SD represents the global spatial autocorrelation derived from the semivariogram. Given their operation at these distinct scales, the joint use of both indicators provides a more comprehensive assessment of spatial structure within each dataset.\u003c/p\u003e\n\u003ch3\u003eInterpolation methods\u003c/h3\u003e\n\u003cp\u003eThe study compared six interpolation methods across the scenarios, categorized as follows: two deterministic approaches: Inverse Distance Weighting (IDW) and Thin plate spline (TPS); one geostatistical method: Ordinary Kriging (OK); two machine learning algorithms: Support Vector Machine (SVM) and Spatial Random Forest (RFE); and one hybrid method combining machine learning and geostatistics: Regression Kriging (RK).\u003c/p\u003e\u003cp\u003e\u003cem\u003eInverse Distance Weighting\u003c/em\u003e \u003cb\u003e(\u003c/b\u003e\u003cem\u003eIDW\u003c/em\u003e\u003cb\u003e)\u003c/b\u003e is an exact interpolator that estimates unknown values by assigning weights to sample points in inverse proportion to distance. The influence of nearby points is modulated by a power parameter (p). In this study, the p parameter was optimized by testing values in a sequence from 0.5 to 6 (increments of 0.5) (Betzek et al., \u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e2019\u003c/span\u003e; Sobjak et al., \u003cspan citationid=\"CR49\" class=\"CitationRef\"\u003e2023\u003c/span\u003e), combined with neighbor counts ranging from 4 to 16. The optimal combination was selected using the lowest value of the Interpolation Selection Index (ISI) (Bier and de Souza, \u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e2017\u003c/span\u003e). The ISI is a composite metric, derived from leave-one-out cross-validation, which integrates prediction error and residual variability to assess the accuracy and stability of the interpolation results.\u003c/p\u003e\u003cp\u003e\u003cem\u003eThin Plate Spline (TPS)\u003c/em\u003e is a special case of radial basis functions (RBF) and an exact interpolator, ensuring the fitted surface passes through all observed data points. Its formulation minimizes a penalized sum of squares to balance fidelity to observations with surface smoothness, thereby generating continuous and smooth surfaces (Broomhead and Lowe, 1988; White et al., \u003cspan citationid=\"CR53\" class=\"CitationRef\"\u003e2019\u003c/span\u003e). Conceptually, this is equivalent to minimizing the bending energy of a thin metal sheet. The study optimized the smooth parameter ε through k-fold cross-validation (k\u0026thinsp;=\u0026thinsp;5), using the ISI criterion over a search space from 0.0001 to 0.5. Since ε\u0026thinsp;=\u0026thinsp;0.0001 consistently provided the best performance across all cases, this value was adopted as the final parameter. The TPS interpolation was implemented globally, utilizing all sampled points to fit the surface rather than a localized neighborhood approach.\u003c/p\u003e\u003cp\u003e\u003cem\u003eOrdinary Kriging (OK)\u003c/em\u003e is considered the best unbiased linear predictor, operating under the assumption of a constant but unknown mean within the local neighborhood. The method relies on modeling spatial autocorrelation using the semivariogram, making it particularly effective when data exhibits strong spatial dependence (Oliver and Webster, \u003cspan citationid=\"CR35\" class=\"CitationRef\"\u003e2015\u003c/span\u003e). The theoretical semivariogram model selected during the SD analysis was used for the OK interpolation.\u003c/p\u003e\u003cp\u003e\u003cem\u003eSupport Vector Machine (SVM)\u003c/em\u003e is an ML algorithm that utilizes covariates to enhance spatial predictions. It projects data into a hyperspace using a radial basis kernel to effectively capture nonlinear patterns (Khaledian and Miller, \u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e2020\u003c/span\u003e). The study tuned the hyperparameters C (error cost) and sigma (kernel parameter) via grid search and 10-fold cross-validation using the caret package in R (Kuhn, \u003cspan citationid=\"CR29\" class=\"CitationRef\"\u003e2008\u003c/span\u003e). The (C, sigma) pair that minimized the RMSE was subsequently selected.\u003c/p\u003e\u003cp\u003e\u003cem\u003eSpatial Random Forest (RFE)\u003c/em\u003e, based on the Random Forest (RF) algorithm, combines decision trees with bootstrap aggregation (bagging) to reduce variance and increase stability (Breiman, \u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e2001\u003c/span\u003e; Khaledian and Miller, \u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e2020\u003c/span\u003e). Spatial dependence was incorporated by including coordinates (x, y) as covariates (Hengl et al., 2018). The study tuned the hyperparameters (ntree: 500\u0026ndash;5000, mtry: 1\u0026ndash;7, and nodesize: 1\u0026ndash;50) through a grid search of 200 iterations, employing a 5-fold cross-validation. This optimization was performed using the mlr package in R (Bischl et al., \u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e2016\u003c/span\u003e), with the optimal parameter set selected as the combination that minimized the RMSE.\u003c/p\u003e\u003cp\u003e\u003cem\u003eRegression Kriging (RK)\u003c/em\u003e is a hybrid method that combines regression and geostatistics (Hengl et al., \u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e2007\u003c/span\u003e). In this study, the RFE algorithm modeled the regression component. Residuals were subsequently computed, modeled with a semivariogram, and interpolated using OK. The final RK prediction was obtained by adding the kriged residuals to the RFE predictions.\u003c/p\u003e\n\u003ch3\u003eCovariates\u003c/h3\u003e\n\u003cp\u003eFive covariates were used for the multivariate methods (SVM, RFE, RK): apparent electrical conductivity (ECa), elevation (DEM), and three spectral features derived from a synthetic soil image: the saturation (S) component of the RGB to HSI transform, the normalized difference clay index (NDCI), and the soil brightness index (SBI) (Supplementary Material, Fig. \u003cspan refid=\"MOESM1\" class=\"InternalRef\"\u003eS1\u003c/span\u003e).\u003c/p\u003e\u003cp\u003eApparent electrical conductivity (ECa) was measured at an effective depth of 0.37m using an EM38 sensor (Geonics, Ltd, Mississauga, Ontario, Canada). The sensor, towed by a quadricycle at approximately 20 km h\u003csup\u003e\u0026minus;\u0026thinsp;1\u003c/sup\u003e with 1 Hz frequency, generated a dense grid of measurements. For processing, the ECa data underwent filtering to remove outliers, maneuvering points, and boundary artifacts. The cleaned data was subsequently interpolated into a 10 m raster using Universal Kriging (UK).\u003c/p\u003e\u003cp\u003eThe UK method was chosen because the ECa data exhibited a spatial trend in the semivariogram analysis, allowing the UK to effectively capture this trend using coordinates (x, y). Furthermore, ECa integrates multiple soil physical and chemical properties and has proven effective in characterizing soil variability (Corwin and Scudiero, \u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e2020\u003c/span\u003e; Pusch et al., \u003cspan citationid=\"CR39\" class=\"CitationRef\"\u003e2021\u003c/span\u003e).\u003c/p\u003e\u003cp\u003eElevation (DEM) data was obtained from the ALOS PALSAR digital elevation model (30 m resolution) using the Google Earth Engine (GEE). The processing involved reprojecting each raster to the appropriate coordinate reference system (CRS, WGS 84 UTM 23S for all areas, except WGS 84 UTM 22S for the Campina farm). The rasters were subsequently clipped to field boundaries and resampled to 10 m using bilinear interpolation. As a key topographic feature, DEM inherently influences water redistribution and soil particle transport, effects that directly impact soil spatial variability (Pusch et al., \u003cspan citationid=\"CR40\" class=\"CitationRef\"\u003e2022\u003c/span\u003e; Rodrigues et al., \u003cspan citationid=\"CR46\" class=\"CitationRef\"\u003e2021\u003c/span\u003e).\u003c/p\u003e\u003cp\u003eThe Synthetic Soil Image (SYSI) was generated following an adapted GEOS workflow (Dematt\u0026ecirc; et al., \u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e2018\u003c/span\u003e) to overcome the limited bare-soil coverage typical of pasture and no-till management. The method composited Sentinel-2 time series data within the GEE, filtering for bare soil pixels (clouds\u0026thinsp;\u0026le;\u0026thinsp;10%, NDVI\u0026thinsp;\u0026lt;\u0026thinsp;0.25, and NBR 2\u0026thinsp;\u0026lt;\u0026thinsp;0.10). This 10 m composite served as the basis for three spatial covariates:\u003c/p\u003e\u003cp\u003e\u003col\u003e\u003cspan\u003e\u003cli\u003e\u003cp\u003eSaturation (S) component (from HSI transform): Captures soil color information with reduced spectral redundancy (Zanotta et al., \u003cspan citationid=\"CR56\" class=\"CitationRef\"\u003e2019\u003c/span\u003e).\u003c/p\u003e\u003c/li\u003e\u003c/span\u003e\u003cspan\u003e\u003cli\u003e\u003cp\u003eNormalized Difference Clay Index (NDCI) (Eq.\u0026nbsp;1): Sensitive to soil texture, and correlated with clay content, which enhances soil mapping (Danoedoro and Zukhrufiyati, \u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e2015\u003c/span\u003e).\u003c/p\u003e\u003c/li\u003e\u003c/span\u003e\u003cspan\u003e\u003cli\u003e\u003cp\u003eSoil Brightness Index (SBI) (Eq.\u0026nbsp;2): Represents soil surface reflectance and brightness, influenced by soil texture, mineral composition, and moisture content, providing predictive value for soil properties (AbdelRahman et al., \u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e2022\u003c/span\u003e).\u003c/p\u003e\u003c/li\u003e\u003c/span\u003e\u003c/ol\u003e\u003cdiv id=\"Equa\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equa\" name=\"EquationSource\"\u003e\n$$\\:NDCI=\\frac{SWIR2-SWIR1}{SWIR2+SWIR1}\\:\\left(1\\right)$$\u003c/div\u003e\u003c/div\u003e\u003cdiv id=\"Equb\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equb\" name=\"EquationSource\"\u003e\n$$\\:BI=\\frac{\\sqrt[2]{{R}^{2}+{G}^{2}\\:\\:}}{2}\\:\\left(2\\right)$$\u003c/div\u003e\u003c/div\u003e\u003c/p\u003e\u003cp\u003eTo evaluate the predictive associations supporting the multivariate interpolation methods, the study performed a Pearson correlation analysis between the soil properties and the covariates after standardizing all covariate layers (z-score). The resulting associations were classified according to the criteria presented in Table\u0026nbsp;\u003cspan refid=\"Tab3\" class=\"InternalRef\"\u003e3\u003c/span\u003e.\u003c/p\u003e\u003cp\u003e\u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab3\" border=\"1\"\u003e\u003ccaption language=\"En\"\u003e\u003cdiv class=\"CaptionNumber\"\u003eTable 3\u003c/div\u003e\u003cdiv class=\"CaptionContent\"\u003e\u003cp\u003eClassification of Pearson correlation strength (Evans, 1996)\u003c/p\u003e\u003c/div\u003e\u003c/caption\u003e\u003ccolgroup cols=\"2\"\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e\u003cthead\u003e\u003ctr\u003e\u003cth align=\"left\" colname=\"c1\"\u003e\u003cp\u003ePearson Correlation |r|\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c2\"\u003e\u003cp\u003eClass\u003c/p\u003e\u003c/th\u003e\u003c/tr\u003e\u003c/thead\u003e\u003ctbody\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003e\u0026lt;\u0026thinsp;0.20\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eVery Weak\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003e[0.20, 0.40)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eWeak\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003e[0.40, 0.60)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eModerate\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003e[0.60, 0.80)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eStrong\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003e\u0026ge;\u0026thinsp;0.80\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eVery Strong\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003c/tbody\u003e\u003c/colgroup\u003e\u003c/table\u003e\u003c/div\u003e\u003c/p\u003e\u003cdiv id=\"Sec8\" class=\"Section2\"\u003e\u003ch2\u003eValidation and Framework\u003c/h2\u003e\u003cp\u003eThe study evaluated the interpolation performance using the independent test datasets from the study areas (Fig.\u0026nbsp;2), with model accuracy assessed by the Lin\u0026rsquo;s concordance coefficient (LCCC), a metric that integrates both precision and accuracy into a single measure (Khaledian and Miller, \u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e2020\u003c/span\u003e).The validation results were stratified into scenarios defined by sample size (\u0026lt;\u0026thinsp;50, 50\u0026ndash;100, \u0026ge;\u0026thinsp;100), reflecting geostatistical limitations, and by the spatial structure of the data (IM-SD classification). For each scenario and interpolator, the median LCCC calculated across all six soil attributes analyzed (P, K, pH, CEC, V%, and clay content) served as the representative performance metric. The median was selected as a robust measure of central tendency to reduce sensitivity to outliers (Huber and Ronchetti, \u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e2009\u003c/span\u003e; Webster and Oliver, \u003cspan citationid=\"CR51\" class=\"CitationRef\"\u003e2007\u003c/span\u003e), mitigating the influence of extreme LCCC values that are expected when analyzing soil properties with inherently different levels of predictability.\u003c/p\u003e\u003cp\u003eTo synthesize these validation results into a practical guide, the study constructed a decision-making framework. The framework used the scenario characteristics (sample size and MI-SD classification) as inputs. The decision rule was performance-based: the interpolator yielding the highest median LCCC was selected as the primary recommendation. Furthermore, a 5% performance threshold was applied, identifying any method within this range of the top result as a suitable alternative, given that minor differences in performance metrics often lack practical significance. In parallel, a univariate framework was also developed using the same procedure. This approach, which limits the choice to simpler methods (TPS, IDW and OK), provides a more practical tool for users lacking the covariates or technical resources required for multivariate approaches.\u003c/p\u003e\u003c/div\u003e\n\u003ch3\u003eExternal validation\u003c/h3\u003e\n\u003cp\u003eExternal validation utilized three independent areas (Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e), where multiple scenarios were simulated by systematically subsetting the datasets (Supplementary material - Fig. S2). This procedure involved dividing the areas into smaller subregions and applying the same sampling densities used in the core study. These simulated scenarios were crucial for evaluating the consistency of univariate interpolation methods (TPS, IDW, and OK) across varying spatial structures and sample sizes. The validation was performed using the LOOCV approach, with performance quantified by LCCC.\u003c/p\u003e"},{"header":"Results and discussion","content":"\u003cp\u003eThe performance of the interpolation methods was fundamentally conditioned by data characteristics, particularly the interaction between sample size and spatial structure. Sampling density, therefore, emerges as a critical parameter in survey planning, governing both the total number of samples and the ability to capture spatial dependence. This is especially relevant in agricultural areas where small field sizes, even at high sampling densities, may fail to provide a sufficient number of samples for the proper application of specific methods (Kerry and Oliver, \u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e2007\u003c/span\u003e; Radočaj et al., \u003cspan citationid=\"CR43\" class=\"CitationRef\"\u003e2021\u003c/span\u003e). To clearly illustrate these relationships, the following sections present the results stratified by sample-size scenario, highlighting how the performance and suitability of each method changed across varying data conditions.\u003c/p\u003e\u003cdiv id=\"Sec11\" class=\"Section2\"\u003e\u003ch2\u003ePerformance in scenarios with less than 50 samples\u003c/h2\u003e\u003cp\u003eFor datasets with less than 50 samples (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e3\u003c/span\u003e), the median LCCC revealed severe limitations when the data exhibited random autocorrelation behavior (SD: Very Low to Very High). Conversely, for clustered patterns, all interpolation methods yielded similar performance (SD: High and Very High). In all scenarios, deterministic approaches, particularly TPS, emerged as the most reliable predictors regardless of the underlying spatial structure (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e3\u003c/span\u003e). The strong performance of TPS reflects its inherent ability to generate continuous and realistic surfaces, effectively minimizing interpolation artifacts while preserving the observed data values (Webster and Oliver, \u003cspan citationid=\"CR51\" class=\"CitationRef\"\u003e2007\u003c/span\u003e). In contrast, geostatistical models produced unstable predictions in this range. This instability is expected, as these methods typically require a minimum of 100 samples for semivariogram construction, or at least 50 samples when employing advanced techniques like REML (Kerry and Oliver, \u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e2007\u003c/span\u003e; Oliver and Webster, \u003cspan citationid=\"CR34\" class=\"CitationRef\"\u003e2014\u003c/span\u003e). This finding aligns with the literature consensus, which recommends deterministic methods for small datasets lacking robust spatial dependence where constructing a reliable semivariogram is not feasible (Kravchenko, \u003cspan citationid=\"CR28\" class=\"CitationRef\"\u003e2003\u003c/span\u003e; Radočaj et al., \u003cspan citationid=\"CR43\" class=\"CitationRef\"\u003e2021\u003c/span\u003e; Rodrigues et al., \u003cspan citationid=\"CR47\" class=\"CitationRef\"\u003e2018\u003c/span\u003e). Nevertheless, the apparent similarity in performance in clustered scenarios must be interpreted with caution, particularly for geostatistical methods (OK), as the semivariogram fitting process is highly sensitive to the analyst's technical expertise (Amaral and Justina, 2019; Oliver and Webster, \u003cspan citationid=\"CR35\" class=\"CitationRef\"\u003e2015\u003c/span\u003e).The limited number of lags generated under small sample sizes allows for the adjustment of multiple plausible models, often yielding diverse results. Consequently, the quality of geostatistical outputs in these scenarios fundamentally reflects the subjective modeling decisions rather than an inherently robust spatial prediction.\u003c/p\u003e\u003cp\u003e\u003c/p\u003e\u003cp\u003eThe visual evidence from the interpolated maps for datasets with fewer than 50 samples is consistent with the LCCC accuracy results, clearly revealing differences in spatial structure consistency among the methods (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e4\u003c/span\u003e). Among the deterministic methods, TPS remained stable across all spatial structure scenarios, producing continuous surfaces that more faithfully capture the local variability of the soil properties. This strong performance stems from its nature as an exact interpolator, which preserves the values of the sample points while generating plausible predictions between observed data (Karwariya et al., 2021; White et al., \u003cspan citationid=\"CR53\" class=\"CitationRef\"\u003e2019\u003c/span\u003e; Wong, \u003cspan citationid=\"CR54\" class=\"CitationRef\"\u003e2017\u003c/span\u003e). In contrast, IDW generated surfaces with inconsistent continuity, displaying artifacts such as bull's-eye effects and blocky patterns, or overly smooth predictions, depending on the power optimization applied. Therefore, TPS is the preferred method, as it provides a more reliable representation of spatial variability than IDW (Rodrigues et al., \u003cspan citationid=\"CR47\" class=\"CitationRef\"\u003e2018\u003c/span\u003e). This superiority is particularly important in practical applications, such as fertilizer recommendations, where smooth, realistic spatial transitions are essential to avoid abrupt dosage variations in the field.\u003c/p\u003e\u003cp\u003eConversely, Machine Learning (ML) models produced overly smooth surfaces (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e4\u003c/span\u003e) that failed to represent the variability in soil properties accurately. This deficiency is attributed to two factors: the insufficient number of samples for model training (Goovaerts and Kerry, \u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e2010\u003c/span\u003e; Pusch et al., \u003cspan citationid=\"CR40\" class=\"CitationRef\"\u003e2022\u003c/span\u003e; Sekulić et al., \u003cspan citationid=\"CR48\" class=\"CitationRef\"\u003e2020\u003c/span\u003e) and the weak correlation with the covariates (Kerry et al., 2024; Pusch et al., \u003cspan citationid=\"CR39\" class=\"CitationRef\"\u003e2021\u003c/span\u003e). Specifically, the relationship between soil properties and covariates largely lacked significant correlations in this scenario across all study areas. When correlations were present, they were generally moderate to strong but isolated to specific variables or sites (ECa or DEM in Paulinia at 0.2 samples ha\u003csup\u003e\u0026minus;\u0026thinsp;1\u003c/sup\u003e; supplementary material \u0026ndash; Fig. S3, S4, S6 and S9). The restricted capacity of the multivariate methods to accurately capture soil variability is explained by the combination of a small sample size and the inconsistency in predictive relationships with the covariates. Similarly, the OK results were also inconsistent, producing maps that ranged from excessively smooth surfaces to those with bull's-eye artifacts. This reduced accuracy reflects the inherent difficulty of modelling a reliable semivariogram from a limited number of samples, supporting the literature that advises against using geostatistical methods in this scenario (Oliver and Webster, \u003cspan citationid=\"CR34\" class=\"CitationRef\"\u003e2014\u003c/span\u003e). Finally, RK achieved only partial improvements by incorporating covariates, capturing some spatial variation in specific scenarios, but its overall performance remained limited (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e3\u003c/span\u003e).\u003c/p\u003e\u003cp\u003eAlthough the methods exhibited similar accuracy in clustered patterns, the spatial appearance of the resulting maps differed substantially (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e4\u003c/span\u003e), ranging from overly smoothed surfaces to maps exhibiting sharper transitions, as previously discussed. This divergence underscores the critical importance of considering both interpolation accuracy and spatial pattern, since accuracy metrics alone do not guarantee a map's practical quality. For practical applications like site-specific management, an interpolated map must achieve a crucial balance between statistical accuracy and agronomic feasibility, ensuring that spatial transitions are sufficiently coherent to effectively guide field decisions (Amaral and Justina, 2019; Karp et al., \u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e2024\u003c/span\u003e).\u003c/p\u003e\u003cp\u003e\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec12\" class=\"Section2\"\u003e\u003ch2\u003ePerformance in scenarios with 50 to 100 samples\u003c/h2\u003e\u003cp\u003eIn intermediate scenarios (50\u0026ndash;100 samples), the influence of spatial structure remains evident (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e5\u003c/span\u003e). For datasets exhibiting random patterns, all interpolation methods showed limited predictive capacity, with LCCC values consistently below 0.5. This behavior suggests that when both local and global spatial structures are weak, the intermediate sampling density (0.2 and 0.5 samples ha\u003csup\u003e\u0026minus;\u003c/sup\u003e1) may still be insufficient to capture the microvariability of the soil properties (Oliver and Webster, \u003cspan citationid=\"CR34\" class=\"CitationRef\"\u003e2014\u003c/span\u003e). However, TPS still outperformed the other interpolation methods, delivering the most stable results and confirming its robustness. This finding aligns with the results of Wen et al. (\u003cspan citationid=\"CR52\" class=\"CitationRef\"\u003e2022\u003c/span\u003e), who report that RBF methods are highly accurate when data exhibit smooth variation, a condition analogous to the low to moderate SD observed in this study. Nonetheless, when a high degree of SD is present, SVM emerges as a viable alternative. This performance is directly associated with the increased significance of correlations between the soil properties and the covariates (Supplementary material - Fig. S5, S7). The SVM exploits these associations through its kernel functions, projecting non-linear data into a higher-dimensional space to capture complex patterns by constructing an optimal hyperplane for prediction (Khaledian and Miller, \u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e2020\u003c/span\u003e).\u003c/p\u003e\u003cp\u003eContrarily, with clustered data, deterministic methods presented similar results, emerging as the preferred choice and competing closely with OK (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e5\u003c/span\u003e). However, when the spatial dependency is very high, OK outperforms all other interpolators. This superiority is expected, as kriging methods are fundamentally based on the assumption that data variation is both random and spatially dependent (Oliver and Webster, \u003cspan citationid=\"CR35\" class=\"CitationRef\"\u003e2015\u003c/span\u003e). In this specific context, the presence of both high SD and a clustered pattern is not a limitation but rather a core prerequisite for geostatistical methods to operate effectively. However, optimizing semivariogram modeling for this data range (50\u0026ndash;100 samples) often necessitates advanced techniques, such as REML, which can be highly computationally demanding with larger datasets and requires a more advanced technical knowledge base (Kerry and Oliver, \u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e2007\u003c/span\u003e). This requirement explains why kriging methods, despite their statistical superiority in specific conditions, are not widely implemented within standard GIS routines for practical use.\u003c/p\u003e\u003cp\u003eUnder these same conditions (Clustered - Very High SD and 50\u0026ndash;100 samples), ML algorithms, such as RFE and SVM, and hybrid approaches like RK, showed strong predictive potential (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e5\u003c/span\u003e). These models benefit from the coexistence of the strong spatial structure and meaningful covariate relationships, a synergy that substantially enhances their ability to capture complex spatial patterns. As noted above, the observed increase in meaningful correlation between soil properties and covariates in this sampling range supports the enhanced performance of multivariate approaches. This finding is consistent with Ho et al. (\u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e2025\u003c/span\u003e), who demonstrated that hybrid models combining RFE and geostatistics, such as RK, can outperform both standard geostatistical and standalone ML algorithms. Therefore, RK is recommended only for datasets with more than 50 samples and with residuals that exhibit spatial dependence. These specific conditions are necessary to reliably model the residual semivariogram, thereby effectively enhancing the predictions of the base model (Hengl, \u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e2009\u003c/span\u003e; Hengl et al., \u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e2007\u003c/span\u003e).\u003c/p\u003e\u003cp\u003e\u003c/p\u003e\u003cp\u003eAt this sample level (50\u0026ndash;100 samples), the spatial appearance of the interpolated maps clearly reflects the theoretical differences among the methods (Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e6\u003c/span\u003e). Deterministic approaches revealed the nature of exact interpolators: IDW intensified the formation of discontinuous patterns (bull's-eyes) as more points were incorporated, while TPS consistently maintained continuous surfaces, although with more pronounced local variability. Meanwhile, OK showed an improvement in its spatial representations, yet it exhibited a characteristic smoothing effect when SD was below 60% (Oliver and Webster, \u003cspan citationid=\"CR34\" class=\"CitationRef\"\u003e2014\u003c/span\u003e). Among the ML-Hybrid methods (SVM, RFE, and RK), all of them captured general variability, but they created overly smooth patterns concerning local variations. Nevertheless, the SVM specifically exhibited patterns clearly influenced by the shapes of the covariates (Radočaj et al., 2020, \u003cspan citationid=\"CR44\" class=\"CitationRef\"\u003e2022\u003c/span\u003e). These observed spatial differences, rather than statistical performance, emphasize the critical distinction that higher predictive accuracy does not necessarily correspond to more realistic or agronomically manageable spatial patterns.\u003c/p\u003e\u003cp\u003e\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec13\" class=\"Section2\"\u003e\u003ch2\u003ePerformance in scenarios with more than 100 samples\u003c/h2\u003e\u003cp\u003eFinally, in the high sampling scenario (datasets with more than 100 samples, corresponding to \u0026ge;\u0026thinsp;1.0 samples ha\u003csup\u003e\u0026minus;\u0026thinsp;1\u003c/sup\u003e), all datasets exhibited clustered patterns (Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e7\u003c/span\u003e). When SD was \u0026lt;\u0026thinsp;60%, the TPS consistently emerged as the most accurate method. As the SD increased to high and very high degrees, IDW and RK presented the most accurate results, respectively. However, overall performance among the methods was comparable. This similarity aligns with Barrena-Gonz\u0026aacute;lez et al. (\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e2022\u003c/span\u003e), who reported that under specific data conditions, the predictive accuracy of various interpolators converges, allowing for potential interchangeability. This highlights that simple methods, such as deterministic interpolators, remain highly competitive when properly optimized. Their operational simplicity positions them as efficient and practical alternatives, allowing the selection of the interpolation method to be guided by the availability of computational resources and operational ease (Li and Heap, \u003cspan citationid=\"CR30\" class=\"CitationRef\"\u003e2011\u003c/span\u003e).\u003c/p\u003e\u003cp\u003e\u003c/p\u003e\u003cp\u003eDespite the increased sample size (100 samples), the spatial appearance of interpolated maps exhibiting SD\u0026thinsp;\u0026lt;\u0026thinsp;80% remained similar to the previous scenario (50\u0026ndash;100 samples). Specifically, deterministic methods continued to show a greater representation of local variability, OK still produced characteristically smoother surfaces, and the multivariate methods (SVM, RFE, and RK) generated smooth patterns influenced by specific covariate distributions. In contrast, when the data was clustered and SD was very high (\u0026ge;\u0026thinsp;80%), all interpolation methods proved competent at capturing the main spatial variability (Fig.\u0026nbsp;\u003cspan refid=\"Fig7\" class=\"InternalRef\"\u003e8\u003c/span\u003e), as demonstrated by the case of clay at the S\u0026atilde;o Jos\u0026eacute; farm, where all methods clearly captured the spatial trend (highest values in the north and lowest in the west). Subtle visual differences, reflecting each algorithm's theoretical basis, were still apparent: deterministic algorithms provided greater local detail, whereas OK yielded a noticeably smoother map, thereby revealing the regional trend by smoothing local variations. Meanwhile, the Machine Learning and hybrid approaches (SVM, RFE, and RK) generated highly detailed maps, displaying complex patterns clearly influenced by the covariates. This finding highlights that, under these favorable sampling conditions, the selection of the interpolator transitions into a strategic decision guided not only by predictive accuracy but also by the map's objective and practical considerations, such as operational simplicity, information availability, and technical knowledge (Amaral \u0026amp; Justina, 2019; Hengl, \u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e2009\u003c/span\u003e; Li \u0026amp; Heap, \u003cspan citationid=\"CR30\" class=\"CitationRef\"\u003e2011\u003c/span\u003e). Critically, all maps interpolated at this high-density level yielded similar results, which sharply contrast with the outcomes observed in scenarios with fewer samples.\u003c/p\u003e\u003cp\u003e\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec14\" class=\"Section2\"\u003e\u003ch2\u003ePractical implications: A Decision-Making Framework\u003c/h2\u003e\u003cp\u003eThese differences in spatial continuity and local detail carry significant practical implications for site-specific management. For example, in the context of fertilizer application, interpolation outputs that fail to accurately represent true soil variability, either by over-smoothing or generating artificial discontinuities, can directly mislead fertilizer prescriptions (Karp et al., \u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e2024\u003c/span\u003e; Pusch et al., \u003cspan citationid=\"CR41\" class=\"CitationRef\"\u003e2023\u003c/span\u003e). Overly smoothed maps, characteristic of methods like OK in low SD scenarios, can mask critical field variability, leading to uniform applications where site-specific management is necessary, thus directly contradicting PA principles. Conversely, maps displaying abrupt and unrealistic discontinuities, often observed with optimized IDW, generate spatial variations that are difficult to implement operationally, resulting in field errors (over- or under-application). Both extremes compromise the agronomic interpretation and overall quality of the final prescription maps, ultimately reducing the efficiency and sustainability of fertilizer use.\u003c/p\u003e\u003cp\u003eThe results of this study decisively confirm that no universal interpolation method is applicable to all datasets. The optimal selection of the interpolator is dependent on the key interplay between the sample size and the intrinsic spatial structure of the data. To address the current lack of objective criteria guiding interpolation method selection in practice, these critical findings were translated and integrated into a comprehensive decision framework (Fig.\u0026nbsp;\u003cspan refid=\"Fig8\" class=\"InternalRef\"\u003e9\u003c/span\u003e). This framework systematically guides the selection of the most suitable interpolator by directly linking observed data characteristics (sample size and spatial structure) to the optimal method. In doing so, it effectively fills a critical gap between empirical performance assessment and prescriptive practical applications in digital soil mapping.\u003c/p\u003e\u003cp\u003eThis proposed framework operates as a hierarchical decision tree, guiding the user through a sequence of logical nodes to determine the optimal interpolator. The starting point is the determination of the total sample size (\u003cem\u003en\u003c/em\u003e), or the number of observations. While the sampling density defines the spatial coverage and resolution, the total number of observations is the primary factor that governs the fundamental feasibility of the interpolation methods. This distinction between sample size and density is particularly relevant in agricultural fields. Even with high sampling densities, a small mapped area may result in a low total number of observations (\u003cem\u003en\u003c/em\u003e), which critically limits the applicability of specific methods, regardless of the apparent density. Therefore, this initial stratification (into \u0026lt;\u0026thinsp;50, 50\u0026ndash;100, \u0026ge; 100 samples) is fundamental, as it establishes the primary methodological condition for method selection. The framework thus ensures that subsequent recommendations are grounded in the methodological feasibility of each interpolation approach. Once the sample size (\u003cem\u003en\u003c/em\u003e) is defined, the next node for consideration is the spatial structure of the data, with the assessment process adapted to the constraints of each data category. For small datasets (n\u0026thinsp;\u0026lt;\u0026thinsp;50), the assessment of the spatial structure is simplified, relying solely on the MI to distinguish between random and clustered patterns. This simplification is necessary because the low number of samples makes it infeasible to model a reliable semivariogram, and, consequently, a trustworthy SD measure cannot be obtained. For larger datasets (n\u0026thinsp;\u0026ge;\u0026thinsp;50 samples), the framework employs a two-stage approach. First, the MI is used to determine the spatial pattern (i.e., whether it is random or clustered). Then, the second stage quantifies the degree of SD, classifying the data from very low to very high. This dual evaluation is crucial as it ensures the accurate identification of both the presence and the degree of the spatial structure in the data.\u003c/p\u003e\u003cp\u003eEach path in the framework concludes with a primary interpolator recommendation and, where applicable, viable alternatives (defined as methods performing within a 5% accuracy threshold). This threshold acknowledges that minor differences in accuracy metrics rarely translate to practically significant effects at the field level, thus allowing the final selection to be justified by the practical goals of the application. This approach effectively demonstrates the framework's adaptability: it guides users toward deterministic methods under data-limited conditions, while pointing to more sophisticated approaches, such as RFE and RK, in conditions with sufficient samples and strong spatial structure. The outcome is an evidence-guided method selection process that replaces arbitrary decisions with a systematic approach, thereby maximizing the reliability and utility of soil maps in precision agriculture.\u003c/p\u003e\u003cp\u003e\u003c/p\u003e\u003cp\u003eWhile the multivariate framework offers the potential for high-accuracy predictions, its practical applications are conditional. Success relies on the availability of high-quality covariates, the strength and stability of their relationships with the target variable, and the technical expertise required for advanced modeling. Several studies emphasize that despite the potential of covariates to enhance the predictive performance of some methods, their performance remains strongly conditioned by the stability of these relationships (Pusch et al., \u003cspan citationid=\"CR39\" class=\"CitationRef\"\u003e2021\u003c/span\u003e, \u003cspan citationid=\"CR41\" class=\"CitationRef\"\u003e2023\u003c/span\u003e). Moreover, the limited interpretability of some sophisticated methods, such as machine learning, can restrict their operational use (Khaledian \u0026amp; Miller, \u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e2020\u003c/span\u003e). Recognizing that these conditions are not always met in practice, a simplified univariate version of the framework was developed as a more accessible alternative (Fig.\u0026nbsp;\u003cspan refid=\"Fig9\" class=\"InternalRef\"\u003e10\u003c/span\u003e). This alternative provides a streamlined decision-making path for contexts where data limitations or operational constraints restrict the use of multivariate algorithms. Consequently, the univariate framework serves as a practical solution for quick evaluations of operational scenarios where simplicity and reproducibility are prioritized over methodological complexity.\u003c/p\u003e\u003cp\u003eFigure 11 presents the external validation of the univariate framework across independent areas, confirming its robustness and reproducibility. Each panel represents one of the sample ranges evaluated in this study (\u0026lt;\u0026thinsp;50, 50\u0026ndash;100, and \u0026ge;\u0026thinsp;100 samples) and displays the median LCCC for each spatial structure category. For datasets with less than 50 samples, only the local spatial structure (obtained with the Moran's Index) was evaluated. This simplification is methodologically necessary because the limited number of observations makes it unreliable to model the semivariogram, thereby precluding the quantification of the SD degree. Under the most limited conditions (i.e., Random pattern and \u0026lt;\u0026thinsp;50 samples), the TPS remained highly competitive, thereby reinforcing its role as a reliable and adaptable interpolator when sampling constraints prevail. As the number of samples increased to the 50\u0026ndash;100 range, the local and global spatial structure became a more important factor in method selection. Even in this moderate range, when random patterns are present, the TPS remains the preferred option. When the spatial data are clustered, the methods demonstrate improved performance as the SD increases, and the differences in method accuracy become more balanced. Finally, with a large number of samples (\u0026ge;\u0026thinsp;100), the IDW method consistently outperformed both OK and TPS, reflecting its ability to effectively exploit spatial structure information when dense data are available. These results decisively demonstrate that the framework\u0026rsquo;s decision logic remains valid beyond the specific datasets used for its development, confirming its applicability as a generalizable and evidence-based tool for selecting interpolation methods in digital soil mapping.\u003c/p\u003e\u003cp\u003e\u003c/p\u003e\u003cp\u003e\u003cb\u003eFigure\u0026nbsp;11\u003c/b\u003e External validation of univariate interpolators: median LCCC by sample size and spatial structure. IDW: Inverse distance weighting; TPS: Thin plate spline; OK: Ordinary kriging; LCCC: Lin\u0026rsquo;s concordance correlation\u003c/p\u003e\u003cp\u003e. R: Random; C: Clustered; RVL: Spatially random (Random) by Moran\u0026rsquo;s I and with very low (Very Low) spatial dependence by semivariogram modeling; RL: Random- Low; RVH: Random -Very High; CL: Clustered - Low; CM: Clustered - Moderate; CH: Clustered - High; CVH: Clustered - Very High. *Only the spatial structure scenarios with available data are displayed.\u003c/p\u003e\u003c/div\u003e"},{"header":"Conclusions","content":"\u003cp\u003eData characteristics, notably sample size (n) and spatial structure, fundamentally influence both the performance and subsequent selection of interpolation methods. Sampling density emerges as a key practical factor linking these two characteristics, as it simultaneously governs the observed spatial structure and the total number of observations captured. However, a critical limitation exists in Precision Agriculture, particularly within small fields, where even high sampling densities may not provide enough observations to support reliable spatial interpolation. The interaction between sample size and spatial structure, therefore, defines the operational thresholds for reliable mapping. This interaction is the crucial finding, indicating that the reliability of each method depends not merely on sample size (n) or spatial structure individually, but on their combined effect.\u003c/p\u003e\u003cp\u003eEach interpolation method has limitations specific to the underlying data characteristics. Geostatistical approaches are critically restricted by sample size (n) and spatial structure. These methods yield suitable results only in scenarios that meet defined conditions: more than 50 samples coupled with a clear spatial structure. This clear structure must be evident both locally indicated by a clustered pattern in the Moran's Index and globally with a SD exceeding 60% as quantified by the semivariogram analysis. In contrast, deterministic methods offer greater flexibility, achieving more stable predictions irrespective of the spatial structure and sample availability. Among these, the Thin Plate Spline (TPS) emerges as a highly versatile alternative, providing a practical balance between prediction accuracy and simplicity. Machine learning methods, however, face limitations that go beyond data characteristics (sample size and spatial structure). They are also restricted by the availability of covariates and the strength of their relationship with the target variable. Additionally, the implementation of these methods requires specialized technical expertise, further limiting their operational accessibility.\u003c/p\u003e\u003cp\u003eUltimately, the results of this study confirm that there is no universal interpolator applicable to all datasets, thereby establishing that the interplay between sample size and data spatial structure must serve as the primary guide for method selection. A decision framework was successfully developed to guide the selection of the most suitable interpolator for accurate soil mapping. Rather than prescribing a single method, the framework effectively translates data characteristics into clear, actionable recommendations, providing users with an accessible tool that demonstrably improves the reliability of soil maps in Precision Agriculture.\u003c/p\u003e"},{"header":"Declarations","content":"\u003ch2\u003eFunding\u003c/h2\u003e\u003cp\u003eThis work was supported by the S\u0026atilde;o Paulo Research Foundation (FAPESP). The authors acknowledge the FAPESP financial support provided through grants No. 2024/01557-3 (L.D.B.), No. 2024/13229-0 (J.V.F.P.), and the project grant No. 2022/03160-8 (L.R.A.) and the National Council for Scientific and Technological Development (CNPq - productivity fellowship n\u0026deg; 306867/2022-2 to L.R.A)\u003c/p\u003e\u003ch2\u003eAuthor Contribution\u003c/h2\u003e\u003cp\u003eL.D.B. and L.R.A. conceived the study. L.D.B., A.L.G.O., and L.R.A. designed the methodology. L.D.B. curated the data and performed the formal analysis. L.D.B. wrote the original draft. All authors contributed to the review and editing of the manuscript. L.R.A. supervised the work and acquired the funding.\u003c/p\u003e\u003ch2\u003eAcknowledgement\u003c/h2\u003e\u003cp\u003eThe authors gratefully acknowledge all members of the Interdisciplinary Group for Technology in Precision Agriculture (GITAP) for their essential contribution in data collection across the six study sites. Special thanks are extended to the producers of the Aliada, S\u0026atilde;o Jos\u0026eacute;, Campina, Santa F\u0026eacute;, Ipiranga, and Boa Vista farms for their generous permission to access their properties and for providing crucial logistical support during field operations.\u003c/p\u003e\u003ch2\u003eData Availability\u003c/h2\u003e\u003cp\u003eThe datasets that support the findings of this study are deposited in the UNICAMP Research Data Repository (Reposit\u0026oacute;rio de Dados de Pesquisa da Unicamp) under the DOI https://doi.org/10.25824/redu/DEBGPBThe data were generated as part of L.D.B.\u0026rsquo;s master\u0026rsquo;s research and will be publicly released after the defense process. Until that time, they are available from the corresponding author upon reasonable request via email.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\u003cli\u003e\u003cspan\u003eAbdelRahman, M. A. E., Farg, E., Saleh, A. M., Sayed, M., Abutaleb, K., Arafat, S. M., \u0026amp; Elsharkawy, M. M. (2022). Mapping of soils and land-related environmental attributes in modern agriculture systems using geomatics. \u003cem\u003eSustainable Water Resources Management\u003c/em\u003e, \u003cem\u003e8\u003c/em\u003e(4), 116. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1007/s40899-022-00704-2\u003c/span\u003e\u003cspan address=\"10.1007/s40899-022-00704-2\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eAldungarova, A., Utepov, Y., Mukhamejanova, A., Tulebekova, A., Nazarova, A., Tleubayeva, A., Makasheva, I., Zhakapbayeva, G., Makhiyev, B., \u0026amp; Mkilima, T. (2025). Advancing Intermediate Soil Properties (ISP) Interpolation for Enhanced Geotechnical Survey Accuracy. \u003cem\u003eA Review Engineering Reports\u003c/em\u003e, \u003cem\u003e7\u003c/em\u003e(8), e70328. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1002/ENG2.70328\u003c/span\u003e\u003cspan address=\"10.1002/ENG2.70328\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e;CTYPE:STRING:JOURNAL.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eAmaral, L. R., do, \u0026amp; Justina, D. D. (2019). della. Spatial dependence degree and sampling neighborhood influence on interpolation process for fertilizer prescription maps. \u003cem\u003eEngenharia Agr\u0026iacute;cola\u003c/em\u003e, \u003cem\u003e39\u003c/em\u003e(spe), 85\u0026ndash;95. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1590/1809-4430-eng.agric.v39nep85-95/2019\u003c/span\u003e\u003cspan address=\"10.1590/1809-4430-eng.agric.v39nep85-95/2019\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eAnselin, L. (1995). Local Indicators of Spatial Association\u0026mdash;LISA. \u003cem\u003eGeographical Analysis\u003c/em\u003e, \u003cem\u003e27\u003c/em\u003e(2), 93\u0026ndash;115. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1111/j.1538-4632.1995.tb00338.x\u003c/span\u003e\u003cspan address=\"10.1111/j.1538-4632.1995.tb00338.x\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eBarrena-Gonz\u0026aacute;lez, J., Contador, L., J. F., \u0026amp; Pulido Fern\u0026aacute;ndez, M. (2022). Mapping Soil Properties at a Regional Scale: Assessing Deterministic vs. Geostatistical Interpolation Methods at Different Soil Depths. \u003cem\u003eSustainability (Switzerland)\u003c/em\u003e, \u003cem\u003e14\u003c/em\u003e(16). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.3390/su141610049\u003c/span\u003e\u003cspan address=\"10.3390/su141610049\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eBetzek, N. M., de Souza, E. G., Bazzi, C. L., Schenatto, K., Gavioli, A., \u0026amp; Magalh\u0026atilde;es, P. S. G. (2019). Computational routines for the automatic selection of the best parameters used by interpolation methods to create thematic maps. \u003cem\u003eComputers and Electronics in Agriculture\u003c/em\u003e, \u003cem\u003e157\u003c/em\u003e(December 2018), 49\u0026ndash;62. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/j.compag.2018.12.004\u003c/span\u003e\u003cspan address=\"10.1016/j.compag.2018.12.004\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eBier, V. A., \u0026amp; de Souza, E. G. (2017). Interpolation selection index for delineation of thematic maps. \u003cem\u003eComputers and Electronics in Agriculture\u003c/em\u003e, \u003cem\u003e136\u003c/em\u003e, 202\u0026ndash;209. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/j.compag.2017.03.008\u003c/span\u003e\u003cspan address=\"10.1016/j.compag.2017.03.008\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eBischl, B., Lang, M., Kotthoff, L., Schiffner, J., Richter, J., Studerus, E., Casalicchio, G., \u0026amp; Jones, Z. M. (2016). Mlr: Machine learning in R. \u003cem\u003eJournal of Machine Learning Research\u003c/em\u003e, \u003cem\u003e17\u003c/em\u003e, 1\u0026ndash;5.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eBreiman, L. (2001). Random Forests. \u003cem\u003eMachine Learning\u003c/em\u003e, \u003cem\u003e45\u003c/em\u003e, 5\u0026ndash;32. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1023/A:1010933404324\u003c/span\u003e\u003cspan address=\"10.1023/A:1010933404324\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eChen, C., Bei, Y., Li, Y., \u0026amp; Zhou, W. (2022). Effect of interpolation methods on quantifying terrain surface roughness under different data densities. \u003cem\u003eGeomorphology\u003c/em\u003e, \u003cem\u003e417\u003c/em\u003e(September), 108448. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/j.geomorph.2022.108448\u003c/span\u003e\u003cspan address=\"10.1016/j.geomorph.2022.108448\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eCherubin, M. R., Damian, J. M., Tavares, T. R., Trevisan, R. G., Cola\u0026ccedil;o, A. F., Eitelwein, M. T., Martello, M., Inamasu, R. Y., de Pias, O. H. C., \u0026amp; Molin, J. P. (2022). Precision Agriculture in Brazil: The Trajectory of 25 Years of Scientific Research. \u003cem\u003eAgriculture (Switzerland)\u003c/em\u003e, \u003cem\u003e12\u003c/em\u003e(11), 1\u0026ndash;29. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.3390/agriculture12111882\u003c/span\u003e\u003cspan address=\"10.3390/agriculture12111882\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eCorwin, D. L., \u0026amp; Scudiero, E. (2020). Field-scale apparent soil electrical conductivity. \u003cem\u003eSoil Science Society of America Journal\u003c/em\u003e, \u003cem\u003e84\u003c/em\u003e(5), 1405\u0026ndash;1441. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1002/saj2.20153\u003c/span\u003e\u003cspan address=\"10.1002/saj2.20153\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eDanoedoro, P., \u0026amp; Zukhrufiyati, A. (2015). Integrating spectral indices and geostatistics based onlandsat-8 imagery for surface clay content mapping in GunungKidul area, Yogyakarta, Indonesia. Asian Conference on Remote Sensing. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://www.researchgate.net/publication/302580476\u003c/span\u003e\u003cspan address=\"https://www.researchgate.net/publication/302580476\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eDe Caires, S. A., Martin, C. S., Atwell, M. A., Kaya, F., Wuddivira, G. A., \u0026amp; Wuddivira, M. N. (2025). Advancing soil mapping and management using geostatistics and integrated machine learning and remote sensing techniques: a synoptic review. \u003cem\u003eDiscover Soil\u003c/em\u003e, \u003cem\u003e2\u003c/em\u003e(1). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1007/s44378-025-00082-z\u003c/span\u003e\u003cspan address=\"10.1007/s44378-025-00082-z\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eDematt\u0026ecirc;, J. A. M., Fongaro, C. T., Rizzo, R., \u0026amp; Safanelli, J. L. (2018). Geospatial Soil Sensing System (GEOS3): A powerful data mining procedure to retrieve soil spectral reflectance from satellite images. \u003cem\u003eRemote Sensing of Environment\u003c/em\u003e, \u003cem\u003e212\u003c/em\u003e, 161\u0026ndash;175. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/j.rse.2018.04.047\u003c/span\u003e\u003cspan address=\"10.1016/j.rse.2018.04.047\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eFontenelli, J. V., Adamchuk, V. I., Ferreira, M. M. C., Amaral, L. R., Guimar\u0026atilde;es, C. C. B., Dematt\u0026ecirc;, J. A. M., \u0026amp; Magalh\u0026atilde;es, P. S. G. (2021). Evaluating the synergy of three soil spectrometers for improving the prediction and mapping of soil properties in a high anthropic management area: A case of study from Southeast Brazil. \u003cem\u003eGeoderma\u003c/em\u003e, 402. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/j.geoderma.2021.115347\u003c/span\u003e\u003cspan address=\"10.1016/j.geoderma.2021.115347\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eGebbers, R., \u0026amp; Adamchuk, V. I. (2010). Precision Agriculture and Food Security. \u003cem\u003eScience\u003c/em\u003e, \u003cem\u003e327\u003c/em\u003e(5967), 828\u0026ndash;831. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1126/science.1183899\u003c/span\u003e\u003cspan address=\"10.1126/science.1183899\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eGoovaerts, P., \u0026amp; Kerry, R. (2010). Using Ancillary Data to Improve Prediction of Soil and Crop Attributes in Precision Agriculture. In M. A. Oliver (Ed.), \u003cem\u003eGeostatistical Applications for Precision Agriculture\u003c/em\u003e (pp. 167\u0026ndash;194). Springer Netherlands. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1007/978-90-481-9133-8_7\u003c/span\u003e\u003cspan address=\"10.1007/978-90-481-9133-8_7\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eGr\u0026auml;ler, B., Pebesma, E., \u0026amp; Heuvelink, G. (2016). Spatio-temporal interpolation using gstat. \u003cem\u003eR Journal\u003c/em\u003e, \u003cem\u003e8\u003c/em\u003e(1), 204\u0026ndash;218. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.32614/RJ-2016-014\u003c/span\u003e\u003cspan address=\"10.32614/RJ-2016-014\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eHengl, T. (2009). \u003cem\u003eA practical guide to geostatistical mapping of environmental variables\u003c/em\u003e. Office for Official Publications of the European Communities.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eHengl, T., Heuvelink, G. B. M., \u0026amp; Rossiter, D. G. (2007). About regression-kriging: From equations to case studies. \u003cem\u003eComputers and Geosciences\u003c/em\u003e, \u003cem\u003e33\u003c/em\u003e(10), 1301\u0026ndash;1315. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/j.cageo.2007.05.001\u003c/span\u003e\u003cspan address=\"10.1016/j.cageo.2007.05.001\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eHo, V. H., Morita, H., Ho, T. H., Bachofer, F., \u0026amp; Nguyen, T. T. (2025). Comparison of geostatistics, machine learning algorithms, and their hybrid approaches for modeling soil organic carbon density in tropical forests. \u003cem\u003eJournal of Soils and Sediments\u003c/em\u003e, \u003cem\u003e25\u003c/em\u003e(5), 1554\u0026ndash;1577. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1007/s11368-025-04027-5\u003c/span\u003e\u003cspan address=\"10.1007/s11368-025-04027-5\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eHuber, P. J., \u0026amp; Ronchetti, E. M. (2009). \u003cem\u003eRobust Statistics\u003c/em\u003e. Wiley. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1002/9780470434697\u003c/span\u003e\u003cspan address=\"10.1002/9780470434697\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eKarp, F. H. S., Adamchuk, V., Dutilleul, P., \u0026amp; Melnitchouck, A. (2024). Comparative study of interpolation methods for low-density sampling. \u003cem\u003ePrecision Agriculture\u003c/em\u003e, \u003cem\u003e25\u003c/em\u003e(6), 2776\u0026ndash;2800. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1007/s11119-024-10141-0\u003c/span\u003e\u003cspan address=\"10.1007/s11119-024-10141-0\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eKerry, R., \u0026amp; Oliver, M. A. (2007). Comparing sampling needs for variograms of soil properties computed by the method of moments and residual maximum likelihood. \u003cem\u003eGeoderma\u003c/em\u003e, \u003cem\u003e140\u003c/em\u003e(4), 383\u0026ndash;396. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/j.geoderma.2007.04.019\u003c/span\u003e\u003cspan address=\"10.1016/j.geoderma.2007.04.019\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eKhaledian, Y., \u0026amp; Miller, B. A. (2020). Selecting appropriate machine learning methods for digital soil mapping. \u003cem\u003eApplied Mathematical Modelling\u003c/em\u003e, \u003cem\u003e81\u003c/em\u003e, 401\u0026ndash;418. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/j.apm.2019.12.016\u003c/span\u003e\u003cspan address=\"10.1016/j.apm.2019.12.016\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eKonopatzki, M. R. S., Souza, E. G., N\u0026oacute;brega, L. H. P., Uribe-Opazo, M. A., \u0026amp; Suszek, G. (2012). Spatial variability of yield and other parameters associated with pear trees. \u003cem\u003eEngenharia Agricola\u003c/em\u003e, \u003cem\u003e32\u003c/em\u003e(2), 381\u0026ndash;392. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1590/S0100-69162012000200018\u003c/span\u003e\u003cspan address=\"10.1590/S0100-69162012000200018\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eKravchenko, A. N. (2003). Influence of Spatial Structure on Accuracy of Interpolation Methods. \u003cem\u003eSoil Science Society of America Journal\u003c/em\u003e, \u003cem\u003e67\u003c/em\u003e(5), 1564\u0026ndash;1571. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.2136/sssaj2003.1564\u003c/span\u003e\u003cspan address=\"10.2136/sssaj2003.1564\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eKuhn, M. (2008). Building predictive models in R using the caret package. \u003cem\u003eJournal of Statistical Software\u003c/em\u003e, \u003cem\u003e28\u003c/em\u003e(5), 1\u0026ndash;26. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.18637/jss.v028.i05\u003c/span\u003e\u003cspan address=\"10.18637/jss.v028.i05\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eLi, J., \u0026amp; Heap, A. D. (2011). A review of comparative studies of spatial interpolation methods in environmental sciences: Performance and impact factors. \u003cem\u003eEcological Informatics\u003c/em\u003e, \u003cem\u003e6\u003c/em\u003e(3\u0026ndash;4), 228\u0026ndash;241. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/j.ecoinf.2010.12.003\u003c/span\u003e\u003cspan address=\"10.1016/j.ecoinf.2010.12.003\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eLong, J., Liu, Y., Xing, S., Qiu, L., Huang, Q., Zhou, B., Shen, J., \u0026amp; Zhang, L. (2018). Effects of sampling density on interpolation accuracy for farmland soil organic matter concentration in a large region of complex topography. \u003cem\u003eEcological Indicators\u003c/em\u003e, \u003cem\u003e93\u003c/em\u003e(May), 562\u0026ndash;571. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/j.ecolind.2018.05.044\u003c/span\u003e\u003cspan address=\"10.1016/j.ecolind.2018.05.044\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eMagalh\u0026atilde;es, P. S. G., \u0026amp; Amaral, L. R. (2022). do. Monitoring integrated crop-livestock systems through remote sensing and precision agriculture for more sustainable production: towards low carbon agriculture (V3 ed.). Reposit\u0026oacute;rio de Dados de Pesquisa da Unicamp. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/doi/10.25824/redu/CMAH9X\u003c/span\u003e\u003cspan address=\"doi/10.25824/redu/CMAH9X\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eMelo, D. D., \u0026amp; Amaral, L. R. (2024). do. Replication data for: Hierarchical stratification for spatial sampling and digital mapping of soil attributes (V1 ed.). Reposit\u0026oacute;rio de Dados de Pesquisa da Unicamp. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/doi/10.25824/redu/8QITE4\u003c/span\u003e\u003cspan address=\"doi/10.25824/redu/8QITE4\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eOliver, M. A., \u0026amp; Webster, R. (2014). A tutorial guide to geostatistics: Computing and modelling variograms and kriging. \u003cem\u003eCatena\u003c/em\u003e, \u003cem\u003e113\u003c/em\u003e, 56\u0026ndash;69. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/j.catena.2013.09.006\u003c/span\u003e\u003cspan address=\"10.1016/j.catena.2013.09.006\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eOliver, M. A., \u0026amp; Webster, R. (2015). Basic Steps in Geostatistics: The Variogram and Kriging. Springer International Publishing. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1007/978-3-319-15865-5\u003c/span\u003e\u003cspan address=\"10.1007/978-3-319-15865-5\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003ePanday, D., Ojha, R. B., Chalise, D., Das, S., \u0026amp; Twanabasu, B. (2019). Spatial variability of soil properties under different land use in the Dang district of Nepal. \u003cem\u003eCogent Food \u0026amp; Agriculture\u003c/em\u003e, \u003cem\u003e5\u003c/em\u003e(1), 1600460. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1080/23311932.2019.1600460\u003c/span\u003e\u003cspan address=\"10.1080/23311932.2019.1600460\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003ePereira, G. W., Valente, D. S. M., de Queiroz, D. M., Santos, N. T., \u0026amp; Fernandes-Filho, E. I. (2022). Soil mapping for precision agriculture using support vector machines combined with inverse distance weighting. \u003cem\u003ePrecision Agriculture\u003c/em\u003e, \u003cem\u003e23\u003c/em\u003e(4), 1189\u0026ndash;1204. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1007/s11119-022-09880-9\u003c/span\u003e\u003cspan address=\"10.1007/s11119-022-09880-9\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003ePiikki, K., Wetterlind, J., S\u0026ouml;derstr\u0026ouml;m, M., \u0026amp; Stenberg, B. (2021). Perspectives on validation in digital soil mapping of continuous attributes\u0026mdash;A review. \u003cem\u003eSoil Use and Management\u003c/em\u003e, \u003cem\u003e37\u003c/em\u003e(1), 7\u0026ndash;21. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1111/sum.12694\u003c/span\u003e\u003cspan address=\"10.1111/sum.12694\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003ePusch, M., Oliveira, A. L. G., Fontenelli, J. V., \u0026amp; Amaral, L. R. D. (2021). Soil Properties Mapping Using Proximal and Remote Sensing As Covariate. \u003cem\u003eEngenharia Agricola\u003c/em\u003e, \u003cem\u003e41\u003c/em\u003e(6), 634\u0026ndash;642. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1590/1809-4430-ENG.AGRIC.V41N6P634-642/2021\u003c/span\u003e\u003cspan address=\"10.1590/1809-4430-ENG.AGRIC.V41N6P634-642/2021\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003ePusch, M., Samuel-Rosa, A., Oliveira, A. L. G., Magalh\u0026atilde;es, P. S. G., \u0026amp; do Amaral, L. R. (2022). Improving soil property maps for precision agriculture in the presence of outliers using covariates. \u003cem\u003ePrecision Agriculture\u003c/em\u003e, \u003cem\u003e23\u003c/em\u003e(5), 1575\u0026ndash;1603. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1007/s11119-022-09898-z\u003c/span\u003e\u003cspan address=\"10.1007/s11119-022-09898-z\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003ePusch, M., Samuel-Rosa, A., Sergio Graziano Magalh\u0026atilde;es, P., \u0026amp; Rios, L. (2023). Covariates in sample planning optimization for digital soil fertility mapping in agricultural areas. \u003cem\u003eGeoderma, 429\u003c/em\u003e(October 2022). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/j.geoderma.2022.116252\u003c/span\u003e\u003cspan address=\"10.1016/j.geoderma.2022.116252\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eQu, L., Lu, H., Tian, Z., Schoorl, J. M., Huang, B., Liang, Y., Qiu, D., \u0026amp; Liang, Y. (2024). Spatial prediction of soil sand content at various sampling density based on geostatistical and machine learning algorithms in plain areas. \u003cem\u003eCatena, 234\u003c/em\u003e(October 2023), 107572. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/j.catena.2023.107572\u003c/span\u003e\u003cspan address=\"10.1016/j.catena.2023.107572\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eRadočaj, D., Jug, I., Vukadinović, V., Jurišić, M., \u0026amp; Gašparović, M. (2021). The effect of soil sampling density and spatial autocorrelation on interpolation accuracy of chemical soil properties in arable cropland. \u003cem\u003eAgronomy\u003c/em\u003e, \u003cem\u003e11\u003c/em\u003e(12). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.3390/agronomy11122430\u003c/span\u003e\u003cspan address=\"10.3390/agronomy11122430\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eRadočaj, D., Jurišić, M., \u0026amp; Gašparović, M. (2022). The Role of Remote Sensing Data and Methods in a Modern Approach to Fertilization in Precision Agriculture. \u003cem\u003eRemote Sensing\u003c/em\u003e, \u003cem\u003e14\u003c/em\u003e(3). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.3390/rs14030778\u003c/span\u003e\u003cspan address=\"10.3390/rs14030778\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003evan Raij, B., de Andrade, J. C., Cantarella, H., \u0026amp; Quaggio, J. A. (2001). An\u0026aacute;lise Qu\u0026iacute;mica para Avalia\u0026ccedil;\u0026atilde;o da Fertilidade de Solos Tropicais (B. van Raij, J. C. de Andrade, H. Cantarella, \u0026amp; J. A. Quaggio (Eds.)). Instituto Agron\u0026ocirc;mico de Campinas. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://www.iac.sp.gov.br/publicacoes/arquivos/Raij_et_al_2001_Metod_Anal_IAC.pdf\u003c/span\u003e\u003cspan address=\"https://www.iac.sp.gov.br/publicacoes/arquivos/Raij_et_al_2001_Metod_Anal_IAC.pdf\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eRodrigues, A. C., Villa, P. M., Ferreira-J\u0026uacute;nior, W. G., Schaefer, C. E. R. G., \u0026amp; Neri, A. V. (2021). Effects of topographic variability and forest attributes on fine-scale soil fertility in late-secondary succession of Atlantic Forest. \u003cem\u003eEcological Processes\u003c/em\u003e, \u003cem\u003e10\u003c/em\u003e(1). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1186/s13717-021-00333-1\u003c/span\u003e\u003cspan address=\"10.1186/s13717-021-00333-1\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eRodrigues, M. S., Alves, D. C., De Souza, V. C., De Melo, A. C., Lima, A. M. N., \u0026amp; Cunha, J. C. (2018). Spatial interpolation techniques for site-specific irrigation management in a mango orchard. \u003cem\u003eComunicata Scientiae\u003c/em\u003e, \u003cem\u003e9\u003c/em\u003e(1), 93\u0026ndash;101. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.14295/CS.v9i1.2645\u003c/span\u003e\u003cspan address=\"10.14295/CS.v9i1.2645\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eSekulić, A., Kilibarda, M., Heuvelink, G. B. M., Nikolić, M., \u0026amp; Bajat, B. (2020). Random forest spatial interpolation. \u003cem\u003eRemote Sensing\u003c/em\u003e, \u003cem\u003e12\u003c/em\u003e(10), 1\u0026ndash;29. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.3390/rs12101687\u003c/span\u003e\u003cspan address=\"10.3390/rs12101687\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eSobjak, R., de Souza, E. G., Bazzi, C. L., Opazo, M. A. U., Mercante, E., \u0026amp; Junior, A., J (2023). Process improvement of selecting the best interpolator and its parameters to create thematic maps. \u003cem\u003ePrecision Agriculture\u003c/em\u003e, \u003cem\u003e24\u003c/em\u003e(4), 1461\u0026ndash;1496. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1007/s11119-023-09998-4\u003c/span\u003e\u003cspan address=\"10.1007/s11119-023-09998-4\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eSun, W., Zhao, Y., Huang, B., Shi, X., Landon Darilek, J., Yang, J., Wang, Z., \u0026amp; Zhang, B. (2012). Effect of sampling density on regional soil organic carbon estimation for cultivated soils. \u003cem\u003eJournal of Plant Nutrition and Soil Science\u003c/em\u003e, \u003cem\u003e175\u003c/em\u003e(5), 671\u0026ndash;680. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1002/jpln.201100181\u003c/span\u003e\u003cspan address=\"10.1002/jpln.201100181\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eWebster, R., \u0026amp; Oliver, M. A. (2007). Geostatistics for Environmental Scientists. (2nd ed.). Wiley. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1002/9780470517277\u003c/span\u003e\u003cspan address=\"10.1002/9780470517277\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eWen, L., Zhang, L., Bai, J., Wang, Y., Wei, Z., \u0026amp; Liu, H. (2022). Optimizing spatial interpolation method and sampling number for predicting cadmium distribution in the largest shallow lake of North China. \u003cem\u003eChemosphere\u003c/em\u003e, 309. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/j.chemosphere.2022.136789\u003c/span\u003e\u003cspan address=\"10.1016/j.chemosphere.2022.136789\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eWhite, G., Sun, D., \u0026amp; Speckman, P. (2019). Direct Sampling of Bayesian Thin-Plate Splines for Spatial Smoothing. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttp://arxiv.org/abs/1906.05575\u003c/span\u003e\u003cspan address=\"http://arxiv.org/abs/1906.05575\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eWong, D. W. S. (2017). Interpolation: Inverse-Distance Weighting. \u003cem\u003eInternational Encyclopedia of Geography\u003c/em\u003e, 1\u0026ndash;7. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1002/9781118786352.wbieg0066\u003c/span\u003e\u003cspan address=\"10.1002/9781118786352.wbieg0066\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eYang, L., Li, X., Shi, J., Shen, F., Qi, F., Gao, B., Chen, Z., Zhu, A. X., \u0026amp; Zhou, C. (2020). Evaluation of conditioned Latin hypercube sampling for soil mapping based on a machine learning method. \u003cem\u003eGeoderma, 369(\u003c/em\u003eMarch), 114337. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/j.geoderma.2020.114337\u003c/span\u003e\u003cspan address=\"10.1016/j.geoderma.2020.114337\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eZanotta, D. C., Ferreira, M. P., \u0026amp; Zortea, M. (2019). \u003cem\u003eProcessamento de imagens de sat\u0026eacute;lite\u003c/em\u003e. Oficina de Textos.\u003c/span\u003e\u003c/li\u003e\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":true,"highlight":"","institution":"","isAcceptedByJournal":true,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"Soil variability, sampling density, spatial structure, decision framework, map accuracy, spatial prediction","lastPublishedDoi":"10.21203/rs.3.rs-8020454/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-8020454/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cdiv id=\"ASec1\" class=\"AbstractSection\"\u003e\u003cdiv class=\"Heading\"\u003ePurpose\u003c/div\u003e\u003cp\u003eThe selection of interpolation methods in digital soil mapping lacks a systematic approach, reducing map accuracy. This study aimed to evaluate whether data characteristics, such as sample size and spatial structure, influence the selection and performance of interpolation methods.\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"ASec2\" class=\"AbstractSection\"\u003e\u003cdiv class=\"Heading\"\u003eMethods\u003c/div\u003e\u003cp\u003eSix interpolation methods were evaluated across datasets representing three typical sampling density scenarios in Brazilian agriculture. Spatial structure was characterized using Moran\u0026rsquo;s index and the spatial dependence index derived from geostatistical semivariograms. Interpolation was performed, and the accuracy was assessed using test datasets and Lin\u0026rsquo;s concordance correlation coefficient. Consequently, two decision frameworks (multivariate and univariate) were developed to guide method selection. The univariate framework was then validated to assess its robustness.\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"ASec3\" class=\"AbstractSection\"\u003e\u003cdiv class=\"Heading\"\u003eResults\u003c/div\u003e\u003cp\u003eFor small datasets (n\u0026thinsp;\u0026lt;\u0026thinsp;50), deterministic methods, particularly Thin Plate Spline (TPS), consistently provided the most stable predictions. In contrast, the performance of the geostatistical and machine learning methods improved with increasing sample size and stronger spatial structure. In the largest datasets (n\u0026thinsp;\u0026ge;\u0026thinsp;100), most methods became competitive, shifting the primary selection criteria towards factors such as operational simplicity. These findings were synthesized into decision frameworks to guide optimal interpolator selection.\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"ASec4\" class=\"AbstractSection\"\u003e\u003cdiv class=\"Heading\"\u003eConclusion\u003c/div\u003e\u003cp\u003eInterpolation performance is critically dependent on underlying data attributes (sample size and spatial structure). No universal interpolator exists for all datasets. Deterministic methods, specifically the TPS, demonstrated superior flexibility across diverse scenarios. A data-driven decision framework was developed in this study translating these key data attributes into clear, actionable recommendations, thereby providing users with an accessible tool to demonstrably improve the reliability of soil maps.\u003c/p\u003e\u003c/div\u003e","manuscriptTitle":"Performance of interpolation methods in digital soil mapping: the influence of data characteristics","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-11-20 17:55:22","doi":"10.21203/rs.3.rs-8020454/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"c975faea-e70c-4255-b475-7e2e14f792f6","owner":[],"postedDate":"November 20th, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[],"tags":[],"updatedAt":"2025-12-29T16:00:45+00:00","versionOfRecord":{"articleIdentity":"rs-8020454","link":"https://doi.org/10.1007/s11119-025-10311-8","journal":{"identity":"precision-agriculture","isVorOnly":false,"title":"Precision Agriculture"},"publishedOn":"2025-12-27 15:57:38","publishedOnDateReadable":"December 27th, 2025"},"versionCreatedAt":"2025-11-20 17:55:22","video":"","vorDoi":"10.1007/s11119-025-10311-8","vorDoiUrl":"https://doi.org/10.1007/s11119-025-10311-8","workflowStages":[]},"version":"v1","identity":"rs-8020454","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-8020454","identity":"rs-8020454","version":["v1"]},"buildId":"8U1c8b4HqxoKbykW_rLl7","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

⚙ Ask this paper AI returns verbatim quotes from the full text · source: preprint-html ⓘ

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2025) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc: last seen: 2026-05-20T01:45:00.602351+00:00