A hybrid deep clustering and machine learning-based decision support framework for modeling crop-livestock environmental suitability and spatial discrepancies

doi:10.21203/rs.3.rs-9196881/v1

A hybrid deep clustering and machine learning-based decision support framework for modeling crop-livestock environmental suitability and spatial discrepancies

2026 · doi:10.21203/rs.3.rs-9196881/v1

preprint OA: closed

Full text JSON View at publisher

Full text 174,416 characters · extracted from preprint-html · click to expand

A hybrid deep clustering and machine learning-based decision support framework for modeling crop-livestock environmental suitability and spatial discrepancies | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article A hybrid deep clustering and machine learning-based decision support framework for modeling crop-livestock environmental suitability and spatial discrepancies JingHua Wu, ZhuoCheng Xie This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-9196881/v1 This work is licensed under a CC BY 4.0 License Status: Under Review Version 1 posted 8 You are reading this latest preprint version Abstract The development of China’s National Modern Agricultural Industrial Parks (NMAIPs) has provided valuable experience to guide regional agricultural structural adjustment. To systematically analyze and scale up the successful practices of crop–livestock spatial layouts, this study examines 335 NMAIPs established between 2017 and 2024.Based on seven natural environmental variables, a deep clustering model (VAE-GMM) was applied to classify the parks into representative environmental types. This classification establishes a standardized spatial reference frame. Concurrently, a LightGBM multi-label classifier was utilized to predict the theoretical suitability of various crop–livestock spatial configurations. Crucially, the study introduces a spatial discrepancy (Gap) metric to evaluate industrial expansion potential. This metric is explicitly calculated as the difference between the model-predicted theoretical suitability proportion and the actual occurrence frequencies within each environmental cluster. The results show that the parks can be grouped into five distinct environmental types with clear regional spatial patterns. The LightGBM prediction achieved a micro-average AUC of 0.75, effectively capturing natural constraints. Furthermore, the discrepancy analysis reveals a structural divergence between environmental suitability and real-world agricultural allocation. Quantifying this divergence highlights environmentally suitable yet underrepresented industries. By treating existing parks as reference cases under specific environmental baselines, this data-driven framework provides objective, transferable decision support for industrial selection and spatial planning in newly established agricultural parks. National Modern Agricultural Industrial Parks Crop–livestock spatial layouts VAE-GMM LightGBM Decision support Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Figure 6 Figure 7 Figure 8 Figure 9 1 Introduction The profound transformation of global agriculture and increasing environmental changes have established Agricultural Industrial Parks (AIPs) as important platforms for promoting agricultural development through coordinated spatial planning and industrial organization(J. Wang, 2022 ). NMAIPs serve as national-level demonstration platforms in China that integrate advanced production factors and core technologies, playing a pivotal role in promoting regional sustainable development and agricultural industrial upgrading(Ling et al., 2023 ). The core value of these parks lies in achieving an efficient allocation of production factors through scientifically designed crop-livestock spatial layouts. This is closely linked to regional food security and also contributes to improving agroecological sustainability(Xiaoling Li, Huang, and Liu, 2025 ). Natural conditions constitute a key component of agricultural production factors and provide the fundamental basis for crop-livestock spatial layouts. Under heterogeneous environmental conditions, data-driven decision support systems (DSS) are increasingly recognized as effective tools for supporting precise planning and scientific decision-making in agricultural industrial parks(Ikendi, Lyons, and Pathak, 2025 ). Environmental baseline data in large-scale agricultural decision systems typically present numerous high-dimensional, sparse, and long-tailed discrete features. As a result, traditional linear dimensionality reduction and conventional clustering methods struggle to effectively capture the complex nonlinear interactions within such data(Y. Wang et al., 2025 ). In recent years, gradient boosting tree algorithms, represented by LightGBM, have demonstrated high computational efficiency in modeling high-dimensional agricultural features(Ke et al., 2017 ). Moreover, these methods exhibit strong robustness and stability in predicting crop–livestock layouts under extreme environmental conditions(Moharana, Yadav, Malav, Biswas, and Patil, 2025 ). In parallel, deep learning techniques have shown significant advantages in extracting spatial characteristics of land use(Dinh et al., 2025 ) and have been widely applied to model nonlinear relationships in complex ecological systems(Jabed and Murad, 2024 ). Nevertheless, how to effectively integrate the strengths of deep generative models in handling high-dimensional sparse features with efficient predictive algorithms to construct a comprehensive agricultural spatial decision support framework remains an open research question. To answer this question, this study proposes a deep learning–based framework for agricultural spatial knowledge discovery and decision support. Using a large-scale dataset of 335 National Modern Agricultural Industrial Parks established between 2017 and 2024, the proposed framework integrates VAE-GMM and LightGBM to analyze crop-livestock spatial layouts under complex environmental conditions. The main contributions of this study are summarized as follows: Environmental Baseline Modeling : A deep clustering model (VAE-GMM) is applied to process high-dimensional agricultural environmental features, objectively classifying the NMAIPs into standardized macro-environmental baselines. Data-Driven Suitability Prediction : A multi-label predictive model (LightGBM) is constructed to evaluate the nonlinear mapping relationships between natural conditions and composite crop-livestock spatial layouts. Spatial Discrepancy Quantification : A spatial discrepancy metric is introduced to quantify the difference between model-predicted environmental suitability and actual industrial distribution, providing objective decision support for new park planning. The remainder of this paper is organized as follows. Section 2 reviews related work in agricultural spatial mapping and decision support. Section 3 presents the dataset and the proposed VAE-GMM and LightGBM framework. Section 4 reports the empirical results and discusses the system applications in knowledge extraction and transferable decision-making. Finally, Section 5 concludes the study. 2 Related Work 2.1 Macro-Level Evaluation of NMAIPs Current research on NMAIPs primarily focuses on macroeconomic evaluation and the analysis of regional development characteristics. Researchers have predominantly utilized the entropy-weight TOPSIS and obstacle degree models to evaluate the agglomeration effects of these parks(Zhao, Zhu, Ma, Li, and Tang, 2024 ), employed standard deviational ellipses and DEA-SBM models to analyze their spatial distribution characteristics and operational efficiency(Zhou, Chen, Han, Ling, and Li, 2025 ), and empirically examined the role of park construction in promoting rural industrial integration(X. Sun, Mei, and Yang, 2024 ). Furthermore, some scholars have summarized the planning and construction models of parks across different regions(W. Wang, Lv, Yang, ZHOU, and CHANG, 2020) or explored the environmental effects of park construction concerning resource utilization efficiency(S. Li, Wu, Yu, and Chen, 2023 ). However, the existing literature largely treats the parks as aggregate statistical evaluation units, making it difficult to effectively characterize the micro-level adaptation logic between the internal natural environment and specific crop-livestock spatial layouts, which restricts the precise translation and application of successful layout experiences in newly established projects. To address this, this study shifts the focus to the micro-level natural environmental baseline. By constructing a data-driven spatial analysis framework, it systematically mines the industrial adaptation patterns of national-level parks, aiming to provide precise planning references for new projects. 2.2 Clustering of Complex Agricultural Environmental Data Analyzing the mapping relationship between crop-livestock spatial layouts and geographical environments is fundamental to revealing the production layout logic of NMAIPs. Current related research primarily focuses on identifying environmental impact factors and analyzing the spatial suitability of crop-livestock spatial layouts(Long Wang et al., 2024 ). In agricultural practice, natural factors such as climate, soil, and topography exert a fundamental constraining role in layout decisions(Nde, Fendji, Yenke, and Schöning, 2024 ). For specific agricultural products, existing studies have utilized ecological niche models such as MaxEnt to predict their suitable planting areas(HengYu, Xi, XiaoMao, Lin, and JiaQi, 2024 ). Regarding data mining and analytical methods, early approaches mostly relied on Spatial Multi-Criteria Decision Analysis (MCDA) or expert experience via the Analytic Hierarchy Process (AHP) for agricultural evaluation(Agrawal, Govil, and Kumar, 2025 ; Akpoti, Kabo-bah, and Zwart, 2019 ), These methods rely on expert experience to assign weights to indicators and primarily process a limited number of continuous variables, thereby performing well in small- to medium-scale agricultural zoning. Nevertheless, when applied to the complex, high-dimensional agricultural data of NMAIPs, these methods—constrained by their dependence on limited continuous variables or static weights—struggle to capture the nonlinear interactions among environmental factors(Mugiyo et al., 2021 ). Furthermore, traditional methods are highly susceptible to subjective weighting biases. These restrictions hinder the effective application of complex, high-dimensional agricultural environmental data in macro-level industrial layouts, thereby highlighting the urgent need for a data-driven architecture capable of automatically extracting deep nonlinear manifold representations and overcoming the flaws of linear dimensionality reduction and hard clustering(Guo, Fan, Amayri, and Bouguila, 2025 ). In response to this need, this study introduces a deep clustering framework integrating Variational Autoencoders and Gaussian Mixture Models (VAE-GMM), aiming to achieve effective dimensionality reduction and objective pattern classification of complex agricultural environmental features. 2.3 Deep Learning and Hybrid Modeling in Agricultural Spatial Planning With the popularization of data mining and artificial intelligence technologies, data-driven machine learning models are increasingly applied in intelligent land suitability assessments(Taghizadeh-Mehrjardi, Nabiollahi, Rasoli, Kerry, and Scholten, 2020 ). Compared to traditional statistical models, machine learning demonstrates a significant accuracy advantage when processing multi-source heterogeneous agricultural data(Rani, Mishra, Kataria, Mallik, and Qin, 2023 ), and its immense potential in agricultural management and intelligent decision-making has been confirmed by numerous studies(Benos et al., 2021 ). However, when dealing with complex agricultural environmental data characterized by high dimensionality and strong spatial heterogeneity, traditional machine learning is often constrained by cumbersome manual feature engineering and faces severe performance bottlenecks(Frimpong et al., 2025 ). Consequently, deep learning architectures have been widely introduced into modern agricultural decision systems(He, Li, and Jin, 2025 ). Nevertheless, when processing the high-dimensional, sparse composite crop–livestock data of industrial parks, there is an urgent need for an algorithm that possesses both extremely high computational efficiency and robustness in multi-label classification prediction(Nirmaladevi and Jagatheswari, 2025 ). More importantly, it is often difficult to directly translate the "theoretical suitability probabilities" of crops outputted by existing studies into macro-planning directives. In agricultural practice, crop–livestock selection is constrained not only by natural resources but also strongly regulated by price support policies(Yang et al., 2023 ) and farmers' decision-making behaviors(Xue Li, Yuan, and Han, 2019 ). The influence of anthropogenic (or socio-economic) factors leads to an objective discrepancy between the actual spatial distribution of crops and their theoretical zones of natural suitability(Guan et al., 2025 ). Therefore, constructing a hybrid model capable of both simplifying complex high-dimensional features and providing clear, quantified decision rules by evaluating the difference between theoretical predictions and actual distributions has become a key direction for deepening research in this field. 3 Methodology Figure 1 illustrates the overall workflow of the proposed decision support framework for crop-livestock layouts in National Modern Agricultural Industrial Parks. This workflow begins with the preparation and preprocessing of multi-source datasets, where heterogeneous data encompassing natural conditions and crop-livestock information are standardized and reconstructed into tensors. This step eliminates the effects of different physical units and ensures computational compatibility with downstream deep learning models. Once the data is prepared, a hybrid deep learning architecture integrating a Variational Autoencoder (VAE) and a Gaussian Mixture Model (GMM) is introduced. This phase aims to extract the latent manifold representations of high-dimensional sparse environmental data and, through objective probabilistic clustering, classify the natural environmental baselines of national industrial parks into typical pattern categories. Building on this unsupervised environmental pattern recognition, the framework further integrates a LightGBM classifier to conduct multi-label suitability supervised learning, thereby deriving the theoretical suitability probabilities of various agricultural industries under specific environments. Finally, to transform the prediction results into reusable macro-decision rules, this study calculates the spatial development discrepancy (Gap) by integrating the predicted probabilities with actual distribution frequencies. Under unified experimental settings, the overall performance of the model is validated through multiple evaluation metrics, thereby precisely quantifying the industrial development potential across different environmental baselines. This section details the specific implementation of each phase in the aforementioned workflow. 3.1 Data Acquisition and Description This study focuses on China's National Modern Agricultural Industrial Parks (NMAIPs) announced between 2017 and 2024, selecting 335 valid park records as the research sample. Data were primarily acquired from specialized agricultural databases, public government platforms, and relevant information websites. Information regarding crop-livestock layouts was obtained by integrating multi-source records from the China Institute of High-Tech, Sohu, and official Chinese government portals. In terms of industrial classification, referring to the categorization standards in the China Agricultural Statistical Yearbook and considering the actual crop-livestock planning of each park, the agricultural industries were aggregated into 11 categories: cereals, vegetables, fruits, livestock, aquatic products, tea, medicinal herbs, oil crops, edible fungi, flowers, and specialty crops (referring to regional signature products not included in the aforementioned categories). Climate data, including mean annual temperature, average air humidity, and annual precipitation, were mainly sourced from the China Meteorological Data Service Centre. Topographic data were retrieved via Google Maps, encompassing spatial information such as elevation, longitude, latitude, and terrain details. Soil type data were obtained from platforms such as the Chinese Soil Database and classified according to the Chinese Soil Taxonomy. 3.2 Data Preprocessing To satisfy the rigorous matrix computation requirements of the VAE-GMM clustering and LightGBM prediction models, the raw features of the 335 sampled parks were subjected to outlier removal and numerical reconstruction. The reconstructed baseline data were partitioned into a natural environmental feature matrix (input $\:\:\text{X}\in\:{R}^{335\times\:50}$ ) and a crop−livestock industrial label matrix (output $\:\text{Y}\in\:{R}^{335\times\:11}$ ). Continuous natural environmental features—including temperature, precipitation, elevation, and humidity—were standardized using StandardScaler to eliminate inconsistencies in units and scales. As presented in Table 1 , these processed features were transformed into standard normal tensors with a mean of 0 and a standard deviation of 1. Furthermore, the skewness metrics indicate that the raw environmental data exhibited highly nonlinear characteristics deviating significantly from a normal distribution (e.g., the skewness of elevation reached 2.50). This pronounced non-normality strongly corroborates the necessity of employing a deep generative model (VAE) in subsequent phases to extract latent representations. Table 1 Numerical distribution and standardized tensor mapping of continuous environmental features Physical Metrics (Raw Data) Tensor Metrics Variable Mean SD Min Max Standardized Range Skewness Avg Annual Temp (°C) 13.87 5.11 1.3 26 [− 2.46,2.38] -0.24 Annual Precipitation (mm) 827.07 512.14 3 2300 [− 1.61,2.88] 0.37 Elevation (m) 607.42 791.49 2 4500 [− 0.77,4.93] 2.5 Avg Air Humidity (%) 67.52 10.77 40 85 [− 2.56,1.63] -0.72 For discrete categorical features—such as climate types, topography, soil, and crop-livestock industries—this study employed One-Hot encoding and Multi-Label Binarization (MLB) to map them into high-dimensional sparse matrices. Given that features like topography and soil possess multi-label attributes (i.e., the sum of activation probabilities across sub-features does not equal 100%), and the resulting encoded matrices are highly dimensional (e.g., soil types expanding to 35 dimensions) with long-tailed distributions, the data presentation in this section was truncated to maintain visual focus. As presented in Table 2 , only the core feature columns with the highest activation probabilities ( $\:p$ ) within each macro-category are listed. For the sake of conciseness, the remaining low-frequency features are omitted from the table, though they were fully incorporated into the model for actual computations. Additionally, Table 2 calculates the tensor feature variance ( $\:p\left(1-p\right)$ ) for each column. This metric quantifies the information content under a binomial distribution; feature columns with higher variances provide more significant node-splitting references for the supervised prediction model. Table 2 Sparsity and feature variance evaluation of categorical feature matrices Matrix Macro-Feature Category Expanded Dimensions Core Tensor Column Activation Count Activation Probability ( $\:p$ ) Tensor Feature Variance ( $\:p\left(1-p\right)$ ) Feature Matrix X Climate Type 6 Dims Subtropical monsoon climate 159 47.46% 0.2494 Temperate monsoon climate 94 28.06% 0.2019 Temperate continental climate 63 18.81% 0.1527 Topography 5 Dims Plain 179 53.43% 0.2488 Hilly 155 46.27% 0.2486 Mountainous 99 29.55% 0.2082 Soil Type 35 Dims Paddy soil 140 41.79% 0.2433 Yellow soil 97 28.96% 0.2057 Cinnamon soil 92 27.46% 0.1992 Red soil 86 25.67% 0.1908 Label Matrix Y Crop-Livestock Industry 11 Dims Cereals 112 33.43% 0.2226 Livestock 84 25.07% 0.1879 Fruits 68 20.30% 0.1618 Vegetables 59 17.61% 0.1451 3.3 VAE-GMM Clustering for High-Dimensional Environmental Data The reconstructed natural environmental matrix ( $\:\text{X}\in\:{R}^{335\times\:50}$ ) derived from feature engineering integrates continuous meteorological values with sparse, binarized soil and topographic tensors. Traditional linear dimensionality reduction techniques (e.g., PCA) and hard clustering algorithms struggle to effectively process such complex data structures(Jabed and Murad, 2024 ). Consequently, this study constructs a hybrid deep learning framework integrating a Variational Autoencoder (VAE) and a Gaussian Mixture Model (GMM) to cascade nonlinear dimensionality reduction with probabilistic clustering. This cascaded computation constitutes the core methodology for parsing complex agricultural environmental baselines. To ensure experimental reproducibility, all algorithms were implemented in a Python environment utilizing the PyTorch and Scikit-learn frameworks, with a uniformly fixed global random seed (Random Seed = 42). Prior to deep feature extraction, the 50-dimensional environmental feature matrix $\:\text{X}$ was normalized using a MinMaxScaler to align with the numerical boundaries of the activation functions in the deep neural network. Subsequently, the normalized tensors were fed into the VAE model. To enhance nonlinear representation capabilities, both the Encoder and Decoder were configured as two-layer fully connected networks utilizing ReLU activation functions. The specific hyperparameter configurations for the experiment are detailed in Table 3 . Table 3 Core hyperparameter configurations of the VAE model Module Key Parameters Setting Architecture Hidden Layers Two-layer fully connected (64–64) Activation Activation function ReLU Latent Space $\:{Z}_{dim}$ 10 Optimizer Algorithm Adam Learning rate 1×10 − 3 Training BatchSize/Epochs 32/150 Loss $\:{L}_{VAE}$ Mean Squared Error (MSE) + KL divergence The encoder compresses and maps the high-dimensional sparse inputs into a continuous latent space, outputting a 10-dimensional latent feature vector (Latent Vector $\:Z$ ) via the reparameterization trick. During the training phase, the model undergoes 150 epochs of iteration, jointly minimizing a loss function composed of the reconstruction error (Mean Squared Error, MSE) and the Kullback-Leibler (KL) divergence, until the network achieves stable convergence. Upon the completion of training, this study extracts the 10-dimensional latent features $\:Z$ outputted by the encoder. This step effectively filters out the sparse noise inherent in the original matrix and distills low-dimensional environmental manifold features with robust representation capabilities, serving as the standardized input for downstream clustering. Following the acquisition of the dimensionally reduced features, probabilistic clustering experiments of the natural environment were conducted based on the GMM and BIC criteria. Given the inherent continuity and transitional nature of the spatial distribution of agricultural natural resources, employing GMM for soft clustering more accurately delineates the fuzzy boundaries of ecological management zones compared to traditional hard clustering(J. Sun, Arellano, Wang, and Mouazen, 2025 ). This study inputs the 10-dimensional latent features $\:Z$ extracted by the VAE into the GMM model for probabilistic fitting. To objectively determine the optimal number of environmental classifications and preclude the interference of subjective experience, the Bayesian Information Criterion (BIC) is introduced as a penalty term to strike a balance between the model's fitting accuracy and complexity(Nalisnick, Hertel, and Smyth, 2016 ). During the cluster optimization process, the Expectation-Maximization (EM) algorithm conducts iterative testing within a cluster range of $\:K\in\:\left[\text{2,10}\right]$ . Ultimately, the number of clusters $\:K$ at which the BIC score reaches its global minimum is selected, thereby achieving a data-driven and objective classification of the natural environmental patterns of industrial parks nationwide. 3.4 LightGBM-Based Multi-Label Suitability Prediction This study employs LightGBM to conduct suitability predictions for crop-livestock suitability. LightGBM is a highly efficient machine learning framework based on gradient boosting decision trees, capable of rapidly processing large-scale data while maintaining robust model performance(Yadav, Jadhav, Kakade, Pangare, and Bhutali, 2025 ). Considering that the actual layouts of agricultural parks frequently manifest as composite crop-livestock layouts, the prediction task in this study is fundamentally formulated as a multi-label classification problem. During the experimental design phase, the model utilizes the clustering results derived from the VAE-GMM as environmental baseline patterns, alongside the reconstructed standardized features, as inputs. Given the significant distributional disparities among different agricultural industries within the samples (e.g., an abundance of cereal samples versus a scarcity of flowers and specialty crops), the model activated the Class Weight "Balanced" strategy. This approach assigns higher misclassification penalties to minority classes, thereby enhancing the model's recognition and predictive capabilities for small-sample specialty crops. To facilitate multi-label prediction, this study adopted the One-Vs-Rest (OvR) strategy to encapsulate the LightGBM classifier, effectively transforming the 11 industrial target categories into 11 independent binary classification sub-tasks for joint resolution. To ensure the consistency and reproducibility of the experimental results, the global random seed was uniformly fixed at 42. The specific core hyperparameter configurations are detailed in Table 4 . Table 4 Core hyperparameter configurations of the LightGBM prediction model Module Key Parameters Setting Estimator LightGBM LGBM Classifier Strategy One-Vs-Rest(OvR) Enabled Tree Setup Number of weak learners 200 Optimization Learning Rate 0.05 Distribution Class Weight Balanced In the model evaluation and validation phase, to prevent overfitting and ensure the robustness of the predictive assessment, this study employs a 5-fold cross-validation strategy for global model training. The experiment randomly shuffles the 335 samples and divides them equally into five subsets, iteratively conducting training and probability prediction. For any given park sample, the model ultimately outputs a continuous theoretical suitability probability matrix across the 11 industrial categories ( $\:P\in\:{R}^{335\times\:11}$ ). To translate the continuous predicted probabilities into actionable agricultural planning decisions and classification labels, this study establishes a decision threshold of 0.3 based on a priori testing. Specifically, when the predicted probability of a given park environment for a specific industry reaches $\:p\:\ge\:0.3$ , the experiment determines that this natural baseline exhibits crop-livestock suitability for that industry. The experimental rationale behind this threshold setting is to maximize the exploration of the environment's potential carrying capacity while maintaining precision. This provides a quantitative foundation for the subsequent calculation of the spatial development discrepancy (Gap) between the predicted suitability probabilities and the actual distribution frequencies. 3.5 Model Validation and Potential Evaluation To verify the reliability of the analytical framework integrating VAE-GMM and LightGBM in parsing agricultural layout patterns, and to demonstrate its decision support capability for newly established parks, this study formulated analytical strategies for model validation and Potential Evaluation. The existing layouts of national-level parks reflect highly efficient production patterns under specific natural conditions, thus serving as reference standards under corresponding environmental baselines(Pilevar, Matinfar, Sohrabi, and Sarmadian, 2020 ). The fundamental logic is that if the model can accurately reconstruct the typical crop-livestock layouts of existing NMAIPs based on natural environmental features, it demonstrates that the model has effectively captured the matching patterns between the environment and agricultural industries. Consequently, it can serve as an empirical reference model to provide a transferable objective basis for the planning of newly established regional industrial parks. Regarding algorithm performance evaluation, given that the crop-livestock prediction in this study constitutes a multi-label classification task with sample imbalances across categories, the experiment employs the Micro-average Receiver Operating Characteristic (ROC) curve and Area Under the Curve (AUC), alongside the Weighted F1-score, as the core evaluation metrics. By globally aggregating the true positive and false positive samples of all categories, the micro-average strategy objectively reflects the comprehensive predictive efficacy of the model in the overall multi-label task. In the model ecological rationality validation phase, using the environmental categories delineated by VAE-GMM as the baseline, this study utilizes the predictive match rate of the LightGBM model under each category for cross-validation. By employing radar charts that treat the existing crop-livestock structures of the parks as the standard for model validity verification, the calculation of the predictive match rate quantifies the degree of overlap between the model-predicted industrial sets and the actual crop-livestock layouts: $$\:\text{M}\text{a}\text{t}\text{c}{\text{h}}_{\text{r}}\text{a}\text{t}\text{e}=\frac{\left|{S}_{pred}\cap\:{S}_{true}\right|}{\left|{S}_{true}\right|}$$ 1 In Eq. 1 , $\:{S}_{pred}$ represents the suitable crop-livestock categories predicted by the model based on natural conditions, and $\:{S}_{true}$ denotes the actual industrial set selected by the park. By comparing the match rate and environmental features across different clustering patterns, it can be verified whether the model successfully captures the constraints and adaptability of the natural environment regarding agricultural layouts. In the industrial Potential Evaluation phase, existing studies indicate that due to factors such as market fluctuations, policy interventions, or farmers' preferences, a significant objective discrepancy often exists between the actual spatial distribution of agricultural industries and their purely naturally suitable areas(X. Wang et al., 2025 ). To accurately quantify the degree of deviation between the theoretical suitability and the actual distribution status under specific natural environmental conditions(Zhang, Hong, Sun, Hao, and Ai, 2025 ), and to translate this into concrete planning references for newly established parks, this study innovatively defines an industrial discrepancy index (Gap). This index is calculated as the difference between the theoretical suitability probability output by the LightGBM model and the actual occurrence frequency of the specific industry among existing parks within the same clustering pattern. The specific calculation formula is as follows: $$\:\text{D}\text{i}\text{f}\text{f}={P}_{\text{pred}}-{P}_{\text{true}}$$ 2 In Eq. 2 , $\:{\text{P}}_{\text{p}\text{r}\text{e}\text{d}}$ represents the theoretical suitability proportion (i.e., the ratio of environments evaluated as suitable by the LightGBM model under the 0.3 threshold), and $\:{P}_{true}$ denotes the actual occurrence frequency of that specific industry among existing parks within the same environmental cluster. Conceptually, $\:{P}_{pred}$ answers the environmental question of "what proportion of areas can optimally support this crop," while $\:{P}_{true}$ reflects the socio-economic reality of "what is already widely deployed." Therefore, the Gap index serves as a direct measurement of "untapped spatial potential" for decision support. When the Gap value is significantly greater than zero, it signifies that the specific industry aligns with the current environmental conditions but maintains a relatively low actual distribution proportion in existing parks. This study utilizes bubble charts to display the Gap values across various clustering patterns, aiming to identify suitable industries under different environmental conditions and provide a quantitative reference for industrial selection in newly established parks. 4 Results and Discussion 4.1 Identification of Environmental Baselines To determine the optimal number of clusters for the natural environment of National Modern Agricultural Industrial Parks, this study evaluated the clustering results of the VAE-GMM model using the Bayesian Information Criterion (BIC). As illustrated in Fig. 2 , when the number of clusters ( $\:K$ ) was tested within the range of 2 to 10, the BIC score exhibited an overall trend of decreasing first and then increasing. At $\:K=5$ , the BIC score reached its global minimum (-19,884). According to the BIC criterion, a lower score indicates that the model has achieved a superior balance between data fitting accuracy and model complexity. Therefore, this study identifies 5 as the optimal number of clusters, objectively partitioning the natural environmental features of the sampled parks into five distinct patterns as the baseline for subsequent analysis. This data-driven clustering result effectively mitigates the subjective biases inherent in traditional spatial evaluation methods that rely on manual weighting(Guo et al., 2025 ). Simultaneously, extracting five typical baselines from the complex nationwide agricultural environment objectively reflects ecological similarity across administrative boundaries. This provides a quantifiable classification basis for exploring the layout patterns of agricultural industries under similar natural conditions and for making planning decisions for newly established industrial parks. Based on the determination of five clusters, this study utilized the t-SNE algorithm to perform dimensionality reduction and visualization of the latent environmental features extracted by the VAE-GMM model (as shown in Fig. 3 ). The scatter distribution results demonstrate that the five natural environmental patterns form relatively independent clusters within the two-dimensional latent feature space. The boundaries of each cluster are distinct, with low overlap among the 95% confidence regions. This visualization outcome indicates that the model can effectively distinguish data variances in complex natural environmental factors among the sampled parks, statistically confirming the rationality of classifying the national industrial parks' natural environments into five types. The separation results in the feature space validate the feature extraction efficacy of deep generative models when processing high-dimensional and sparse agricultural data. By filtering out redundant information from the original environmental matrix, the model successfully transforms multidimensional geographical elements into structured macro-environmental baselines. This objective clustering partition not only establishes a spatial reference frame for evaluating agricultural natural suitability but also provides a core analytical framework for subsequent predictive cross-validation and the quantification of industrial expansion discrepancies (Gap) under specific environmental patterns. Statistical results of the natural conditions for each clustering pattern reveal significant numerical differentiation (Fig. 4 ). Based on the distribution ranges and medians of core environmental factors, this study defines the five patterns as follows: Cluster 0 (Low-Altitude Hot and Humid Type) : Characterized by high mean annual temperature (approx. 17°C), humidity (approx. 76%), and precipitation (900–1500 mm), with altitudes generally below 500 m. Cluster 1 (Lowland Cool and Dry Type) : Characterized by low altitudes (approx. 100 m), with mean annual temperature (approx. 13°C) and precipitation (500–600 mm) situated in lower intervals. Cluster 2 (Temperate and Moist Plain Type) : Features the lowest altitudes (approx. 50 m), a mean annual temperature of approx. 16°C, and precipitation ranging between 800 and 1300 mm. Cluster 3 (High-Rainfall Warm and Humid Type) : Exhibits the highest precipitation (approx. 1500 mm) and a high mean annual temperature (approx. 17.5°C). Cluster 4 (High-Altitude Cold and Arid Type) : Displays the highest average altitude (approx. 850 m), while the mean annual temperature (approx. 9°C), humidity (approx. 55%), and precipitation (approx. 450 mm) are the lowest among all categories. Mapping the aforementioned clustering results with defined environmental attributes onto geospatial coordinates (Fig. 5 ) reveals that the various natural environmental patterns exhibit significant regional agglomeration characteristics in their macro-distribution. Specifically, the Low-Altitude Hot and Humid Type (Cluster 0) is widely distributed across the southwest inland and regions south of the Yangtze River; the Lowland Cool and Dry Type (Cluster 1) is highly concentrated in the Northeast Plain; the Temperate and Moist Plain Type (Cluster 2) is primarily aggregated in central and eastern regions such as the North China Plain and the Huang-Huai-Hai Plain; the High-Rainfall Warm and Humid Type (Cluster 3) is concentrated along the southeast coast and South China; and the High-Altitude Cold and Arid Type (Cluster 4) is extensively distributed throughout the northwest inland and western high-altitude fringe areas. These spatial distribution patterns align with the actual distribution of China’s macro-geographical climatic zones(Gong et al., 2024 ), thereby cross-validating the ecological rationality of the unsupervised clustering results from the deep generative model (VAE-GMM) from a geographical mechanism perspective. Crucially, this result converges 335 geographically dispersed industrial parks into five standardized natural environmental baselines. This objective classification, which precludes the interference of administrative divisions, allows parks located in different provinces or cities but sharing similar environmental characteristics to be compared within the same reference frame. This provides the structured prerequisites for the subsequent quantification of agricultural industrial structure discrepancies (Gap) under specific environments and for identifying reference cases for newly established parks based on similar natural conditions. 4.2 Multi-Label Suitability Prediction and Ecological Rationality These identified environmental clusters not only characterize agroecological heterogeneity but also provide a structured basis for subsequent prediction of crop-livestock spatial layouts. To evaluate the performance of the model on the multi-label crop-livestock suitability prediction task, this study plotted the Receiver Operating Characteristic (ROC) curve based on the test set (Fig. 6 ). The results indicate an overall micro-average Area Under the Curve (AUC) value of 0.75. In the independent predictions for various industrial categories, the predictive accuracy exhibited significant numerical variations: Tea achieved the highest AUC (0.86), followed by Aquatic products (0.77) and Oil crops (0.77); Cereals and Specialty crops both yielded an AUC of 0.67; whereas the predictive AUC for the Flowers category was the relatively lowest (0.37). These evaluation metrics demonstrate that the tree-based LightGBM algorithm can effectively parse high-dimensional heterogeneous features and maintain robust accuracy in predicting the majority of foundational industries. This establishes a reliable algorithmic foundation for the subsequent generation of agricultural spatial planning recommendations. To further quantitatively evaluate the model's alignment with actual layouts, Table 5 demonstrates that although foundational industries achieve robust layout hit rates (e.g., 73.21% for cereals and 59.52% for livestock), secondary and specialized categories exhibit a pronounced discrepancy between their Prediction AUC and actual layout hit rates. For instance, environmentally sensitive crops like Tea and Oil Crops maintain excellent classification capability (AUC > 0.75) despite relatively low hit rates, whereas Flowers present an anomaly with an AUC of 0.372 and a 0.00% hit rate. Table 5 Multi-label prediction performance and layout hit rate across crop-livestock categories Crop-Livestock Category Actual Distribution Prediction AUC Layout Hit Rate Cereals 112 0.674 73.21% Livestock 84 0.577 59.52% Fruits 68 0.568 47.06% Vegetables 59 0.585 45.76% Specialty Crops 39 0.67 25.64% Aquatic Products 27 0.771 18.52% Tea 27 0.858 48.15% Medicinal Herbs 17 0.6 11.76% Oil Crops 13 0.77 7.69% Edible Fungi 13 0.529 7.69% Flowers 8 0.372 0.00% These numerical divergences suggest an objective mismatch between theoretical natural suitability and actual agricultural layouts. Computationally, the high AUC yet low hit rate for minority crops (e.g., oil crops) can be attributed to threshold compression caused by class imbalance. Practically, this divergence indicates that actual spatial deployment is associated with broader agricultural dynamics, such as the spatial regulation of cropping patterns(Dai et al., 2025 ). Additionally, the anomaly observed in categories such as flowers (AUC < 0.5, hit rate = 0%) reflects the limitations of purely environment-driven evaluations, as the distribution of high-value crops is often shaped by economic trade-offs and intensive management practices rather than solely by natural climatic conditions(Wu, Li, Deng, and Zhao, 2025 ). Consequently, the macroscopic hit rate of approximately 60% for major industries implies that natural baselines continue to shape the majority of current layouts, while the unhit proportions likely correspond to these complex socio-economic interventions. Crucially, this discrepancy is not merely a statistical artifact, but reflects a structural divergence between environmentally driven suitability and real-world agricultural allocation. Ultimately, this divergence between theoretical potential and current distribution quantifies the margin for spatial optimization. Before exploring this spatial discrepancy (Gap) in Section 4.3 , it is first necessary to visually delineate how these predictive probabilities are distributed across various natural environmental baselines. Furthermore, to delineate the physical applicability of the predictive model across various natural carrying capacity intervals, this study constructed multi-dimensional environmental feature radar charts (Fig. 7 ). The results indicate that the polygonal profiles of each natural environmental type exhibit significant feature differentiation, completely covering the natural environmental gradient from extremely constrained (e.g., High-Altitude Cold and Arid Type, Cluster 4) to climatically favorable (e.g., High-Rainfall Warm and Humid Type, Cluster 3), while the Temperate and Moist Plain Type (Cluster 2) and others constitute objective transitional intervals. This clear delineation of the physical space confirms that each clustering pattern possesses objective independence in terms of climatic and topographic features, thereby providing a reliable physical baseline for the downstream analysis of industrial environmental adaptability. Building upon this, the heatmap of predicted crop-livestock suitability (Fig. 8 ) further elucidates the constraint patterns imposed by natural conditions on various industrial types. Overall, the theoretical suitability of each industry exhibits prominent resource-oriented and adaptive aggregation characteristics. In the High-Altitude Cold and Arid Type (Cluster 4) and Lowland Cool and Dry Type (Cluster 1), where environmental constraints are relatively severe, highly suitable industries are concentrated in hardy livestock and cereal crops. From a macro-ecological perspective, high altitudes and cool climates objectively restrict the growth cycles of most thermophilic cash crops; however, the vast natural resource baseline provides a reasonable carrying capacity for modern livestock husbandry(L Wang, Xiao, Kong, Wu, and Ouyang, 2022 ). Conversely, in the Temperate and Moist Plain Type (Cluster 2) and Low-Altitude Hot and Humid Type (Cluster 0), which possess superior hydrothermal resources, the distribution of suitable industries tends to be diversified, with cereals, fruits, and vegetables all exhibiting high suitability probabilities. These baseline patterns correspond to China's traditional major agricultural production areas, where the abundant combination of light, temperature, and water is sufficient to support the composite cultivation of multiple high-value-added crops(Zhang et al., 2025 ). Furthermore, the High-Rainfall Warm and Humid Type (Cluster 3) demonstrates a unique suitability for specific warm-and-moist-loving crops such as tea, which is highly consistent with the spatial aggregation habitat of the tea industry in the hilly regions of Southern China(Zhu et al., 2025 ). The industrial aggregation characteristics presented by the predictive heatmap deviate from pure mathematical random distribution and strictly adhere to macro-agroecological laws. This not only provides a deep data-level analysis of the objective adaptation mechanisms between different natural environmental baselines and composite crop-livestock layouts but also confirms the model's success in internalizing agricultural expertise into quantitative probabilistic indicators. This theoretical suitability baseline, aligning with realistic ecological logic, further corroborates the foundational constraining role of natural conditions in macro-agricultural spatial layouts(Nde et al., 2024 ). Moreover, it establishes a solid logical premise for the subsequent in-depth mining of industrial expansion potential and spatial discrepancies (Gap) under specific environments. 4.3 Spatial Discrepancies and Decision Support Motivated by the identified mismatch between environmental suitability and actual agricultural layouts, this study proposes a spatial discrepancy (Gap) metric to explicitly quantify this deviation. To transform continuous suitability predictive probabilities into macro-level planning references, this study conducted a quantitative evaluation of the industrial expansion potential under each natural environmental baseline by calculating the Gap between the model-predicted probability and the actual distribution frequency within the same clustering pattern (Fig. 9 ). The bubble chart distribution reveals that high-potential industries under each pattern directly correspond to their respective natural environmental characteristics. In the High-Altitude Cold and Arid Type (Cluster 4) and Lowland Cool and Dry Type (Cluster 1), where climatic conditions are relatively constrained, the objective scope for scale expansion is primarily concentrated in livestock husbandry and cereal crops. Conversely, in the Temperate and Moist Plain Type (Cluster 2) and Low-Altitude Hot and Humid Type (Cluster 0), characterized by superior hydrothermal conditions, fruits and vegetables exhibit more prominent development potential. Furthermore, the High-Rainfall Warm and Humid Type (Cluster 3) demonstrates significant value for extending industrial layouts in specific industrial categories such as tea. The spatial differentiation of the aforementioned industrial potential reveals the objective discrepancy between theoretical industrial suitability zones and actual crop-livestock distributions. In agricultural practice, such deviations are typically constrained by non-natural interventions(X. Wang et al., 2025 ), such as agricultural price support policies (Yang et al., 2023 ) and the micro-level decision-making behaviors of farmers(Xue Li et al., 2019 ). Consequently, the systematically identified high Gap values quantitatively denote advantageous industries that possess high natural suitability but currently maintain relatively low distribution proportions. This index ensures that agricultural spatial assessment is no longer confined to solitary probability predictions; instead, it comprehensively reveals the objective margin for industrial spatial expansion. Building upon this, the analytical framework established in this study provides a standardized decision-making workflow for newly developed agricultural industrial parks: by inputting the natural environmental parameters of a target region, the system maps them to the corresponding baseline environmental type, extracts the relevant Gap indices, and ultimately outputs a prioritized crop-livestock recommendation list. This provides an objective, data-driven reference for the preliminary planning and industrial selection of cross-regional new agricultural industrial parks. 5 Conclusions This study constructed a data-driven modeling and decision-support framework to analyze the spatial adaptation between crop-livestock layouts and natural environments in China's NMAIPs. By integrating a deep clustering model (VAE-GMM) with a multi-label classifier (LightGBM), the framework effectively addresses the high-dimensional and nonlinear characteristics of agricultural environmental data. The modeling results objectively partition the complex national agricultural environment into five typical macro-baselines. Based on these baselines, the predictive model accurately captures the theoretical suitability of various crop-livestock layouts (micro-average AUC = 0.75). Furthermore, by calculating the spatial discrepancy (Gap) between the model-predicted suitability and the actual distribution frequencies, this study quantifies the objective divergence between natural environmental capacity and current agricultural allocation. This Gap metric transforms solitary probability predictions into a quantifiable index of industrial expansion potential, successfully identifying environmentally suitable but underrepresented industries. Overall, this research distills historical spatial layouts into a transferable modeling workflow, providing a standardized reference for the planning of newly established agricultural parks. Future research will aim to incorporate socio-economic variables and high-resolution remote sensing data to further enhance the spatial modeling of dynamic agricultural systems. Declarations Acknowledgements The authors would like to thank the supporting institution for providing the computational resources and research environment required for this study. This research did not receive any specific grant from funding agencies. Declaration of Competing Interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. Funding This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors. Ethical Statement This study does not involve any human or animal subjects performed by any of the authors. Therefore, ethical approval was not required. References Agrawal, N., Govil, H., and Kumar, T. (2025). Agricultural land suitability classification and crop suggestion using machine learning and spatial multicriteria decision analysis in semi-arid ecosystem. Environment, Development and Sustainability, 27 (6), 13689-13726. Akpoti, K., Kabo-bah, A. T., and Zwart, S. J. (2019). Agricultural land suitability analysis: State-of-the-art and outlooks for integration of climate change analysis. Agricultural Systems, 173 , 172-208. Benos, L., Tagarakis, A. C., Dolias, G., Berruto, R., Kateris, D., and Bochtis, D. (2021). Machine learning in agriculture: A comprehensive updated review. Sensors, 21 (11), 3758. Dai, Z.-Z., Duan, J.-J., Liang, H.-Y., Zhu, Z.-Y., Feng, Y.-Z., and Wang, X. (2025). Spatial optimization of cropping patterns of staple crops to enhance supply–demand balance in China. Journal of Rural Studies, 120 , 103869. Dinh, T. D., Théau, J., Pham, T. T. H., Varin, M., Marchal, J., and Genest, M.-A. (2025). Deep learning applied to urban agriculture: spatial-temporal changes of agricultural land in a rapidly urbanizing Southeast Asian city. European Journal of Remote Sensing, 58 (1), 2572109. Frimpong, S. A., Han, M., Zheng, W., Li, X., Akpaku, E., and Obeng, A. P. (2025). Machine and deep learning in agricultural engineering: A comprehensive survey and meta-analysis of techniques, applications, and challenges. Computers, 14 (10), 438. Gong, L., Liao, Y., Han, Z., Jiang, L., Liu, D., and Li, X. (2024). The Effects of Global Warming on Agroclimatic Regions in China: Past and Future. Agronomy, 14 (2), 293. Guan, Q., Tang, J., Davis, K. F., Kong, M., Feng, L., Shi, K., and Schurgers, G. (2025). Improving future agricultural sustainability by optimizing crop distributions in China. PNAS nexus, 4 (1), pgae562. Guo, J., Fan, W., Amayri, M., and Bouguila, N. (2025). Deep clustering analysis via variational autoencoder with Gamma mixture latent embeddings. Neural Networks, 183 , 106979. He, T., Li, M., and Jin, D. (2025). Deep learning-based time series prediction for precision field crop protection. Frontiers in Plant Science, 16 , 1575796. HengYu, Z., Xi, G., XiaoMao, L., Lin, C., and JiaQi, B. (2024). Evaluation of Planting Suitability of Geographical Indication Agricultural Products Based on Ecological Niche Model: The Case of Purple-Skinned Garlic in Shanggao County (in Chinese). Journal of Integrative Agriculture, 57 (18), 3586-3600. Ikendi, S., Lyons, A., and Pathak, T. B. (2025). Advancing Decision Support for Climate Adaptation in Agriculture and Natural Resources. Frontiers in Environmental Science, 13 , 1605176. Jabed, M. A., and Murad, M. A. A. (2024). Crop yield prediction in agriculture: A comprehensive review of machine learning and deep learning approaches, with insights for future research and sustainability. Heliyon, 10 (24). Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., . . . Liu, T.-Y. (2017). Lightgbm: A highly efficient gradient boosting decision tree. Advances in neural information processing systems, 30 . Li, S., Wu, Y., Yu, Q., and Chen, X. (2023). National Agricultural Science and Technology Parks in China: Distribution Characteristics, Innovation Efficiency, and Influencing Factors. Agriculture, 13 (7). doi:10.3390/agriculture13071459 Li, X., Huang, W., and Liu, J. (2025). The Impact of China’s National Modern Agricultural Industrial Parks on Fertilizer Use from the Perspective of Food Security. Sustainability, 17 (24), 11227. Li, X., Yuan, Q., and Han, Y. (2019). ANALYSIS ON THE INFLUENCE MECHANISM OF PRICE SUPPORT POLICIES ON GRAIN PLANTING AREA IN CHINA——BASED ON WHEAT PROVINCIAL-LEVEL PANEL DATA (in Chinese). Chinese Journal of Agricultural Resources and Regional Planning, 40 (01), 89-96. Ling, L., Chen, X., Wu, Y., Li, S., Wei, J., and Zhou, Q. (2023). National Modern Agricultural Industrial Parks: Development Characteristics, Regional Differences, and Experience Inspiration—Case Study of 200 NMAIPs in China. Agronomy, 13 (3), 653. Moharana, P. C., Yadav, B., Malav, L. C., Biswas, H., and Patil, N. G. (2025). Machine Learning-Based Crop Suitability Prediction: An Emerging Technique for Sustainable Agricultural Production in the Desert Region of India. Communications in Soil Science and Plant Analysis, 56 (3), 376-395. Mugiyo, H., Chimonyo, V. G., Sibanda, M., Kunz, R., Masemola, C. R., Modi, A. T., and Mabhaudhi, T. (2021). Evaluation of land suitability methods with reference to neglected and underutilised crop species: A scoping review. Land, 10 (2), 125. Nalisnick, E., Hertel, L., and Smyth, P. (2016). Approximate inference for deep latent gaussian mixtures. Paper presented at the NIPS Workshop on Bayesian Deep Learning. Nde, R. K., Fendji, J. L. E. K., Yenke, B. O., and Schöning, J. (2024). Crop selection. Smart Agricultural Technology, 9 . doi:10.1016/j.atech.2024.100602 Nirmaladevi, S., and Jagatheswari, S. (2025). A Data-Driven Machine Learning Framework for Predicting Total Agricultural Food Grain Yield. Results in Engineering , 106790. Pilevar, A. R., Matinfar, H. R., Sohrabi, A., and Sarmadian, F. (2020). Integrated fuzzy, AHP and GIS techniques for land suitability assessment in semi-arid regions for wheat and maize farming. Ecological Indicators, 110 , 105887. Rani, S., Mishra, A. K., Kataria, A., Mallik, S., and Qin, H. (2023). Machine learning-based optimal crop selection system in smart agriculture. Scientific Reports, 13 (1), 15997. Sun, J., Arellano, M. V., Wang, Y., and Mouazen, A. M. (2025). Optimizing management zone delineation technique for high-dimensional and large-volume datasets in precision agriculture. Precision Agriculture, 26 (6), 93. Sun, X., Mei, Y., and Yang, X. (2024). Does the construction of modern agricultural parks promote rural industrial integration? Empirical evidence from 8325 agricultural parks across China (in Chinese). China Rural Survey, 3 , 39-61. Taghizadeh-Mehrjardi, R., Nabiollahi, K., Rasoli, L., Kerry, R., and Scholten, T. (2020). Land suitability assessment and agricultural production sustainability using machine learning models. Agronomy, 10 (4), 573. Wang, J. (2022). Drivers of the sustainable development of agro-industrial parks: Evidence from Jiangsu Province, China. SAGE Open, 12 (4), 21582440221144415. Wang, L., He, Y., You, F., Han, S., Wang, X., Chen, H., . . . Feng, A. (2024). Analysis on evaluation scale and method of crop planting suitability (in Chinese). Chinese Journal of Agricultural Resources and Regional Planning, 45 (09), 214-221. Wang, L., Xiao, Y., Kong, L., Wu, B., and Ouyang, Z. (2022). Spatiotemporal patterns and early-warning of grassland carrying capacity in the Qinghai-Tibet Plateau. Acta Ecol. Sin, 42 (16), 6684-6694. Wang, W., Lv, J., Yang, X., ZHOU, Z., and CHANG, Z. (2020). Research on construction mode and key technology of modern agricultural industrial park (in Chinese). Journal of Chinese Agricultural Mechanization, 41 , 210-216. Wang, X., Zhao, H., Zhao, G., Qu, X., Cao, C., Qian, J., . . . Han, H. (2025). High-Resolution Crop Mapping and Suitability Assessment in China’s Three Northeastern Provinces (2000–2023): Implications for Optimizing Crop Layout. Agronomy, 15 (11), 2587. Wang, Y., Yuan, Y., Yuan, F., Liu, X., Tian, Y., Zhu, Y., . . . Cao, Q. (2025). Optimizing management zone delineation through advanced dimensionality reduction models and clustering algorithms. Precision Agriculture, 26 (4), 68. Wu, H., Li, Z., Deng, X., and Zhao, Z. (2025). Enhancing agricultural sustainability: Optimizing crop planting structures and spatial layouts within the water-land-energy-economy-environment-food nexus. Geography and Sustainability, 6 (3), 100258. Yadav, S., Jadhav, T. D., Kakade, O. S., Pangare, P. V., and Bhutali, P. S. (2025). A Machine Learning-Based Dynamic Model for Crop Suitability Using Rainfall and Soil Parameters. IJSAT-International Journal on Science and Technology, 16 (2). Yang, J., Liu, H., Wang, X., Long, Y., Shi, H., Yang, P., . . . Sun, J. (2023). Influence of social and economic factors on farmers' planting decision behavior—Pixel-scale simulation with the agent-based model (in Chinese). Chinese Journal of Agricultural Resources and Regional Planning, 44 (03), 186-196. Zhang, Z., Hong, Q., Sun, Y., Hao, J., and Ai, D. (2025). Assessing the Alignment Between Naturally Adaptive Grain Crop Planting Patterns and Staple Food Security in China. Foods, 14 (22), 3870. Zhao, H., Zhu, M., Ma, Z., Li, L., and Tang, H. (2024). Study on agglomeration effect of modern agricultural industrial parks—An empirical analysis based on eight parks in Beijing (in Chinese). Chinese Journal of Agricultural Resources and Regional Planning, 45 (04), 178-189. Zhou, Q., Chen, X., Han, X., Ling, L., and Li, S. (2025). NORTH CHINA PLAIN NATIONAL MODERN AGRICULTURAL INDUSTRIAL PARK:REGIONAL CHARACTERISTICS, EFFICIENCY EVALUATION, AND SUGGESTIONS FOR COUNTERMEASURES (in Chinese). Chinese Journal of Agricultural Resources and Regional Planning, 46 (01), 190-201. Zhu, Q., Shi, Y., Yu, Y., Wang, X., Tang, Y., Ren, L., and Lou, Y. (2025). Impact of Future Climate Change on the Climatic Suitability of Tea Planting on Hainan Island, China. Agronomy, 15 (9), 2196. Additional Declarations No competing interests reported. Cite Share Download PDF Status: Under Review Version 1 posted Reviews received at journal 02 May, 2026 Reviews received at journal 08 Apr, 2026 Reviewers agreed at journal 27 Mar, 2026 Reviewers agreed at journal 26 Mar, 2026 Reviewers invited by journal 26 Mar, 2026 Editor assigned by journal 23 Mar, 2026 Submission checks completed at journal 23 Mar, 2026 First submitted to journal 23 Mar, 2026 You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-9196881","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":613018835,"identity":"896a4a37-201c-44f0-a5c2-f35f39f0c166","order_by":0,"name":"JingHua Wu","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAA6klEQVRIiWNgGAWjYHCChAMMDEDE3gAiwQLEauE5QLwWEAAqloCrJKDF4EbCw8MFv+4k9s98/vDAj5rDDPzsOQYMP3fg1ZJweGbfs8QZt3MMDvYcO8wg2fPGgLH3DG4tZiAtvD2HExtu5zAcZmA7DDQkx4CZsY0ILfNvHn9wmOHfYQZ7orTw/DicuOEGg8FhxjagLRIEtNifeQC0peGw8cYzQL/09qXzSJx5VnCwF48Wyfac5M88fw7Lzjt+/PGHH9+s5fjbkzc++IlHCzAKExiQncEDIg7g0wBMKED5P/iVjIJRMApGwQgHAMWxYu13p6OKAAAAAElFTkSuQmCC","orcid":"","institution":"Sichuan Agricultural University","correspondingAuthor":true,"prefix":"","firstName":"JingHua","middleName":"","lastName":"Wu","suffix":""},{"id":613018836,"identity":"01013507-2dbe-4f25-95f9-e8adaa020821","order_by":1,"name":"ZhuoCheng Xie","email":"","orcid":"","institution":"Sichuan Agricultural University","correspondingAuthor":false,"prefix":"","firstName":"ZhuoCheng","middleName":"","lastName":"Xie","suffix":""}],"badges":[],"createdAt":"2026-03-23 07:25:10","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-9196881/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-9196881/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":105641593,"identity":"5079d9a2-953d-4350-af99-ae24e1982794","added_by":"auto","created_at":"2026-03-28 16:33:09","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":194773,"visible":true,"origin":"","legend":"\u003cp\u003eOverall workflow of the proposed data-driven decision support framework for crop-livestock spatial layouts\u003c/p\u003e","description":"","filename":"1.png","url":"https://assets-eu.researchsquare.com/files/rs-9196881/v1/63afac29137e11e3b5bcaad7.png"},{"id":105728919,"identity":"55d8b3f3-5ef4-4bb9-8978-538c08ffbd8c","added_by":"auto","created_at":"2026-03-30 11:12:59","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":90938,"visible":true,"origin":"","legend":"\u003cp\u003eEvaluation of the optimal number of environmental clusters based on the Bayesian Information Criterion (BIC)\u003c/p\u003e","description":"","filename":"2.png","url":"https://assets-eu.researchsquare.com/files/rs-9196881/v1/d2bdf74c4ffcfd35c075ab98.png"},{"id":105728805,"identity":"c793b84d-5b5d-4ea1-951b-05c9644240b8","added_by":"auto","created_at":"2026-03-30 11:12:45","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":110050,"visible":true,"origin":"","legend":"\u003cp\u003eTwo-dimensional t-SNE visualization of the latent environmental features extracted by the VAE-GMM deep clustering model\u003c/p\u003e","description":"","filename":"3.png","url":"https://assets-eu.researchsquare.com/files/rs-9196881/v1/914433e86f7f71590b96fa25.png"},{"id":105641595,"identity":"9725ab36-a71e-4a0f-ab49-5093013f1436","added_by":"auto","created_at":"2026-03-28 16:33:09","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":53331,"visible":true,"origin":"","legend":"\u003cp\u003eStatistical distribution of core natural environmental factors across the five clustering patterns\u003c/p\u003e","description":"","filename":"4.png","url":"https://assets-eu.researchsquare.com/files/rs-9196881/v1/c12e0982754e52d10e1b8d74.png"},{"id":105728256,"identity":"3b999f95-d824-429e-9e11-28234236425d","added_by":"auto","created_at":"2026-03-30 11:11:08","extension":"png","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":128074,"visible":true,"origin":"","legend":"\u003cp\u003eSpatial distribution mapping of the five natural environmental patterns of National Modern Agricultural Industrial Parks across China\u003c/p\u003e","description":"","filename":"5.png","url":"https://assets-eu.researchsquare.com/files/rs-9196881/v1/4396efd08744eeb56b265e55.png"},{"id":105641600,"identity":"408b00a8-8d98-4c78-8cc0-b36bc3d81387","added_by":"auto","created_at":"2026-03-28 16:33:09","extension":"png","order_by":6,"title":"Figure 6","display":"","copyAsset":false,"role":"figure","size":115365,"visible":true,"origin":"","legend":"\u003cp\u003eReceiver Operating Characteristic (ROC) curve and Area Under the Curve (AUC) for the LightGBM multi-label crop-livestock suitability prediction\u003c/p\u003e","description":"","filename":"6.png","url":"https://assets-eu.researchsquare.com/files/rs-9196881/v1/ed7670c95b8bb468aa0d69e0.png"},{"id":105641597,"identity":"b079c01f-94f3-4a4f-9db8-86feee5c5d9a","added_by":"auto","created_at":"2026-03-28 16:33:09","extension":"png","order_by":7,"title":"Figure 7","display":"","copyAsset":false,"role":"figure","size":86251,"visible":true,"origin":"","legend":"\u003cp\u003eMulti-dimensional environmental feature radar charts illustrating the physical applicability across the five baseline clustering patterns\u003c/p\u003e","description":"","filename":"7.png","url":"https://assets-eu.researchsquare.com/files/rs-9196881/v1/071559bfc04ee010b8447bd1.png"},{"id":105728991,"identity":"c5f195f7-86fc-4987-b7ff-d0ce3111abe5","added_by":"auto","created_at":"2026-03-30 11:13:12","extension":"png","order_by":8,"title":"Figure 8","display":"","copyAsset":false,"role":"figure","size":70801,"visible":true,"origin":"","legend":"\u003cp\u003eHeatmap of the model-predicted theoretical crop-livestock suitability probabilities across different natural environmental baselines\u003c/p\u003e","description":"","filename":"8.png","url":"https://assets-eu.researchsquare.com/files/rs-9196881/v1/5bcc82bc6a013502e8c77f59.png"},{"id":105641601,"identity":"036e2793-c19e-463e-9d6a-1f18f19d87ac","added_by":"auto","created_at":"2026-03-28 16:33:09","extension":"png","order_by":9,"title":"Figure 9","display":"","copyAsset":false,"role":"figure","size":104647,"visible":true,"origin":"","legend":"\u003cp\u003eBubble chart quantifying the spatial discrepancy (Gap) and industrial expansion potential across various environmental patterns\u003c/p\u003e","description":"","filename":"9.png","url":"https://assets-eu.researchsquare.com/files/rs-9196881/v1/56cec4118f6d0f0b23173c51.png"},{"id":108490614,"identity":"a15ebd34-dd2b-4c17-b02a-5822cc23906d","added_by":"auto","created_at":"2026-05-05 09:45:19","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":1227369,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-9196881/v1/ab519ba5-6f79-4d32-9b40-89773133dd15.pdf"}],"financialInterests":"No competing interests reported.","formattedTitle":"A hybrid deep clustering and machine learning-based decision support framework for modeling crop-livestock environmental suitability and spatial discrepancies","fulltext":[{"header":"1 Introduction","content":"\u003cp\u003eThe profound transformation of global agriculture and increasing environmental changes have established Agricultural Industrial Parks (AIPs) as important platforms for promoting agricultural development through coordinated spatial planning and industrial organization(J. Wang, \u003cspan citationid=\"CR29\" class=\"CitationRef\"\u003e2022\u003c/span\u003e). NMAIPs serve as national-level demonstration platforms in China that integrate advanced production factors and core technologies, playing a pivotal role in promoting regional sustainable development and agricultural industrial upgrading(Ling et al., \u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e2023\u003c/span\u003e). The core value of these parks lies in achieving an efficient allocation of production factors through scientifically designed crop-livestock spatial layouts. This is closely linked to regional food security and also contributes to improving agroecological sustainability(Xiaoling Li, Huang, and Liu, \u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e2025\u003c/span\u003e). Natural conditions constitute a key component of agricultural production factors and provide the fundamental basis for crop-livestock spatial layouts. Under heterogeneous environmental conditions, data-driven decision support systems (DSS) are increasingly recognized as effective tools for supporting precise planning and scientific decision-making in agricultural industrial parks(Ikendi, Lyons, and Pathak, \u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e2025\u003c/span\u003e).\u003c/p\u003e \u003cp\u003eEnvironmental baseline data in large-scale agricultural decision systems typically present numerous high-dimensional, sparse, and long-tailed discrete features. As a result, traditional linear dimensionality reduction and conventional clustering methods struggle to effectively capture the complex nonlinear interactions within such data(Y. Wang et al., \u003cspan citationid=\"CR33\" class=\"CitationRef\"\u003e2025\u003c/span\u003e). In recent years, gradient boosting tree algorithms, represented by LightGBM, have demonstrated high computational efficiency in modeling high-dimensional agricultural features(Ke et al., \u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e2017\u003c/span\u003e). Moreover, these methods exhibit strong robustness and stability in predicting crop\u0026ndash;livestock layouts under extreme environmental conditions(Moharana, Yadav, Malav, Biswas, and Patil, \u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e2025\u003c/span\u003e). In parallel, deep learning techniques have shown significant advantages in extracting spatial characteristics of land use(Dinh et al., \u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e2025\u003c/span\u003e) and have been widely applied to model nonlinear relationships in complex ecological systems(Jabed and Murad, \u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e2024\u003c/span\u003e). Nevertheless, how to effectively integrate the strengths of deep generative models in handling high-dimensional sparse features with efficient predictive algorithms to construct a comprehensive agricultural spatial decision support framework remains an open research question.\u003c/p\u003e \u003cp\u003eTo answer this question, this study proposes a deep learning\u0026ndash;based framework for agricultural spatial knowledge discovery and decision support. Using a large-scale dataset of 335 National Modern Agricultural Industrial Parks established between 2017 and 2024, the proposed framework integrates VAE-GMM and LightGBM to analyze crop-livestock spatial layouts under complex environmental conditions. The main contributions of this study are summarized as follows:\u003c/p\u003e \u003cp\u003e \u003cul\u003e \u003cli\u003e \u003cp\u003e \u003cb\u003eEnvironmental Baseline Modeling\u003c/b\u003e: A deep clustering model (VAE-GMM) is applied to process high-dimensional agricultural environmental features, objectively classifying the NMAIPs into standardized macro-environmental baselines.\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003e \u003cb\u003eData-Driven Suitability Prediction\u003c/b\u003e: A multi-label predictive model (LightGBM) is constructed to evaluate the nonlinear mapping relationships between natural conditions and composite crop-livestock spatial layouts.\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003e \u003cb\u003eSpatial Discrepancy Quantification\u003c/b\u003e: A spatial discrepancy metric is introduced to quantify the difference between model-predicted environmental suitability and actual industrial distribution, providing objective decision support for new park planning.\u003c/p\u003e \u003c/li\u003e \u003c/ul\u003e \u003c/p\u003e \u003cp\u003eThe remainder of this paper is organized as follows. Section \u003cspan refid=\"Sec2\" class=\"InternalRef\"\u003e2\u003c/span\u003e reviews related work in agricultural spatial mapping and decision support. Section \u003cspan refid=\"Sec6\" class=\"InternalRef\"\u003e3\u003c/span\u003e presents the dataset and the proposed VAE-GMM and LightGBM framework. Section \u003cspan refid=\"Sec12\" class=\"InternalRef\"\u003e4\u003c/span\u003e reports the empirical results and discusses the system applications in knowledge extraction and transferable decision-making. Finally, Section \u003cspan refid=\"Sec16\" class=\"InternalRef\"\u003e5\u003c/span\u003e concludes the study.\u003c/p\u003e"},{"header":"2 Related Work","content":"\u003cdiv id=\"Sec3\" class=\"Section2\"\u003e \u003ch2\u003e2.1 Macro-Level Evaluation of NMAIPs\u003c/h2\u003e \u003cp\u003eCurrent research on NMAIPs primarily focuses on macroeconomic evaluation and the analysis of regional development characteristics. Researchers have predominantly utilized the entropy-weight TOPSIS and obstacle degree models to evaluate the agglomeration effects of these parks(Zhao, Zhu, Ma, Li, and Tang, \u003cspan citationid=\"CR39\" class=\"CitationRef\"\u003e2024\u003c/span\u003e), employed standard deviational ellipses and DEA-SBM models to analyze their spatial distribution characteristics and operational efficiency(Zhou, Chen, Han, Ling, and Li, \u003cspan citationid=\"CR40\" class=\"CitationRef\"\u003e2025\u003c/span\u003e), and empirically examined the role of park construction in promoting rural industrial integration(X. Sun, Mei, and Yang, \u003cspan citationid=\"CR27\" class=\"CitationRef\"\u003e2024\u003c/span\u003e). Furthermore, some scholars have summarized the planning and construction models of parks across different regions(W. Wang, Lv, Yang, ZHOU, and CHANG, 2020) or explored the environmental effects of park construction concerning resource utilization efficiency(S. Li, Wu, Yu, and Chen, \u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e2023\u003c/span\u003e).\u003c/p\u003e \u003cp\u003eHowever, the existing literature largely treats the parks as aggregate statistical evaluation units, making it difficult to effectively characterize the micro-level adaptation logic between the internal natural environment and specific crop-livestock spatial layouts, which restricts the precise translation and application of successful layout experiences in newly established projects. To address this, this study shifts the focus to the micro-level natural environmental baseline. By constructing a data-driven spatial analysis framework, it systematically mines the industrial adaptation patterns of national-level parks, aiming to provide precise planning references for new projects.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec4\" class=\"Section2\"\u003e \u003ch2\u003e2.2 Clustering of Complex Agricultural Environmental Data\u003c/h2\u003e \u003cp\u003eAnalyzing the mapping relationship between crop-livestock spatial layouts and geographical environments is fundamental to revealing the production layout logic of NMAIPs. Current related research primarily focuses on identifying environmental impact factors and analyzing the spatial suitability of crop-livestock spatial layouts(Long Wang et al., \u003cspan citationid=\"CR30\" class=\"CitationRef\"\u003e2024\u003c/span\u003e). In agricultural practice, natural factors such as climate, soil, and topography exert a fundamental constraining role in layout decisions(Nde, Fendji, Yenke, and Sch\u0026ouml;ning, \u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e2024\u003c/span\u003e). For specific agricultural products, existing studies have utilized ecological niche models such as MaxEnt to predict their suitable planting areas(HengYu, Xi, XiaoMao, Lin, and JiaQi, \u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e2024\u003c/span\u003e). Regarding data mining and analytical methods, early approaches mostly relied on Spatial Multi-Criteria Decision Analysis (MCDA) or expert experience via the Analytic Hierarchy Process (AHP) for agricultural evaluation(Agrawal, Govil, and Kumar, \u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e2025\u003c/span\u003e; Akpoti, Kabo-bah, and Zwart, \u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2019\u003c/span\u003e), These methods rely on expert experience to assign weights to indicators and primarily process a limited number of continuous variables, thereby performing well in small- to medium-scale agricultural zoning.\u003c/p\u003e \u003cp\u003eNevertheless, when applied to the complex, high-dimensional agricultural data of NMAIPs, these methods\u0026mdash;constrained by their dependence on limited continuous variables or static weights\u0026mdash;struggle to capture the nonlinear interactions among environmental factors(Mugiyo et al., \u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e2021\u003c/span\u003e). Furthermore, traditional methods are highly susceptible to subjective weighting biases. These restrictions hinder the effective application of complex, high-dimensional agricultural environmental data in macro-level industrial layouts, thereby highlighting the urgent need for a data-driven architecture capable of automatically extracting deep nonlinear manifold representations and overcoming the flaws of linear dimensionality reduction and hard clustering(Guo, Fan, Amayri, and Bouguila, \u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e2025\u003c/span\u003e). In response to this need, this study introduces a deep clustering framework integrating Variational Autoencoders and Gaussian Mixture Models (VAE-GMM), aiming to achieve effective dimensionality reduction and objective pattern classification of complex agricultural environmental features.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec5\" class=\"Section2\"\u003e \u003ch2\u003e2.3 Deep Learning and Hybrid Modeling in Agricultural Spatial Planning\u003c/h2\u003e \u003cp\u003eWith the popularization of data mining and artificial intelligence technologies, data-driven machine learning models are increasingly applied in intelligent land suitability assessments(Taghizadeh-Mehrjardi, Nabiollahi, Rasoli, Kerry, and Scholten, \u003cspan citationid=\"CR28\" class=\"CitationRef\"\u003e2020\u003c/span\u003e). Compared to traditional statistical models, machine learning demonstrates a significant accuracy advantage when processing multi-source heterogeneous agricultural data(Rani, Mishra, Kataria, Mallik, and Qin, \u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e2023\u003c/span\u003e), and its immense potential in agricultural management and intelligent decision-making has been confirmed by numerous studies(Benos et al., \u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e2021\u003c/span\u003e). However, when dealing with complex agricultural environmental data characterized by high dimensionality and strong spatial heterogeneity, traditional machine learning is often constrained by cumbersome manual feature engineering and faces severe performance bottlenecks(Frimpong et al., \u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e2025\u003c/span\u003e). Consequently, deep learning architectures have been widely introduced into modern agricultural decision systems(He, Li, and Jin, \u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e2025\u003c/span\u003e). Nevertheless, when processing the high-dimensional, sparse composite crop\u0026ndash;livestock data of industrial parks, there is an urgent need for an algorithm that possesses both extremely high computational efficiency and robustness in multi-label classification prediction(Nirmaladevi and Jagatheswari, \u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e2025\u003c/span\u003e).\u003c/p\u003e \u003cp\u003eMore importantly, it is often difficult to directly translate the \"theoretical suitability probabilities\" of crops outputted by existing studies into macro-planning directives. In agricultural practice, crop\u0026ndash;livestock selection is constrained not only by natural resources but also strongly regulated by price support policies(Yang et al., \u003cspan citationid=\"CR37\" class=\"CitationRef\"\u003e2023\u003c/span\u003e) and farmers' decision-making behaviors(Xue Li, Yuan, and Han, \u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e2019\u003c/span\u003e). The influence of anthropogenic (or socio-economic) factors leads to an objective discrepancy between the actual spatial distribution of crops and their theoretical zones of natural suitability(Guan et al., \u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e2025\u003c/span\u003e). Therefore, constructing a hybrid model capable of both simplifying complex high-dimensional features and providing clear, quantified decision rules by evaluating the difference between theoretical predictions and actual distributions has become a key direction for deepening research in this field.\u003c/p\u003e \u003c/div\u003e"},{"header":"3 Methodology","content":"\u003cp\u003eFigure \u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e illustrates the overall workflow of the proposed decision support framework for crop-livestock layouts in National Modern Agricultural Industrial Parks. This workflow begins with the preparation and preprocessing of multi-source datasets, where heterogeneous data encompassing natural conditions and crop-livestock information are standardized and reconstructed into tensors. This step eliminates the effects of different physical units and ensures computational compatibility with downstream deep learning models. Once the data is prepared, a hybrid deep learning architecture integrating a Variational Autoencoder (VAE) and a Gaussian Mixture Model (GMM) is introduced. This phase aims to extract the latent manifold representations of high-dimensional sparse environmental data and, through objective probabilistic clustering, classify the natural environmental baselines of national industrial parks into typical pattern categories. Building on this unsupervised environmental pattern recognition, the framework further integrates a LightGBM classifier to conduct multi-label suitability supervised learning, thereby deriving the theoretical suitability probabilities of various agricultural industries under specific environments. Finally, to transform the prediction results into reusable macro-decision rules, this study calculates the spatial development discrepancy (Gap) by integrating the predicted probabilities with actual distribution frequencies. Under unified experimental settings, the overall performance of the model is validated through multiple evaluation metrics, thereby precisely quantifying the industrial development potential across different environmental baselines. This section details the specific implementation of each phase in the aforementioned workflow.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cdiv id=\"Sec7\" class=\"Section2\"\u003e \u003ch2\u003e3.1 Data Acquisition and Description\u003c/h2\u003e \u003cp\u003eThis study focuses on China's National Modern Agricultural Industrial Parks (NMAIPs) announced between 2017 and 2024, selecting 335 valid park records as the research sample. Data were primarily acquired from specialized agricultural databases, public government platforms, and relevant information websites. Information regarding crop-livestock layouts was obtained by integrating multi-source records from the China Institute of High-Tech, Sohu, and official Chinese government portals. In terms of industrial classification, referring to the categorization standards in the China Agricultural Statistical Yearbook and considering the actual crop-livestock planning of each park, the agricultural industries were aggregated into 11 categories: cereals, vegetables, fruits, livestock, aquatic products, tea, medicinal herbs, oil crops, edible fungi, flowers, and specialty crops (referring to regional signature products not included in the aforementioned categories). Climate data, including mean annual temperature, average air humidity, and annual precipitation, were mainly sourced from the China Meteorological Data Service Centre. Topographic data were retrieved via Google Maps, encompassing spatial information such as elevation, longitude, latitude, and terrain details. Soil type data were obtained from platforms such as the Chinese Soil Database and classified according to the Chinese Soil Taxonomy.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec8\" class=\"Section2\"\u003e \u003ch2\u003e3.2 Data Preprocessing\u003c/h2\u003e \u003cp\u003eTo satisfy the rigorous matrix computation requirements of the VAE-GMM clustering and LightGBM prediction models, the raw features of the 335 sampled parks were subjected to outlier removal and numerical reconstruction. The reconstructed baseline data were partitioned into a natural environmental feature matrix (input\u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:\\:\\text{X}\\in\\:{R}^{335\\times\\:50}\$\u003c/span\u003e\u003c/span\u003e) and a crop\u0026minus;livestock industrial label matrix (output \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:\\text{Y}\\in\\:{R}^{335\\times\\:11}\$\u003c/span\u003e\u003c/span\u003e).\u003c/p\u003e \u003cp\u003eContinuous natural environmental features\u0026mdash;including temperature, precipitation, elevation, and humidity\u0026mdash;were standardized using StandardScaler to eliminate inconsistencies in units and scales. As presented in Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e, these processed features were transformed into standard normal tensors with a mean of 0 and a standard deviation of 1. Furthermore, the skewness metrics indicate that the raw environmental data exhibited highly nonlinear characteristics deviating significantly from a normal distribution (e.g., the skewness of elevation reached 2.50). This pronounced non-normality strongly corroborates the necessity of employing a deep generative model (VAE) in subsequent phases to extract latent representations.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab1\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 1\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eNumerical distribution and standardized tensor mapping of continuous environmental features\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"7\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c7\" colnum=\"7\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colspan=\"5\" nameend=\"c5\" namest=\"c1\"\u003e \u003cp\u003ePhysical Metrics (Raw Data)\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colspan=\"2\" nameend=\"c7\" namest=\"c6\"\u003e \u003cp\u003eTensor Metrics\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eVariable\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eMean\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eSD\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eMin\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eMax\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003eStandardized Range\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003eSkewness\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eAvg Annual Temp (\u0026deg;C)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e13.87\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e5.11\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e1.3\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e26\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e[\u0026minus;\u0026thinsp;2.46,2.38]\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e-0.24\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eAnnual Precipitation (mm)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e827.07\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e512.14\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e3\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e2300\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e[\u0026minus;\u0026thinsp;1.61,2.88]\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e0.37\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eElevation (m)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e607.42\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e791.49\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e2\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e4500\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e[\u0026minus;\u0026thinsp;0.77,4.93]\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e2.5\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eAvg Air Humidity (%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e67.52\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e10.77\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e40\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e85\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e[\u0026minus;\u0026thinsp;2.56,1.63]\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e-0.72\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003eFor discrete categorical features\u0026mdash;such as climate types, topography, soil, and crop-livestock industries\u0026mdash;this study employed One-Hot encoding and Multi-Label Binarization (MLB) to map them into high-dimensional sparse matrices. Given that features like topography and soil possess multi-label attributes (i.e., the sum of activation probabilities across sub-features does not equal 100%), and the resulting encoded matrices are highly dimensional (e.g., soil types expanding to 35 dimensions) with long-tailed distributions, the data presentation in this section was truncated to maintain visual focus. As presented in Table\u0026nbsp;\u003cspan refid=\"Tab2\" class=\"InternalRef\"\u003e2\u003c/span\u003e, only the core feature columns with the highest activation probabilities (\u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:p\$\u003c/span\u003e\u003c/span\u003e) within each macro-category are listed. For the sake of conciseness, the remaining low-frequency features are omitted from the table, though they were fully incorporated into the model for actual computations. Additionally, Table\u0026nbsp;\u003cspan refid=\"Tab2\" class=\"InternalRef\"\u003e2\u003c/span\u003e calculates the tensor feature variance (\u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:p\\left(1-p\\right)\$\u003c/span\u003e\u003c/span\u003e) for each column. This metric quantifies the information content under a binomial distribution; feature columns with higher variances provide more significant node-splitting references for the supervised prediction model.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab2\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 2\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eSparsity and feature variance evaluation of categorical feature matrices\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"7\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c7\" colnum=\"7\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eMatrix\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eMacro-Feature Category\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eExpanded Dimensions\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eCore Tensor Column\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003eActivation Count\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c6\"\u003e \u003cp\u003eActivation Probability (\u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:p\$\u003c/span\u003e\u003c/span\u003e)\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c7\"\u003e \u003cp\u003eTensor Feature Variance (\u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:p\\left(1-p\\right)\$\u003c/span\u003e\u003c/span\u003e)\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\" morerows=\"9\" rowspan=\"10\"\u003e \u003cp\u003eFeature Matrix X\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\" morerows=\"2\" rowspan=\"3\"\u003e \u003cp\u003eClimate Type\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\" morerows=\"2\" rowspan=\"3\"\u003e \u003cp\u003e6 Dims\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eSubtropical monsoon climate\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e159\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e47.46%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e0.2494\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eTemperate monsoon climate\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e94\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e28.06%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e0.2019\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eTemperate continental climate\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e63\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e18.81%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e0.1527\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\" morerows=\"2\" rowspan=\"3\"\u003e \u003cp\u003eTopography\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\" morerows=\"2\" rowspan=\"3\"\u003e \u003cp\u003e5 Dims\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003ePlain\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e179\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e53.43%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e0.2488\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eHilly\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e155\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e46.27%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e0.2486\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eMountainous\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e99\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e29.55%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e0.2082\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\" morerows=\"3\" rowspan=\"4\"\u003e \u003cp\u003eSoil Type\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\" morerows=\"3\" rowspan=\"4\"\u003e \u003cp\u003e35 Dims\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003ePaddy soil\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e140\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e41.79%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e0.2433\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eYellow soil\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e97\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e28.96%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e0.2057\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eCinnamon soil\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e92\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e27.46%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e0.1992\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eRed soil\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e86\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e25.67%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e0.1908\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\" morerows=\"3\" rowspan=\"4\"\u003e \u003cp\u003eLabel Matrix Y\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\" morerows=\"3\" rowspan=\"4\"\u003e \u003cp\u003eCrop-Livestock Industry\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\" morerows=\"3\" rowspan=\"4\"\u003e \u003cp\u003e11 Dims\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eCereals\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e112\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e33.43%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e0.2226\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eLivestock\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e84\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e25.07%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e0.1879\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eFruits\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e68\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e20.30%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e0.1618\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eVegetables\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e59\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e17.61%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e0.1451\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec9\" class=\"Section2\"\u003e \u003ch2\u003e3.3 VAE-GMM Clustering for High-Dimensional Environmental Data\u003c/h2\u003e \u003cp\u003eThe reconstructed natural environmental matrix (\u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:\\text{X}\\in\\:{R}^{335\\times\\:50}\$\u003c/span\u003e\u003c/span\u003e) derived from feature engineering integrates continuous meteorological values with sparse, binarized soil and topographic tensors. Traditional linear dimensionality reduction techniques (e.g., PCA) and hard clustering algorithms struggle to effectively process such complex data structures(Jabed and Murad, \u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e2024\u003c/span\u003e). Consequently, this study constructs a hybrid deep learning framework integrating a Variational Autoencoder (VAE) and a Gaussian Mixture Model (GMM) to cascade nonlinear dimensionality reduction with probabilistic clustering. This cascaded computation constitutes the core methodology for parsing complex agricultural environmental baselines. To ensure experimental reproducibility, all algorithms were implemented in a Python environment utilizing the PyTorch and Scikit-learn frameworks, with a uniformly fixed global random seed (Random Seed\u0026thinsp;=\u0026thinsp;42).\u003c/p\u003e \u003cp\u003ePrior to deep feature extraction, the 50-dimensional environmental feature matrix \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:\\text{X}\$\u003c/span\u003e\u003c/span\u003e was normalized using a MinMaxScaler to align with the numerical boundaries of the activation functions in the deep neural network. Subsequently, the normalized tensors were fed into the VAE model. To enhance nonlinear representation capabilities, both the Encoder and Decoder were configured as two-layer fully connected networks utilizing ReLU activation functions. The specific hyperparameter configurations for the experiment are detailed in Table\u0026nbsp;\u003cspan refid=\"Tab3\" class=\"InternalRef\"\u003e3\u003c/span\u003e.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab3\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 3\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eCore hyperparameter configurations of the VAE model\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"3\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eModule\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eKey Parameters\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eSetting\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eArchitecture\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eHidden Layers\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eTwo-layer fully connected (64\u0026ndash;64)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eActivation\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eActivation function\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eReLU\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eLatent Space\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e\u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:{Z}_{dim}\$\u003c/span\u003e\u003c/span\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e10\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003eOptimizer\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eAlgorithm\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eAdam\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eLearning rate\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e1\u0026times;10\u0026thinsp;\u0026minus;\u0026thinsp;3\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eTraining\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eBatchSize/Epochs\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e32/150\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eLoss\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e\u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:{L}_{VAE}\$\u003c/span\u003e\u003c/span\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eMean Squared Error (MSE)\u0026thinsp;+\u0026thinsp;KL divergence\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003eThe encoder compresses and maps the high-dimensional sparse inputs into a continuous latent space, outputting a 10-dimensional latent feature vector (Latent Vector \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:Z\$\u003c/span\u003e\u003c/span\u003e) via the reparameterization trick. During the training phase, the model undergoes 150 epochs of iteration, jointly minimizing a loss function composed of the reconstruction error (Mean Squared Error, MSE) and the Kullback-Leibler (KL) divergence, until the network achieves stable convergence. Upon the completion of training, this study extracts the 10-dimensional latent features \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:Z\$\u003c/span\u003e\u003c/span\u003e outputted by the encoder. This step effectively filters out the sparse noise inherent in the original matrix and distills low-dimensional environmental manifold features with robust representation capabilities, serving as the standardized input for downstream clustering.\u003c/p\u003e \u003cp\u003eFollowing the acquisition of the dimensionally reduced features, probabilistic clustering experiments of the natural environment were conducted based on the GMM and BIC criteria. Given the inherent continuity and transitional nature of the spatial distribution of agricultural natural resources, employing GMM for soft clustering more accurately delineates the fuzzy boundaries of ecological management zones compared to traditional hard clustering(J. Sun, Arellano, Wang, and Mouazen, \u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e2025\u003c/span\u003e). This study inputs the 10-dimensional latent features \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:Z\$\u003c/span\u003e\u003c/span\u003e extracted by the VAE into the GMM model for probabilistic fitting. To objectively determine the optimal number of environmental classifications and preclude the interference of subjective experience, the Bayesian Information Criterion (BIC) is introduced as a penalty term to strike a balance between the model's fitting accuracy and complexity(Nalisnick, Hertel, and Smyth, \u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e2016\u003c/span\u003e). During the cluster optimization process, the Expectation-Maximization (EM) algorithm conducts iterative testing within a cluster range of \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:K\\in\\:\\left[\\text{2,10}\\right]\$\u003c/span\u003e\u003c/span\u003e. Ultimately, the number of clusters \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:K\$\u003c/span\u003e\u003c/span\u003e at which the BIC score reaches its global minimum is selected, thereby achieving a data-driven and objective classification of the natural environmental patterns of industrial parks nationwide.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec10\" class=\"Section2\"\u003e \u003ch2\u003e3.4 LightGBM-Based Multi-Label Suitability Prediction\u003c/h2\u003e \u003cp\u003eThis study employs LightGBM to conduct suitability predictions for crop-livestock suitability. LightGBM is a highly efficient machine learning framework based on gradient boosting decision trees, capable of rapidly processing large-scale data while maintaining robust model performance(Yadav, Jadhav, Kakade, Pangare, and Bhutali, \u003cspan citationid=\"CR36\" class=\"CitationRef\"\u003e2025\u003c/span\u003e). Considering that the actual layouts of agricultural parks frequently manifest as composite crop-livestock layouts, the prediction task in this study is fundamentally formulated as a multi-label classification problem.\u003c/p\u003e \u003cp\u003eDuring the experimental design phase, the model utilizes the clustering results derived from the VAE-GMM as environmental baseline patterns, alongside the reconstructed standardized features, as inputs. Given the significant distributional disparities among different agricultural industries within the samples (e.g., an abundance of cereal samples versus a scarcity of flowers and specialty crops), the model activated the Class Weight \"Balanced\" strategy. This approach assigns higher misclassification penalties to minority classes, thereby enhancing the model's recognition and predictive capabilities for small-sample specialty crops.\u003c/p\u003e \u003cp\u003eTo facilitate multi-label prediction, this study adopted the One-Vs-Rest (OvR) strategy to encapsulate the LightGBM classifier, effectively transforming the 11 industrial target categories into 11 independent binary classification sub-tasks for joint resolution. To ensure the consistency and reproducibility of the experimental results, the global random seed was uniformly fixed at 42. The specific core hyperparameter configurations are detailed in Table\u0026nbsp;\u003cspan refid=\"Tab4\" class=\"InternalRef\"\u003e4\u003c/span\u003e.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab4\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 4\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eCore hyperparameter configurations of the LightGBM prediction model\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"3\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eModule\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eKey Parameters\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eSetting\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eEstimator\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eLightGBM\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eLGBM Classifier\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eStrategy\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eOne-Vs-Rest(OvR)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eEnabled\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eTree Setup\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eNumber of weak learners\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e200\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eOptimization\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eLearning Rate\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.05\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eDistribution\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eClass Weight\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eBalanced\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003eIn the model evaluation and validation phase, to prevent overfitting and ensure the robustness of the predictive assessment, this study employs a 5-fold cross-validation strategy for global model training. The experiment randomly shuffles the 335 samples and divides them equally into five subsets, iteratively conducting training and probability prediction. For any given park sample, the model ultimately outputs a continuous theoretical suitability probability matrix across the 11 industrial categories (\u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:P\\in\\:{R}^{335\\times\\:11}\$\u003c/span\u003e\u003c/span\u003e).\u003c/p\u003e \u003cp\u003eTo translate the continuous predicted probabilities into actionable agricultural planning decisions and classification labels, this study establishes a decision threshold of 0.3 based on a priori testing. Specifically, when the predicted probability of a given park environment for a specific industry reaches \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:p\\:\\ge\\:0.3\$\u003c/span\u003e\u003c/span\u003e, the experiment determines that this natural baseline exhibits crop-livestock suitability for that industry. The experimental rationale behind this threshold setting is to maximize the exploration of the environment's potential carrying capacity while maintaining precision. This provides a quantitative foundation for the subsequent calculation of the spatial development discrepancy (Gap) between the predicted suitability probabilities and the actual distribution frequencies.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec11\" class=\"Section2\"\u003e \u003ch2\u003e3.5 Model Validation and Potential Evaluation\u003c/h2\u003e \u003cp\u003eTo verify the reliability of the analytical framework integrating VAE-GMM and LightGBM in parsing agricultural layout patterns, and to demonstrate its decision support capability for newly established parks, this study formulated analytical strategies for model validation and Potential Evaluation. The existing layouts of national-level parks reflect highly efficient production patterns under specific natural conditions, thus serving as reference standards under corresponding environmental baselines(Pilevar, Matinfar, Sohrabi, and Sarmadian, \u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e2020\u003c/span\u003e). The fundamental logic is that if the model can accurately reconstruct the typical crop-livestock layouts of existing NMAIPs based on natural environmental features, it demonstrates that the model has effectively captured the matching patterns between the environment and agricultural industries. Consequently, it can serve as an empirical reference model to provide a transferable objective basis for the planning of newly established regional industrial parks.\u003c/p\u003e \u003cp\u003eRegarding algorithm performance evaluation, given that the crop-livestock prediction in this study constitutes a multi-label classification task with sample imbalances across categories, the experiment employs the Micro-average Receiver Operating Characteristic (ROC) curve and Area Under the Curve (AUC), alongside the Weighted F1-score, as the core evaluation metrics. By globally aggregating the true positive and false positive samples of all categories, the micro-average strategy objectively reflects the comprehensive predictive efficacy of the model in the overall multi-label task.\u003c/p\u003e \u003cp\u003eIn the model ecological rationality validation phase, using the environmental categories delineated by VAE-GMM as the baseline, this study utilizes the predictive match rate of the LightGBM model under each category for cross-validation. By employing radar charts that treat the existing crop-livestock structures of the parks as the standard for model validity verification, the calculation of the predictive match rate quantifies the degree of overlap between the model-predicted industrial sets and the actual crop-livestock layouts:\u003cdiv id=\"Equ1\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ1\" name=\"EquationSource\"\u003e\n$$\\:\\text{M}\\text{a}\\text{t}\\text{c}{\\text{h}}_{\\text{r}}\\text{a}\\text{t}\\text{e}=\\frac{\\left|{S}_{pred}\\cap\\:{S}_{true}\\right|}{\\left|{S}_{true}\\right|}$$\u003c/div\u003e\u003cdiv class=\"EquationNumber\"\u003e1\u003c/div\u003e\u003c/div\u003e\u003c/p\u003e \u003cp\u003eIn Eq.\u0026nbsp;\u003cspan refid=\"Equ1\" class=\"InternalRef\"\u003e1\u003c/span\u003e, \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:{S}_{pred}\$\u003c/span\u003e\u003c/span\u003e represents the suitable crop-livestock categories predicted by the model based on natural conditions, and \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:{S}_{true}\$\u003c/span\u003e\u003c/span\u003e denotes the actual industrial set selected by the park. By comparing the match rate and environmental features across different clustering patterns, it can be verified whether the model successfully captures the constraints and adaptability of the natural environment regarding agricultural layouts.\u003c/p\u003e \u003cp\u003eIn the industrial Potential Evaluation phase, existing studies indicate that due to factors such as market fluctuations, policy interventions, or farmers' preferences, a significant objective discrepancy often exists between the actual spatial distribution of agricultural industries and their purely naturally suitable areas(X. Wang et al., \u003cspan citationid=\"CR33\" class=\"CitationRef\"\u003e2025\u003c/span\u003e). To accurately quantify the degree of deviation between the theoretical suitability and the actual distribution status under specific natural environmental conditions(Zhang, Hong, Sun, Hao, and Ai, \u003cspan citationid=\"CR38\" class=\"CitationRef\"\u003e2025\u003c/span\u003e), and to translate this into concrete planning references for newly established parks, this study innovatively defines an industrial discrepancy index (Gap). This index is calculated as the difference between the theoretical suitability probability output by the LightGBM model and the actual occurrence frequency of the specific industry among existing parks within the same clustering pattern. The specific calculation formula is as follows:\u003cdiv id=\"Equ2\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ2\" name=\"EquationSource\"\u003e\n$$\\:\\text{D}\\text{i}\\text{f}\\text{f}={P}_{\\text{pred}}-{P}_{\\text{true}}$$\u003c/div\u003e\u003cdiv class=\"EquationNumber\"\u003e2\u003c/div\u003e\u003c/div\u003e\u003c/p\u003e \u003cp\u003eIn Eq.\u0026nbsp;\u003cspan refid=\"Equ2\" class=\"InternalRef\"\u003e2\u003c/span\u003e, \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:{\\text{P}}_{\\text{p}\\text{r}\\text{e}\\text{d}}\$\u003c/span\u003e\u003c/span\u003e represents the theoretical suitability proportion (i.e., the ratio of environments evaluated as suitable by the LightGBM model under the 0.3 threshold), and \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:{P}_{true}\$\u003c/span\u003e\u003c/span\u003e denotes the actual occurrence frequency of that specific industry among existing parks within the same environmental cluster. Conceptually, \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:{P}_{pred}\$\u003c/span\u003e\u003c/span\u003e answers the environmental question of \"what proportion of areas can optimally support this crop,\" while \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:{P}_{true}\$\u003c/span\u003e\u003c/span\u003e reflects the socio-economic reality of \"what is already widely deployed.\" Therefore, the Gap index serves as a direct measurement of \"untapped spatial potential\" for decision support. When the Gap value is significantly greater than zero, it signifies that the specific industry aligns with the current environmental conditions but maintains a relatively low actual distribution proportion in existing parks. This study utilizes bubble charts to display the Gap values across various clustering patterns, aiming to identify suitable industries under different environmental conditions and provide a quantitative reference for industrial selection in newly established parks.\u003c/p\u003e \u003c/div\u003e"},{"header":"4 Results and Discussion","content":"\u003cdiv id=\"Sec13\" class=\"Section2\"\u003e \u003ch2\u003e4.1 Identification of Environmental Baselines\u003c/h2\u003e \u003cp\u003eTo determine the optimal number of clusters for the natural environment of National Modern Agricultural Industrial Parks, this study evaluated the clustering results of the VAE-GMM model using the Bayesian Information Criterion (BIC). As illustrated in Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003e, when the number of clusters (\u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:K\$\u003c/span\u003e\u003c/span\u003e) was tested within the range of 2 to 10, the BIC score exhibited an overall trend of decreasing first and then increasing. At \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:K=5\$\u003c/span\u003e\u003c/span\u003e, the BIC score reached its global minimum (-19,884). According to the BIC criterion, a lower score indicates that the model has achieved a superior balance between data fitting accuracy and model complexity. Therefore, this study identifies 5 as the optimal number of clusters, objectively partitioning the natural environmental features of the sampled parks into five distinct patterns as the baseline for subsequent analysis.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eThis data-driven clustering result effectively mitigates the subjective biases inherent in traditional spatial evaluation methods that rely on manual weighting(Guo et al., \u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e2025\u003c/span\u003e). Simultaneously, extracting five typical baselines from the complex nationwide agricultural environment objectively reflects ecological similarity across administrative boundaries. This provides a quantifiable classification basis for exploring the layout patterns of agricultural industries under similar natural conditions and for making planning decisions for newly established industrial parks.\u003c/p\u003e \u003cp\u003eBased on the determination of five clusters, this study utilized the t-SNE algorithm to perform dimensionality reduction and visualization of the latent environmental features extracted by the VAE-GMM model (as shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003e). The scatter distribution results demonstrate that the five natural environmental patterns form relatively independent clusters within the two-dimensional latent feature space. The boundaries of each cluster are distinct, with low overlap among the 95% confidence regions. This visualization outcome indicates that the model can effectively distinguish data variances in complex natural environmental factors among the sampled parks, statistically confirming the rationality of classifying the national industrial parks' natural environments into five types.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eThe separation results in the feature space validate the feature extraction efficacy of deep generative models when processing high-dimensional and sparse agricultural data. By filtering out redundant information from the original environmental matrix, the model successfully transforms multidimensional geographical elements into structured macro-environmental baselines. This objective clustering partition not only establishes a spatial reference frame for evaluating agricultural natural suitability but also provides a core analytical framework for subsequent predictive cross-validation and the quantification of industrial expansion discrepancies (Gap) under specific environmental patterns.\u003c/p\u003e \u003cp\u003eStatistical results of the natural conditions for each clustering pattern reveal significant numerical differentiation (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003e). Based on the distribution ranges and medians of core environmental factors, this study defines the five patterns as follows:\u003c/p\u003e \u003cp\u003e \u003cul\u003e \u003cli\u003e \u003cp\u003e \u003cb\u003eCluster 0 (Low-Altitude Hot and Humid Type)\u003c/b\u003e: Characterized by high mean annual temperature (approx. 17\u0026deg;C), humidity (approx. 76%), and precipitation (900\u0026ndash;1500 mm), with altitudes generally below 500 m.\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003e \u003cb\u003eCluster 1 (Lowland Cool and Dry Type)\u003c/b\u003e: Characterized by low altitudes (approx. 100 m), with mean annual temperature (approx. 13\u0026deg;C) and precipitation (500\u0026ndash;600 mm) situated in lower intervals.\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003e \u003cb\u003eCluster 2 (Temperate and Moist Plain Type)\u003c/b\u003e: Features the lowest altitudes (approx. 50 m), a mean annual temperature of approx. 16\u0026deg;C, and precipitation ranging between 800 and 1300 mm.\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003e \u003cb\u003eCluster 3 (High-Rainfall Warm and Humid Type)\u003c/b\u003e: Exhibits the highest precipitation (approx. 1500 mm) and a high mean annual temperature (approx. 17.5\u0026deg;C).\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003e \u003cb\u003eCluster 4 (High-Altitude Cold and Arid Type)\u003c/b\u003e: Displays the highest average altitude (approx. 850 m), while the mean annual temperature (approx. 9\u0026deg;C), humidity (approx. 55%), and precipitation (approx. 450 mm) are the lowest among all categories.\u003c/p\u003e \u003c/li\u003e \u003c/ul\u003e \u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eMapping the aforementioned clustering results with defined environmental attributes onto geospatial coordinates (Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003e) reveals that the various natural environmental patterns exhibit significant regional agglomeration characteristics in their macro-distribution. Specifically, the Low-Altitude Hot and Humid Type (Cluster 0) is widely distributed across the southwest inland and regions south of the Yangtze River; the Lowland Cool and Dry Type (Cluster 1) is highly concentrated in the Northeast Plain; the Temperate and Moist Plain Type (Cluster 2) is primarily aggregated in central and eastern regions such as the North China Plain and the Huang-Huai-Hai Plain; the High-Rainfall Warm and Humid Type (Cluster 3) is concentrated along the southeast coast and South China; and the High-Altitude Cold and Arid Type (Cluster 4) is extensively distributed throughout the northwest inland and western high-altitude fringe areas. These spatial distribution patterns align with the actual distribution of China\u0026rsquo;s macro-geographical climatic zones(Gong et al., \u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e2024\u003c/span\u003e), thereby cross-validating the ecological rationality of the unsupervised clustering results from the deep generative model (VAE-GMM) from a geographical mechanism perspective.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eCrucially, this result converges 335 geographically dispersed industrial parks into five standardized natural environmental baselines. This objective classification, which precludes the interference of administrative divisions, allows parks located in different provinces or cities but sharing similar environmental characteristics to be compared within the same reference frame. This provides the structured prerequisites for the subsequent quantification of agricultural industrial structure discrepancies (Gap) under specific environments and for identifying reference cases for newly established parks based on similar natural conditions.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec14\" class=\"Section2\"\u003e \u003ch2\u003e4.2 Multi-Label Suitability Prediction and Ecological Rationality\u003c/h2\u003e \u003cp\u003eThese identified environmental clusters not only characterize agroecological heterogeneity but also provide a structured basis for subsequent prediction of crop-livestock spatial layouts. To evaluate the performance of the model on the multi-label crop-livestock suitability prediction task, this study plotted the Receiver Operating Characteristic (ROC) curve based on the test set (Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003e). The results indicate an overall micro-average Area Under the Curve (AUC) value of 0.75. In the independent predictions for various industrial categories, the predictive accuracy exhibited significant numerical variations: Tea achieved the highest AUC (0.86), followed by Aquatic products (0.77) and Oil crops (0.77); Cereals and Specialty crops both yielded an AUC of 0.67; whereas the predictive AUC for the Flowers category was the relatively lowest (0.37).\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eThese evaluation metrics demonstrate that the tree-based LightGBM algorithm can effectively parse high-dimensional heterogeneous features and maintain robust accuracy in predicting the majority of foundational industries. This establishes a reliable algorithmic foundation for the subsequent generation of agricultural spatial planning recommendations.\u003c/p\u003e \u003cp\u003eTo further quantitatively evaluate the model's alignment with actual layouts, Table\u0026nbsp;\u003cspan refid=\"Tab5\" class=\"InternalRef\"\u003e5\u003c/span\u003e demonstrates that although foundational industries achieve robust layout hit rates (e.g., 73.21% for cereals and 59.52% for livestock), secondary and specialized categories exhibit a pronounced discrepancy between their Prediction AUC and actual layout hit rates. For instance, environmentally sensitive crops like Tea and Oil Crops maintain excellent classification capability (AUC\u0026thinsp;\u0026gt;\u0026thinsp;0.75) despite relatively low hit rates, whereas Flowers present an anomaly with an AUC of 0.372 and a 0.00% hit rate.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab5\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 5\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eMulti-label prediction performance and layout hit rate across crop-livestock categories\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"4\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eCrop-Livestock Category\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eActual Distribution\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003ePrediction AUC\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eLayout Hit Rate\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eCereals\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e112\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.674\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e73.21%\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eLivestock\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e84\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.577\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e59.52%\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eFruits\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e68\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.568\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e47.06%\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eVegetables\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e59\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.585\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e45.76%\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eSpecialty Crops\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e39\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.67\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e25.64%\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eAquatic Products\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e27\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.771\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e18.52%\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eTea\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e27\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.858\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e48.15%\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eMedicinal Herbs\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e17\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.6\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e11.76%\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eOil Crops\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e13\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.77\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e7.69%\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eEdible Fungi\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e13\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.529\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e7.69%\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eFlowers\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e8\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.372\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.00%\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003eThese numerical divergences suggest an objective mismatch between theoretical natural suitability and actual agricultural layouts. Computationally, the high AUC yet low hit rate for minority crops (e.g., oil crops) can be attributed to threshold compression caused by class imbalance. Practically, this divergence indicates that actual spatial deployment is associated with broader agricultural dynamics, such as the spatial regulation of cropping patterns(Dai et al., \u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e2025\u003c/span\u003e). Additionally, the anomaly observed in categories such as flowers (AUC\u0026thinsp;\u0026lt;\u0026thinsp;0.5, hit rate\u0026thinsp;=\u0026thinsp;0%) reflects the limitations of purely environment-driven evaluations, as the distribution of high-value crops is often shaped by economic trade-offs and intensive management practices rather than solely by natural climatic conditions(Wu, Li, Deng, and Zhao, \u003cspan citationid=\"CR35\" class=\"CitationRef\"\u003e2025\u003c/span\u003e). Consequently, the macroscopic hit rate of approximately 60% for major industries implies that natural baselines continue to shape the majority of current layouts, while the unhit proportions likely correspond to these complex socio-economic interventions. Crucially, this discrepancy is not merely a statistical artifact, but reflects a structural divergence between environmentally driven suitability and real-world agricultural allocation. Ultimately, this divergence between theoretical potential and current distribution quantifies the margin for spatial optimization. Before exploring this spatial discrepancy (Gap) in Section \u003cspan refid=\"Sec15\" class=\"InternalRef\"\u003e4.3\u003c/span\u003e, it is first necessary to visually delineate how these predictive probabilities are distributed across various natural environmental baselines.\u003c/p\u003e \u003cp\u003eFurthermore, to delineate the physical applicability of the predictive model across various natural carrying capacity intervals, this study constructed multi-dimensional environmental feature radar charts (Fig.\u0026nbsp;\u003cspan refid=\"Fig7\" class=\"InternalRef\"\u003e7\u003c/span\u003e). The results indicate that the polygonal profiles of each natural environmental type exhibit significant feature differentiation, completely covering the natural environmental gradient from extremely constrained (e.g., High-Altitude Cold and Arid Type, Cluster 4) to climatically favorable (e.g., High-Rainfall Warm and Humid Type, Cluster 3), while the Temperate and Moist Plain Type (Cluster 2) and others constitute objective transitional intervals. This clear delineation of the physical space confirms that each clustering pattern possesses objective independence in terms of climatic and topographic features, thereby providing a reliable physical baseline for the downstream analysis of industrial environmental adaptability.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eBuilding upon this, the heatmap of predicted crop-livestock suitability (Fig.\u0026nbsp;\u003cspan refid=\"Fig8\" class=\"InternalRef\"\u003e8\u003c/span\u003e) further elucidates the constraint patterns imposed by natural conditions on various industrial types. Overall, the theoretical suitability of each industry exhibits prominent resource-oriented and adaptive aggregation characteristics. In the High-Altitude Cold and Arid Type (Cluster 4) and Lowland Cool and Dry Type (Cluster 1), where environmental constraints are relatively severe, highly suitable industries are concentrated in hardy livestock and cereal crops. From a macro-ecological perspective, high altitudes and cool climates objectively restrict the growth cycles of most thermophilic cash crops; however, the vast natural resource baseline provides a reasonable carrying capacity for modern livestock husbandry(L Wang, Xiao, Kong, Wu, and Ouyang, \u003cspan citationid=\"CR31\" class=\"CitationRef\"\u003e2022\u003c/span\u003e).\u003c/p\u003e \u003cp\u003eConversely, in the Temperate and Moist Plain Type (Cluster 2) and Low-Altitude Hot and Humid Type (Cluster 0), which possess superior hydrothermal resources, the distribution of suitable industries tends to be diversified, with cereals, fruits, and vegetables all exhibiting high suitability probabilities. These baseline patterns correspond to China's traditional major agricultural production areas, where the abundant combination of light, temperature, and water is sufficient to support the composite cultivation of multiple high-value-added crops(Zhang et al., \u003cspan citationid=\"CR38\" class=\"CitationRef\"\u003e2025\u003c/span\u003e). Furthermore, the High-Rainfall Warm and Humid Type (Cluster 3) demonstrates a unique suitability for specific warm-and-moist-loving crops such as tea, which is highly consistent with the spatial aggregation habitat of the tea industry in the hilly regions of Southern China(Zhu et al., \u003cspan citationid=\"CR41\" class=\"CitationRef\"\u003e2025\u003c/span\u003e).\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eThe industrial aggregation characteristics presented by the predictive heatmap deviate from pure mathematical random distribution and strictly adhere to macro-agroecological laws. This not only provides a deep data-level analysis of the objective adaptation mechanisms between different natural environmental baselines and composite crop-livestock layouts but also confirms the model's success in internalizing agricultural expertise into quantitative probabilistic indicators. This theoretical suitability baseline, aligning with realistic ecological logic, further corroborates the foundational constraining role of natural conditions in macro-agricultural spatial layouts(Nde et al., \u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e2024\u003c/span\u003e). Moreover, it establishes a solid logical premise for the subsequent in-depth mining of industrial expansion potential and spatial discrepancies (Gap) under specific environments.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec15\" class=\"Section2\"\u003e \u003ch2\u003e4.3 Spatial Discrepancies and Decision Support\u003c/h2\u003e \u003cp\u003eMotivated by the identified mismatch between environmental suitability and actual agricultural layouts, this study proposes a spatial discrepancy (Gap) metric to explicitly quantify this deviation. To transform continuous suitability predictive probabilities into macro-level planning references, this study conducted a quantitative evaluation of the industrial expansion potential under each natural environmental baseline by calculating the Gap between the model-predicted probability and the actual distribution frequency within the same clustering pattern (Fig.\u0026nbsp;\u003cspan refid=\"Fig9\" class=\"InternalRef\"\u003e9\u003c/span\u003e). The bubble chart distribution reveals that high-potential industries under each pattern directly correspond to their respective natural environmental characteristics.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eIn the High-Altitude Cold and Arid Type (Cluster 4) and Lowland Cool and Dry Type (Cluster 1), where climatic conditions are relatively constrained, the objective scope for scale expansion is primarily concentrated in livestock husbandry and cereal crops. Conversely, in the Temperate and Moist Plain Type (Cluster 2) and Low-Altitude Hot and Humid Type (Cluster 0), characterized by superior hydrothermal conditions, fruits and vegetables exhibit more prominent development potential. Furthermore, the High-Rainfall Warm and Humid Type (Cluster 3) demonstrates significant value for extending industrial layouts in specific industrial categories such as tea.\u003c/p\u003e \u003cp\u003eThe spatial differentiation of the aforementioned industrial potential reveals the objective discrepancy between theoretical industrial suitability zones and actual crop-livestock distributions. In agricultural practice, such deviations are typically constrained by non-natural interventions(X. Wang et al., \u003cspan citationid=\"CR33\" class=\"CitationRef\"\u003e2025\u003c/span\u003e), such as agricultural price support policies (Yang et al., \u003cspan citationid=\"CR37\" class=\"CitationRef\"\u003e2023\u003c/span\u003e) and the micro-level decision-making behaviors of farmers(Xue Li et al., \u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e2019\u003c/span\u003e).\u003c/p\u003e \u003cp\u003eConsequently, the systematically identified high Gap values quantitatively denote advantageous industries that possess high natural suitability but currently maintain relatively low distribution proportions. This index ensures that agricultural spatial assessment is no longer confined to solitary probability predictions; instead, it comprehensively reveals the objective margin for industrial spatial expansion.\u003c/p\u003e \u003cp\u003eBuilding upon this, the analytical framework established in this study provides a standardized decision-making workflow for newly developed agricultural industrial parks: by inputting the natural environmental parameters of a target region, the system maps them to the corresponding baseline environmental type, extracts the relevant Gap indices, and ultimately outputs a prioritized crop-livestock recommendation list. This provides an objective, data-driven reference for the preliminary planning and industrial selection of cross-regional new agricultural industrial parks.\u003c/p\u003e \u003c/div\u003e"},{"header":"5 Conclusions","content":"\u003cp\u003eThis study constructed a data-driven modeling and decision-support framework to analyze the spatial adaptation between crop-livestock layouts and natural environments in China's NMAIPs. By integrating a deep clustering model (VAE-GMM) with a multi-label classifier (LightGBM), the framework effectively addresses the high-dimensional and nonlinear characteristics of agricultural environmental data.\u003c/p\u003e \u003cp\u003eThe modeling results objectively partition the complex national agricultural environment into five typical macro-baselines. Based on these baselines, the predictive model accurately captures the theoretical suitability of various crop-livestock layouts (micro-average AUC\u0026thinsp;=\u0026thinsp;0.75). Furthermore, by calculating the spatial discrepancy (Gap) between the model-predicted suitability and the actual distribution frequencies, this study quantifies the objective divergence between natural environmental capacity and current agricultural allocation.\u003c/p\u003e \u003cp\u003eThis Gap metric transforms solitary probability predictions into a quantifiable index of industrial expansion potential, successfully identifying environmentally suitable but underrepresented industries. Overall, this research distills historical spatial layouts into a transferable modeling workflow, providing a standardized reference for the planning of newly established agricultural parks. Future research will aim to incorporate socio-economic variables and high-resolution remote sensing data to further enhance the spatial modeling of dynamic agricultural systems.\u003c/p\u003e"},{"header":"Declarations","content":"\u003cp\u003e\u003cstrong\u003eAcknowledgements\u0026nbsp;\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe authors would like to thank the supporting institution for providing the computational resources and research environment required for this study. This research did not receive any specific grant from funding agencies.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eDeclaration of Competing Interest\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eFunding\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThis research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eEthical Statement\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThis study does not involve any human or animal subjects performed by any of the authors. Therefore, ethical approval was not required.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\n \u003cli\u003eAgrawal, N., Govil, H., and Kumar, T. (2025). Agricultural land suitability classification and crop suggestion using machine learning and spatial multicriteria decision analysis in semi-arid ecosystem. \u003cem\u003eEnvironment, Development and Sustainability, 27\u003c/em\u003e(6), 13689-13726.\u003c/li\u003e\n \u003cli\u003eAkpoti, K., Kabo-bah, A. T., and Zwart, S. J. (2019). Agricultural land suitability analysis: State-of-the-art and outlooks for integration of climate change analysis. \u003cem\u003eAgricultural Systems, 173\u003c/em\u003e, 172-208.\u003c/li\u003e\n \u003cli\u003eBenos, L., Tagarakis, A. C., Dolias, G., Berruto, R., Kateris, D., and Bochtis, D. (2021). Machine learning in agriculture: A comprehensive updated review. \u003cem\u003eSensors, 21\u003c/em\u003e(11), 3758.\u003c/li\u003e\n \u003cli\u003eDai, Z.-Z., Duan, J.-J., Liang, H.-Y., Zhu, Z.-Y., Feng, Y.-Z., and Wang, X. (2025). Spatial optimization of cropping patterns of staple crops to enhance supply\u0026ndash;demand balance in China. \u003cem\u003eJournal of Rural Studies, 120\u003c/em\u003e, 103869.\u003c/li\u003e\n \u003cli\u003eDinh, T. D., Th\u0026eacute;au, J., Pham, T. T. H., Varin, M., Marchal, J., and Genest, M.-A. (2025). Deep learning applied to urban agriculture: spatial-temporal changes of agricultural land in a rapidly urbanizing Southeast Asian city. \u003cem\u003eEuropean Journal of Remote Sensing, 58\u003c/em\u003e(1), 2572109.\u003c/li\u003e\n \u003cli\u003eFrimpong, S. A., Han, M., Zheng, W., Li, X., Akpaku, E., and Obeng, A. P. (2025). Machine and deep learning in agricultural engineering: A comprehensive survey and meta-analysis of techniques, applications, and challenges. \u003cem\u003eComputers, 14\u003c/em\u003e(10), 438.\u003c/li\u003e\n \u003cli\u003eGong, L., Liao, Y., Han, Z., Jiang, L., Liu, D., and Li, X. (2024). The Effects of Global Warming on Agroclimatic Regions in China: Past and Future. \u003cem\u003eAgronomy, 14\u003c/em\u003e(2), 293.\u003c/li\u003e\n \u003cli\u003eGuan, Q., Tang, J., Davis, K. F., Kong, M., Feng, L., Shi, K., and Schurgers, G. (2025). Improving future agricultural sustainability by optimizing crop distributions in China. \u003cem\u003ePNAS nexus, 4\u003c/em\u003e(1), pgae562.\u003c/li\u003e\n \u003cli\u003eGuo, J., Fan, W., Amayri, M., and Bouguila, N. (2025). Deep clustering analysis via variational autoencoder with Gamma mixture latent embeddings. \u003cem\u003eNeural Networks, 183\u003c/em\u003e, 106979.\u003c/li\u003e\n \u003cli\u003eHe, T., Li, M., and Jin, D. (2025). Deep learning-based time series prediction for precision field crop protection. \u003cem\u003eFrontiers in Plant Science, 16\u003c/em\u003e, 1575796.\u003c/li\u003e\n \u003cli\u003eHengYu, Z., Xi, G., XiaoMao, L., Lin, C., and JiaQi, B. (2024). Evaluation of Planting Suitability of Geographical Indication Agricultural Products Based on Ecological Niche Model: The Case of Purple-Skinned Garlic in Shanggao County (in Chinese). \u003cem\u003eJournal of Integrative Agriculture, 57\u003c/em\u003e(18), 3586-3600.\u003c/li\u003e\n \u003cli\u003eIkendi, S., Lyons, A., and Pathak, T. B. (2025). Advancing Decision Support for Climate Adaptation in Agriculture and Natural Resources. \u003cem\u003eFrontiers in Environmental Science, 13\u003c/em\u003e, 1605176.\u003c/li\u003e\n \u003cli\u003eJabed, M. A., and Murad, M. A. A. (2024). Crop yield prediction in agriculture: A comprehensive review of machine learning and deep learning approaches, with insights for future research and sustainability. \u003cem\u003eHeliyon, 10\u003c/em\u003e(24).\u003c/li\u003e\n \u003cli\u003eKe, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., . . . Liu, T.-Y. (2017). Lightgbm: A highly efficient gradient boosting decision tree. \u003cem\u003eAdvances in neural information processing systems, 30\u003c/em\u003e.\u003c/li\u003e\n \u003cli\u003eLi, S., Wu, Y., Yu, Q., and Chen, X. (2023). National Agricultural Science and Technology Parks in China: Distribution Characteristics, Innovation Efficiency, and Influencing Factors. \u003cem\u003eAgriculture, 13\u003c/em\u003e(7). doi:10.3390/agriculture13071459\u003c/li\u003e\n \u003cli\u003eLi, X., Huang, W., and Liu, J. (2025). The Impact of China\u0026rsquo;s National Modern Agricultural Industrial Parks on Fertilizer Use from the Perspective of Food Security. \u003cem\u003eSustainability, 17\u003c/em\u003e(24), 11227.\u003c/li\u003e\n \u003cli\u003eLi, X., Yuan, Q., and Han, Y. (2019). ANALYSIS ON THE INFLUENCE MECHANISM OF PRICE SUPPORT POLICIES ON GRAIN PLANTING AREA IN CHINA\u0026mdash;\u0026mdash;BASED ON WHEAT PROVINCIAL-LEVEL PANEL DATA (in Chinese). \u003cem\u003eChinese Journal of Agricultural Resources and Regional Planning, 40\u003c/em\u003e(01), 89-96.\u003c/li\u003e\n \u003cli\u003eLing, L., Chen, X., Wu, Y., Li, S., Wei, J., and Zhou, Q. (2023). National Modern Agricultural Industrial Parks: Development Characteristics, Regional Differences, and Experience Inspiration\u0026mdash;Case Study of 200 NMAIPs in China. \u003cem\u003eAgronomy, 13\u003c/em\u003e(3), 653.\u003c/li\u003e\n \u003cli\u003eMoharana, P. C., Yadav, B., Malav, L. C., Biswas, H., and Patil, N. G. (2025). Machine Learning-Based Crop Suitability Prediction: An Emerging Technique for Sustainable Agricultural Production in the Desert Region of India. \u003cem\u003eCommunications in Soil Science and Plant Analysis, 56\u003c/em\u003e(3), 376-395.\u003c/li\u003e\n \u003cli\u003eMugiyo, H., Chimonyo, V. G., Sibanda, M., Kunz, R., Masemola, C. R., Modi, A. T., and Mabhaudhi, T. (2021). Evaluation of land suitability methods with reference to neglected and underutilised crop species: A scoping review. \u003cem\u003eLand, 10\u003c/em\u003e(2), 125.\u003c/li\u003e\n \u003cli\u003eNalisnick, E., Hertel, L., and Smyth, P. (2016). \u003cem\u003eApproximate inference for deep latent gaussian mixtures.\u003c/em\u003e Paper presented at the NIPS Workshop on Bayesian Deep Learning.\u003c/li\u003e\n \u003cli\u003eNde, R. K., Fendji, J. L. E. K., Yenke, B. O., and Sch\u0026ouml;ning, J. (2024). Crop selection. \u003cem\u003eSmart Agricultural Technology, 9\u003c/em\u003e. doi:10.1016/j.atech.2024.100602\u003c/li\u003e\n \u003cli\u003eNirmaladevi, S., and Jagatheswari, S. (2025). A Data-Driven Machine Learning Framework for Predicting Total Agricultural Food Grain Yield. \u003cem\u003eResults in Engineering\u003c/em\u003e, 106790.\u003c/li\u003e\n \u003cli\u003ePilevar, A. R., Matinfar, H. R., Sohrabi, A., and Sarmadian, F. (2020). Integrated fuzzy, AHP and GIS techniques for land suitability assessment in semi-arid regions for wheat and maize farming. \u003cem\u003eEcological Indicators, 110\u003c/em\u003e, 105887.\u003c/li\u003e\n \u003cli\u003eRani, S., Mishra, A. K., Kataria, A., Mallik, S., and Qin, H. (2023). Machine learning-based optimal crop selection system in smart agriculture. \u003cem\u003eScientific Reports, 13\u003c/em\u003e(1), 15997.\u003c/li\u003e\n \u003cli\u003eSun, J., Arellano, M. V., Wang, Y., and Mouazen, A. M. (2025). Optimizing management zone delineation technique for high-dimensional and large-volume datasets in precision agriculture. \u003cem\u003ePrecision Agriculture, 26\u003c/em\u003e(6), 93.\u003c/li\u003e\n \u003cli\u003eSun, X., Mei, Y., and Yang, X. (2024). Does the construction of modern agricultural parks promote rural industrial integration? Empirical evidence from 8325 agricultural parks across China (in Chinese). \u003cem\u003eChina Rural Survey, 3\u003c/em\u003e, 39-61.\u003c/li\u003e\n \u003cli\u003eTaghizadeh-Mehrjardi, R., Nabiollahi, K., Rasoli, L., Kerry, R., and Scholten, T. (2020). Land suitability assessment and agricultural production sustainability using machine learning models. \u003cem\u003eAgronomy, 10\u003c/em\u003e(4), 573.\u003c/li\u003e\n \u003cli\u003eWang, J. (2022). Drivers of the sustainable development of agro-industrial parks: Evidence from Jiangsu Province, China. \u003cem\u003eSAGE Open, 12\u003c/em\u003e(4), 21582440221144415.\u003c/li\u003e\n \u003cli\u003eWang, L., He, Y., You, F., Han, S., Wang, X., Chen, H., . . . Feng, A. (2024). Analysis on evaluation scale and method of crop planting suitability (in Chinese). \u003cem\u003eChinese Journal of Agricultural Resources and Regional Planning, 45\u003c/em\u003e(09), 214-221.\u003c/li\u003e\n \u003cli\u003eWang, L., Xiao, Y., Kong, L., Wu, B., and Ouyang, Z. (2022). Spatiotemporal patterns and early-warning of grassland carrying capacity in the Qinghai-Tibet Plateau. \u003cem\u003eActa Ecol. Sin, 42\u003c/em\u003e(16), 6684-6694.\u003c/li\u003e\n \u003cli\u003eWang, W., Lv, J., Yang, X., ZHOU, Z., and CHANG, Z. (2020). Research on construction mode and key technology of modern agricultural industrial park (in Chinese). \u003cem\u003eJournal of Chinese Agricultural Mechanization, 41\u003c/em\u003e, 210-216.\u003c/li\u003e\n \u003cli\u003eWang, X., Zhao, H., Zhao, G., Qu, X., Cao, C., Qian, J., . . . Han, H. (2025). High-Resolution Crop Mapping and Suitability Assessment in China\u0026rsquo;s Three Northeastern Provinces (2000\u0026ndash;2023): Implications for Optimizing Crop Layout. \u003cem\u003eAgronomy, 15\u003c/em\u003e(11), 2587.\u003c/li\u003e\n \u003cli\u003eWang, Y., Yuan, Y., Yuan, F., Liu, X., Tian, Y., Zhu, Y., . . . Cao, Q. (2025). Optimizing management zone delineation through advanced dimensionality reduction models and clustering algorithms. \u003cem\u003ePrecision Agriculture, 26\u003c/em\u003e(4), 68.\u003c/li\u003e\n \u003cli\u003eWu, H., Li, Z., Deng, X., and Zhao, Z. (2025). Enhancing agricultural sustainability: Optimizing crop planting structures and spatial layouts within the water-land-energy-economy-environment-food nexus. \u003cem\u003eGeography and Sustainability, 6\u003c/em\u003e(3), 100258.\u003c/li\u003e\n \u003cli\u003eYadav, S., Jadhav, T. D., Kakade, O. S., Pangare, P. V., and Bhutali, P. S. (2025). A Machine Learning-Based Dynamic Model for Crop Suitability Using Rainfall and Soil Parameters. \u003cem\u003eIJSAT-International Journal on Science and Technology, 16\u003c/em\u003e(2).\u003c/li\u003e\n \u003cli\u003eYang, J., Liu, H., Wang, X., Long, Y., Shi, H., Yang, P., . . . Sun, J. (2023). Influence of social and economic factors on farmers\u0026apos; planting decision behavior\u0026mdash;Pixel-scale simulation with the agent-based model (in Chinese). \u003cem\u003eChinese Journal of Agricultural Resources and Regional Planning, 44\u003c/em\u003e(03), 186-196.\u003c/li\u003e\n \u003cli\u003eZhang, Z., Hong, Q., Sun, Y., Hao, J., and Ai, D. (2025). Assessing the Alignment Between Naturally Adaptive Grain Crop Planting Patterns and Staple Food Security in China. \u003cem\u003eFoods, 14\u003c/em\u003e(22), 3870.\u003c/li\u003e\n \u003cli\u003eZhao, H., Zhu, M., Ma, Z., Li, L., and Tang, H. (2024). Study on agglomeration effect of modern agricultural industrial parks\u0026mdash;An empirical analysis based on eight parks in Beijing (in Chinese). \u003cem\u003eChinese Journal of Agricultural Resources and Regional Planning, 45\u003c/em\u003e(04), 178-189.\u003c/li\u003e\n \u003cli\u003eZhou, Q., Chen, X., Han, X., Ling, L., and Li, S. (2025). NORTH CHINA PLAIN NATIONAL MODERN AGRICULTURAL INDUSTRIAL PARK:REGIONAL CHARACTERISTICS, EFFICIENCY EVALUATION, AND SUGGESTIONS FOR COUNTERMEASURES (in Chinese). \u003cem\u003eChinese Journal of Agricultural Resources and Regional Planning, 46\u003c/em\u003e(01), 190-201.\u003c/li\u003e\n \u003cli\u003eZhu, Q., Shi, Y., Yu, Y., Wang, X., Tang, Y., Ren, L., and Lou, Y. (2025). Impact of Future Climate Change on the Climatic Suitability of Tea Planting on Hainan Island, China. \u003cem\u003eAgronomy, 15\u003c/em\u003e(9), 2196.\u003c/li\u003e\n\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"modeling-earth-systems-and-environment","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"mese","sideBox":"Learn more about [Modeling Earth Systems and Environment](http://link.springer.com/journal/40808)","snPcode":"40808","submissionUrl":"https://submission.springernature.com/new-submission/40808/3","title":"Modeling Earth Systems and Environment","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"stoa","reportingPortfolio":"Springer Hybrid","inReviewEnabled":true,"inReviewRevisionsEnabled":false},"keywords":"National Modern Agricultural Industrial Parks, Crop–livestock spatial layouts, VAE-GMM, LightGBM, Decision support","lastPublishedDoi":"10.21203/rs.3.rs-9196881/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-9196881/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eThe development of China\u0026rsquo;s National Modern Agricultural Industrial Parks (NMAIPs) has provided valuable experience to guide regional agricultural structural adjustment. To systematically analyze and scale up the successful practices of crop\u0026ndash;livestock spatial layouts, this study examines 335 NMAIPs established between 2017 and 2024.Based on seven natural environmental variables, a deep clustering model (VAE-GMM) was applied to classify the parks into representative environmental types. This classification establishes a standardized spatial reference frame. Concurrently, a LightGBM multi-label classifier was utilized to predict the theoretical suitability of various crop\u0026ndash;livestock spatial configurations. Crucially, the study introduces a spatial discrepancy (Gap) metric to evaluate industrial expansion potential. This metric is explicitly calculated as the difference between the model-predicted theoretical suitability proportion and the actual occurrence frequencies within each environmental cluster. The results show that the parks can be grouped into five distinct environmental types with clear regional spatial patterns. The LightGBM prediction achieved a micro-average AUC of 0.75, effectively capturing natural constraints. Furthermore, the discrepancy analysis reveals a structural divergence between environmental suitability and real-world agricultural allocation. Quantifying this divergence highlights environmentally suitable yet underrepresented industries. By treating existing parks as reference cases under specific environmental baselines, this data-driven framework provides objective, transferable decision support for industrial selection and spatial planning in newly established agricultural parks.\u003c/p\u003e","manuscriptTitle":"A hybrid deep clustering and machine learning-based decision support framework for modeling crop-livestock environmental suitability and spatial discrepancies","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2026-03-28 16:32:53","doi":"10.21203/rs.3.rs-9196881/v1","editorialEvents":[{"type":"communityComments","content":0},{"type":"editorInvitedReview","content":"","date":"2026-05-03T03:41:18+00:00","index":"hide","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2026-04-08T09:22:10+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"95313403863571698029906488940162456977","date":"2026-03-27T04:32:07+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"245694070463253072186263326037729639329","date":"2026-03-27T03:33:46+00:00","index":"hide","fulltext":""},{"type":"reviewersInvited","content":"","date":"2026-03-26T08:41:51+00:00","index":"","fulltext":""},{"type":"editorAssigned","content":"","date":"2026-03-23T14:06:29+00:00","index":"","fulltext":""},{"type":"checksComplete","content":"","date":"2026-03-23T14:05:43+00:00","index":"","fulltext":""},{"type":"submitted","content":"Modeling Earth Systems and Environment","date":"2026-03-23T07:10:11+00:00","index":"","fulltext":""}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"modeling-earth-systems-and-environment","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"mese","sideBox":"Learn more about [Modeling Earth Systems and Environment](http://link.springer.com/journal/40808)","snPcode":"40808","submissionUrl":"https://submission.springernature.com/new-submission/40808/3","title":"Modeling Earth Systems and Environment","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"stoa","reportingPortfolio":"Springer Hybrid","inReviewEnabled":true,"inReviewRevisionsEnabled":false}}],"origin":"","ownerIdentity":"59dd639b-a50e-42df-a9f8-4d77e8bace81","owner":[],"postedDate":"March 28th, 2026","published":true,"recentEditorialEvents":[{"type":"editorInvitedReview","content":"","date":"2026-05-03T03:41:18+00:00","index":24,"fulltext":""}],"rejectedJournal":[],"revision":"","amendment":"","status":"under-review","subjectAreas":[],"tags":[],"updatedAt":"2026-03-28T16:32:53+00:00","versionOfRecord":[],"versionCreatedAt":"2026-03-28 16:32:53","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-9196881","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-9196881","identity":"rs-9196881","version":["v1"]},"buildId":"XKTyCvWXoU3ODBz1xrDgd","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

⚙ Ask this paper AI returns verbatim quotes from the full text · source: preprint-html ⓘ

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2026) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc: last seen: 2026-05-20T01:45:00.602351+00:00