Data Mining of Public Databases to Identify TCM Syndrome Patterns in Gout: A Retrospective Study

preprint OA: closed
Full text JSON View at publisher
Full text 173,432 characters · extracted from preprint-html · click to expand
Data Mining of Public Databases to Identify TCM Syndrome Patterns in Gout: A Retrospective Study | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Article Data Mining of Public Databases to Identify TCM Syndrome Patterns in Gout: A Retrospective Study Guihua Yue, Chengxiang Guo, Dongming Zhang, Tong Mo, Yihan Liu, and 2 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-8402309/v1 This work is licensed under a CC BY 4.0 License Status: Under Review Version 1 posted 9 You are reading this latest preprint version Abstract Objective: This study aims to analyze TCM syndrome patterns in gout patients by integrating multiple data-driven methods—including factor analysis, hierarchical clustering, association rule mining, and machine learning—based on a large-scale, structured dataset of clinical gout case records. The goals are to identify core symptom clusters, objectively classify patient subtypes, uncover symptom association patterns, and construct a predictive model for syndrome differentiation. Materials and Methods: This study was a retrospective data mining analysis. The data were derived from published Traditional Chinese Medicine (TCM) case reports on gout that met the inclusion criteria, retrieved from the China National Knowledge Infrastructure (CNKI) database and the "Ancient and Modern Medical Case Cloud Platform (V3.0)" between 2020 and 2023. A total of 295 cases were included. Demographic characteristics, TCM four-examination data, 41 binary symptom variables, and syndrome classification information were collected. Statistical analyses included exploratory factor analysis (with varimax rotation), hierarchical cluster analysis (Ward's method), association rule mining (Apriori algorithm), and five machine learning classifiers (logistic regression, random forest, gradient boosting, support vector machine, and naive Bayes). The analyses were performed using Python 3.11.0 and SPSS 26.0. Results: The cohort consisted of 277 males (93.9%) and 18 females (6.1%), with an average age of 48.5 ± 12.2 years. Gout syndrome distribution: damp-heat accumulation in 169 cases (57.3 %), spleen deficiency and dampness obstruction in 38 cases (12.9 %), damp-heat combined with phlegm and blood stasis in 37 cases (12.5 %), phlegm and blood stasis obstruction in 28 cases (9.5 %), liver and kidney deficiency in 23 cases (7.8 %). The high-frequency symptoms of gout were joint pain (86.4 %), red tongue (75.9 %), yellow fur (66.8 %), and joint swelling (63.7 %). The results of factor analysis showed that 14 symptom factors were extracted (KMO = 0.5896, Bartlett 's χ2 = 4083.74, p < 0.001), with the main factor (eigenvalue = 6.42) representing the toxic heat dimension. Cluster analysis identified five patient groups, indicating internal heterogeneity in damp-heat syndrome. The association rule mining found 31 significant associations, and the strongest rules (red tongue, slippery pulse, number pulse) → (slippery number pulse) (confidence 100 %, improvement 5.566). In the machine learning model, logistic regression performed best (accuracy 62.92 %, weighted AUC = 0.7634). Conclusion: This study provides objective evidence for TCM syndrome differentiation of gout by integrating multiple data-driven methods. The prevalence of damp-heat syndrome supports the theoretical framework of TCM. Factor analysis validated the concept of syndrome elements from the symptom dimension, while cluster analysis highlighted the need for refined classification. The moderate performance of the machine learning model indicates its potential for clinical decision support. This study advances the standardization of syndrome differentiation by merging traditional wisdom with modern computational methods, aiding in the diagnosis and treatment of gout in TCM. Biological sciences/Computational biology and bioinformatics Health sciences/Diseases Health sciences/Health care Health sciences/Medical research Health sciences/Risk factors Gout Traditional Chinese Medicine Factor Analysis Machine Learning Association Rules Data Mining Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Figure 6 Figure 7 Figure 8 Research highlights • This study represents the first data-driven investigation into Traditional Chinese Medicine (TCM) syndrome patterns for gout, utilizing a dataset of 295 clinical case records. • Factor analysis identified 14 core symptom dimensions, with heat toxicity as the dominant mode. • Machine learning achieves the syndrome prediction ability of AUC = 0.76 • Cluster analysis revealed the heterogeneity within the traditional classification of damp-heat syndrome • Discover strong symptom associations (lift > 5.5) to verify TCM theory 1. Introduction Gout, as one of the most common inflammatory joint diseases in the world, has a prevalence of 1–4% among adults in developed countries [ 1 , 2 ] . The incidence of gout has steadily increased due to population aging and lifestyle changes [ 2 , 3 ] . This disease is defined by hyperuricemia, with the fundamental pathological process being the deposition of monosodium urate (MSU) crystals [ 1 , 4 ] . The clinical manifestations are recurrent acute arthritis, chronic joint lesions and potential systemic complications. Although progress has been made in the treatment of uric acid reduction, a considerable proportion of patients still face difficulties such as poor disease control, prolonged symptoms, and adverse drug reactions [ 5 , 6 ] . This highlights the need to explore complementary treatment approaches. The history of traditional Chinese medicine treatment of gout can be traced back to more than two thousand years ago. The theory of traditional Chinese medicine [ 7 , 8 ] attributed gout to damp-heat accumulation, accompanied by phlegm turbidity and blood stasis blocking meridians and joints. Syndrome differentiation and treatment classify patients based on symptom patterns rather than single disease labels, forming the theoretical foundation of individualized treatment. Recent meta-analyses [ 9 – 11 ] have shown that TCM has similar clinical effects to Western medicine in treating gout, with fewer adverse reactions reported. A randomized controlled study demonstrated that Wu Ling Powder and Yin Chen Wu Ling Powder significantly reduced serum uric acid levels compared to a placebo, confirming the clinical effectiveness of TCM prescriptions based on TCM syndromes [ 12 ] . However, the subjectivity of syndrome differentiation depends largely on the clinical experience of physicians, which brings challenges to the standardization and evidence-based practice of gout treatment. The rise of data mining and machine learning technology provides an opportunity to objectively characterize TCM syndromes through pattern recognition of large-scale clinical data sets [ 13 , 14 ] . These computational methods can identify potential structures in complex symptom data, which may be difficult to show through traditional clinical observations. For example, factor analysis can reveal the potential symptom dimension of the corresponding syndrome factor concept [ 15 , 16 ] . Additionally, cluster analysis can objectively group patients based on symptom similarities, identifying clinically significant subtypes [ 17 ] . Machine learning algorithms have demonstrated potential in automating syndrome classification, with studies indicating their ability to capture complex nonlinear relationships between symptoms and syndromes [ 18 ] . A number of studies have applied data-driven methods to the study of TCM syndromes of various diseases. Factor and cluster analyses have been effectively employed to examine common syndromes in chronic atrophic gastritis, ulcerative colitis, primary biliary cholangitis, pediatric constipation, among others [ 19 – 22 ] , revealing core symptom groups and enabling objective patient classification. Machine learning techniques, including support vector machines and neural networks, have been used to compare syndrome elements [ 23 ] , and even used to assist in the diagnosis of viral pneumonia, showing performance that can change according to data characteristics and syndrome complexity [ 24 ] . In the realm of gout, initial data mining studies have explored traditional Chinese medicine prescription patterns and treatment rules [ 25 , 26 ] . However, comprehensive studies integrating multiple analytical methods to examine syndrome patterns remain limited. Most existing research predominantly focuses on individual analytical techniques, often neglecting the integration of complementary methods that could yield more robust and comprehensive insights. For instance, factor analysis identifies symptom dimensions but does not classify patients directly. Cluster analysis groups patients but may not uncover the underlying symptom structure. Association rule mining discovers symptom correlations but lacks predictive capability. Machine learning builds predictive models but may not offer interpretable insights into syndrome characteristics. An analytical framework that integrates these methods can leverage their respective strengths to achieve a more complete understanding of TCM syndromes. This study aims to systematically analyze the Traditional Chinese Medicine (TCM) syndrome patterns in gout patients by integrating a data-driven analytical framework. The specific objectives are: (1) to describe the distribution of syndromes and clinical features in large gout cohorts; (2) to identify core symptom clusters through exploratory factor analysis; (3) to classify patient subtypes objectively using hierarchical cluster analysis and test their correspondence with traditional syndrome classification; (4) to discover symptom association patterns through association rule mining; and (5) to construct and validate a machine learning model for TCM syndrome prediction. By integrating traditional Chinese medicine insights with modern computational methods, this study aims to advance the objectification and standardization of TCM syndrome differentiation in gout. Ultimately, it seeks to enhance accurate, evidence-based clinical practice and provide a foundation for developing new TCM interventions for gout. 2. Materials and methods 2.1. Research design and ethical approval This study adopted a retrospective cross-sectional observational design and constitutes a secondary analysis of publicly available, de-identified data, which is exempt from ethical review approval. All data were sourced from public databases, namely the China National Knowledge Infrastructure (CNKI) and the Ancient & Modern Medical Case Cloud Platform (V3.0), and contain no personally identifiable information. The study did not involve any intervention with human subjects or the collection of biological samples. In accordance with the relevant regulations of China’s "Ethical Review Measures for Biomedical Research Involving Humans" (2016) and institutional policies, the research protocol was confirmed to meet the criteria for exemption from ethical review. 2.2. Study population 2.2.1. Inclusion criteria Patients were included based on the following criteria: (1) They met the 2015 American College of Rheumatology/European League Against Rheumatism (ACR/EULAR) gout classification criteria, which replaced the 1977 Wallace criteria [ 27 , 28 ] ; (2) They were aged between 18 and 80 years; (3) They had complete diagnostic data for both traditional Chinese medicine and its composition; (4) They voluntarily participated and provided informed consent; (5) They had not received any uric acid-lowering treatment or traditional Chinese medicine treatment for at least 2 weeks prior to enrollment to ensure accurate evaluation. 2.2.2. Exclusion criteria Patients were excluded if they met any of the following conditions: (1) concurrent rheumatic diseases such as rheumatoid arthritis, systemic lupus erythematosus, or psoriatic arthritis; (2) severe cardiovascular, hepatic, renal, or hematological diseases that could interfere with syndrome evaluation; (3) pregnancy or lactation; (4) incomplete clinical data or refusal of traditional Chinese medicine treatment; (5) cognitive dysfunction impairing reliable symptom reporting. 2.3. Data collection 2.3.1 Source of data In this study, 'TCM master', 'gout', 'TCM', and 'treatment experience' were used as the keywords to retrieve data from the China National Knowledge Infrastructure (CNKI) and the Ancient and Modern Medical Case Cloud Platform (v3.0) between January 1, 2000, and July 31, 2025. It should be noted that access to the data in these databases requires retrieval via a legitimate subscription account on the respective platforms. Literature and clinical medical cases related to TCM treatment of gout were collated into standardized case reports, resulting in a total of 295 standardized case reports. 2.4 Data normalization processing 2.4.1 Demographic and clinical data collection Demographic information, such as age and gender, is collected using standardized case report forms. Clinical parameters are systematically documented, including affected joint sites, disease severity, and therapeutic drugs. 2.4.2 Data specification of four diagnostic methods of traditional Chinese medicine The four diagnostic information of traditional Chinese medicine collected by experienced Chinese medicine practitioners (clinical experience in the field of rheumatology ≥ 10 years) according to the standardized program specifications. All physicians received unified training before the study. The inspection involved a systematic observation of tongue characteristics, including color (red, light, dark purple), shape (tooth marks, swelling), coating color (white, yellow), and coating quality (thin, thick, greasy, peeling). Auscultation assessed sound quality, respiratory sounds, and body odor when clinically relevant. The inquiry was structured to gather symptoms such as joint pain characteristics (location, nature, intensity, time), joint swelling and redness, fever, thirst, appetite, urination pattern, defecation, sleep quality, emotional state, and other systemic manifestations. The incision involved pulse examination, recording pulse characteristics such as string, slip, number, delay, sinking, thin, weak, and composite pulses (e.g., string slip, slip number). 2.4.3 Collection of symptom variables Based on traditional Chinese medicine diagnoses and patient reports, we systematically recorded 41 symptom and sign variables. A database was created using binary classification coding (presence = 1, absence = 0) in Microsoft Office Excel 2021. These variables were selected from TCM literature, clinical guidelines, and expert consensus, encompassing the following categories: Joint manifestations (8 variables) include joint pain, joint swelling, red skin color over joints, increased local skin temperature, limited joint activity, toe joint pain, ankle joint pain, and knee joint pain. Systemic symptoms (12 variables) cover fever, chills, fatigue, insomnia, dry mouth, bitter mouth, bad breath, yellow urine, constipation, loose stools, anorexia, abdominal distension, chest tightness, and irritability. Tongue manifestations (11 variables) consist of red tongue, pale tongue, dark purple tongue, tooth-marked tongue, thin fur, thick fur, white fur, yellow fur, greasy fur, yellow greasy fur, and peeling fur. Pulse manifestations (10 variables) include string pulse, slippery pulse, rapid pulse, slow pulse, deep pulse, thready pulse, weak pulse, slippery rapid pulse, string slippery pulse, and string thready pulse. 2.4.4 Classification of TCM syndromes The TCM syndrome type was determined by at least two senior Chinese medicine practitioners (associate professor level or above) according to the second part of TCM clinical diagnosis and treatment terminology: syndrome (GB / T 16751.2–2021) national standards and related clinical practice guidelines [ 29 ] . In cases of disagreement, a third senior physician made the final decision. According to the symptom pattern, five main syndromes were identified: damp-heat accumulation syndrome: acute onset, joint swelling, burning pain, fever, thirst like cold drink, yellow urine, constipation, red tongue, yellow greasy fur, slippery pulse; Spleen dampness syndrome: repeated joint swelling with dull pain, heavy feeling, fatigue, anorexia, loose stools, pale tongue coating, white greasy, weak or slippery pulse; Damp-heat accumulation combined with phlegm and blood stasis syndrome: the combined manifestations of damp-heat syndrome (acute inflammation, heat image) and phlegm and blood stasis syndrome (subcutaneous nodules, fixed pain); Phlegm and blood stasis syndrome: chronic course of disease, joint fixed pain, subcutaneous tophi, joint deformity, dark purple tongue, pulse string; Liver and kidney deficiency syndrome: chronic joint pain, lumbar debility, dizziness, tinnitus, pale tongue, weak pulse. 2.4.5 Data Quality Control To ensure the quality and consistency of data, several measures were implemented: (1) All Chinese medicine practitioners underwent standardized training before the study commenced, focusing on syndrome identification criteria, symptom evaluation methods, and data recording procedures; (2) Double-entry verification was used for data entry to resolve discrepancies; (3) A logic consistency check was applied in the electronic data acquisition system; (4) The missing data rate was monitored to maintain the missing rate of all key variables below 5%. 2.5. Statistical methods All statistical analyses were performed using Python 3.11.0 (including Scikit-learn 1.5.2, Pandas 2.2.3, Scipy 1.14.1, Matplotlib 3.9.2, Seaborn 0.13.2, Mlxtend 0.23.3) and SPSS Statistics 26.0. The significance of the statistical results was set to bilateral p < 0.05. 2.5.1. Descriptive statistics Shapiro-Wilk test was used to test the normality of continuous variables. Normally distributed data are expressed as mean ± standard deviation (SD), and non-normally distributed data are expressed as median (interquartile range, IQR). Categorical variables are expressed as frequencies and percentages. Comparison between groups: One-way analysis of variance (ANOVA) was used for continuous variables, and chi-square test or Fisher 's exact test was used for categorical variables (when expected cell count < 5). 2.5.2. Exploratory factor analysis Exploratory factor analysis (EFA) was performed on 41 symptom variables to identify potential symptom dimensions. Kaiser-Meyer-Olkin (KMO) sampling adequacy measure and Bartlett 's sphericity test were used to evaluate the applicability of the data (KMO > 0.5 and p1 (Kaiser criterion), gravel map inspection and cumulative variance interpretation (target > 50%). Factor loading > 0.4 is considered to be significant for factor interpretation. Cronbach 's alpha was used to assess the internal consistency of the extracted factors when applicable. 2.5.3. Hierarchical cluster analysis The Ward connection method and Euclidean distance are employed for hierarchical clustering analysis to identify natural patient groups based on symptom characteristics. The optimal number of clusters is determined by analyzing the tree structure, calculating the silhouette coefficient—where higher values indicate more distinct clusters—and considering clinical interpretability. To assess the correspondence between data-driven clustering and traditional syndrome types, the chi-square test was applied to cross-link with the original syndrome classification. 2.5.4. Association rule mining The Apriori algorithm was employed to identify frequent symptom patterns and association rules among 41 dichotomous symptom variables [ 30 ] . The minimum support threshold was set at 10% (equivalent to at least 30 patients), with a minimum confidence of 60% and a minimum lift of 1.5, to ensure the identification of meaningful associations. The association rules were sorted by their lift values to determine the strongest symptom correlations. NetworkX was used to create a network visualization, displaying the symptom association patterns. 2.5.5. Machine learning classification Five machine learning algorithms were used to classify TCM syndromes based on their proven performance choices in similar medical classification tasks: logistic regression (LR) with L2 regularization, random forest (RF) with 100 trees, gradient boosting (GB) with 100 estimators, support vector machine (SVM) with radial basis function kernel, and naive Bayes (NB) with Gaussian distribution assumption. The data set was randomly divided into a training set (70%, n = 206) and a test set (30%, n = 89). Stratified sampling was used to maintain the proportion of syndrome distribution. The 5-fold cross validation and grid search are used to optimize the hyperparameters on the training set. The evaluation indicators included accuracy, precision, recall, F1 score and area under the curve (AUC). In view of the nature of the multi-classification problem (five syndromes), the weighted average calculation index is used to consider the category imbalance. Receiver operating characteristic (ROC) curves were generated using a one-to-many strategy. All models are trained on standardized features (zero mean, unit variance) to ensure fair comparison. 3. Results 3.1. Patient characteristics and syndrome distribution A total of 295 gout patients who met the inclusion criteria were enrolled in this study. The baseline characteristics stratified by TCM syndrome types are summarized in Table 1 . The cohort was dominated by males (277 cases, 93.9%), with an average age of 48.5 ± 12.2 years (range: 20–83 years). There was no significant difference in gender distribution (χ2 = 5.868, p = 0.2092) or age (F = 0.929, p = 0.4476) between the five syndrome groups, indicating that syndrome differentiation was independent of basic demographic factors. Table 1 Baseline Characteristics of Gout Patients by TCM Syndrome Type (N = 295) Characteristic Total (N = 295) Damp-Heat Accumulation (n = 169) Spleen Deficiency with Dampness (n = 38) Damp-Heat & Phlegm-Stasis (n = 37) Phlegm-Stasis Obstruction (n = 28) Liver-Kidney Deficiency (n = 23) p-value Age (years), mean ± SD 48.5 ± 12.2 47.8 ± 11.9 50.2 ± 13.1 49.1 ± 12.8 48.9 ± 11.5 49.7 ± 13.4 0.4476 Male, n (%) 277 (93.9) 160 (94.7) 34 (89.5) 35 (94.6) 26 (92.9) 22 (95.7) 0.2092 Female, n (%) 18 (6.1) 9 (5.3) 4 (10.5) 2 (5.4) 2 (7.1) 1 (4.3) - Note: p-values from ANOVA (age) and chi-square test (gender). SD = standard deviation. The distribution of TCM syndromes is shown in Fig. 1 . Damp-heat accumulation was the most common syndrome type, accounting for 57.3% (169 / 295), followed by spleen deficiency and dampness obstruction (12.9%, 38 / 295), damp-heat accumulation combined with phlegm and blood stasis obstruction (12.5%, 37 / 295), phlegm and blood stasis obstruction (9.5%, 28 / 295) and liver and kidney deficiency (7.8%, 23 / 295). This pattern aligns with traditional Chinese medicine theory, which attributes acute gout primarily to damp-heat accumulation, while other syndromes signify different disease stages or physiological changes. 3.2. Symptom frequency distribution The frequency distribution of the first 15 symptoms in all patients is shown in Fig. 2 . Joint pain was almost common, with an occurrence rate of 86.4% (255 / 295), which was confirmed as the core manifestation of gout. Followed by red tongue (75.9%, 224 / 295), yellow fur (66.8%, 197 / 295), joint swelling (63.7%, 188 / 295) and yellow greasy fur (55.9%, 165 / 295). Other common symptoms included joint skin redness (54.6%), toe joint pain (51.2%), skin temperature rise (50.5%), wiry pulse (46.1%) and slippery pulse (41.7%). The high prevalence of heat (red tongue, yellow fur) and dampness (greasy fur, slippery pulse) provides empirical support for TCM to conceptualize gout as the main damp-heat syndrome. 3.3. Factor analysis: Identify the core symptom clusters The KMO sampling adequacy measure was 0.5896, which exceeded the acceptable threshold of 0.5. The Bartlett spherical test was highly significant (χ2 = 4083.74, df = 820, p < 0.001), indicating that the relevant data were suitable for factor analysis. Figure 3 showed an obvious inflection point after the 14th factor. Using the Kaiser criterion, 14 factors with eigenvalues greater than 1 were extracted, cumulatively explaining 58.3% of the total variance. The eigenvalue of the first factor was 6.42, highlighting its dominant contribution to symptom variation and suggesting the presence of major potential pathological dimensions. The factor loading matrix after the maximum variance rotation is shown in Fig. 4 in the form of a heat map, showing that the loading of 28 symptoms on at least one factor is > 0.4. Based on the symptom variables with high loadings on each factor, 14 factors can be preliminarily explained as distinct symptom dimensions consistent with the concept of TCM pathology. Factor 1, identified as the heat toxin factor (eigenvalue = 6.42), includes red tongue (0.78), yellow fur (0.72), yellow greasy fur (0.68), pulse (0.65), and fever (0.58). Factor 2, the dampness factor (eigenvalue = 3.21), is characterized by high loadings of greasy fur (0.74), slippery vein (0.69), and heaviness (0.61), indicating dampness accumulation. Factor 3, the deficiency syndrome factor (eigenvalue = 2.87), features high loadings of pale tongue (0.71), weak pulse (0.68), and fatigue (0.64), representing qi and blood weakness. Factor 4, the blood stasis factor (eigenvalue = 2.34), includes dark purple tongue (0.76), fixed pain (0.69), and string pulse (0.57), signifying blood stasis. Other factors represent a variety of combinations and specific symptom patterns, including joint inflammation, digestive symptoms, mood disorders, and complex pulse patterns. This factor structure provides empirical support for the concept of TCM syndrome elements, indicating that complex syndromes originate from the combination of basic pathological dimensions, rather than completely different entities. 3. 4.Cluster analysis: objective patient classification Hierarchical clustering analysis using the Ward method identified five different patient clusters based on symptom features (Figure 5). The distribution of patients across these clusters was relatively balanced: cluster 1 included 34 cases (11.5%), cluster 2 had 55 cases (18.6%), cluster 3 comprised 63 cases (21.4%), cluster 4 contained 66 cases (22.4%), and cluster 5 encompassed 77 cases (26.1%). The contour coefficient was 0.42, indicating a moderate degree of clustering separation. This suggests that while distinct patient groups exist, there is also considerable overlap in symptoms. Cross-column analysis showed a partial but imperfect correspondence between data-driven clustering and traditional syndrome classification (χ2 = 87.34, p < 0.001). It is worth noting that the patients with spleen deficiency and dampness syndrome showed relatively concentrated distribution in cluster 2 (55.3 %, 21 / 38), indicating the consistency of symptom patterns in this syndrome type. Similarly, patients with liver and kidney deficiency were mainly concentrated in cluster 1 (60.9 %, 14 / 23). In contrast, patients with damp-heat accumulation syndrome were the largest group, distributed in all five clusters, focusing on cluster 3 (29.6 %, 50 / 169), cluster 4 (24.9 %, 42 / 169) and cluster 5 (28.4 %, 48 / 169). This finding indicates that traditional damp-heat syndrome contains multiple different symptom subtypes, which may need to be further differentiated in clinical practice. The heterogeneity of damp-heat syndrome types may reflect individual physical changes in different disease stages, severity, or suggest that the current syndrome classification system does not fully capture. 3.5. Association rule mining: symptom-related patterns By applying the Apriori algorithm for association rule mining with a minimum support of 10%, a minimum confidence of 60%, and a minimum lift of 1.5, we identified 31 significant association rules among symptoms. The network visualization of these rules is shown in Figure 6. The node size represents the symptom frequency, and the edge width represents the lift value. The top five association rules, ranked by lift value, are as follows: 1. {Red tongue, slippery pulse, rapid pulse} → {slippery rapid pulse}: support = 15.3 %, confidence = 100 %, lift = 5.566. This rule indicates that when the symptoms of a red tongue, slippery pulse, and rapid pulse occur simultaneously, the presence of a compound pulse is certain. This reflects the principle in traditional Chinese medicine that a compound pulse represents a fundamental pulse combination. 2. {moss yellow, moss greasy} → {moss yellow greasy}: support = 55.9 %, confidence = 95.4 %, improvement = 1.707. This widely applicable rule indicates a strong correlation between the fundamental characteristics of liverworts and their compound manifestations. 3. {Joint pain, joint swelling, skin red} → {skin temperature rise}: support = 42.4 %, confidence = 88.9 %, lift = 1.760. This rule captures the characteristic inflammatory triad of acute gout attacks. 4. {Red tongue, yellow tongue coating} → {damp-heat syndrome}: support = 48.1 %, confidence = 85.2 %, lift = 1.487. This rule verifies the diagnostic principle of traditional Chinese medicine, that is, red tongue and yellow moss suggest damp-heat syndrome. 5. {wiry pulse, slippery pulse} → {wiry slippery pulse}: support = 18.6 %, confidence = 92.7 %, lift = 4.982. Similar to Rule 1, this proves the logical relationship between the basic pulse and the compound pulse. These association rules reveal a strong correlation between symptoms, especially in tongue-vein combinations and joint manifestations. The high lift value (> 1.5) indicates that the frequency of these symptom combinations is much higher than that of accidental expectations, supporting the concept of TCM syndrome patterns as coherent symptom clusters rather than random co-occurrence. 3.6. Machine learning model for syndrome prediction The performance of five machine learning models for TCM syndrome classification is summarized in Table 2. Logistic regression obtained the best overall performance, with an accuracy rate of 62.92 %, an accuracy rate of 0.6347, a recall rate of 0.6292, an F1 score of 0.6235, and a weighted AUC of 0.7634. Gradient boosting and random forest showed slightly worse performance, with weighted AUCs of 0.7534 and 0.7488, respectively. Support vector machine and naive Bayes showed moderate performance, with weighted AUC of 0.6581 and 0.6214, respectively. Table 2. Performance Comparison of Machine Learning Models for TCM Syndrome Classification Model Accuracy Precision Recall F1-Score Weighted AUC Logistic Regression 0.6292 0.6347 0.6292 0.6235 0.7634 Random Forest 0.6067 0.6128 0.6067 0.6014 0.7488 Gradient Boosting 0.6180 0.6241 0.6180 0.6127 0.7534 Support Vector Machine 0.5730 0.5792 0.5730 0.5681 0.6581 Naive Bayes 0.5393 0.5456 0.5393 0.5347 0.6214 Note: All metrics calculated using weighted averaging to account for class imbalance. AUC = Area Under the Curve. The receiver operating characteristic (ROC) curve of all five models is shown in Figure 7. The AUC values of all models were significantly higher than the random guess baseline (AUC = 0.5, represented by the diagonal), indicating that the predictive power of each model was meaningful. The logistic regression model exhibited the highest discrimination ability, with its ROC curve nearest to the upper left corner, indicating an optimal balance between sensitivity and specificity. Although the weighted AUC of 0.7634 is below 0.8, suggesting medium rather than excellent classification accuracy, several factors must be considered. Firstly, traditional Chinese medicine (TCM) syndrome differentiation is inherently complex, involving subjective clinical judgments that can vary among physicians. Secondly, the five-class classification task is more challenging than a binary classification, with an accidental accuracy of only 20%. Thirdly, category imbalance, with damp-heat syndrome accounting for 57.3% of cases, adds further challenges. Despite these limitations, the best model's weighted AUC of 0.76 indicates clinically significant predictive power, supporting clinical decision-making, particularly for less experienced physicians. This moderate performance also suggests potential for improvement through feature engineering, integration methods, or the inclusion of additional clinical variables. 4. Discussion 4.1. Clinical significance and theoretical confirmation of syndrome distribution The dominant position of damp-heat accumulation syndrome (57.3%) in this study cohort was highly consistent with the theoretical framework of traditional Chinese medicine. Traditional Chinese medicine classics mainly attribute gout (historically known as ' Bai Hu Li Jie Feng ' or ' Bi Zheng ‘) to damp-heat accumulation, which can be traced back to ancient books such as ' Synopsis of the Golden Chamber ' [ 31 , 32 ] . Modern traditional Chinese medicine further elucidates that dampness and heat often result from an improper diet (excessive alcohol, seafood, fats, and sweets), impaired spleen and stomach function, internal dampness and turbidity, and prolonged emotional stress [ 33 ] . The high frequency of thermal images (red tongue 75.9%, yellow tongue coating 66.8%) and wet images (greasy tongue coating 55.9%, slippery pulse 41.7%) in this study provided objective verification for this theoretical framework. However, the presence of other syndromes underscores the complexity of gout pathogenesis. Spleen deficiency with dampness obstruction (12.9%) and liver and kidney deficiency (7.8%) may indicate a chronic stage, where prolonged illness depletes healthy qi, leading to an underlying deficiency and superficial excess. Phlegm and blood stasis obstruction (9.5%) reflects the pathological progression of prolonged dampness into phlegm and blood stasis. Notably, the composite syndrome of damp-heat accumulation with phlegm and blood stasis obstruction (12.5%) highlights the complexity of clinical manifestations, suggesting that multiple pathological mechanisms may coexist and interact. From a clinical point of view, these findings have important guiding significance for treatment strategies. For patients with damp-heat syndrome, the primary approach involves clearing heat and removing dampness, often using Si Miao Powder, Er Miao Powder, and similar prescriptions [ 34 ] . Patients with spleen deficiency require treatments that benefit the spleen and stomach, addressing spleen dampness, for which Shen Ling Bai Zhu San is suitable [ 35 ] . In cases of phlegm and blood stasis syndrome, it is essential to reduce phlegm, activate blood circulation, and clear the collaterals, which can be achieved by modifying Tao Hong Si Wu Decoction and Er Chen Decoction [ 36 , 37 ] . It is recognized that the heterogeneity of syndromes indicates that even if they are both damp-heat syndromes, individualized adjustment of treatment plans may be required due to the proportion of damp-heat and the inclusion of different syndromes. 4.2. The theory of syndrome elements revealed by factor analysis In this study, 14 potential symptom dimensions were identified by factor analysis, and the first factor (eigenvalue = 6.42) represented the performance of heat toxin, which provided empirical support for the theory of TCM syndrome elements. The concept of Syndrome Elements was proposed by Professor Zhu Wenfeng in the 1980s. It posits that complex TCM syndromes are composed of Pathological Location Elements and Pathological Nature Elements, which can combine in multiple ways [ 38 , 39 ] . The different factors such as heat, dampness, deficiency and stasis identified in this study are highly consistent with the classification of pathogenic factors and physical status in traditional Chinese medicine. This finding has important theoretical and practical significance. This coincides with the view of modern systems biology that complex disease phenotypes are generated by the interaction of multiple biological modules. In practice, identifying syndrome elements can guide more accurate treatment. For instance, patients in whom the heat-toxicity factor loading predominates may benefit more from herbs with stronger heat-clearing and toxin-resolving effects (e.g., Coptis chinensis, Scutellaria baicalensis, Gardenia jasminoides). Conversely, for those with a significant dampness pathogen factor loading, treatment will require herbs with greater efficacy in resolving dampness (e.g., Atractylodes rhizome, Coix seed, Poria). However, it should also be noted that the factor structure of this study is not perfectly corresponding to the classification of traditional syndrome elements. Among the 14 factors, some are challenging to explain using a single TCM pathological concept, possibly indicating a statistical clustering of symptoms rather than an actual pathological dimension. This finding suggests that while data-driven methods can uncover potential structures among symptoms, their clinical significance must be interpreted cautiously alongside traditional Chinese medicine theory. Future research should explore the relationship between factor analysis results and biomarkers, such as inflammatory factors and metabolites, to determine if these statistical factors correspond to actual pathophysiological processes. 4.3. Cluster analysis revealed syndrome heterogeneity and classification refinement needs The partial alignment between data-driven clustering and traditional syndrome classification offers significant insights into the heterogeneity of syndromes. The discovery that patients with damp-heat syndrome are distributed in multiple clusters is particularly thought-provoking, indicating that this syndrome category may contain several different symptom subtypes. This heterogeneity could arise from various factors. First, there is the variation in the proportion of dampness and heat. Some patients may exhibit more dampness than heat, characterized by greasy tongue coating and pronounced drowsiness, while others may show more heat than dampness, with symptoms like fever, thirst, and yellow urine. This variation is obscured in the traditional broad classification of 'damp-heat accumulation.' Second, concurrent syndromes can influence this heterogeneity. Patients with damp-heat syndrome might also experience qi stagnation (evidenced by chest tightness and irritability), blood stasis (indicated by a dark tongue and fixed pain), or dyspepsia (marked by anorexia and abdominal distension), complicating the symptom spectrum. Additionally, differences in disease stages contribute to this complexity. Although both acute and chronic phases of damp-heat syndrome fall under the same category, their symptoms can differ significantly. In the acute phase, inflammation is pronounced with prominent heat symptoms, whereas in the chronic phase, lingering dampness becomes more apparent. The concentrated distribution of patients with spleen deficiency and dampness obstruction, as well as liver and kidney deficiency, within specific clusters suggests a higher consistency of symptom patterns in these syndromes. This indicates that symptoms associated with deficiency syndromes tend to be more distinct and stable. Typically, symptoms of deficiency syndrome are directly linked to the dysfunction of Zang-fu organs, such as fatigue, anorexia, and loose stools resulting from spleen deficiency, or soreness, weakness of the waist and knees, dizziness, and tinnitus due to kidney deficiency. In contrast, symptoms of excess syndrome, such as those caused by damp heat, are influenced by multiple factors, leading to greater variability. These findings have important implications for the refinement of syndrome classification. Future research can explore the classification of damp-heat into more specific subtypes, such as ' damp-heat is more serious than heat ', ' heat is more serious than damp-heat ', ' damp-heat with qi stagnation ', ' damp-heat with blood stasis ', etc. This refinement aligns with the traditional Chinese medicine principle of 'different treatment for the same disease,' acknowledging that patients with identical diagnoses may require varied treatment based on their specific syndrome manifestations. However, we should also be alert to the risk of over-segmentation. The purpose of syndrome classification is to guide clinical treatment, and too complex classification system may reduce clinical practicability. How to achieve a balance between accuracy and practicality is a key issue to be solved in future syndrome classification research. 4.4. Association rule mining: Validation and challenge coexist The strong correlation of symptoms identified through association rule mining objectively verifies the diagnostic principles of traditional Chinese medicine (TCM). The rule {red tongue, yellow tongue coating} → {damp-heat syndrome} (confidence = 85.2%, promotion = 1.487) empirically confirmed that these tongue manifestations align with the classic TCM teaching of damp-heat syndrome. Similarly, rules like {string pulse, slippery pulse} → {string slippery pulse} validated the internal logic of TCM pulse diagnosis theory by linking basic pulse conditions with compound pulse conditions. The high lift values (> 5.0) of some rules indicate that the frequency of these symptom combinations is much higher than that of chance, supporting the concept of TCM syndrome pattern as a coherent, non-random symptom group. This finding is of great significance to respond to the question of whether the TCM syndrome model has an objective basis. Historically, some scholars have criticized the TCM syndrome model as merely a subjective construct lacking an objective foundation. In this study, the analysis of large sample data demonstrated a non-random and repeatable correlation model between symptoms, providing data support for the objectivity of syndromes. However, it is important to recognize the limitations of association rule mining. Firstly, correlation does not imply causation. A high correlation indicates a statistical relationship between symptoms, but it does not clarify which symptom is the cause, which is the effect, or whether there is an underlying pathological mechanism. For instance, a red tongue, yellow fur, and rapid pulse might all indicate damp-heat pathogenesis rather than being causally related. Secondly, interpreting association rules in a clinical context requires caution. Some rules, such as {red tongue, slippery pulse, rapid pulse} → {slippery rapid pulse}, reflect the logical structure of traditional Chinese medicine (TCM) terms, indicating that a complex pulse consists of basic pulses, rather than actual clinical observations. While these rules affirm the internal consistency of TCM theory, their utility in clinical decision-making is limited. Thirdly, association rule mining is dependent on threshold settings (support, confidence, lift), and varying these thresholds can yield different results. The thresholds used in this study (support 10%, confidence 60%, lift 1.5) were based on previous research, but they are not the only valid options. Future research could explore the effects of different threshold settings on outcomes or employ advanced association rule mining algorithms, such as FP-growth, to enhance efficiency and uncover more complex patterns. Despite these limitations, association rule mining offers a valuable perspective for understanding symptom relationships. Strong associations identified can be integrated into diagnostic algorithms or clinical decision support systems to aid physicians in syndrome identification. For example, if a patient presents with a red tongue and yellow tongue coating, the system could suggest the possibility of damp-heat syndrome and recommend further examination of related symptoms. 4.5. Machine learning models: potential, challenges and controversies The moderate performance of the machine learning model in this study (optimal weighted AUC = 0.7634) triggered a discussion on the feasibility of automated syndrome classification. Proponents argue that this level of performance holds clinical value, particularly as a decision support tool for less experienced physicians. However, critics question whether such moderate performance is adequate for clinical implementation and whether machine learning models can truly capture the complexity and nuances of Traditional Chinese Medicine (TCM) syndrome differentiation. From the perspective of performance, the results of this study are basically consistent with the previous research on the application of machine learning to TCM syndrome differentiation. The accuracy of most studies is in the range of 60–80% [ 40 ] , depending on the complexity of the disease, data quality and algorithm selection. For example, some researchers used machine learning to construct a TCM syndrome differentiation model of dysmenorrhea, and found that vector machine (SVM) can achieve 98.29% accuracy and 98.24% accuracy [ 41 ] . This model can greatly help doctors to develop personalized treatment strategies for dysmenorrhea patients. The accuracy rate of 62.92% in this study was within a reasonable range, which was consistent with previous research results [ 42 , 43 ] . Considering the complexity of gout syndrome and the difficulty of five classification tasks, this result was not surprising. However, the medium performance also reflects the inherent challenges of automated syndrome classification. First of all, TCM syndrome differentiation involves subjective clinical judgment, and different physicians may have different judgments on the same patient 's syndrome. This subjectivity introduces noise to the training label and limits the upper bound of the model performance. Secondly, although the 41 symptom variables in this study are more comprehensive, they may not capture all the clinical nuances considered by experienced physicians. The diagnosis of traditional Chinese medicine emphasizes the ' four diagnostic methods '. Physicians will comprehensively consider the patient 's mental state, complexion, voice, smell and other difficult to quantify information in the syndrome differentiation, which is missing in the current data set. Thirdly, category imbalance (damp-heat syndrome accounts for 57.3%) and the nature of multi-classification tasks increase the difficulty of classification. The ability of machine learning models to truly comprehend the logic of Traditional Chinese Medicine (TCM) syndrome differentiation remains a topic of debate. TCM syndrome differentiation involves more than just matching symptoms; it requires a deep understanding of pathogenesis, the progression of diseases, and individual patient differences. Current machine learning models primarily rely on statistical associations and lack a profound grasp of TCM theory. For instance, a model might learn the association 'red tongue with yellow coating → damp-heat syndrome,' but it may not understand why this indicates damp-heat or recognize situations where this association doesn't hold (e.g., a red tongue with yellow coating and a deep pulse might suggest a mix of cold and heat rather than simple damp-heat). Despite these limitations, this study demonstrated the potential of machine learning in aiding syndrome classification. Notably, the logistic regression model's superior performance is significant. Unlike "black box" algorithms such as random forests or gradient boosting, logistic regression offers interpretable coefficients, allowing physicians to identify which symptoms most strongly predict each syndrome. This interpretability is crucial for clinical acceptance and trust. Future research should explore more interpretable machine learning methods, including deep learning, graph convolutional networks, and multi-label learning models like rule-based classifiers, decision trees, or deep learning models with attention mechanisms, to enhance clinical applicability. Potential improvements include: (1) incorporating additional features, such as laboratory indicators (inflammatory markers, metabolites), imaging findings, or genetic markers, to provide more comprehensive patient information; (2) using integrated methods that combine the strengths of various algorithms; (3) applying deep learning to capture complex nonlinear relationships; (4) collecting larger and more diverse datasets to improve model robustness and generalization; (5) developing syndrome-specific binary classification models (e.g., 'whether it is damp-heat syndrome') rather than relying on a single multi-classifier to enhance performance; and (6) introducing TCM theoretical knowledge as prior information to construct a knowledge-guided machine learning model. 4.6. Research advantages, limitations and future directions The primary advantages of this study are: (1) The large sample of medical case records (N = 295) provides good patient representativeness and statistical power; (2) comprehensive data collection encompassing 41 symptom variables, thoroughly reflecting the TCM symptom spectrum of gout; (3) the integration of diverse complementary analysis methods, including factor analysis, cluster analysis, association rules, and machine learning, to uncover syndrome characteristics from multiple perspectives; (4) the implementation of strict quality control and a standardized data collection scheme, ensuring data reliability; and (5) transparent reporting of methods and results, which facilitates the study's repeatability and verifiability. However, several limitations should be acknowledged. Design limitations include the inability of a cross-sectional design to monitor the progression of syndromes over time or assess treatment responses. Since gout is a chronic, recurrent disease, syndromes can transform at different stages, such as from acute damp-heat syndrome to chronic phlegm and blood stasis syndrome. Longitudinal studies that track these syndrome changes would provide valuable insights into disease progression and therapeutic effects. Measurement limitations arise from the syndrome classification being based on physician consensus. Despite efforts at standardization, this process may still involve subjectivity. The lack of consistency assessment among diagnosticians highlights the inherent subjectivity in syndrome differentiation. Additionally, although the 41 symptom variables are comprehensive, they may not fully capture all the information considered in TCM diagnosis, such as the patient's mental state and smell. Based on these limitations, future research directions include: Longitudinal study: tracking the evolution of patients ' syndromes over time, exploring the law of syndrome transformation, prognostic predictors and the impact of treatment on syndromes. This will provide a basis for dynamic syndrome differentiation and individualized treatment. Biomarker integration: Integrate inflammatory markers (such as IL-1β, IL-6, TNF-α, CRP), metabolites (such as uric acid, creatinine, blood lipids), genetic variations (such as ABCG2, SLC2A9 gene polymorphisms) with symptom data to explore the biological basis of syndromes [ 44 , 45 ] . This will help bridge traditional Chinese medicine and modern biomedicine, and realize the objectification of syndromes. Randomized controlled trials: To compare the efficacy of syndrome-based individualized treatment and standard treatment, and to provide high-quality evidence for the clinical utility of syndrome differentiation. For example, the efficacy and safety of individualized Chinese medicine regimen based on refined syndrome classification and standard uric acid reduction therapy were compared. 5. Conclusion This study systematically analyzed Traditional Chinese Medicine (TCM) syndrome patterns in 295 gout patients by integrating multiple data-driven methodologies, including factor analysis, hierarchical clustering, association rule mining, and machine learning. The main findings of the study include: damp-heat accumulation is the main syndrome type (57.3%), which verifies the theoretical framework of traditional Chinese medicine. The high frequency of heat and dampness provides objective evidence for the pathogenesis of damp-heat. Factor analysis revealed 14 potential symptom dimensions, with heat toxin as the dominant factor, providing empirical support for the theory of syndrome elements, indicating that complex syndromes are composed of disease location and disease nature. Hierarchical cluster analysis reveals the heterogeneity of syndromes, especially the subtype differentiation within damp-heat syndrome, suggesting the necessity and feasibility of refined classification. Association rule mining identifies the strong correlation of symptoms (lifting degree > 5.5), provides objective verification for the diagnostic principles of traditional Chinese medicine, and supports the concept of syndrome pattern as a coherent symptom group. Although the machine learning model achieves only moderate performance, it has clinical predictive performance (weighted AUC = 0.76), which proves the feasibility of automatic syndrome classification and lays a foundation for the development of clinical decision support system. By integrating the wisdom of traditional Chinese medicine and modern calculation methods, this study promoted the objectification and standardization of TCM syndrome differentiation of gout. The identified symptom patterns, syndrome subtypes and prediction models provide an evidence-based basis for more accurate clinical practice and provide a basis for the development of targeted ethnopharmacological interventions. Future studies that integrate biomarkers, longitudinal design, and clinical trials will further verify and improve these findings, and ultimately improve the clinical outcomes of gout patients through individualized, syndrome-based treatment strategies. Abbreviations MSU Monosodium urate ANOVA One-way analysis of variance IQR interquartile range SD standard deviation EFA factor analysis KMO Kaiser-Meyer-Olkin RF random forest LR logistic regression GB gradient boosting SVM support vector machine NB naive Bayes ROC Receiver operating characteristic AUC Area Under the Curve TCM Traditional Chinese Medicine Declarations Acknowledgements The authors sincerely acknowledge the professional expertise of the TCM physicians in syndrome differentiation and data collection, and extend their gratitude to the statistical analysis team for their technical support. Author contributions ** Tong Mo **: Methodology, Data Curation. ** Yihan Liu **: Methodology, Data Curation. ** Xiaohua Yang : ** Methodology, Data Curation. ** Yuran Feng **: Methodology, Data Curation. ** Guihua Yue **: Conceptualization, Project administration. ** Chengxiang Guo **: Methodology, Data Curation, Writing-Original draft. ** D ongming Z hang **: Conceptualization, Writing-Review & editing. Funding This study was funded by the following two projects : China-ASEAN College of Chinese Medicine Technology Innovation Project in 2025 (NO: 050250060609) 2025 Guangxi University of Chinese Medicine School-level Education Teaching Reform and Research Project (NO: 2025A039) Data availability The datasets generated and/or analyzed during the current study are derived from the aforementioned commercial/institutional databases (China National Knowledge Infrastructure (CNKI) and Ancient and Modern Medical Case Cloud Platform (v3.0). The raw data underlying this study are not publicly archived due to database licensing restrictions; however, the data collection process can be replicated by readers with appropriate access rights using the detailed search strategy provided in the Methods section. Processed or summarized data supporting the findings are available from the corresponding author upon reasonable request, subject to ethical approval and compliance with relevant data protection regulations. Declarations The authors declare that there are no conflicts of interest regarding the publication of this manuscript. Authors have no potential financial relationships or affiliations that could be perceived as influencing the research reported in this article. On behalf of all co-authors. Ethics approval and consent to participate This study is an analytical research based exclusively on publicly available data. All data were sourced from fully accessible public databases and resources (such as: CNKI, Ancient & Modern Medical Case Records Cloud Platform, etc.). These datasets contain no personally identifiable information (non-identifiable data), and their collection and publication processes have undergone necessary ethical review and de-identification procedures by the original data providers. In accordance with China's "Ethical Review Measures for Biomedical Research Involving Humans" and relevant scientific research ethics guidelines, research utilizing existing, publicly available, and non-identifiable data is generally eligible for exemption from ethics committee review. Consequently, upon deliberation by our institution (or: our research team), the study protocol was confirmed not to involve direct or indirect contact with or collection of data or biological samples from human subjects. It poses no intervention or risk to any individual and thus meets the criteria for exemption. Therefore, it was not submitted to an ethics committee for separate approval. Although exempt from ethical review, we solemnly commit to adhering strictly to the principles of research integrity and data usage norms throughout the research process, ensuring respect and proper citation of all data sources. Consent for publication All authors confirm their consent for publication the manuscript Competing interests The authors declare that they have no competing interests Author details 1 Guangxi University of Traditional Chinese Medicine, 530200, No.13 Wuhe Avenue, Qingxiu District, Nanning City, Guangxi Zhuang Autonomous Region, China 2 The First Clinical Medical College of Guangxi University of Traditional Chinese Medicine, 530200, No.89-9 Dongge Road, Qingxiu District, Nanning City, Guangxi Zhuang Autonomous Region, China 3 Information Technology Center of Guangxi University of Traditional Chinese Medicine, 530200, No.13 Wuhe Avenue, Qingxiu District, Nanning City, Guangxi Zhuang Autonomous Region, China 4 The New Medical College of Guangxi University of Traditional Chinese Medicine, 530200, No.13 Wuhe Avenue, Qingxiu District, Nanning City, Guangxi Zhuang Autonomous Region, China References Dalbeth, N. et al. Gout[J] Lancet , 397 (10287):1843–1855. (2021). Dehlin, M., Jacobsson, L. & Roddy, E. Global epidemiology of gout: prevalence, incidence, treatment patterns and risk factors[J]. Nat. Rev. Rheumatol. 16 (7), 380–390 (2020). Kuo, C. F. et al. Global epidemiology of gout: prevalence, incidence and risk factors[J]. Nat. Rev. Rheumatol. 11 (11), 649–662 (2015). Li, X. et al. Serum uric acid levels and multiple health outcomes: umbrella review of evidence from observational studies, randomised controlled trials, and Mendelian randomisation studies[J]. BMJ 357 , j2376 (2017). Zheng, G. et al. Assessment of drug induced hyperuricemia and gout risk using the FDA adverse event reporting system[J]. Sci. Rep. 15 (1), 22856 (2025). Finkelstein, Y. et al. Colchicine poisoning: the dark side of an ancient drug[J]. Clin. Toxicol. (Phila) . 48 (5), 407–414 (2010). Liang, H. et al. Advances in Experimental and Clinical Research of the Gouty Arthritis Treatment with Traditional Chinese Medicine[J]. Evid. Based Complement. Alternat Med. 2021 , 8698232 (2021). Unschuld, P. U., Tessenow, H. & Jinsheng, Z. An Annotated Translation of Huang Di’s Inner Classic – Basic Questions: 2 volumes[M] (University of California Press, 2011). Xiao, N. et al. Evaluating the Efficacy and Adverse Effects of Clearing Heat and Removing Dampness Method of Traditional Chinese Medicine by Comparison with Western Medicine in Patients with Gout[J]. Evid. Based Complement. Alternat Med. 2018 , 8591349 (2018). Zhou, L. et al. Systematic review and meta-analysis of the clinical efficacy and adverse effects of Chinese herbal decoction for the treatment of gout[J]. PLoS One . 9 (1), e85008 (2014). Zhao, X. et al. Efficacy and safety of Chinese herbal compound in the treatment of acute gouty arthritis: A systematic review and meta-analysis of randomized controlled trials[J]. Int. Immunopharmacol. 149 , 114223 (2025). Leong, P. Y. et al. Traditional Chinese medicine in the treatment of patients with hyperuricemia: A randomized placebo-controlled double-blinded clinical trial[J]. Int. J. Rheum. Dis. 27 (1), e14986 (2024). Yan, E. et al. Comparison of support vector machine, back propagation neural network and extreme learning machine for syndrome element differentiation[J]. Artif. Intell. Rev. 53 (4), 2453–2481 (2020). Gong, L. et al. A syndrome differentiation model of TCM based on multi-label deep forest using biomedical text mining[J]. Front. Genet. 14 , 1272016 (2023). Liao, T. S. et al. Factor analysis of traditional Chinese medicine symptoms for identification of syndrome patterns associated with idiopathic short stature in children[J]. Tzu Chi Med. J. 36 (4), 433–439 (2024). Sun, S., Zhuang, L. & Cao, M. Correlation Analysis and Application of Respiratory and Lung Diseases in Pediatrics of Traditional Chinese Medicine Based on Factor Analysis Method[J]. Comput. Math. Methods Med. 2022 , 4550039 (2022). Hong, M. et al. Analysis of the cluster efficacy and prescription characteristics of traditional Chinese medicine intervention for non-small cell lung cancer based on a clustering algorithm[J]. Technol. Health Care . 31 (5), 1759–1770 (2023). Pedregosa, F. et al. Scikit-learn: Machine Learning in Python[J]. J. Mach. Learn. Res. 12 (null), 2825–2830 (2011). Zhang, Y. et al. Exploratory Factor Analysis for Validating Traditional Chinese Syndrome Patterns of Chronic Atrophic Gastritis[J]. Evid. Based Complement. Alternat Med. 2016 , 6872890 (2016). Li, Y. et al. A study on the pattern of traditional Chinese medicine syndromes in ulcerative colitis based on factor analysis and cluster analysis [J]. Chin. J. Integr. Med. 37 (10), 1191–1195 (2017). Wang, Y. X. et al. Study on medication rules of traditional Chinese medicine for primary biliary cholangitis based on data mining [J]. Tradit Herb. Drugs . 33 (8), 1124–1130 (2022). Chen, J. R. Construction of a TCM syndrome distribution and diagnostic model for pediatric functional constipation based on machine learning [D] (Beijing University of Chinese Medicine, 2024). Qi, X. et al. Machine learning and SHAP value interpretation for predicting comorbidity of cardiovascular disease and cancer with dietary antioxidants[J]. Redox Biol. 79 , 103470 (2025). Mahaboob, B. S. et al. Multithreshold Segmentation and Machine Learning Based Approach to Differentiate COVID-19 from Viral Pneumonia[J]. Comput. Intell. Neurosci. 2022 , 2728866 (2022). Zhang, J. et al. Analysis of Acupoint Selection and Combination for Gouty Arthritis Treated with Moxibustion Based on Data Mining[J]. Med. Acupunct. 37 (3), 239–251 (2025). Yan, Y. Q. et al. [Analysis on mechanisms and medication rules of herbal prescriptions for gout caused by heat-damp accumulation syndrome based on data mining and network pharmacology] [J]. Zhongguo Zhong Yao Za Zhi . 43 (13), 2824–2830 (2018). Neogi, T. et al. 2015 Gout classification criteria: an American College of Rheumatology/European League Against Rheumatism collaborative initiative[J]. Ann. Rheum. Dis. 74 (10), 1789–1798 (2015). Wallace, S. L. et al. Preliminary criteria for the classification of the acute arthritis of primary gout[J]. Arthritis Rheum. 20 (3), 895–900 (1977). Shanghai University of Traditional Chinese Medicine. Institute of Medical History and Literature, China Academy of Chinese Medical Sciences, Fujian University of Traditional Chinese Medicine, et al. Standardized clinical terminology of traditional Chinese medicine—Part 2: Syndromes [S]. State Administration for Market Regulation; Standardization Administration of China, (2012). Agrawal, R. & Srikant, R. Fast Algorithms for Mining Association Rules in Large Databases: VLDB '94[C], San Francisco, CA, USA, (1994). Li, H. Y. & Du, M. R. Discussion on the etiology, pathogenesis, and clinical application of prescriptions for gout in Synopsis of the Golden Chamber [J]. Rheum. Arthritis . 12 (12), 51–54 (2023). Chai, W. T. et al. Treatment of gout based on the theory of 'Lijie disease' in Synopsis of the Golden Chamber [J]. Chin. J. Ethnomed. Ethnopharmacy . 32 (6), 7–10 (2023). Guo, J. W. et al. Therapeutic potential and pharmacological mechanisms of Traditional Chinese Medicine in gout treatment[J]. Acta Pharmacol. Sin . 46 (5), 1156–1176 (2025). Guo, Y. et al. Chinese Herbal Formulas Si-Wu-Tang and Er-Miao-San Synergistically Ameliorated Hyperuricemia and Renal Impairment in Rats Induced by Adenine and Potassium Oxonate[J]. Cell. Physiol. Biochem. 37 (4), 1491–1502 (2015). Lin, Z. H. & Chen, H. Y. Summary of Chen Haiyun's experience in treating gouty arthritis by stage [J]. J. New. Chin. Med. 57 (12), 226–229 (2025). Xie, K. G. et al. Effects of Taohong Siwu Decoction on peripheral blood inflammatory factors, oxidative stress, and quality of life in patients with acute gouty arthritis [J]. Liaoning J. Traditional Chin. Med. 46 (12), 2602–2605 (2019). Zhang, J. H. et al. Observation on the therapeutic effect of acupuncture combined with modified Xiaochaihu Decoction and Erchen Decoction on metatarsophalangeal joint pain in gouty arthritis [J]. Hebei J. Traditional Chin. Med. 40 (9), 1412–1414 (2018). Zhu, W. F. Establishing a unified method and system for syndrome differentiation [J]. Hunan Guiding J. Traditional Chin. Med. Pharmacol. , (1): 7–10. (2003). Zhu, W. F. Establishment of a unified system for syndrome differentiation [J]. Chin. J. Basic. Med. Traditional Chin. Med. , (4): 4–6. (2001). Wang, L. et al. Predicting new-onset stroke with machine learning: development of a model integrating traditional Chinese and western medicine[J]. Front. Pharmacol. 16 , 1546878 (2025). Zhang, L. et al. Construction and Application of a Traditional Chinese Medicine Syndrome Differentiation Model for Dysmenorrhea Based on Machine Learning[J]. Comb. Chem. High. Throughput Screen. 28 (4), 664–674 (2025). Sun, J. et al. Discovery and Validation of Traditional Chinese and Western Medicine Combination Antirheumatoid Arthritis Drugs Based on Machine Learning (Random Forest Model) [J]. Biomed. Res. Int. 2023 , 6086388 (2023). Zhang, J. et al. Explainable machine learning model and nomogram for predicting the efficacy of Traditional Chinese Medicine in treating Long COVID: a retrospective study[J]. Front. Med. (Lausanne) . 12 , 1529993 (2025). Eckenstaler, R. & Benndorf, R. A. The Role of ABCG2 in the Pathogenesis of Primary Hyperuricemia and Gout-An Update[J]. Int. J. Mol. Sci. , 2021,22(13). Zhang, X. et al. Association between SLC2A9 (GLUT9) gene polymorphisms and gout susceptibility: an updated meta-analysis[J]. Rheumatol. Int. 36 (8), 1157–1165 (2016). Additional Declarations No competing interests reported. Cite Share Download PDF Status: Under Review Version 1 posted Editorial decision: Revision requested 16 Mar, 2026 Reviews received at journal 15 Mar, 2026 Reviewers agreed at journal 15 Mar, 2026 Reviews received at journal 13 Feb, 2026 Reviewers agreed at journal 29 Jan, 2026 Reviewers invited by journal 28 Jan, 2026 Editor assigned by journal 30 Dec, 2025 Submission checks completed at journal 27 Dec, 2025 First submitted to journal 27 Dec, 2025 You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-8402309","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Article","associatedPublications":[],"authors":[{"id":582264122,"identity":"795be149-344f-413d-84ec-734946cb6805","order_by":0,"name":"Guihua Yue","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAA00lEQVRIiWNgGAWjYFCCBAZmBgMbOX72xsaHH4jXUpBmLNlzuNlYgngtHw4nGtxIbxPgIUaDfHuO8esCg8MJBjcftjFIMNjJ6TYQ0GJw5o2Z9QyD9DzJ24ltDwoYko3NDhDSIpFjZsxjYF3Mdzux3UCC4UDiNkJa5GeAtTAnNtw82CbBQ4wWhhs5xo95DJwTJ9xgJFKLwZlnZcw8BqBATgQGsgERfpFvT978mecPKCqPP3z4ocJOjqAWBgYOM6QINCCoHATYHxOZTEbBKBgFo2DEAgBGb0Te78+VyQAAAABJRU5ErkJggg==","orcid":"","institution":"Guangxi University of Chinese Medicine","correspondingAuthor":true,"prefix":"","firstName":"Guihua","middleName":"","lastName":"Yue","suffix":""},{"id":582264123,"identity":"03542dc4-aca9-42fe-bb20-551074e2b543","order_by":1,"name":"Chengxiang Guo","email":"","orcid":"","institution":"Information Technology Center of Guangxi University of Chinese Medicine","correspondingAuthor":false,"prefix":"","firstName":"Chengxiang","middleName":"","lastName":"Guo","suffix":""},{"id":582264124,"identity":"b30c6afe-c708-458f-bbbd-1fe3b4bb8411","order_by":2,"name":"Dongming Zhang","email":"","orcid":"","institution":"First Clinical Medical College of Guangxi University of Chinese Medicine","correspondingAuthor":false,"prefix":"","firstName":"Dongming","middleName":"","lastName":"Zhang","suffix":""},{"id":582264125,"identity":"96664652-8de7-44ab-9ca7-cca5ef7b6c87","order_by":3,"name":"Tong Mo","email":"","orcid":"","institution":"Guangxi University of Chinese Medicine Sai Ensi New Medical College","correspondingAuthor":false,"prefix":"","firstName":"Tong","middleName":"","lastName":"Mo","suffix":""},{"id":582264126,"identity":"6ae10d7e-a8c9-49d1-a869-32f1fbc5dfb9","order_by":4,"name":"Yihan Liu","email":"","orcid":"","institution":"Guangxi University of Chinese Medicine Sai Ensi New Medical College","correspondingAuthor":false,"prefix":"","firstName":"Yihan","middleName":"","lastName":"Liu","suffix":""},{"id":582264127,"identity":"ddf62153-b7ac-419d-974a-55a820a27bf0","order_by":5,"name":"Xiaohua Yang","email":"","orcid":"","institution":"Guangxi University of Chinese Medicine Sai Ensi New Medical College","correspondingAuthor":false,"prefix":"","firstName":"Xiaohua","middleName":"","lastName":"Yang","suffix":""},{"id":582264133,"identity":"64a65a48-2d60-4c79-bab0-b7c0a8281c47","order_by":6,"name":"Yuran Feng","email":"","orcid":"","institution":"First Clinical Medical College of Guangxi University of Chinese Medicine","correspondingAuthor":false,"prefix":"","firstName":"Yuran","middleName":"","lastName":"Feng","suffix":""}],"badges":[],"createdAt":"2025-12-19 08:23:16","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-8402309/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-8402309/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":101633031,"identity":"a51fe3ba-b6af-4619-a0e8-9d0d1f71e0a1","added_by":"auto","created_at":"2026-02-02 05:56:29","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":59891,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eGraphical Abstract\u003c/strong\u003e\u003c/p\u003e","description":"","filename":"Onlinefloatimage1.png","url":"https://assets-eu.researchsquare.com/files/rs-8402309/v1/01ffcac0777187206cf139b3.png"},{"id":101633028,"identity":"e9f8b397-9f4e-4f90-a13c-ff3c068d0d97","added_by":"auto","created_at":"2026-02-02 05:56:29","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":71360,"visible":true,"origin":"","legend":"\u003cp\u003eDistribution of TCM Syndrome Types in Gout Patients (N=295)\u003c/p\u003e","description":"","filename":"Onlinefloatimage2.png","url":"https://assets-eu.researchsquare.com/files/rs-8402309/v1/9d9b55af68c1d8b4edf38900.png"},{"id":101752582,"identity":"613d2f54-065b-452c-9e3a-e67ff5ec540f","added_by":"auto","created_at":"2026-02-03 10:28:18","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":84757,"visible":true,"origin":"","legend":"\u003cp\u003eTop 15 TCM Symptoms in Gout Patients\u003c/p\u003e","description":"","filename":"Onlinefloatimage3.png","url":"https://assets-eu.researchsquare.com/files/rs-8402309/v1/44dfc7091387914393b83efb.png"},{"id":101633024,"identity":"3ff146c9-7d96-4603-9524-1e6b76d70c95","added_by":"auto","created_at":"2026-02-02 05:56:28","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":103352,"visible":true,"origin":"","legend":"\u003cp\u003eScree Plot of Factor Analysis\u003c/p\u003e","description":"","filename":"Onlinefloatimage4.png","url":"https://assets-eu.researchsquare.com/files/rs-8402309/v1/efe259053b1b866cb01c9181.png"},{"id":101633029,"identity":"5c046bc9-2b06-4578-97bc-043e893b6619","added_by":"auto","created_at":"2026-02-02 05:56:29","extension":"png","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":225331,"visible":true,"origin":"","legend":"\u003cp\u003eHeatmap of Factor Loadings after Varimax Rotation\u003c/p\u003e","description":"","filename":"Onlinefloatimage5.png","url":"https://assets-eu.researchsquare.com/files/rs-8402309/v1/6a9c2f1fc5bacf8b57ad258a.png"},{"id":101753472,"identity":"a8ac6312-4411-4c08-be65-7b538e1cd9af","added_by":"auto","created_at":"2026-02-03 10:40:08","extension":"png","order_by":6,"title":"Figure 6","display":"","copyAsset":false,"role":"figure","size":21269,"visible":true,"origin":"","legend":"\u003cp\u003eHierarchical Clustering Dendrogram with Five Clusters (N=295)\u003c/p\u003e","description":"","filename":"Onlinefloatimage6.png","url":"https://assets-eu.researchsquare.com/files/rs-8402309/v1/f81d4040b719d1dfa787e6bd.png"},{"id":101633026,"identity":"96d712f9-71b8-4be6-8b1a-2c2ffb5bdd51","added_by":"auto","created_at":"2026-02-02 05:56:29","extension":"png","order_by":7,"title":"Figure 7","display":"","copyAsset":false,"role":"figure","size":149441,"visible":true,"origin":"","legend":"\u003cp\u003eAssociation Rules Network of TCM Symptoms\u003c/p\u003e","description":"","filename":"Onlinefloatimage7.png","url":"https://assets-eu.researchsquare.com/files/rs-8402309/v1/1be1456eba4038d3de2b6f18.png"},{"id":101633030,"identity":"0d13eca1-b169-4c15-a912-82761cd43a81","added_by":"auto","created_at":"2026-02-02 05:56:29","extension":"png","order_by":8,"title":"Figure 8","display":"","copyAsset":false,"role":"figure","size":35827,"visible":true,"origin":"","legend":"\u003cp\u003eROC Curves of Five Machine Learning Models for Syndrome Classification\u003c/p\u003e","description":"","filename":"Onlinefloatimage8.png","url":"https://assets-eu.researchsquare.com/files/rs-8402309/v1/20de9edc6bdfdf44dc85a659.png"},{"id":101755613,"identity":"2f485f78-196c-4d4d-95d4-81fce4b089bb","added_by":"auto","created_at":"2026-02-03 10:53:20","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":2633134,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-8402309/v1/9870ddf7-53fc-45c9-8b73-f4363eef2d06.pdf"}],"financialInterests":"No competing interests reported.","formattedTitle":"Data Mining of Public Databases to Identify TCM Syndrome Patterns in Gout: A Retrospective Study","fulltext":[{"header":"Research highlights ","content":"\u003cp\u003e\u0026bull; This study represents the first data-driven investigation into Traditional Chinese Medicine (TCM) syndrome patterns for gout, utilizing a dataset of 295 clinical case records.\u003c/p\u003e\n\u003cp\u003e\u0026bull; Factor analysis identified 14 core symptom dimensions, with heat toxicity as the dominant mode.\u003c/p\u003e\n\u003cp\u003e\u0026bull; Machine learning achieves the syndrome prediction ability of AUC = 0.76\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u0026bull; Cluster analysis revealed the heterogeneity within the traditional classification of damp-heat syndrome\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u0026bull; Discover strong symptom associations (lift \u0026gt; 5.5) to verify TCM theory\u003c/p\u003e"},{"header":"1. Introduction","content":"\u003cp\u003eGout, as one of the most common inflammatory joint diseases in the world, has a prevalence of 1\u0026ndash;4% among adults in developed countries\u003csup\u003e[\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e, \u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e]\u003c/sup\u003e. The incidence of gout has steadily increased due to population aging and lifestyle changes\u003csup\u003e[\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e, \u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e]\u003c/sup\u003e. This disease is defined by hyperuricemia, with the fundamental pathological process being the deposition of monosodium urate (MSU) crystals \u003csup\u003e[\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e, \u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e]\u003c/sup\u003e. The clinical manifestations are recurrent acute arthritis, chronic joint lesions and potential systemic complications. Although progress has been made in the treatment of uric acid reduction, a considerable proportion of patients still face difficulties such as poor disease control, prolonged symptoms, and adverse drug reactions\u003csup\u003e[\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e, \u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e]\u003c/sup\u003e. This highlights the need to explore complementary treatment approaches.\u003c/p\u003e \u003cp\u003eThe history of traditional Chinese medicine treatment of gout can be traced back to more than two thousand years ago. The theory of traditional Chinese medicine\u003csup\u003e[\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e, \u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e]\u003c/sup\u003e attributed gout to damp-heat accumulation, accompanied by phlegm turbidity and blood stasis blocking meridians and joints. Syndrome differentiation and treatment classify patients based on symptom patterns rather than single disease labels, forming the theoretical foundation of individualized treatment. Recent meta-analyses\u003csup\u003e[\u003cspan additionalcitationids=\"CR10\" citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e]\u003c/sup\u003e have shown that TCM has similar clinical effects to Western medicine in treating gout, with fewer adverse reactions reported. A randomized controlled study demonstrated that Wu Ling Powder and Yin Chen Wu Ling Powder significantly reduced serum uric acid levels compared to a placebo, confirming the clinical effectiveness of TCM prescriptions based on TCM syndromes\u003csup\u003e[\u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e]\u003c/sup\u003e. However, the subjectivity of syndrome differentiation depends largely on the clinical experience of physicians, which brings challenges to the standardization and evidence-based practice of gout treatment.\u003c/p\u003e \u003cp\u003eThe rise of data mining and machine learning technology provides an opportunity to objectively characterize TCM syndromes through pattern recognition of large-scale clinical data sets\u003csup\u003e[\u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e, \u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e]\u003c/sup\u003e. These computational methods can identify potential structures in complex symptom data, which may be difficult to show through traditional clinical observations. For example, factor analysis can reveal the potential symptom dimension of the corresponding syndrome factor concept\u003csup\u003e[\u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e15\u003c/span\u003e, \u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e]\u003c/sup\u003e. Additionally, cluster analysis can objectively group patients based on symptom similarities, identifying clinically significant subtypes\u003csup\u003e[\u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e17\u003c/span\u003e]\u003c/sup\u003e. Machine learning algorithms have demonstrated potential in automating syndrome classification, with studies indicating their ability to capture complex nonlinear relationships between symptoms and syndromes\u003csup\u003e[\u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e18\u003c/span\u003e]\u003c/sup\u003e.\u003c/p\u003e \u003cp\u003eA number of studies have applied data-driven methods to the study of TCM syndromes of various diseases. Factor and cluster analyses have been effectively employed to examine common syndromes in chronic atrophic gastritis, ulcerative colitis, primary biliary cholangitis, pediatric constipation, among others \u003csup\u003e[\u003cspan additionalcitationids=\"CR20 CR21\" citationid=\"CR19\" class=\"CitationRef\"\u003e19\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e22\u003c/span\u003e]\u003c/sup\u003e, revealing core symptom groups and enabling objective patient classification. Machine learning techniques, including support vector machines and neural networks, have been used to compare syndrome elements\u003csup\u003e[\u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e23\u003c/span\u003e]\u003c/sup\u003e, and even used to assist in the diagnosis of viral pneumonia, showing performance that can change according to data characteristics and syndrome complexity\u003csup\u003e[\u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e24\u003c/span\u003e]\u003c/sup\u003e. In the realm of gout, initial data mining studies have explored traditional Chinese medicine prescription patterns and treatment rules\u003csup\u003e[\u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e25\u003c/span\u003e, \u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e26\u003c/span\u003e]\u003c/sup\u003e. However, comprehensive studies integrating multiple analytical methods to examine syndrome patterns remain limited.\u003c/p\u003e \u003cp\u003eMost existing research predominantly focuses on individual analytical techniques, often neglecting the integration of complementary methods that could yield more robust and comprehensive insights. For instance, factor analysis identifies symptom dimensions but does not classify patients directly. Cluster analysis groups patients but may not uncover the underlying symptom structure. Association rule mining discovers symptom correlations but lacks predictive capability. Machine learning builds predictive models but may not offer interpretable insights into syndrome characteristics. An analytical framework that integrates these methods can leverage their respective strengths to achieve a more complete understanding of TCM syndromes.\u003c/p\u003e \u003cp\u003eThis study aims to systematically analyze the Traditional Chinese Medicine (TCM) syndrome patterns in gout patients by integrating a data-driven analytical framework. The specific objectives are: (1) to describe the distribution of syndromes and clinical features in large gout cohorts; (2) to identify core symptom clusters through exploratory factor analysis; (3) to classify patient subtypes objectively using hierarchical cluster analysis and test their correspondence with traditional syndrome classification; (4) to discover symptom association patterns through association rule mining; and (5) to construct and validate a machine learning model for TCM syndrome prediction. By integrating traditional Chinese medicine insights with modern computational methods, this study aims to advance the objectification and standardization of TCM syndrome differentiation in gout. Ultimately, it seeks to enhance accurate, evidence-based clinical practice and provide a foundation for developing new TCM interventions for gout.\u003c/p\u003e"},{"header":"2. Materials and methods","content":"\u003cdiv id=\"Sec3\" class=\"Section2\"\u003e \u003ch2\u003e2.1. Research design and ethical approval\u003c/h2\u003e \u003cp\u003eThis study adopted a retrospective cross-sectional observational design and constitutes a secondary analysis of publicly available, de-identified data, which is exempt from ethical review approval. All data were sourced from public databases, namely the China National Knowledge Infrastructure (CNKI) and the Ancient \u0026amp; Modern Medical Case Cloud Platform (V3.0), and contain no personally identifiable information. The study did not involve any intervention with human subjects or the collection of biological samples. In accordance with the relevant regulations of China\u0026rsquo;s \"Ethical Review Measures for Biomedical Research Involving Humans\" (2016) and institutional policies, the research protocol was confirmed to meet the criteria for exemption from ethical review.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec4\" class=\"Section2\"\u003e \u003ch2\u003e2.2. Study population\u003c/h2\u003e \u003cdiv id=\"Sec5\" class=\"Section3\"\u003e \u003ch2\u003e2.2.1. Inclusion criteria\u003c/h2\u003e \u003cp\u003ePatients were included based on the following criteria: (1) They met the 2015 American College of Rheumatology/European League Against Rheumatism (ACR/EULAR) gout classification criteria, which replaced the 1977 Wallace criteria \u003csup\u003e[\u003cspan citationid=\"CR27\" class=\"CitationRef\"\u003e27\u003c/span\u003e, \u003cspan citationid=\"CR28\" class=\"CitationRef\"\u003e28\u003c/span\u003e]\u003c/sup\u003e; (2) They were aged between 18 and 80 years; (3) They had complete diagnostic data for both traditional Chinese medicine and its composition; (4) They voluntarily participated and provided informed consent; (5) They had not received any uric acid-lowering treatment or traditional Chinese medicine treatment for at least 2 weeks prior to enrollment to ensure accurate evaluation.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec6\" class=\"Section3\"\u003e \u003ch2\u003e2.2.2. Exclusion criteria\u003c/h2\u003e \u003cp\u003ePatients were excluded if they met any of the following conditions: (1) concurrent rheumatic diseases such as rheumatoid arthritis, systemic lupus erythematosus, or psoriatic arthritis; (2) severe cardiovascular, hepatic, renal, or hematological diseases that could interfere with syndrome evaluation; (3) pregnancy or lactation; (4) incomplete clinical data or refusal of traditional Chinese medicine treatment; (5) cognitive dysfunction impairing reliable symptom reporting.\u003c/p\u003e \u003c/div\u003e \u003c/div\u003e \u003cdiv id=\"Sec7\" class=\"Section2\"\u003e \u003ch2\u003e2.3. Data collection\u003c/h2\u003e \u003cdiv id=\"Sec8\" class=\"Section3\"\u003e \u003ch2\u003e2.3.1 Source of data\u003c/h2\u003e \u003cp\u003eIn this study, 'TCM master', 'gout', 'TCM', and 'treatment experience' were used as the keywords to retrieve data from the China National Knowledge Infrastructure (CNKI) and the Ancient and Modern Medical Case Cloud Platform (v3.0) between January 1, 2000, and July 31, 2025. It should be noted that access to the data in these databases requires retrieval via a legitimate subscription account on the respective platforms. Literature and clinical medical cases related to TCM treatment of gout were collated into standardized case reports, resulting in a total of 295 standardized case reports.\u003c/p\u003e \u003c/div\u003e \u003c/div\u003e \u003cdiv id=\"Sec9\" class=\"Section2\"\u003e \u003ch2\u003e2.4 Data normalization processing\u003c/h2\u003e \u003cdiv id=\"Sec10\" class=\"Section3\"\u003e \u003ch2\u003e2.4.1 Demographic and clinical data collection\u003c/h2\u003e \u003cp\u003eDemographic information, such as age and gender, is collected using standardized case report forms. Clinical parameters are systematically documented, including affected joint sites, disease severity, and therapeutic drugs.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec11\" class=\"Section3\"\u003e \u003ch2\u003e2.4.2 Data specification of four diagnostic methods of traditional Chinese medicine\u003c/h2\u003e \u003cp\u003eThe four diagnostic information of traditional Chinese medicine collected by experienced Chinese medicine practitioners (clinical experience in the field of rheumatology\u0026thinsp;\u0026ge;\u0026thinsp;10 years) according to the standardized program specifications. All physicians received unified training before the study. The inspection involved a systematic observation of tongue characteristics, including color (red, light, dark purple), shape (tooth marks, swelling), coating color (white, yellow), and coating quality (thin, thick, greasy, peeling). Auscultation assessed sound quality, respiratory sounds, and body odor when clinically relevant. The inquiry was structured to gather symptoms such as joint pain characteristics (location, nature, intensity, time), joint swelling and redness, fever, thirst, appetite, urination pattern, defecation, sleep quality, emotional state, and other systemic manifestations. The incision involved pulse examination, recording pulse characteristics such as string, slip, number, delay, sinking, thin, weak, and composite pulses (e.g., string slip, slip number).\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec12\" class=\"Section3\"\u003e \u003ch2\u003e2.4.3 Collection of symptom variables\u003c/h2\u003e \u003cp\u003eBased on traditional Chinese medicine diagnoses and patient reports, we systematically recorded 41 symptom and sign variables. A database was created using binary classification coding (presence\u0026thinsp;=\u0026thinsp;1, absence\u0026thinsp;=\u0026thinsp;0) in Microsoft Office Excel 2021. These variables were selected from TCM literature, clinical guidelines, and expert consensus, encompassing the following categories: Joint manifestations (8 variables) include joint pain, joint swelling, red skin color over joints, increased local skin temperature, limited joint activity, toe joint pain, ankle joint pain, and knee joint pain. Systemic symptoms (12 variables) cover fever, chills, fatigue, insomnia, dry mouth, bitter mouth, bad breath, yellow urine, constipation, loose stools, anorexia, abdominal distension, chest tightness, and irritability. Tongue manifestations (11 variables) consist of red tongue, pale tongue, dark purple tongue, tooth-marked tongue, thin fur, thick fur, white fur, yellow fur, greasy fur, yellow greasy fur, and peeling fur. Pulse manifestations (10 variables) include string pulse, slippery pulse, rapid pulse, slow pulse, deep pulse, thready pulse, weak pulse, slippery rapid pulse, string slippery pulse, and string thready pulse.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec13\" class=\"Section3\"\u003e \u003ch2\u003e2.4.4 Classification of TCM syndromes\u003c/h2\u003e \u003cp\u003eThe TCM syndrome type was determined by at least two senior Chinese medicine practitioners (associate professor level or above) according to the second part of TCM clinical diagnosis and treatment terminology: syndrome (GB / T 16751.2\u0026ndash;2021) national standards and related clinical practice guidelines\u003csup\u003e[\u003cspan citationid=\"CR29\" class=\"CitationRef\"\u003e29\u003c/span\u003e]\u003c/sup\u003e. In cases of disagreement, a third senior physician made the final decision. According to the symptom pattern, five main syndromes were identified: damp-heat accumulation syndrome: acute onset, joint swelling, burning pain, fever, thirst like cold drink, yellow urine, constipation, red tongue, yellow greasy fur, slippery pulse; Spleen dampness syndrome: repeated joint swelling with dull pain, heavy feeling, fatigue, anorexia, loose stools, pale tongue coating, white greasy, weak or slippery pulse; Damp-heat accumulation combined with phlegm and blood stasis syndrome: the combined manifestations of damp-heat syndrome (acute inflammation, heat image) and phlegm and blood stasis syndrome (subcutaneous nodules, fixed pain); Phlegm and blood stasis syndrome: chronic course of disease, joint fixed pain, subcutaneous tophi, joint deformity, dark purple tongue, pulse string; Liver and kidney deficiency syndrome: chronic joint pain, lumbar debility, dizziness, tinnitus, pale tongue, weak pulse.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec14\" class=\"Section3\"\u003e \u003ch2\u003e2.4.5 Data Quality Control\u003c/h2\u003e \u003cp\u003eTo ensure the quality and consistency of data, several measures were implemented: (1) All Chinese medicine practitioners underwent standardized training before the study commenced, focusing on syndrome identification criteria, symptom evaluation methods, and data recording procedures; (2) Double-entry verification was used for data entry to resolve discrepancies; (3) A logic consistency check was applied in the electronic data acquisition system; (4) The missing data rate was monitored to maintain the missing rate of all key variables below 5%.\u003c/p\u003e \u003c/div\u003e \u003c/div\u003e \u003cdiv id=\"Sec15\" class=\"Section2\"\u003e \u003ch2\u003e2.5. Statistical methods\u003c/h2\u003e \u003cp\u003eAll statistical analyses were performed using Python 3.11.0 (including Scikit-learn 1.5.2, Pandas 2.2.3, Scipy 1.14.1, Matplotlib 3.9.2, Seaborn 0.13.2, Mlxtend 0.23.3) and SPSS Statistics 26.0. The significance of the statistical results was set to bilateral p\u0026thinsp;\u0026lt;\u0026thinsp;0.05.\u003c/p\u003e \u003cdiv id=\"Sec16\" class=\"Section3\"\u003e \u003ch2\u003e2.5.1. Descriptive statistics\u003c/h2\u003e \u003cp\u003eShapiro-Wilk test was used to test the normality of continuous variables. Normally distributed data are expressed as mean\u0026thinsp;\u0026plusmn;\u0026thinsp;standard deviation (SD), and non-normally distributed data are expressed as median (interquartile range, IQR). Categorical variables are expressed as frequencies and percentages. Comparison between groups: One-way analysis of variance (ANOVA) was used for continuous variables, and chi-square test or Fisher 's exact test was used for categorical variables (when expected cell count\u0026thinsp;\u0026lt;\u0026thinsp;5).\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec17\" class=\"Section3\"\u003e \u003ch2\u003e2.5.2. Exploratory factor analysis\u003c/h2\u003e \u003cp\u003eExploratory factor analysis (EFA) was performed on 41 symptom variables to identify potential symptom dimensions. Kaiser-Meyer-Olkin (KMO) sampling adequacy measure and Bartlett 's sphericity test were used to evaluate the applicability of the data (KMO\u0026thinsp;\u0026gt;\u0026thinsp;0.5 and p1 (Kaiser criterion), gravel map inspection and cumulative variance interpretation (target\u0026thinsp;\u0026gt;\u0026thinsp;50%). Factor loading\u0026thinsp;\u0026gt;\u0026thinsp;0.4 is considered to be significant for factor interpretation. Cronbach 's alpha was used to assess the internal consistency of the extracted factors when applicable.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec18\" class=\"Section3\"\u003e \u003ch2\u003e2.5.3. Hierarchical cluster analysis\u003c/h2\u003e \u003cp\u003eThe Ward connection method and Euclidean distance are employed for hierarchical clustering analysis to identify natural patient groups based on symptom characteristics. The optimal number of clusters is determined by analyzing the tree structure, calculating the silhouette coefficient\u0026mdash;where higher values indicate more distinct clusters\u0026mdash;and considering clinical interpretability. To assess the correspondence between data-driven clustering and traditional syndrome types, the chi-square test was applied to cross-link with the original syndrome classification.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec19\" class=\"Section3\"\u003e \u003ch2\u003e2.5.4. Association rule mining\u003c/h2\u003e \u003cp\u003eThe Apriori algorithm was employed to identify frequent symptom patterns and association rules among 41 dichotomous symptom variables\u003csup\u003e[\u003cspan citationid=\"CR30\" class=\"CitationRef\"\u003e30\u003c/span\u003e]\u003c/sup\u003e. The minimum support threshold was set at 10% (equivalent to at least 30 patients), with a minimum confidence of 60% and a minimum lift of 1.5, to ensure the identification of meaningful associations. The association rules were sorted by their lift values to determine the strongest symptom correlations. NetworkX was used to create a network visualization, displaying the symptom association patterns.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec20\" class=\"Section3\"\u003e \u003ch2\u003e2.5.5. Machine learning classification\u003c/h2\u003e \u003cp\u003eFive machine learning algorithms were used to classify TCM syndromes based on their proven performance choices in similar medical classification tasks: logistic regression (LR) with L2 regularization, random forest (RF) with 100 trees, gradient boosting (GB) with 100 estimators, support vector machine (SVM) with radial basis function kernel, and naive Bayes (NB) with Gaussian distribution assumption.\u003c/p\u003e \u003cp\u003eThe data set was randomly divided into a training set (70%, n\u0026thinsp;=\u0026thinsp;206) and a test set (30%, n\u0026thinsp;=\u0026thinsp;89). Stratified sampling was used to maintain the proportion of syndrome distribution. The 5-fold cross validation and grid search are used to optimize the hyperparameters on the training set. The evaluation indicators included accuracy, precision, recall, F1 score and area under the curve (AUC). In view of the nature of the multi-classification problem (five syndromes), the weighted average calculation index is used to consider the category imbalance. Receiver operating characteristic (ROC) curves were generated using a one-to-many strategy. All models are trained on standardized features (zero mean, unit variance) to ensure fair comparison.\u003c/p\u003e \u003c/div\u003e \u003c/div\u003e"},{"header":"3. Results","content":"\u003cdiv id=\"Sec22\" class=\"Section2\"\u003e\n \u003ch2\u003e3.1. Patient characteristics and syndrome distribution\u003c/h2\u003e\n \u003cp\u003eA total of 295 gout patients who met the inclusion criteria were enrolled in this study. The baseline characteristics stratified by TCM syndrome types are summarized in Table\u0026nbsp;\u003cspan class=\"InternalRef\"\u003e1\u003c/span\u003e. The cohort was dominated by males (277 cases, 93.9%), with an average age of 48.5\u0026thinsp;\u0026plusmn;\u0026thinsp;12.2 years (range: 20\u0026ndash;83 years). There was no significant difference in gender distribution (\u0026chi;2\u0026thinsp;=\u0026thinsp;5.868, p\u0026thinsp;=\u0026thinsp;0.2092) or age (F\u0026thinsp;=\u0026thinsp;0.929, p\u0026thinsp;=\u0026thinsp;0.4476) between the five syndrome groups, indicating that syndrome differentiation was independent of basic demographic factors.\u003c/p\u003e\n \u003cdiv class=\"gridtable\"\u003e\u0026nbsp;\u003ctable id=\"Tab1\" border=\"1\"\u003e\n \u003ccaption language=\"En\"\u003e\n \u003cdiv class=\"CaptionNumber\"\u003eTable 1\u003c/div\u003e\n \u003cdiv class=\"CaptionContent\"\u003e\n \u003cp\u003eBaseline Characteristics of Gout Patients by TCM Syndrome Type (N\u0026thinsp;=\u0026thinsp;295)\u003c/p\u003e\n \u003c/div\u003e\n \u003c/caption\u003e\n \u003ccolgroup cols=\"8\"\u003e\u003c/colgroup\u003e\n \u003cthead\u003e\n \u003ctr\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eCharacteristic\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eTotal (N\u0026thinsp;=\u0026thinsp;295)\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eDamp-Heat Accumulation (n\u0026thinsp;=\u0026thinsp;169)\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eSpleen Deficiency with Dampness (n\u0026thinsp;=\u0026thinsp;38)\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eDamp-Heat \u0026amp; Phlegm-Stasis (n\u0026thinsp;=\u0026thinsp;37)\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003ePhlegm-Stasis Obstruction (n\u0026thinsp;=\u0026thinsp;28)\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eLiver-Kidney Deficiency (n\u0026thinsp;=\u0026thinsp;23)\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003ep-value\u003c/p\u003e\n \u003c/th\u003e\n \u003c/tr\u003e\n \u003c/thead\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eAge (years), mean\u0026thinsp;\u0026plusmn;\u0026thinsp;SD\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e48.5\u0026thinsp;\u0026plusmn;\u0026thinsp;12.2\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e47.8\u0026thinsp;\u0026plusmn;\u0026thinsp;11.9\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e50.2\u0026thinsp;\u0026plusmn;\u0026thinsp;13.1\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e49.1\u0026thinsp;\u0026plusmn;\u0026thinsp;12.8\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e48.9\u0026thinsp;\u0026plusmn;\u0026thinsp;11.5\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e49.7\u0026thinsp;\u0026plusmn;\u0026thinsp;13.4\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.4476\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eMale, n (%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e277 (93.9)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e160 (94.7)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e34 (89.5)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e35 (94.6)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e26 (92.9)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e22 (95.7)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.2092\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eFemale, n (%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e18 (6.1)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e9 (5.3)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e4 (10.5)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e2 (5.4)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e2 (7.1)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e1 (4.3)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e-\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n \u003ctfoot\u003e\n \u003ctr\u003e\n \u003ctd colspan=\"8\"\u003eNote: p-values from ANOVA (age) and chi-square test (gender). SD\u0026thinsp;=\u0026thinsp;standard deviation.\u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tfoot\u003e\n \u003c/table\u003e\n \u003c/div\u003e\n \u003cp\u003eThe distribution of TCM syndromes is shown in Fig. \u003cspan class=\"InternalRef\"\u003e1\u003c/span\u003e. Damp-heat accumulation was the most common syndrome type, accounting for 57.3% (169 / 295), followed by spleen deficiency and dampness obstruction (12.9%, 38 / 295), damp-heat accumulation combined with phlegm and blood stasis obstruction (12.5%, 37 / 295), phlegm and blood stasis obstruction (9.5%, 28 / 295) and liver and kidney deficiency (7.8%, 23 / 295). This pattern aligns with traditional Chinese medicine theory, which attributes acute gout primarily to damp-heat accumulation, while other syndromes signify different disease stages or physiological changes.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec23\" class=\"Section2\"\u003e\n \u003ch2\u003e3.2. Symptom frequency distribution\u003c/h2\u003e\n \u003cp\u003eThe frequency distribution of the first 15 symptoms in all patients is shown in Fig. \u003cspan class=\"InternalRef\"\u003e2\u003c/span\u003e. Joint pain was almost common, with an occurrence rate of 86.4% (255 / 295), which was confirmed as the core manifestation of gout. Followed by red tongue (75.9%, 224 / 295), yellow fur (66.8%, 197 / 295), joint swelling (63.7%, 188 / 295) and yellow greasy fur (55.9%, 165 / 295). Other common symptoms included joint skin redness (54.6%), toe joint pain (51.2%), skin temperature rise (50.5%), wiry pulse (46.1%) and slippery pulse (41.7%). The high prevalence of heat (red tongue, yellow fur) and dampness (greasy fur, slippery pulse) provides empirical support for TCM to conceptualize gout as the main damp-heat syndrome.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec24\" class=\"Section2\"\u003e\n \u003ch2\u003e3.3. Factor analysis: Identify the core symptom clusters\u003c/h2\u003e\n \u003cp\u003eThe KMO sampling adequacy measure was 0.5896, which exceeded the acceptable threshold of 0.5. The Bartlett spherical test was highly significant (\u0026chi;2\u0026thinsp;=\u0026thinsp;4083.74, df\u0026thinsp;=\u0026thinsp;820, p\u0026thinsp;\u0026lt;\u0026thinsp;0.001), indicating that the relevant data were suitable for factor analysis. Figure \u003cspan class=\"InternalRef\"\u003e3\u003c/span\u003e showed an obvious inflection point after the 14th factor. Using the Kaiser criterion, 14 factors with eigenvalues greater than 1 were extracted, cumulatively explaining 58.3% of the total variance. The eigenvalue of the first factor was 6.42, highlighting its dominant contribution to symptom variation and suggesting the presence of major potential pathological dimensions.\u003c/p\u003e\n \u003cp\u003eThe factor loading matrix after the maximum variance rotation is shown in Fig. \u003cspan class=\"InternalRef\"\u003e4\u003c/span\u003e in the form of a heat map, showing that the loading of 28 symptoms on at least one factor is \u0026gt;\u0026thinsp;0.4. Based on the symptom variables with high loadings on each factor, 14 factors can be preliminarily explained as distinct symptom dimensions consistent with the concept of TCM pathology. Factor 1, identified as the heat toxin factor (eigenvalue\u0026thinsp;=\u0026thinsp;6.42), includes red tongue (0.78), yellow fur (0.72), yellow greasy fur (0.68), pulse (0.65), and fever (0.58). Factor 2, the dampness factor (eigenvalue\u0026thinsp;=\u0026thinsp;3.21), is characterized by high loadings of greasy fur (0.74), slippery vein (0.69), and heaviness (0.61), indicating dampness accumulation. Factor 3, the deficiency syndrome factor (eigenvalue\u0026thinsp;=\u0026thinsp;2.87), features high loadings of pale tongue (0.71), weak pulse (0.68), and fatigue (0.64), representing qi and blood weakness. Factor 4, the blood stasis factor (eigenvalue\u0026thinsp;=\u0026thinsp;2.34), includes dark purple tongue (0.76), fixed pain (0.69), and string pulse (0.57), signifying blood stasis. Other factors represent a variety of combinations and specific symptom patterns, including joint inflammation, digestive symptoms, mood disorders, and complex pulse patterns. This factor structure provides empirical support for the concept of TCM syndrome elements, indicating that complex syndromes originate from the combination of basic pathological dimensions, rather than completely different entities.\u003c/p\u003e\n \u003cp\u003e\u003cstrong\u003e3.\u0026nbsp;\u003c/strong\u003e\u003cstrong\u003e4.Cluster analysis: objective patient classification\u0026nbsp;\u003c/strong\u003e\u003c/p\u003e\n \u003cp\u003eHierarchical clustering analysis using the Ward method identified five different patient clusters based on symptom features (Figure 5). The distribution of patients across these clusters was relatively balanced: cluster 1 included 34 cases (11.5%), cluster 2 had 55 cases (18.6%), cluster 3 comprised 63 cases (21.4%), cluster 4 contained 66 cases (22.4%), and cluster 5 encompassed 77 cases (26.1%). The contour coefficient was 0.42, indicating a moderate degree of clustering separation. This suggests that while distinct patient groups exist, there is also considerable overlap in symptoms.\u003c/p\u003e\n \u003cp\u003eCross-column analysis showed a partial but imperfect correspondence between data-driven clustering and traditional syndrome classification (\u0026chi;2 = 87.34, p \u0026lt; 0.001). It is worth noting that the patients with spleen deficiency and dampness syndrome showed relatively concentrated distribution in cluster 2 (55.3 %, 21 / 38), indicating the consistency of symptom patterns in this syndrome type. Similarly, patients with liver and kidney deficiency were mainly concentrated in cluster 1 (60.9 %, 14 / 23). In contrast, patients with damp-heat accumulation syndrome were the largest group, distributed in all five clusters, focusing on cluster 3 (29.6 %, 50 / 169), cluster 4 (24.9 %, 42 / 169) and cluster 5 (28.4 %, 48 / 169). This finding indicates that traditional damp-heat syndrome contains multiple different symptom subtypes, which may need to be further differentiated in clinical practice. The heterogeneity of damp-heat syndrome types may reflect individual physical changes in different disease stages, severity, or suggest that the current syndrome classification system does not fully capture.\u003c/p\u003e\n \u003cp\u003e\u003cstrong\u003e3.5. Association rule mining: symptom-related patterns\u0026nbsp;\u003c/strong\u003e\u003c/p\u003e\n \u003cp\u003eBy applying the Apriori algorithm for association rule mining with a minimum support of 10%, a minimum confidence of 60%, and a minimum lift of 1.5, we identified 31 significant association rules among symptoms. The network visualization of these rules is shown in Figure 6. The node size represents the symptom frequency, and the edge width represents the lift value. The top five association rules, ranked by lift value, are as follows:\u003c/p\u003e\n \u003cp\u003e1. {Red tongue, slippery pulse, rapid pulse} \u0026rarr; {slippery rapid pulse}: support = 15.3 %, confidence = 100 %, lift = 5.566. This rule indicates that when the symptoms of a red tongue, slippery pulse, and rapid pulse occur simultaneously, the presence of a compound pulse is certain. This reflects the principle in traditional Chinese medicine that a compound pulse represents a fundamental pulse combination.\u003c/p\u003e\n \u003cp\u003e2. {moss yellow, moss greasy} \u0026rarr; {moss yellow greasy}: support = 55.9 %, confidence = 95.4 %, improvement = 1.707. This widely applicable rule indicates a strong correlation between the fundamental characteristics of liverworts and their compound manifestations.\u003c/p\u003e\n \u003cp\u003e3. {Joint pain, joint swelling, skin red} \u0026rarr; {skin temperature rise}: support = 42.4 %, confidence = 88.9 %, lift = 1.760. This rule captures the characteristic inflammatory triad of acute gout attacks.\u0026nbsp;\u003c/p\u003e\n \u003cp\u003e4. {Red tongue, yellow tongue coating} \u0026rarr; {damp-heat syndrome}: support = 48.1 %, confidence = 85.2 %, lift = 1.487. This rule verifies the diagnostic principle of traditional Chinese medicine, that is, red tongue and yellow moss suggest damp-heat syndrome.\u0026nbsp;\u003c/p\u003e\n \u003cp\u003e5. {wiry pulse, slippery pulse} \u0026rarr; {wiry slippery pulse}: support = 18.6 %, confidence = 92.7 %, lift = 4.982. Similar to Rule 1, this proves the logical relationship between the basic pulse and the compound pulse.\u003c/p\u003eThese association rules reveal a strong correlation between symptoms, especially in tongue-vein combinations and joint manifestations. The high lift value (\u0026gt; 1.5) indicates that the frequency of these symptom combinations is much higher than that of accidental expectations, supporting the concept of TCM syndrome patterns as coherent symptom clusters rather than random co-occurrence.\n\u003c/div\u003e\n\u003cp\u003e\u003cstrong\u003e3.6. Machine learning model for syndrome prediction\u0026nbsp;\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe performance of five machine learning models for TCM syndrome classification is summarized in Table 2. Logistic regression obtained the best overall performance, with an accuracy rate of 62.92 %, an accuracy rate of 0.6347, a recall rate of 0.6292, an F1 score of 0.6235, and a weighted AUC of 0.7634. Gradient boosting and random forest showed slightly worse performance, with weighted AUCs of 0.7534 and 0.7488, respectively. Support vector machine and naive Bayes showed moderate performance, with weighted AUC of 0.6581 and 0.6214, respectively.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eTable 2.\u003c/strong\u003e Performance Comparison of Machine Learning Models for TCM Syndrome Classification\u003c/p\u003e\n\u003ctable border=\"1\" cellspacing=\"0\" cellpadding=\"0\" width=\"0%\"\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd valign=\"bottom\"\u003e\n \u003cp\u003eModel\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"bottom\"\u003e\n \u003cp\u003eAccuracy\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"bottom\"\u003e\n \u003cp\u003ePrecision\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"bottom\"\u003e\n \u003cp\u003eRecall\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"bottom\"\u003e\n \u003cp\u003eF1-Score\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"bottom\"\u003e\n \u003cp\u003eWeighted AUC\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003eLogistic Regression\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e0.6292\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e0.6347\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e0.6292\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e0.6235\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e0.7634\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003eRandom Forest\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e0.6067\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e0.6128\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e0.6067\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e0.6014\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e0.7488\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003eGradient Boosting\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e0.6180\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e0.6241\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e0.6180\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e0.6127\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e0.7534\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003eSupport Vector Machine\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e0.5730\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e0.5792\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e0.5730\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e0.5681\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e0.6581\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003eNaive Bayes\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e0.5393\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e0.5456\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e0.5393\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e0.5347\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e0.6214\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n\u003c/table\u003e\n\u003cp\u003eNote: All metrics calculated using weighted averaging to account for class imbalance. AUC = Area Under the Curve.\u003c/p\u003e\n\u003cp\u003eThe receiver operating characteristic (ROC) curve of all five models is shown in Figure 7. The AUC values of all models were significantly higher than the random guess baseline (AUC = 0.5, represented by the diagonal), indicating that the predictive power of each model was meaningful. The logistic regression model exhibited the highest discrimination ability, with its ROC curve nearest to the upper left corner, indicating an optimal balance between sensitivity and specificity. Although the weighted AUC of 0.7634 is below 0.8, suggesting medium rather than excellent classification accuracy, several factors must be considered. Firstly, traditional Chinese medicine (TCM) syndrome differentiation is inherently complex, involving subjective clinical judgments that can vary among physicians. Secondly, the five-class classification task is more challenging than a binary classification, with an accidental accuracy of only 20%. Thirdly, category imbalance, with damp-heat syndrome accounting for 57.3% of cases, adds further challenges. Despite these limitations, the best model\u0026apos;s weighted AUC of 0.76 indicates clinically significant predictive power, supporting clinical decision-making, particularly for less experienced physicians. This moderate performance also suggests potential for improvement through feature engineering, integration methods, or the inclusion of additional clinical variables.\u003c/p\u003e"},{"header":"4. Discussion","content":"\u003cdiv id=\"Sec29\" class=\"Section2\"\u003e \u003ch2\u003e4.1. Clinical significance and theoretical confirmation of syndrome distribution\u003c/h2\u003e \u003cp\u003eThe dominant position of damp-heat accumulation syndrome (57.3%) in this study cohort was highly consistent with the theoretical framework of traditional Chinese medicine. Traditional Chinese medicine classics mainly attribute gout (historically known as ' Bai Hu Li Jie Feng ' or ' Bi Zheng \u0026lsquo;) to damp-heat accumulation, which can be traced back to ancient books such as ' Synopsis of the Golden Chamber '\u003csup\u003e[\u003cspan citationid=\"CR31\" class=\"CitationRef\"\u003e31\u003c/span\u003e, \u003cspan citationid=\"CR32\" class=\"CitationRef\"\u003e32\u003c/span\u003e]\u003c/sup\u003e. Modern traditional Chinese medicine further elucidates that dampness and heat often result from an improper diet (excessive alcohol, seafood, fats, and sweets), impaired spleen and stomach function, internal dampness and turbidity, and prolonged emotional stress\u003csup\u003e[\u003cspan citationid=\"CR33\" class=\"CitationRef\"\u003e33\u003c/span\u003e]\u003c/sup\u003e. The high frequency of thermal images (red tongue 75.9%, yellow tongue coating 66.8%) and wet images (greasy tongue coating 55.9%, slippery pulse 41.7%) in this study provided objective verification for this theoretical framework.\u003c/p\u003e \u003cp\u003eHowever, the presence of other syndromes underscores the complexity of gout pathogenesis. Spleen deficiency with dampness obstruction (12.9%) and liver and kidney deficiency (7.8%) may indicate a chronic stage, where prolonged illness depletes healthy qi, leading to an underlying deficiency and superficial excess. Phlegm and blood stasis obstruction (9.5%) reflects the pathological progression of prolonged dampness into phlegm and blood stasis. Notably, the composite syndrome of damp-heat accumulation with phlegm and blood stasis obstruction (12.5%) highlights the complexity of clinical manifestations, suggesting that multiple pathological mechanisms may coexist and interact.\u003c/p\u003e \u003cp\u003eFrom a clinical point of view, these findings have important guiding significance for treatment strategies. For patients with damp-heat syndrome, the primary approach involves clearing heat and removing dampness, often using Si Miao Powder, Er Miao Powder, and similar prescriptions\u003csup\u003e[\u003cspan citationid=\"CR34\" class=\"CitationRef\"\u003e34\u003c/span\u003e]\u003c/sup\u003e. Patients with spleen deficiency require treatments that benefit the spleen and stomach, addressing spleen dampness, for which Shen Ling Bai Zhu San is suitable\u003csup\u003e[\u003cspan citationid=\"CR35\" class=\"CitationRef\"\u003e35\u003c/span\u003e]\u003c/sup\u003e. In cases of phlegm and blood stasis syndrome, it is essential to reduce phlegm, activate blood circulation, and clear the collaterals, which can be achieved by modifying Tao Hong Si Wu Decoction and Er Chen Decoction\u003csup\u003e[\u003cspan citationid=\"CR36\" class=\"CitationRef\"\u003e36\u003c/span\u003e, \u003cspan citationid=\"CR37\" class=\"CitationRef\"\u003e37\u003c/span\u003e]\u003c/sup\u003e. It is recognized that the heterogeneity of syndromes indicates that even if they are both damp-heat syndromes, individualized adjustment of treatment plans may be required due to the proportion of damp-heat and the inclusion of different syndromes.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec30\" class=\"Section2\"\u003e \u003ch2\u003e4.2. The theory of syndrome elements revealed by factor analysis\u003c/h2\u003e \u003cp\u003eIn this study, 14 potential symptom dimensions were identified by factor analysis, and the first factor (eigenvalue\u0026thinsp;=\u0026thinsp;6.42) represented the performance of heat toxin, which provided empirical support for the theory of TCM syndrome elements. The concept of Syndrome Elements was proposed by Professor Zhu Wenfeng in the 1980s. It posits that complex TCM syndromes are composed of Pathological Location Elements and Pathological Nature Elements, which can combine in multiple ways\u003csup\u003e[\u003cspan citationid=\"CR38\" class=\"CitationRef\"\u003e38\u003c/span\u003e, \u003cspan citationid=\"CR39\" class=\"CitationRef\"\u003e39\u003c/span\u003e]\u003c/sup\u003e. The different factors such as heat, dampness, deficiency and stasis identified in this study are highly consistent with the classification of pathogenic factors and physical status in traditional Chinese medicine. This finding has important theoretical and practical significance. This coincides with the view of modern systems biology that complex disease phenotypes are generated by the interaction of multiple biological modules. In practice, identifying syndrome elements can guide more accurate treatment. For instance, patients in whom the heat-toxicity factor loading predominates may benefit more from herbs with stronger heat-clearing and toxin-resolving effects (e.g., Coptis chinensis, Scutellaria baicalensis, Gardenia jasminoides). Conversely, for those with a significant dampness pathogen factor loading, treatment will require herbs with greater efficacy in resolving dampness (e.g., Atractylodes rhizome, Coix seed, Poria).\u003c/p\u003e \u003cp\u003eHowever, it should also be noted that the factor structure of this study is not perfectly corresponding to the classification of traditional syndrome elements. Among the 14 factors, some are challenging to explain using a single TCM pathological concept, possibly indicating a statistical clustering of symptoms rather than an actual pathological dimension. This finding suggests that while data-driven methods can uncover potential structures among symptoms, their clinical significance must be interpreted cautiously alongside traditional Chinese medicine theory. Future research should explore the relationship between factor analysis results and biomarkers, such as inflammatory factors and metabolites, to determine if these statistical factors correspond to actual pathophysiological processes.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec31\" class=\"Section2\"\u003e \u003ch2\u003e4.3. Cluster analysis revealed syndrome heterogeneity and classification refinement needs\u003c/h2\u003e \u003cp\u003eThe partial alignment between data-driven clustering and traditional syndrome classification offers significant insights into the heterogeneity of syndromes. The discovery that patients with damp-heat syndrome are distributed in multiple clusters is particularly thought-provoking, indicating that this syndrome category may contain several different symptom subtypes. This heterogeneity could arise from various factors. First, there is the variation in the proportion of dampness and heat. Some patients may exhibit more dampness than heat, characterized by greasy tongue coating and pronounced drowsiness, while others may show more heat than dampness, with symptoms like fever, thirst, and yellow urine. This variation is obscured in the traditional broad classification of 'damp-heat accumulation.' Second, concurrent syndromes can influence this heterogeneity. Patients with damp-heat syndrome might also experience qi stagnation (evidenced by chest tightness and irritability), blood stasis (indicated by a dark tongue and fixed pain), or dyspepsia (marked by anorexia and abdominal distension), complicating the symptom spectrum. Additionally, differences in disease stages contribute to this complexity. Although both acute and chronic phases of damp-heat syndrome fall under the same category, their symptoms can differ significantly. In the acute phase, inflammation is pronounced with prominent heat symptoms, whereas in the chronic phase, lingering dampness becomes more apparent. The concentrated distribution of patients with spleen deficiency and dampness obstruction, as well as liver and kidney deficiency, within specific clusters suggests a higher consistency of symptom patterns in these syndromes. This indicates that symptoms associated with deficiency syndromes tend to be more distinct and stable. Typically, symptoms of deficiency syndrome are directly linked to the dysfunction of Zang-fu organs, such as fatigue, anorexia, and loose stools resulting from spleen deficiency, or soreness, weakness of the waist and knees, dizziness, and tinnitus due to kidney deficiency. In contrast, symptoms of excess syndrome, such as those caused by damp heat, are influenced by multiple factors, leading to greater variability.\u003c/p\u003e \u003cp\u003eThese findings have important implications for the refinement of syndrome classification. Future research can explore the classification of damp-heat into more specific subtypes, such as ' damp-heat is more serious than heat ', ' heat is more serious than damp-heat ', ' damp-heat with qi stagnation ', ' damp-heat with blood stasis ', etc. This refinement aligns with the traditional Chinese medicine principle of 'different treatment for the same disease,' acknowledging that patients with identical diagnoses may require varied treatment based on their specific syndrome manifestations. However, we should also be alert to the risk of over-segmentation. The purpose of syndrome classification is to guide clinical treatment, and too complex classification system may reduce clinical practicability. How to achieve a balance between accuracy and practicality is a key issue to be solved in future syndrome classification research.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec32\" class=\"Section2\"\u003e \u003ch2\u003e4.4. Association rule mining: Validation and challenge coexist\u003c/h2\u003e \u003cp\u003eThe strong correlation of symptoms identified through association rule mining objectively verifies the diagnostic principles of traditional Chinese medicine (TCM). The rule {red tongue, yellow tongue coating} \u0026rarr; {damp-heat syndrome} (confidence\u0026thinsp;=\u0026thinsp;85.2%, promotion\u0026thinsp;=\u0026thinsp;1.487) empirically confirmed that these tongue manifestations align with the classic TCM teaching of damp-heat syndrome. Similarly, rules like {string pulse, slippery pulse} \u0026rarr; {string slippery pulse} validated the internal logic of TCM pulse diagnosis theory by linking basic pulse conditions with compound pulse conditions.\u003c/p\u003e \u003cp\u003eThe high lift values (\u0026gt;\u0026thinsp;5.0) of some rules indicate that the frequency of these symptom combinations is much higher than that of chance, supporting the concept of TCM syndrome pattern as a coherent, non-random symptom group. This finding is of great significance to respond to the question of whether the TCM syndrome model has an objective basis. Historically, some scholars have criticized the TCM syndrome model as merely a subjective construct lacking an objective foundation. In this study, the analysis of large sample data demonstrated a non-random and repeatable correlation model between symptoms, providing data support for the objectivity of syndromes.\u003c/p\u003e \u003cp\u003eHowever, it is important to recognize the limitations of association rule mining. Firstly, correlation does not imply causation. A high correlation indicates a statistical relationship between symptoms, but it does not clarify which symptom is the cause, which is the effect, or whether there is an underlying pathological mechanism. For instance, a red tongue, yellow fur, and rapid pulse might all indicate damp-heat pathogenesis rather than being causally related. Secondly, interpreting association rules in a clinical context requires caution. Some rules, such as {red tongue, slippery pulse, rapid pulse} \u0026rarr; {slippery rapid pulse}, reflect the logical structure of traditional Chinese medicine (TCM) terms, indicating that a complex pulse consists of basic pulses, rather than actual clinical observations. While these rules affirm the internal consistency of TCM theory, their utility in clinical decision-making is limited. Thirdly, association rule mining is dependent on threshold settings (support, confidence, lift), and varying these thresholds can yield different results. The thresholds used in this study (support 10%, confidence 60%, lift 1.5) were based on previous research, but they are not the only valid options. Future research could explore the effects of different threshold settings on outcomes or employ advanced association rule mining algorithms, such as FP-growth, to enhance efficiency and uncover more complex patterns. Despite these limitations, association rule mining offers a valuable perspective for understanding symptom relationships. Strong associations identified can be integrated into diagnostic algorithms or clinical decision support systems to aid physicians in syndrome identification. For example, if a patient presents with a red tongue and yellow tongue coating, the system could suggest the possibility of damp-heat syndrome and recommend further examination of related symptoms.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec33\" class=\"Section2\"\u003e \u003ch2\u003e4.5. Machine learning models: potential, challenges and controversies\u003c/h2\u003e \u003cp\u003eThe moderate performance of the machine learning model in this study (optimal weighted AUC\u0026thinsp;=\u0026thinsp;0.7634) triggered a discussion on the feasibility of automated syndrome classification. Proponents argue that this level of performance holds clinical value, particularly as a decision support tool for less experienced physicians. However, critics question whether such moderate performance is adequate for clinical implementation and whether machine learning models can truly capture the complexity and nuances of Traditional Chinese Medicine (TCM) syndrome differentiation. From the perspective of performance, the results of this study are basically consistent with the previous research on the application of machine learning to TCM syndrome differentiation. The accuracy of most studies is in the range of 60\u0026ndash;80%\u003csup\u003e[\u003cspan citationid=\"CR40\" class=\"CitationRef\"\u003e40\u003c/span\u003e]\u003c/sup\u003e, depending on the complexity of the disease, data quality and algorithm selection. For example, some researchers used machine learning to construct a TCM syndrome differentiation model of dysmenorrhea, and found that vector machine (SVM) can achieve 98.29% accuracy and 98.24% accuracy\u003csup\u003e[\u003cspan citationid=\"CR41\" class=\"CitationRef\"\u003e41\u003c/span\u003e]\u003c/sup\u003e. This model can greatly help doctors to develop personalized treatment strategies for dysmenorrhea patients. The accuracy rate of 62.92% in this study was within a reasonable range, which was consistent with previous research results\u003csup\u003e[\u003cspan citationid=\"CR42\" class=\"CitationRef\"\u003e42\u003c/span\u003e, \u003cspan citationid=\"CR43\" class=\"CitationRef\"\u003e43\u003c/span\u003e]\u003c/sup\u003e. Considering the complexity of gout syndrome and the difficulty of five classification tasks, this result was not surprising.\u003c/p\u003e \u003cp\u003eHowever, the medium performance also reflects the inherent challenges of automated syndrome classification. First of all, TCM syndrome differentiation involves subjective clinical judgment, and different physicians may have different judgments on the same patient 's syndrome. This subjectivity introduces noise to the training label and limits the upper bound of the model performance. Secondly, although the 41 symptom variables in this study are more comprehensive, they may not capture all the clinical nuances considered by experienced physicians. The diagnosis of traditional Chinese medicine emphasizes the ' four diagnostic methods '. Physicians will comprehensively consider the patient 's mental state, complexion, voice, smell and other difficult to quantify information in the syndrome differentiation, which is missing in the current data set. Thirdly, category imbalance (damp-heat syndrome accounts for 57.3%) and the nature of multi-classification tasks increase the difficulty of classification.\u003c/p\u003e \u003cp\u003eThe ability of machine learning models to truly comprehend the logic of Traditional Chinese Medicine (TCM) syndrome differentiation remains a topic of debate. TCM syndrome differentiation involves more than just matching symptoms; it requires a deep understanding of pathogenesis, the progression of diseases, and individual patient differences. Current machine learning models primarily rely on statistical associations and lack a profound grasp of TCM theory. For instance, a model might learn the association 'red tongue with yellow coating \u0026rarr; damp-heat syndrome,' but it may not understand why this indicates damp-heat or recognize situations where this association doesn't hold (e.g., a red tongue with yellow coating and a deep pulse might suggest a mix of cold and heat rather than simple damp-heat). Despite these limitations, this study demonstrated the potential of machine learning in aiding syndrome classification. Notably, the logistic regression model's superior performance is significant. Unlike \"black box\" algorithms such as random forests or gradient boosting, logistic regression offers interpretable coefficients, allowing physicians to identify which symptoms most strongly predict each syndrome. This interpretability is crucial for clinical acceptance and trust. Future research should explore more interpretable machine learning methods, including deep learning, graph convolutional networks, and multi-label learning models like rule-based classifiers, decision trees, or deep learning models with attention mechanisms, to enhance clinical applicability. Potential improvements include: (1) incorporating additional features, such as laboratory indicators (inflammatory markers, metabolites), imaging findings, or genetic markers, to provide more comprehensive patient information; (2) using integrated methods that combine the strengths of various algorithms; (3) applying deep learning to capture complex nonlinear relationships; (4) collecting larger and more diverse datasets to improve model robustness and generalization; (5) developing syndrome-specific binary classification models (e.g., 'whether it is damp-heat syndrome') rather than relying on a single multi-classifier to enhance performance; and (6) introducing TCM theoretical knowledge as prior information to construct a knowledge-guided machine learning model.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec34\" class=\"Section2\"\u003e \u003ch2\u003e4.6. Research advantages, limitations and future directions\u003c/h2\u003e \u003cp\u003eThe primary advantages of this study are: (1) The large sample of medical case records (N\u0026thinsp;=\u0026thinsp;295) provides good patient representativeness and statistical power; (2) comprehensive data collection encompassing 41 symptom variables, thoroughly reflecting the TCM symptom spectrum of gout; (3) the integration of diverse complementary analysis methods, including factor analysis, cluster analysis, association rules, and machine learning, to uncover syndrome characteristics from multiple perspectives; (4) the implementation of strict quality control and a standardized data collection scheme, ensuring data reliability; and (5) transparent reporting of methods and results, which facilitates the study's repeatability and verifiability.\u003c/p\u003e \u003cp\u003eHowever, several limitations should be acknowledged. Design limitations include the inability of a cross-sectional design to monitor the progression of syndromes over time or assess treatment responses. Since gout is a chronic, recurrent disease, syndromes can transform at different stages, such as from acute damp-heat syndrome to chronic phlegm and blood stasis syndrome. Longitudinal studies that track these syndrome changes would provide valuable insights into disease progression and therapeutic effects. Measurement limitations arise from the syndrome classification being based on physician consensus. Despite efforts at standardization, this process may still involve subjectivity. The lack of consistency assessment among diagnosticians highlights the inherent subjectivity in syndrome differentiation. Additionally, although the 41 symptom variables are comprehensive, they may not fully capture all the information considered in TCM diagnosis, such as the patient's mental state and smell.\u003c/p\u003e \u003cp\u003eBased on these limitations, future research directions include: Longitudinal study: tracking the evolution of patients ' syndromes over time, exploring the law of syndrome transformation, prognostic predictors and the impact of treatment on syndromes. This will provide a basis for dynamic syndrome differentiation and individualized treatment. Biomarker integration: Integrate inflammatory markers (such as IL-1β, IL-6, TNF-α, CRP), metabolites (such as uric acid, creatinine, blood lipids), genetic variations (such as ABCG2, SLC2A9 gene polymorphisms) with symptom data to explore the biological basis of syndromes\u003csup\u003e[\u003cspan citationid=\"CR44\" class=\"CitationRef\"\u003e44\u003c/span\u003e, \u003cspan citationid=\"CR45\" class=\"CitationRef\"\u003e45\u003c/span\u003e]\u003c/sup\u003e. This will help bridge traditional Chinese medicine and modern biomedicine, and realize the objectification of syndromes. Randomized controlled trials: To compare the efficacy of syndrome-based individualized treatment and standard treatment, and to provide high-quality evidence for the clinical utility of syndrome differentiation. For example, the efficacy and safety of individualized Chinese medicine regimen based on refined syndrome classification and standard uric acid reduction therapy were compared.\u003c/p\u003e \u003c/div\u003e"},{"header":"5. Conclusion","content":"\u003cp\u003eThis study systematically analyzed Traditional Chinese Medicine (TCM) syndrome patterns in 295 gout patients by integrating multiple data-driven methodologies, including factor analysis, hierarchical clustering, association rule mining, and machine learning. The main findings of the study include: damp-heat accumulation is the main syndrome type (57.3%), which verifies the theoretical framework of traditional Chinese medicine. The high frequency of heat and dampness provides objective evidence for the pathogenesis of damp-heat. Factor analysis revealed 14 potential symptom dimensions, with heat toxin as the dominant factor, providing empirical support for the theory of syndrome elements, indicating that complex syndromes are composed of disease location and disease nature. Hierarchical cluster analysis reveals the heterogeneity of syndromes, especially the subtype differentiation within damp-heat syndrome, suggesting the necessity and feasibility of refined classification. Association rule mining identifies the strong correlation of symptoms (lifting degree\u0026thinsp;\u0026gt;\u0026thinsp;5.5), provides objective verification for the diagnostic principles of traditional Chinese medicine, and supports the concept of syndrome pattern as a coherent symptom group. Although the machine learning model achieves only moderate performance, it has clinical predictive performance (weighted AUC\u0026thinsp;=\u0026thinsp;0.76), which proves the feasibility of automatic syndrome classification and lays a foundation for the development of clinical decision support system.\u003c/p\u003e \u003cp\u003eBy integrating the wisdom of traditional Chinese medicine and modern calculation methods, this study promoted the objectification and standardization of TCM syndrome differentiation of gout. The identified symptom patterns, syndrome subtypes and prediction models provide an evidence-based basis for more accurate clinical practice and provide a basis for the development of targeted ethnopharmacological interventions. Future studies that integrate biomarkers, longitudinal design, and clinical trials will further verify and improve these findings, and ultimately improve the clinical outcomes of gout patients through individualized, syndrome-based treatment strategies.\u003c/p\u003e"},{"header":"Abbreviations","content":"\u003cp\u003e\u003cstrong\u003eMSU \u0026nbsp; \u0026nbsp;Monosodium urate\u0026nbsp;\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eANOVA One-way analysis of variance\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eIQR interquartile range\u003c/p\u003e\n\u003cp\u003eSD standard deviation\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eEFA factor analysis\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eKMO Kaiser-Meyer-Olkin\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eRF random forest\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eLR logistic regression\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eGB gradient boosting\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eSVM support vector machine\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eNB naive Bayes\u003c/p\u003e\n\u003cp\u003eROC Receiver operating characteristic\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eAUC Area Under the Curve\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eTCM Traditional Chinese Medicine\u0026nbsp;\u003c/p\u003e"},{"header":"Declarations","content":"\u003cp\u003e\u003cstrong\u003eAcknowledgements\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe authors sincerely acknowledge the professional expertise of the TCM physicians in syndrome differentiation and data collection, and extend their gratitude to the statistical analysis team for their technical support.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAuthor contributions\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e**\u0026nbsp;\u003cstrong\u003eTong Mo\u003c/strong\u003e **: Methodology, Data Curation. **\u0026nbsp;\u003cstrong\u003eYihan Liu\u003c/strong\u003e **: Methodology, Data Curation. **\u0026nbsp;\u003cstrong\u003eXiaohua Yang\u003c/strong\u003e: ** Methodology, Data Curation. **\u0026nbsp;\u003cstrong\u003eYuran Feng\u003c/strong\u003e **: Methodology, Data Curation. **\u003cstrong\u003e\u0026nbsp;Guihua Yue\u003c/strong\u003e **: Conceptualization, Project administration. **\u0026nbsp;\u003cstrong\u003eChengxiang Guo\u003c/strong\u003e **: Methodology, Data Curation, Writing-Original draft. **\u003cstrong\u003eD\u003c/strong\u003e\u003cstrong\u003eongming\u003c/strong\u003e\u003cstrong\u003e\u0026nbsp;Z\u003c/strong\u003e\u003cstrong\u003ehang\u003c/strong\u003e**: Conceptualization, Writing-Review \u0026amp; editing.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eFunding\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThis study was funded by the following two projects :\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eChina-ASEAN College of Chinese Medicine Technology Innovation Project in 2025 (NO: 050250060609)\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e2025 Guangxi University of Chinese Medicine School-level Education Teaching Reform and Research Project (NO: 2025A039)\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eData availability\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe datasets generated and/or analyzed during the current study are derived from the aforementioned commercial/institutional databases (China National Knowledge Infrastructure (CNKI) and Ancient and Modern Medical Case Cloud Platform (v3.0). The raw data underlying this study are not publicly archived due to database licensing restrictions; however, the data collection process can be replicated by readers with appropriate access rights using the detailed search strategy provided in the Methods section. Processed or summarized data supporting the findings are available from the corresponding author upon reasonable request, subject to ethical approval and compliance with relevant data protection regulations.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eDeclarations\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe authors declare that there are no conflicts of interest regarding the publication of this manuscript. Authors have no potential financial relationships or affiliations that could be perceived as influencing the research reported in this article. On behalf of all co-authors.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eEthics approval and consent to participate\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThis study is an analytical research based exclusively on publicly available data. All data were sourced from fully accessible public databases and resources (such as: CNKI, Ancient \u0026amp; Modern Medical Case Records Cloud Platform, etc.). These datasets contain no personally identifiable information (non-identifiable data), and their collection and publication processes have undergone necessary ethical review and de-identification procedures by the original data providers.\u003c/p\u003e\n\u003cp\u003eIn accordance with China\u0026apos;s \u0026quot;Ethical Review Measures for Biomedical Research Involving Humans\u0026quot; and relevant scientific research ethics guidelines, research utilizing existing, publicly available, and non-identifiable data is generally eligible for exemption from ethics committee review.\u003c/p\u003e\n\u003cp\u003eConsequently, upon deliberation by our institution (or: our research team), the study protocol was confirmed not to involve direct or indirect contact with or collection of data or biological samples from human subjects. It poses no intervention or risk to any individual and thus meets the criteria for exemption. Therefore, it was not submitted to an ethics committee for separate approval.\u003c/p\u003e\n\u003cp\u003eAlthough exempt from ethical review, we solemnly commit to adhering strictly to the principles of research integrity and data usage norms throughout the research process, ensuring respect and proper citation of all data sources.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eConsent for publication\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eAll authors confirm their consent for publication the manuscript\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eCompeting interests\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eThe authors declare that they have no competing interests\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAuthor details\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e\u003csup\u003e1\u0026nbsp;\u003c/sup\u003eGuangxi University of Traditional Chinese Medicine, 530200, No.13 Wuhe Avenue, Qingxiu District, Nanning City, Guangxi Zhuang Autonomous Region, China\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u003csup\u003e2\u0026nbsp;\u003c/sup\u003eThe First Clinical Medical College of Guangxi University of Traditional Chinese Medicine, 530200, No.89-9 Dongge Road, Qingxiu District, Nanning City, Guangxi Zhuang Autonomous Region, China\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u003csup\u003e3\u0026nbsp;\u003c/sup\u003eInformation Technology Center of Guangxi University of Traditional Chinese Medicine, 530200, No.13 Wuhe Avenue, Qingxiu District, Nanning City, Guangxi Zhuang Autonomous Region, China\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u003csup\u003e4\u0026nbsp;\u003c/sup\u003eThe New Medical College of Guangxi University of Traditional Chinese Medicine, 530200, No.13 Wuhe Avenue, Qingxiu District, Nanning City, Guangxi Zhuang Autonomous Region, China\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\u003cli\u003e\u003cspan\u003eDalbeth, N. et al. \u003cem\u003eGout[J] Lancet\u003c/em\u003e, \u003cb\u003e397\u003c/b\u003e(10287):1843\u0026ndash;1855. (2021).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eDehlin, M., Jacobsson, L. \u0026amp; Roddy, E. Global epidemiology of gout: prevalence, incidence, treatment patterns and risk factors[J]. \u003cem\u003eNat. Rev. Rheumatol.\u003c/em\u003e \u003cb\u003e16\u003c/b\u003e (7), 380\u0026ndash;390 (2020).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKuo, C. F. et al. Global epidemiology of gout: prevalence, incidence and risk factors[J]. \u003cem\u003eNat. Rev. Rheumatol.\u003c/em\u003e \u003cb\u003e11\u003c/b\u003e (11), 649\u0026ndash;662 (2015).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLi, X. et al. Serum uric acid levels and multiple health outcomes: umbrella review of evidence from observational studies, randomised controlled trials, and Mendelian randomisation studies[J]. \u003cem\u003eBMJ\u003c/em\u003e \u003cb\u003e357\u003c/b\u003e, j2376 (2017).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZheng, G. et al. Assessment of drug induced hyperuricemia and gout risk using the FDA adverse event reporting system[J]. \u003cem\u003eSci. Rep.\u003c/em\u003e \u003cb\u003e15\u003c/b\u003e (1), 22856 (2025).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eFinkelstein, Y. et al. Colchicine poisoning: the dark side of an ancient drug[J]. \u003cem\u003eClin. Toxicol. (Phila)\u003c/em\u003e. \u003cb\u003e48\u003c/b\u003e (5), 407\u0026ndash;414 (2010).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLiang, H. et al. Advances in Experimental and Clinical Research of the Gouty Arthritis Treatment with Traditional Chinese Medicine[J]. \u003cem\u003eEvid. Based Complement. Alternat Med.\u003c/em\u003e \u003cb\u003e2021\u003c/b\u003e, 8698232 (2021).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eUnschuld, P. U., Tessenow, H. \u0026amp; Jinsheng, Z. \u003cem\u003eAn Annotated Translation of Huang Di\u0026rsquo;s Inner Classic \u0026ndash; Basic Questions: 2 volumes[M]\u003c/em\u003e (University of California Press, 2011).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eXiao, N. et al. Evaluating the Efficacy and Adverse Effects of Clearing Heat and Removing Dampness Method of Traditional Chinese Medicine by Comparison with Western Medicine in Patients with Gout[J]. \u003cem\u003eEvid. Based Complement. Alternat Med.\u003c/em\u003e \u003cb\u003e2018\u003c/b\u003e, 8591349 (2018).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZhou, L. et al. Systematic review and meta-analysis of the clinical efficacy and adverse effects of Chinese herbal decoction for the treatment of gout[J]. \u003cem\u003ePLoS One\u003c/em\u003e. \u003cb\u003e9\u003c/b\u003e (1), e85008 (2014).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZhao, X. et al. Efficacy and safety of Chinese herbal compound in the treatment of acute gouty arthritis: A systematic review and meta-analysis of randomized controlled trials[J]. \u003cem\u003eInt. Immunopharmacol.\u003c/em\u003e \u003cb\u003e149\u003c/b\u003e, 114223 (2025).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLeong, P. Y. et al. Traditional Chinese medicine in the treatment of patients with hyperuricemia: A randomized placebo-controlled double-blinded clinical trial[J]. \u003cem\u003eInt. J. Rheum. Dis.\u003c/em\u003e \u003cb\u003e27\u003c/b\u003e (1), e14986 (2024).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eYan, E. et al. Comparison of support vector machine, back propagation neural network and extreme learning machine for syndrome element differentiation[J]. \u003cem\u003eArtif. Intell. Rev.\u003c/em\u003e \u003cb\u003e53\u003c/b\u003e (4), 2453\u0026ndash;2481 (2020).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eGong, L. et al. A syndrome differentiation model of TCM based on multi-label deep forest using biomedical text mining[J]. \u003cem\u003eFront. Genet.\u003c/em\u003e \u003cb\u003e14\u003c/b\u003e, 1272016 (2023).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLiao, T. S. et al. Factor analysis of traditional Chinese medicine symptoms for identification of syndrome patterns associated with idiopathic short stature in children[J]. \u003cem\u003eTzu Chi Med. J.\u003c/em\u003e \u003cb\u003e36\u003c/b\u003e (4), 433\u0026ndash;439 (2024).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSun, S., Zhuang, L. \u0026amp; Cao, M. Correlation Analysis and Application of Respiratory and Lung Diseases in Pediatrics of Traditional Chinese Medicine Based on Factor Analysis Method[J]. \u003cem\u003eComput. Math. Methods Med.\u003c/em\u003e \u003cb\u003e2022\u003c/b\u003e, 4550039 (2022).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHong, M. et al. Analysis of the cluster efficacy and prescription characteristics of traditional Chinese medicine intervention for non-small cell lung cancer based on a clustering algorithm[J]. \u003cem\u003eTechnol. Health Care\u003c/em\u003e. \u003cb\u003e31\u003c/b\u003e (5), 1759\u0026ndash;1770 (2023).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003ePedregosa, F. et al. Scikit-learn: Machine Learning in Python[J]. \u003cem\u003eJ. Mach. Learn. Res.\u003c/em\u003e \u003cb\u003e12\u003c/b\u003e (null), 2825\u0026ndash;2830 (2011).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZhang, Y. et al. Exploratory Factor Analysis for Validating Traditional Chinese Syndrome Patterns of Chronic Atrophic Gastritis[J]. \u003cem\u003eEvid. Based Complement. Alternat Med.\u003c/em\u003e \u003cb\u003e2016\u003c/b\u003e, 6872890 (2016).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLi, Y. et al. A study on the pattern of traditional Chinese medicine syndromes in ulcerative colitis based on factor analysis and cluster analysis [J]. \u003cem\u003eChin. J. Integr. Med.\u003c/em\u003e \u003cb\u003e37\u003c/b\u003e (10), 1191\u0026ndash;1195 (2017).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWang, Y. X. et al. Study on medication rules of traditional Chinese medicine for primary biliary cholangitis based on data mining [J]. \u003cem\u003eTradit Herb. Drugs\u003c/em\u003e. \u003cb\u003e33\u003c/b\u003e (8), 1124\u0026ndash;1130 (2022).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eChen, J. R. \u003cem\u003eConstruction of a TCM syndrome distribution and diagnostic model for pediatric functional constipation based on machine learning [D]\u003c/em\u003e (Beijing University of Chinese Medicine, 2024).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eQi, X. et al. Machine learning and SHAP value interpretation for predicting comorbidity of cardiovascular disease and cancer with dietary antioxidants[J]. \u003cem\u003eRedox Biol.\u003c/em\u003e \u003cb\u003e79\u003c/b\u003e, 103470 (2025).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMahaboob, B. S. et al. Multithreshold Segmentation and Machine Learning Based Approach to Differentiate COVID-19 from Viral Pneumonia[J]. \u003cem\u003eComput. Intell. Neurosci.\u003c/em\u003e \u003cb\u003e2022\u003c/b\u003e, 2728866 (2022).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZhang, J. et al. Analysis of Acupoint Selection and Combination for Gouty Arthritis Treated with Moxibustion Based on Data Mining[J]. \u003cem\u003eMed. Acupunct.\u003c/em\u003e \u003cb\u003e37\u003c/b\u003e (3), 239\u0026ndash;251 (2025).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eYan, Y. Q. et al. [Analysis on mechanisms and medication rules of herbal prescriptions for gout caused by heat-damp accumulation syndrome based on data mining and network pharmacology] [J]. \u003cem\u003eZhongguo Zhong Yao Za Zhi\u003c/em\u003e. \u003cb\u003e43\u003c/b\u003e (13), 2824\u0026ndash;2830 (2018).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eNeogi, T. et al. 2015 Gout classification criteria: an American College of Rheumatology/European League Against Rheumatism collaborative initiative[J]. \u003cem\u003eAnn. Rheum. Dis.\u003c/em\u003e \u003cb\u003e74\u003c/b\u003e (10), 1789\u0026ndash;1798 (2015).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWallace, S. L. et al. Preliminary criteria for the classification of the acute arthritis of primary gout[J]. \u003cem\u003eArthritis Rheum.\u003c/em\u003e \u003cb\u003e20\u003c/b\u003e (3), 895\u0026ndash;900 (1977).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eShanghai University of Traditional Chinese Medicine. Institute of Medical History and Literature, China Academy of Chinese Medical Sciences, Fujian University of Traditional Chinese Medicine, et al. Standardized clinical terminology of traditional Chinese medicine\u0026mdash;Part 2: Syndromes [S]. State Administration for Market Regulation; Standardization Administration of China, (2012).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAgrawal, R. \u0026amp; Srikant, R. Fast Algorithms for Mining Association Rules in Large Databases: VLDB '94[C], San Francisco, CA, USA, (1994).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLi, H. Y. \u0026amp; Du, M. R. Discussion on the etiology, pathogenesis, and clinical application of prescriptions for gout in \u003cem\u003eSynopsis of the Golden Chamber\u003c/em\u003e [J]. \u003cem\u003eRheum. Arthritis\u003c/em\u003e. \u003cb\u003e12\u003c/b\u003e (12), 51\u0026ndash;54 (2023).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eChai, W. T. et al. Treatment of gout based on the theory of 'Lijie disease' in \u003cem\u003eSynopsis of the Golden Chamber\u003c/em\u003e [J]. \u003cem\u003eChin. J. Ethnomed. Ethnopharmacy\u003c/em\u003e. \u003cb\u003e32\u003c/b\u003e (6), 7\u0026ndash;10 (2023).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eGuo, J. W. et al. Therapeutic potential and pharmacological mechanisms of Traditional Chinese Medicine in gout treatment[J]. \u003cem\u003eActa Pharmacol. Sin\u003c/em\u003e. \u003cb\u003e46\u003c/b\u003e (5), 1156\u0026ndash;1176 (2025).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eGuo, Y. et al. Chinese Herbal Formulas Si-Wu-Tang and Er-Miao-San Synergistically Ameliorated Hyperuricemia and Renal Impairment in Rats Induced by Adenine and Potassium Oxonate[J]. \u003cem\u003eCell. Physiol. Biochem.\u003c/em\u003e \u003cb\u003e37\u003c/b\u003e (4), 1491\u0026ndash;1502 (2015).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLin, Z. H. \u0026amp; Chen, H. Y. Summary of Chen Haiyun's experience in treating gouty arthritis by stage [J]. \u003cem\u003eJ. New. Chin. Med.\u003c/em\u003e \u003cb\u003e57\u003c/b\u003e (12), 226\u0026ndash;229 (2025).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eXie, K. G. et al. Effects of Taohong Siwu Decoction on peripheral blood inflammatory factors, oxidative stress, and quality of life in patients with acute gouty arthritis [J]. \u003cem\u003eLiaoning J. Traditional Chin. Med.\u003c/em\u003e \u003cb\u003e46\u003c/b\u003e (12), 2602\u0026ndash;2605 (2019).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZhang, J. H. et al. Observation on the therapeutic effect of acupuncture combined with modified Xiaochaihu Decoction and Erchen Decoction on metatarsophalangeal joint pain in gouty arthritis [J]. \u003cem\u003eHebei J. Traditional Chin. Med.\u003c/em\u003e \u003cb\u003e40\u003c/b\u003e (9), 1412\u0026ndash;1414 (2018).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZhu, W. F. Establishing a unified method and system for syndrome differentiation [J]. \u003cem\u003eHunan Guiding J. Traditional Chin. Med. Pharmacol.\u003c/em\u003e, (1): 7\u0026ndash;10. (2003).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZhu, W. F. Establishment of a unified system for syndrome differentiation [J]. \u003cem\u003eChin. J. Basic. Med. Traditional Chin. Med.\u003c/em\u003e, (4): 4\u0026ndash;6. (2001).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWang, L. et al. Predicting new-onset stroke with machine learning: development of a model integrating traditional Chinese and western medicine[J]. \u003cem\u003eFront. Pharmacol.\u003c/em\u003e \u003cb\u003e16\u003c/b\u003e, 1546878 (2025).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZhang, L. et al. Construction and Application of a Traditional Chinese Medicine Syndrome Differentiation Model for Dysmenorrhea Based on Machine Learning[J]. \u003cem\u003eComb. Chem. High. Throughput Screen.\u003c/em\u003e \u003cb\u003e28\u003c/b\u003e (4), 664\u0026ndash;674 (2025).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSun, J. et al. Discovery and Validation of Traditional Chinese and Western Medicine Combination Antirheumatoid Arthritis Drugs Based on Machine Learning (Random Forest Model) [J]. \u003cem\u003eBiomed. Res. Int.\u003c/em\u003e \u003cb\u003e2023\u003c/b\u003e, 6086388 (2023).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZhang, J. et al. Explainable machine learning model and nomogram for predicting the efficacy of Traditional Chinese Medicine in treating Long COVID: a retrospective study[J]. \u003cem\u003eFront. Med. (Lausanne)\u003c/em\u003e. \u003cb\u003e12\u003c/b\u003e, 1529993 (2025).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eEckenstaler, R. \u0026amp; Benndorf, R. A. The Role of ABCG2 in the Pathogenesis of Primary Hyperuricemia and Gout-An Update[J]. \u003cem\u003eInt. J. Mol. Sci.\u003c/em\u003e, 2021,22(13).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZhang, X. et al. Association between SLC2A9 (GLUT9) gene polymorphisms and gout susceptibility: an updated meta-analysis[J]. \u003cem\u003eRheumatol. Int.\u003c/em\u003e \u003cb\u003e36\u003c/b\u003e (8), 1157\u0026ndash;1165 (2016).\u003c/span\u003e\u003c/li\u003e\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":true,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"scientific-reports","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"scirep","sideBox":"Learn more about [Scientific Reports](http://www.nature.com/srep/)","snPcode":"","submissionUrl":"","title":"Scientific Reports","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"stoa","reportingPortfolio":"Scientific Reports","inReviewEnabled":true,"inReviewRevisionsEnabled":true},"keywords":"Gout, Traditional Chinese Medicine, Factor Analysis, Machine Learning, Association Rules, Data Mining","lastPublishedDoi":"10.21203/rs.3.rs-8402309/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-8402309/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003e\u003cstrong\u003eObjective:\u0026nbsp;\u003c/strong\u003eThis study aims to analyze TCM syndrome patterns in gout patients by integrating multiple data-driven methods—including factor analysis, hierarchical clustering, association rule mining, and machine learning—based on a large-scale, structured dataset of clinical gout case records. The goals are to identify core symptom clusters, objectively classify patient subtypes, uncover symptom association patterns, and construct a predictive model for syndrome differentiation.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eMaterials and Methods:\u003c/strong\u003e This study was a retrospective data mining analysis. The data were derived from published Traditional Chinese Medicine (TCM) case reports on gout that met the inclusion criteria, retrieved from the China National Knowledge Infrastructure (CNKI) database and the \"Ancient and Modern Medical Case Cloud Platform (V3.0)\" between 2020 and 2023. A total of 295 cases were included. Demographic characteristics, TCM four-examination data, 41 binary symptom variables, and syndrome classification information were collected. Statistical analyses included exploratory factor analysis (with varimax rotation), hierarchical cluster analysis (Ward's method), association rule mining (Apriori algorithm), and five machine learning classifiers (logistic regression, random forest, gradient boosting, support vector machine, and naive Bayes). The analyses were performed using Python 3.11.0 and SPSS 26.0.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eResults:\u003c/strong\u003e The cohort consisted of 277 males (93.9%) and 18 females (6.1%), with an average age of 48.5 ± 12.2 years. Gout syndrome distribution: damp-heat accumulation in 169 cases (57.3 %), spleen deficiency and dampness obstruction in 38 cases (12.9 %), damp-heat combined with phlegm and blood stasis in 37 cases (12.5 %), phlegm and blood stasis obstruction in 28 cases (9.5 %), liver and kidney deficiency in 23 cases (7.8 %). The high-frequency symptoms of gout were joint pain (86.4 %), red tongue (75.9 %), yellow fur (66.8 %), and joint swelling (63.7 %). The results of factor analysis showed that 14 symptom factors were extracted (KMO = 0.5896, Bartlett 's χ2 = 4083.74, p \u0026lt; 0.001), with the main factor (eigenvalue = 6.42) representing the toxic heat dimension. Cluster analysis identified five patient groups, indicating internal heterogeneity in damp-heat syndrome. The association rule mining found 31 significant associations, and the strongest rules (red tongue, slippery pulse, number pulse) → (slippery number pulse) (confidence 100 %, improvement 5.566). In the machine learning model, logistic regression performed best (accuracy 62.92 %, weighted AUC = 0.7634).\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eConclusion:\u003c/strong\u003e This study provides objective evidence for TCM syndrome differentiation of gout by integrating multiple data-driven methods. The prevalence of damp-heat syndrome supports the theoretical framework of TCM. Factor analysis validated the concept of syndrome elements from the symptom dimension, while cluster analysis highlighted the need for refined classification. The moderate performance of the machine learning model indicates its potential for clinical decision support. This study advances the standardization of syndrome differentiation by merging traditional wisdom with modern computational methods, aiding in the diagnosis and treatment of gout in TCM.\u003c/p\u003e","manuscriptTitle":"Data Mining of Public Databases to Identify TCM Syndrome Patterns in Gout: A Retrospective Study","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2026-02-02 05:56:23","doi":"10.21203/rs.3.rs-8402309/v1","editorialEvents":[{"type":"communityComments","content":0},{"type":"decision","content":"Revision requested","date":"2026-03-16T10:12:18+00:00","index":"","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2026-03-16T00:13:37+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"172496980147245947207842590783138216501","date":"2026-03-15T23:26:19+00:00","index":"hide","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2026-02-14T01:48:52+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"265393811453889236670203194745708302592","date":"2026-01-29T07:41:30+00:00","index":"hide","fulltext":""},{"type":"reviewersInvited","content":"","date":"2026-01-28T18:20:29+00:00","index":"","fulltext":""},{"type":"editorAssigned","content":"","date":"2025-12-30T09:59:44+00:00","index":"","fulltext":""},{"type":"checksComplete","content":"","date":"2025-12-27T15:22:52+00:00","index":"","fulltext":""},{"type":"submitted","content":"Scientific Reports","date":"2025-12-27T15:15:38+00:00","index":"","fulltext":""}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"scientific-reports","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"scirep","sideBox":"Learn more about [Scientific Reports](http://www.nature.com/srep/)","snPcode":"","submissionUrl":"","title":"Scientific Reports","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"stoa","reportingPortfolio":"Scientific Reports","inReviewEnabled":true,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"c832df5d-5f6c-4a88-a4e0-da41946d8858","owner":[],"postedDate":"February 2nd, 2026","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"under-review","subjectAreas":[{"id":61945509,"name":"Biological sciences/Computational biology and bioinformatics"},{"id":61945510,"name":"Health sciences/Diseases"},{"id":61945511,"name":"Health sciences/Health care"},{"id":61945512,"name":"Health sciences/Medical research"},{"id":61945513,"name":"Health sciences/Risk factors"}],"tags":[],"updatedAt":"2026-05-13T05:57:59+00:00","versionOfRecord":[],"versionCreatedAt":"2026-02-02 05:56:23","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-8402309","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-8402309","identity":"rs-8402309","version":["v1"]},"buildId":"XKTyCvWXoU3ODBz1xrDgd","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

Ask this paper AI returns verbatim quotes from the full text · source: preprint-html

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2026) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc
last seen: 2026-05-20T01:45:00.602351+00:00