Development and validation of a new nomogram for OA based on machine learning

doi:10.21203/rs.3.rs-4268728/v1

Development and validation of a new nomogram for OA based on machine learning

2024 · doi:10.21203/rs.3.rs-4268728/v1

preprint OA: closed CC-BY-4.0

📄 Open PDF Full text JSON View at publisher

Full text 121,393 characters · extracted from preprint-html · click to expand

Development and validation of a new nomogram for OA based on machine learning | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article Development and validation of a new nomogram for OA based on machine learning Qiongbing Zheng, Jiexin Chen, Youmian Lan, Meijing Li, Ling Lin This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-4268728/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract Introduction: Osteoarthritis (OA) is a chronic joint disease with the global number of OA patients exceeds 300 million currently, posing a significant economic burden on patients and society. Currently, there is no cure for OA, making early identification and appropriate management of individuals at risk crucial. Thus, the development of a novel OA prediction model to screen for high-risk individuals, enabling early diagnosis and intervention, holds great importance in improving patient prognosis. Methods: Based on the National Health and Nutrition Examination Survey (NHANES) for the periods of 2011-2012, 2013-2014, and 2015-2016, the study was a retrospective cross-sectional study involving 11,366 participants. Least absolute shrinkage and selection operator (LASSO) regression, XGBoost algorithm, and random forest (RF) algorithm were used to identify significant indicators associated with OA, and a OA prediction nomogram was developed. The nomogram was evaluated by measuring the the area under receiver operating characteristic curve (AUC), calibration curve, and decision curve analysis (DCA) curve of training and validation sets. Results: In this study, we identified 5 predictors from 19 variables, including age, gender, hypertension, BMI and coffee intake, and developed an OA nomogram. In both the training and validation cohorts, the OA nomogram exhibited good predictive performance (with AUCs of 0.804 and 0.814, respectively), good consistency and stability in calibration curve and high net benefit in DCA. Conclusion: This nomogram based on 5 variables predicted the risk of OA with a high degree of accuracy, suggesting that it is a convenient tool for clinicians to identify high-risk populations of OA. Osteoarthritis Nomogram NHANES Machine learning LASSO XGBoost Random forest Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Introduction Osteoarthritis (OA) is a chronic joint disease characterized by cartilage damage, subchondral bone destruction, and synovial inflammation[ 1 ]. Currently, there are over 300 million OA patients worldwide, and future prevalence is expected to rise due to rising obesity rates and aging, posing a significant economic burden on patients and society[ 2 , 3 ]. The pathogenesis of OA is still not fully understood and is considered to involve factors such as immunity, metabolism, hormones, and genetics[ 4 – 6 ]. At present, in order to improve quality of life, the clinical management of OA primarily focuses on symptom relief, including pain reduction, alleviation of stiffness, and maintenance of joint functio[ 7 ]. While joint replacement surgery is a viable treatment option for advanced-stage OA patients, the high cost and associated surgical risks make it unattainable for the majority of OA patients[ 8 ]. In summary, there is currently no cure for OA. However, Chu et al. demonstrated that the early stages of OA may be reversible, and further recommended early diagnosis as a potential strategy to delay disease progression[ 8 ]. Therefore, developing a predictive model to identify high-risk populations for OA is essential for early diagnosis and intervention to improve patient outcomes. In recent years, the utilization of machine learning techniques has opened up new avenues for the development of clinical prediction models. Machine learning algorithms can analyze multiple variables based on data and identify relevant risk factors, providing early disease diagnosis predictions, which has been widely applied in assisting clinical diagnosis for various diseases[ 9 , 10 ]. With the increasing incidence of OA, there is an urgent need for a disease risk prediction model to help early diagnosis of OA. However, previous prediction models have primarily focused on genetic aspects or solely relied on imaging and serological factors[ 11 – 14 ]. Other disease diagnostic models have utilized Logistic regression for variable selection and model construction. Unfortunately, these models have encountered issues such as small sample sizes, lack of comprehensive model evaluation methods, and the absence of internal or external validation. As of now, there is no disease prediction model for OA based on clinical data. The nomogram is a visual disease-specific prediction model that incorporates various clinical variables. It aids in early disease detection and is easily accessible for doctors to utilize[ 15 , 16 ]. In this study, we aim to create a novel clinical prediction model through machine learning methods, incorporating patient demographics, laboratory examinations, anthropometric measurements, and health survey information. This model is designed to effectively predict the risk of OA occurrence and identify high-risk populations of OA. Methods Data source The data in this study was obtained from the National Health and Nutrition Examination Survey (NHANES) in the United States. This survey employs a sophisticated, multi-stage, and stratified probability sampling method to provide a comprehensive understanding of the health and nutritional status of the non-institutionalized population in the United States. The data from NHANES is nationally representative, making it an invaluable resource for conducting large-scale epidemiological studies and developing clinical prediction models. All NHANES survey protocols have received approval from the Research Ethics Review Committee of the National Center for Health Statistics, and participants have signed informed consent forms before participating in the survey. All the data from NHANES used in this study are publicly available at https://www.cdc.gov/nchs/nhanes . Participant selection In our study, we utilized data from three cycles of the NHANES survey, conducted in the years 2011–2012, 2013–2014, and 2015–2016. During these cycles, a total of 29,902 participants completed extensive demographic surveys, laboratory examinations, and health status questionnaires. To ensure the accuracy and reliability of our research, we conducted rigorous data screening and exclusions. Firstly, participants under 20 years of age were excluded (n = 12,854), as our study primarily focused on osteoarthritis in adults (n = 17,048). Subsequently, individuals with missing osteoarthritis-related data were excluded (n = 1,256). Furthermore,to guarantee data integrity, we excluded individuals lacking essential demographic information (n = 1,456), those with missing questionnaire and dietary data (n = 2,462), and those with missing laboratory data (n = 508). Ultimately, a total of 11,366 participants were included as the subjects of analysis in our study, as shown in Fig. 1 . Definition of osteoarthritis The case definition in epidemiological studies often relies on self-reported osteoarthritis (OA)[ 17 ]. March et al. demonstrated an 81% consistency rate between self-reported OA and clinically well-defined OA[ 18 ], suggesting that OA can be reliably self-reported. All participants were asked if they had ever been diagnosed with arthritis: "Has a doctor or other health professional ever told you that you have arthritis?" If participants answered "yes," they were then asked, "What type of arthritis do you have?" Based on the answers to these questions, participants were categorized as having OA, other types of arthritis or no arthritis[ 19 ]. Demographics, laboratory Factors, anthropometrics, and lifestyles In accordance with previous research, we identified factors including age, gender, race, educational level, poverty income ratio (PIR), marital status, body mass index (BMI), alcohol consumption, smoking status, recreational physical activity, self-reported health status, dietary intake factors, renal function, and the systemic immune-inflammatory index (SII) as influencing factors[ 20 , 21 ]. Age (years) and PIR were used as continuous variables. Gender was classified as male or female. Race was classified as Mexican American people, Other Hispanic people, Non-Hispanic White people, Non-Hispanic Black people, and Other or multiracial people. Education was divided into five categories: Less Than 9th Grade, 9-11th Grade, High School Grad/GED, Some College or AA degree, College Graduate or above. Alcohol consumption was defined by the response to the question: “Have you had at least 12 drinks of any type of alcoholic beverage in any one year?” and was divided into two groups (yes or no). Smoking status was classified as current smoking, former smoking, and never smoking according to the response to the questions: “Have you smoked at least 100 cigarettes in your entire life?” and “Do you currently smoke cigarettes?” Marital status is classified into five categories: married, widowed, divorced, separated, never married, and cohabiting with a partner. Based on BMI, individuals are divided into categories of underweight (< 18 kg/m 2 ), normal weight (18–25 kg/m 2 ), overweight (25–30 kg/m 2 ), and obesity (≥ 30 kg/m 2 ). Leisure physical activity levels were categorized into two groups: active and inactive. Individuals reporting moderate or vigorous leisure physical activity in a typical week were classified as active. Those who report no moderate or vigorous leisure physical activities were classified as inactive. In our study, hypertension was defined as a self-reported diagnosis by a doctor, the use of anti-hypertensive medications, or blood pressure ≥ 140/90 mmHg. Diabetes mellitus (DM) status was classified as “diabetes” (self-reported diagnosis by a doctor, HbA1c level ≥ 6.5%, fasting plasma glucose [FPG] level ≥ 7.0 mmol/L random blood glucose level ≥ 11.1 mmol/L, two-hour glucose tolerance test blood glucose level ≥ 11.1 mmol/L, use of diabetes medications, or insulin). Dietary supplement information was obtained from questionnaires designed to collect detailed data on dietary supplement usage. During each NHANES cycle, participants provided detailed dietary intake information for two 24-hour periods, which was used to estimate intake of total energy, caffeine, and fiber. The first dietary recall was collected in person during the NHANES visit, while the second was collected via telephone 3 to 10 days later. The intake was estimated as the average of the two recall periods (or the available data from the first day if only one day's data was available)[ 22 ]. Data on urinary creatinine and albumin were obtained from laboratory examination within the NHANES project. Blood biomarkers include levels of vitamin D, neutrophil count (NC), lymphocyte count (LC), and platelet count (PC). As previously described, the systemic immune-inflammation index (SII) was calculated as PC × (NC / LC). Considering the right skewed distribution of SII, we performed a log2 transformation on SII[ 23 , 24 ]. Statistical analysis NHANES is a multiple and complex survey. To represent sample weighted data, it is necessary to calculate weighted data based on sample design[ 25 ]. However, in this study, we used raw unweighted data from the NHANES database to construct models for machine learning. The reason we did not use weighted data is that weighted data is usually used to estimate the incidence/prevalence rate nationwide. We don’t estimate the prevalence nationwide, we just need to know the relationship between OA and individual characteristics to train the model[ 11 ]. Data were statistically analyzed using R software (version 4.3.0). Continuous variables are represented as mean ± standard deviation (SD), and t-tests are used to compare differences between groups. Meanwhile, categorical variables are expressed in terms of frequency and percentage, and compared using chi square tests. All statistical tests are bilateral, and a P -value < 0.05 is statistically significant. To facilitate model development, we randomly divided all 11,366 participants into two groups in a 7:3 ratio (7,958 individuals for training and 3,408 for validation). The training cohort was used for model development, while the validation cohort was served for internal validation. LASSO regression, XGBoost algorithm, and random forest (RF) algorithm were applied for 10-fold cross-validation and feature importance assessment. Subsequently, we developed a clinical risk prediction nomogram by integrating results from the three algorithms, considering the importance of feature variables. For model evaluation, we plotted receiver operating characteristic (ROC) curves and calculated the AUC value. To evaluate the clinical utility of the model, we further conducted decision curve analysis (DCA). Results Baseline characteristics of participants This study included 11,366 participants for analysis, with an average age of 47.9 years. Among these participants, according to the diagnostic criteria mentioned above, 1,434 individuals (12.6%) were diagnosed with OA. Among these OA patients, 504 individuals (35.4%) were male, and 912 individuals (64.6%) were female. Baseline characteristics of the two groups of participants are shown in Table 1 . These participants were randomly divided into training and validation groups in a 7:3 ratio, with 7,958 individuals in the training group and 3,408 in the validation group. In the training cohort, participants had an average age of 48.0 years, with 1,007 individuals (12.7%) diagnosed as OA. In the validation cohort, the observed average age of participants was 47.7 years, with 427 individuals (12.5%) diagnosed as OA. No significant differences were observed in baseline characteristics between the two cohorts (as shown in Table S1 ) Table 1 Baseline Characteristics of Study Participants from NHANES 2011–2016. Variable Total (N = 11366) Normal (N = 9932) OA (N = 1434) p -value Age (years) 47.9 ± 17.4 45.8 ± 16.9 62.6 ± 13.2 < 0.001 Ratio of family income to poverty 2.5 ± 1.6 2.5 ± 1.6 2.7 ± 1.6 < 0.001 Diabetes, n(%) 1944 (17.1%) 1547 (15.6%) 397 (27.7%) < 0.001 Hypertension, n(%) 4587 (40.4%) 3631 (36.6%) 956 (66.7%) < 0.001 Calories intake (kcal/d) 2071.4 ± 848.2 2093.6 ± 863.0 1918.0 ± 719.1 < 0.001 Coffee intake (mg/d) 137.9 ± 158.0 133.5 ± 153.5 168.0 ± 183.8 < 0.001 Fiber intake (g/d) 17.3 ± 9.6 17.4 ± 9.8 16.6 ± 8.5 0.002 Urinary protein (µg/ml) 44.8 ± 293.0 43.5 ± 278.4 53.8 ± 379.2 0.32 Urine creatinine (mg/dl) 123.4 ± 81.4 125.4 ± 82.3 109.6 ± 73.8 < 0.001 Log2 (SII) 8.8 ± 0.8 8.8 ± 0.8 8.9 ± 0.8 < 0.001 Gender, n(%) < 0.001 Male 5658 (49.8%) 5150 (51.9%) 508 (35.4%) Female 5708 (50.2%) 4782 (48.1%) 926 (64.6%) Race, n(%) < 0.001 Mexican American people 1525 (13.4%) 1409 (14.2%) 116 (8.1%) Other Hispanic people 1187 (10.4%) 1076 (10.8%) 111 (7.7%) Non-Hispanic White people 4652 (40.9%) 3781 (38.1%) 871 (60.7%) Non-Hispanic Black people 2385 (21%) 2153 (21.7%) 232 (16.2%) Other/multiracial people 1617 (14.2%) 1513 (15.2%) 104 (7.3%) Education level, n(%) 0.11 Less Than 9th Grade 873 (7.7%) 773 (7.8%) 100 (7%) 9-11th Grade 1368 (12%) 1203 (12.1%) 165 (11.5%) High School Grad/GED 2460 (21.6%) 2165 (21.8%) 295 (20.6%) Some College or AA degree 3582 (31.5%) 3086 (31.1%) 496 (34.6%) College Graduate or above 3083 (27.1%) 2705 (27.2%) 378 (26.4%) Marital status, n(%) < 0.001 Married 5807 (51.1%) 5031 (50.7%) 776 (54.1%) Widowed 721 (6.3%) 499 (5%) 222 (15.5%) Divorced 1218 (10.7%) 995 (10%) 223 (15.6%) Separated 345 (3%) 310 (3.1%) 35 (2.4%) Never married 2307 (20.3%) 2185 (22%) 122 (8.5%) Living with partner 968 (8.5%) 912 (9.2%) 56 (3.9%) Smoking, n(%) < 0.001 Never smoker 6524 (57.4%) 5840 (58.8%) 684 (47.7%) Former smoker 2611 (23%) 2111 (21.3%) 500 (34.9%) Current smoker 2231 (19.6%) 1981 (19.9%) 250 (17.4%) Drink alcohol, n(%) 8278 (72.8%) 7235 (72.8%) 1043 (72.7%) > 0.9 Physically active, n(%) 5890 (51.8%) 5302 (53.4%) 588 (41%) < 0.001 BMI, n(%) Underweight 193 (1.7%) 175 (1.8%) 18 (1.3%) Normal weight 3219 (28.3%) 2945 (29.7%) 274 (19.1%) < 0.001 Overweight 3673 (32.3%) 3264 (32.9%) 409 (28.5%) Obesity 4281 (37.7%) 3548 (35.7%) 733 (51.1%) Continuous data are expressed as weighted mean ± standard deviation (SD), categorical data are expressed as number percentages. Abbreviations: SII: systemic immune inflammation index, BMI: body mass index. Selection of main predictors of OA To identify the key predictor variables of OA, we performed LASSO regression, XGBoost algorithm, and RF algorithm for 10-fold cross-validation and assessment of feature importance. Using LASSO regression, we selected 7 significant predictors from the 19 feature variables in the training cohort: gender, hypertension, BMI, age, drink alcohol, education level and coffee intake (Fig. 2 A-C). Figure 2 D presents the importance ranking of 15 feature variables by XGBoost algorithm, including age, gender, race, hypertension, coffee intake, BMI, calories intake, Log2 (SII), fiber intake, PIR, marital status, urine creatinine, diabetes, urinary protein and smoking. Figure 2 E-F displays the importance ranking of 15 feature variables by RF algorithm: age, gender, race, urine creatinine, diabetes, calories intake, marital status, urinary protein, hypertension, education level, coffee intake, smoking, BMI, fiber intake and drink alcohol. We took intersection of these feature variables obtained from the three algorithms, and it came out with 5 key feature variables: age, gender, hypertension, BMI and coffee intake (Fig. 2 G). Construction of a new predictive model of OA Through the machine learning methods mentioned above, 5 of the original 19 variables were selected, considered optimal variables. Next, in order to create a new predictive model, we utilized multivariate logistic regression. The logistic regression analysis for these 5 variables was performed (Table S2 ). These predictors, which are mutually independent, were combined to create a nomogram illustrating the quantified OA risk (Fig. 3 ). Performance of the new nomogram of OA in AUC, and calibration curve ROC curve and calibration curve were used to evaluate the of performance of the new OA nomogram. Figure 4 A display the distribution of ROC curve and calibration curve of the new OA nomogram for the training cohort; the ROC curve shows an AUC of 0.804, and the calibration curve exhibits good consistency. In the validation cohort, the ROC curve of the new OA nomogram has an AUC of 0.814(Fig. 4 C), and the calibration curve in Fig. 4 D also demonstrates good consistency and stability. The results above indicate that the nomogram exhibits excellent predictive value and discriminative capability. Evaluation of clinical utility of the new nomogram of OA We further conducted decision curve analysis (DCA) to evaluate the clinical utility of the new nomogram of OA. As shown in Figs. 5 A-B, the nomogram exhibits a high net benefit in both the training and validation cohorts. The results above indicate that this new OA nomogram possesses a certain degree of clinical utility. Discussion Osteoarthritis is a severe public health issue, and delayed diagnosis hinders early protective treatment of joint health[ 8 ]. Therefore, there is an urgent need for a simple and user-friendly prediction model of OA to identify high risk populations. The nomogram, a visual prediction model, which is helpful for disease screening and early diagnosis[ 15 ]. In this study, we successfully applied machine learning methods to select feature variables using data from the NHANES database. By intersecting these feature variables, we identified 5 optimal predictive factors, to construct a effective risk prediction nomogram of OA. This model exhibits a high predictive performance of OA risk (AUC of training set: 0.804, AUC of validation set: 0.814). Currently, there is a growing annual increase in the use of machine learning for developing clinical prediction models[ 11 , 26 , 27 ]. To the best of our knowledge, we are the first to utilize machine learning methods based on the NHANES database to construct a risk prediction nomogram for OA using clinical data. This model has a large population base and diversity, ensuring more stable results. We selected simple clinical and laboratory datas that are easily obtainable from various factors, including demographic data, anthropometric measurements, laboratory examintation, and questionnaires, making it more conducive to the widespread application of the model. In conclusion, our prediction model has two distinct advantages: it possesses excellent predictive capacity (high AUC) and it is user-friendly, with wide adaptability (only 5 easily obtainable feature variables and cross-ethnic applicability). In our model, we ultimately included five factors: age, gender, hypertension, BMI, coffee intake. It is acknowledged that advanced age and being female are the risk factors of OA[ 28 – 30 ]. High BMI is also a risk factor, as being overweight or obese has a direct effect on OA[ 31 – 35 ], particularly in the knee, demonstrating a dose-response relationship between BMI and OA[ 34 , 36 ]. The activation of metabolic factors leading to the joint injury and the mechanical overload of the weight-bearing joint are considered possible mechanisms to explain how BMI increases the risk of OA[ 37 ]. A meta-analysis revealed an association between hypertension and OA, including 2 cohort studies and 6 cross-sectional studies[ 38 ]. Hart et al. found that there was a relationship between metabolic factors (such as hypertension and hypercholesterolemia) and OA[ 39 ]. Metabolic syndrome (MetS) and OA share similar mechanisms of inflammation, and obesity altered adipokines secretion, leading to the chronic low grade inflammatory status in joint tissues[ 40 ]. As for coffee intake, studies have demonstrated that coffee intake is associated with an increased risk of OA[ 41 – 43 ]. Zhang et al. utilized two-sample and two-step Mendelian randomization (MR) analyses by genome-wide association studies (GWAS) summary statistics to estimate the relationship between coffee intake and OA, and they found that coffee consumption increased the risk of OA, especially knee osteoarthritis (KOA), with BMI mediating this relationship[ 43 ]. Tan et al. observed that maternal mice exposed to low doses of caffeine exhibited impaired fetal joint integrity[ 44 ]. Caffeine antagonizes adenosine receptors and increases osteoclastogenesis, which may explain this phenomenon[ 45 , 46 ]. Using this nomogram, clinical staffs can rapidly and accurately identify individuals who may be at risk of OA. For those identified as having a higher risk during the screening, it is recommended to conduct further examinations for early diagnosis and intervention to improve prognosis. Furthermore, closely monitoring BMI and blood pressure, implementing measures to control diet such coffee intake, can effectively reduce the risk of OA. It should be noted that although internal validation was performed in the validation cohort, this study still requires external validation to further prove its generalizability. Conclusions Compared to other existing models, this study has developed an effective clinical nomogram for identifying OA in a large population. Based on the risk assessment, clinicians can create individualized diagnostic and treatment plans for the subjects. For individuals at high risk of OA, further examinations are recommended for early diagnosis and lifestyle or medical intervention to prevent the disease from progressing further. Thus, the risk prediction nomogram of OA developed in this study has significant clinical value. Abbreviations OA Osteoarthritis NHANES National Health and Nutrition Examination Survey PIR Poverty income ratio BMI Body mass index SII Systemic immune-inflammatory index SD Standard deviatio DM Diabetes mellitus FPG Fasting plasma glucose NC Neutrophil count LC Lymphocyte count PC Platelet count RF Random forest ROC Receiver operating characteristic AUC Area under receiver operating characteristic curve DCA Decision curve analysis SD Standard deviation LASSO Least absolute shrinkage and selection operator MR Mendelian Randomization GWAS Genome-wide association studies KOA Knee osteoarthritis MetS Metabolic syndrome Declarations Acknowledgments The authors thank NCHS for its research design and data sharing, as well as all the investigators and participants. Authors’ contributions Conceptualization: QZ, JC and YL; methodology: QZ; formal analysis: QZ and JC; writing—original draft: QZ, JC and YL; writing—review and editing: JC, QZ and ML; supervision: LL. All authors have read and agreed to the published version of the manuscript. Funding This work was supported by grants from the Basic and Applied Basic Research Foundation of Guangdong Province (No. 2021A1515010137), and the Special Fund Project for Science and Technology Innovation Strategy of Guangdong Province (No. 210715106900976). Availability of data and materials Dataset is publicly available at https://www.cdc.gov/nchs/nhanes (accessed on 12 December 2023). Declarations Ethics approval and consent to participate NHANES is conducted by the Centers for Disease Control and Prevention(CDC) and the National Center for Health Statistics (NCHS). The NCHS ResearchEthics Review Committee reviewed and approved the NHANES study proto-col.All participants siqned written informed consent. Consent for publication Not applicable. Competing interests The authors declare that they have no competing interests. References Yue L, Berman J. What Is Osteoarthritis? JAMA. 2022;327(13):1300. Disease GBD, Injury I, Prevalence C. Global, regional, and national incidence, prevalence, and years lived with disability for 354 diseases and injuries for 195 countries and territories, 1990–2017: a systematic analysis for the Global Burden of Disease Study 2017. Lancet. 2018;392(10159):1789–858. Sowers MR, Karvonen-Gutierrez CA. The evolving role of obesity in knee osteoarthritis. Curr Opin Rheumatol. 2010;22(5):533–7. Liao Z, Han X, Wang Y, Shi J, Zhang Y, Zhao H, Zhang L, Jiang M, Liu M. Differential Metabolites in Osteoarthritis: A Systematic Review and Meta-Analysis. Nutrients 2023, 15(19). Mobasheri A, Rayman MP, Gualillo O, Sellam J, van der Kraan P, Fearon U. The role of metabolism in the pathogenesis of osteoarthritis. Nat Rev Rheumatol. 2017;13(5):302–11. Young DA, Barter MJ, Soul J. Osteoarthritis year in review: genetics, genomics, epigenetics. Osteoarthritis Cartilage. 2022;30(2):216–25. Felson DT. Clinical practice. Osteoarthritis of the knee. N Engl J Med. 2006;354(8):841–8. Chu CR, Williams AA, Coyle CH, Bowers ME. Early diagnosis to enable early treatment of pre-osteoarthritis. Arthritis Res Ther. 2012;14(3):212. Abdel Hady DA, Abd El-Hafeez T. Predicting female pelvic tilt and lumbar angle using machine learning in case of urinary incontinence and sexual dysfunction. Sci Rep. 2023;13(1):17940. Ferreira-Santos D, Amorim P, Silva Martins T, Monteiro-Soares M, Pereira Rodrigues P. Enabling Early Obstructive Sleep Apnea Diagnosis With Machine Learning: Systematic Review. J Med Internet Res. 2022;24(9):e39452. Tsai SF, Yang CT, Liu WJ, Lee CL. Development and validation of an insulin resistance model for a population without diabetes mellitus and its clinical implication: a prospective cohort study. EClinicalMedicine. 2023;58:101934. Li W, Feng J, Zhu D, Xiao Z, Liu J, Fang Y, Yao L, Qian B, Li S. Nomogram model based on radiomics signatures and age to assist in the diagnosis of knee osteoarthritis. Exp Gerontol. 2023;171:112031. Li S, Ma L, Cui R. Identification of Novel Diagnostic Biomarkers and Classification Patterns for Osteoarthritis by Analyzing a Specific Set of Genes Related to Inflammation. Inflammation; 2023. Chen X, Xu J, Zhang H, Yu L. A nomogram for predicting osteoarthritis based on serum biomarkers of bone turnover in middle age: A cross-sectional study of PTH and beta-CTx. Med (Baltim). 2023;102(20):e33833. Bonnett LJ, Snell KIE, Collins GS, Riley RD. Guide to presenting clinical prediction models for use in clinical settings. BMJ. 2019;365:l737. Wang Y, Li J, Xia Y, Gong R, Wang K, Yan Z, Wan X, Liu G, Wu D, Shi L, et al. Prognostic nomogram for intrahepatic cholangiocarcinoma after partial hepatectomy. J Clin Oncol. 2013;31(9):1188–95. Xu Y, Wu Q. Trends and disparities in osteoarthritis prevalence among US adults, 2005–2018. Sci Rep. 2021;11(1):21845. March LM, Schwarz JM, Carfrae BH, Bagge E. Clinical validation of self-reported osteoarthritis. Osteoarthritis Cartilage. 1998;6(2):87–93. Mendy A, Park J, Vieira ER. Osteoarthritis and risk of mortality in the USA: a population-based cohort study. Int J Epidemiol. 2018;47(6):1821–9. Wang X, Xie L, Yang S. Association between weight-adjusted-waist index and the prevalence of rheumatoid arthritis and osteoarthritis: a population-based study. BMC Musculoskelet Disord. 2023;24(1):595. Alhassan E, Nguyen K, Hochberg MC, Mitchell BD. Causal Factors for Osteoarthritis: A Scoping Review of Mendelian Randomization Studies. Arthritis Care Res (Hoboken) 2023. Christensen K, Gleason CE, Mares JA. Dietary carotenoids and cognitive function among US adults, NHANES 2011–2014. Nutr Neurosci. 2020;23(7):554–62. Liu B, Wang J, Li YY, Li KP, Zhang Q. The association between systemic immune-inflammation index and rheumatoid arthritis: evidence from NHANES 1999–2018. Arthritis Res Ther. 2023;25(1):34. Qin Z, Li H, Wang L, Geng J, Yang Q, Su B, Liao R. Systemic Immune-Inflammation Index Is Associated With Increased Urinary Albumin Excretion: A Population-Based Study. Front Immunol. 2022;13:863640. Johnson CL, Dohrmann SM, Burt VL, Mohadjer LK. National health and nutrition examination survey: sample design, 2011–2014. Vital Health Stat 2 2014(162):1–33. Merianos AL, Mahabee-Gittens EM, Stone TM, Jandarov RA, Wang L, Bhandari D, Blount BC, Matt GE. Distinguishing Exposure to Secondhand and Thirdhand Tobacco Smoke among U.S. Children Using Machine Learning: NHANES 2013–2016. Environ Sci Technol. 2023;57(5):2042–53. Li W, Huang G, Tang N, Lu P, Jiang L, Lv J, Qin Y, Lin Y, Xu F, Lei D. Effects of heavy metal exposure on hypertension: A machine learning modeling approach. Chemosphere. 2023;337:139435. Johnson VL, Hunter DJ. The epidemiology of osteoarthritis. Best Pract Res Clin Rheumatol. 2014;28(1):5–15. Prieto-Alhambra D, Judge A, Javaid MK, Cooper C, Diez-Perez A, Arden NK. Incidence and risk factors for clinically diagnosed knee, hip and hand osteoarthritis: influences of age, gender and osteoarthritis affecting other joints. Ann Rheum Dis. 2014;73(9):1659–64. Wang L, Lu H, Chen H, Jin S, Wang M, Shang S. Development of a model for predicting the 4-year risk of symptomatic knee osteoarthritis in China: a longitudinal cohort study. Arthritis Res Ther. 2021;23(1):65. Reijman M, Pols HA, Bergink AP, Hazes JM, Belo JN, Lievense AM, Bierma-Zeinstra SM. Body mass index associated with onset and progression of osteoarthritis of the knee but not of the hip: the Rotterdam Study. Ann Rheum Dis. 2007;66(2):158–62. Jiang L, Tian W, Wang Y, Rong J, Bao C, Liu Y, Zhao Y, Wang C. Body mass index and susceptibility to knee osteoarthritis: a systematic review and meta-analysis. Joint Bone Spine. 2012;79(3):291–7. Grotle M, Hagen KB, Natvig B, Dahl FA, Kvien TK. Obesity and osteoarthritis in knee, hip and/or hand: an epidemiological study in the general population with 10 years follow-up. BMC Musculoskelet Disord. 2008;9:132. Reyes C, Leyland KM, Peat G, Cooper C, Arden NK, Prieto-Alhambra D. Association Between Overweight and Obesity and Risk of Clinically Diagnosed Knee, Hip, and Hand Osteoarthritis: A Population-Based Cohort Study. Arthritis Rheumatol. 2016;68(8):1869–75. Ho J, Mak CCH, Sharma V, To K, Khan W. Mendelian Randomization Studies of Lifestyle-Related Risk Factors for Osteoarthritis: A PRISMA Review and Meta-Analysis. Int J Mol Sci 2022, 23(19). Raud B, Gay C, Guiguet-Auclair C, Bonnin A, Gerbaud L, Pereira B, Duclos M, Boirie Y, Coudeyre E. Level of obesity is directly associated with the clinical and functional consequences of knee osteoarthritis. Sci Rep. 2020;10(1):3601. King LK, March L, Anandacoomarasamy A. Obesity & osteoarthritis. Indian J Med Res. 2013;138(2):185–93. Zhang YM, Wang J, Liu XG. Association between hypertension and risk of knee osteoarthritis: A meta-analysis of observational studies. Med (Baltim). 2017;96(32):e7584. Hart DJ, Doyle DV, Spector TD. Association between metabolic factors and knee osteoarthritis in women: the Chingford Study. J Rheumatol. 1995;22(6):1118–23. Batushansky A, Zhu S, Komaravolu RK, South S, Mehta-D'souza P, Griffin TM. Fundamentals of OA. An initiative of Osteoarthritis and Cartilage. Obesity and metabolic factors in OA. Osteoarthritis Cartilage. 2022;30(4):501–15. Lee YH. Investigating the possible causal association of coffee consumption with osteoarthritis risk using a Mendelian randomization analysis. Clin Rheumatol. 2018;37(11):3133–9. Zhang Y, Fan J, Chen L, Xiong Y, Wu T, Shen S, Wang X, Meng X, Lu Y, Lei X. Causal Association of Coffee Consumption and Total, Knee, Hip and Self-Reported Osteoarthritis: A Mendelian Randomization Study. Front Endocrinol (Lausanne). 2021;12:768529. Zhang W, Lei X, Tu Y, Ma T, Wen T, Yang T, Xue L, Ji J, Xue H. Coffee and the risk of osteoarthritis: a two-sample, two-step multivariable Mendelian randomization study. Front Genet. 2024;15:1340044. Tan Y, Lu K, Li J, Ni Q, Zhao Z, Magdalou J, Chen L, Wang H. Prenatal caffeine exprosure increases adult female offspring rat's susceptibility to osteoarthritis via low-functional programming of cartilage IGF-1 with histone acetylation. Toxicol Lett. 2018;295:229–36. Yi J, Yan B, Li M, Wang Y, Zheng W, Li Y, Zhao Z. Caffeine may enhance orthodontic tooth movement through increasing osteoclastogenesis induced by periodontal ligament cells under compression. Arch Oral Biol. 2016;64:51–60. Nieber K. The Impact of Coffee on Health. Planta Med. 2017;83(16):1256–63. Additional Declarations No competing interests reported. Supplementary Files TableS1.docx TableS2.docx TableS3Abbreviations.docx Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-4268728","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":292486507,"identity":"5656c548-7f1a-4a16-81c0-5f897bb6c9f4","order_by":0,"name":"Qiongbing Zheng","email":"","orcid":"","institution":"Department of Rheumatology and Immunology, The First Affiliated Hospital of Shantou University Medical College, Shantou 515041","correspondingAuthor":false,"prefix":"","firstName":"Qiongbing","middleName":"","lastName":"Zheng","suffix":""},{"id":292486508,"identity":"f5d4d6fa-87cb-4a74-9df4-b0ff1b56b582","order_by":1,"name":"Jiexin Chen","email":"","orcid":"","institution":"Department of Rheumatology and Immunology, The First Affiliated Hospital of Shantou University Medical College, Shantou 515041","correspondingAuthor":false,"prefix":"","firstName":"Jiexin","middleName":"","lastName":"Chen","suffix":""},{"id":292486509,"identity":"2bcf3308-24cc-41e2-bd71-3d92c3986c1f","order_by":2,"name":"Youmian Lan","email":"","orcid":"","institution":"Department of Cell Biology and Genetics, Key Laboratory of Molecular Biology in High Cancer Incidence Coastal Chaoshan Area of Guangdong Higher Education Institutes, Shantou University Medical College","correspondingAuthor":false,"prefix":"","firstName":"Youmian","middleName":"","lastName":"Lan","suffix":""},{"id":292486510,"identity":"24619163-0e63-41fa-919c-1f40a2ed0233","order_by":3,"name":"Meijing Li","email":"","orcid":"","institution":"Department of Cell Biology and Genetics, Key Laboratory of Molecular Biology in High Cancer Incidence Coastal Chaoshan Area of Guangdong Higher Education Institutes, Shantou University Medical College","correspondingAuthor":false,"prefix":"","firstName":"Meijing","middleName":"","lastName":"Li","suffix":""},{"id":292486511,"identity":"f925d080-368b-49db-9996-975afd8c1147","order_by":4,"name":"Ling Lin","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAAoklEQVRIiWNgGAWjYBACPnbGBgaGCgk5eaK1sDGDtJyxMDYEUgwHiNMCJBjbKhLByonUwtz24Oc8iQTGBt6Djz8Q6bB2w95tEnnsDHzJBkTawtgmzbhNopixgcdMggQtcyQSGw7wmP8gQUsDWIsZsd5nbJPsOSZhbNjMYyxxhhgt/OztzyR+1NTJybP3GH6oIEYLAjCTpnwUjIJRMApGAT4AAH25KJYEER50AAAAAElFTkSuQmCC","orcid":"","institution":"Department of Rheumatology and Immunology, The First Affiliated Hospital of Shantou University Medical College, Shantou 515041","correspondingAuthor":true,"prefix":"","firstName":"Ling","middleName":"","lastName":"Lin","suffix":""}],"badges":[],"createdAt":"2024-04-15 09:35:20","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-4268728/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-4268728/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":55252153,"identity":"7fd274a4-2c36-49e3-b7f6-e22e5fe785c4","added_by":"auto","created_at":"2024-04-24 17:47:33","extension":"jpg","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":580926,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eFlow chart of sample selection.\u003c/strong\u003e\u003c/p\u003e","description":"","filename":"Figure1.jpg","url":"https://assets-eu.researchsquare.com/files/rs-4268728/v1/93da093426f22b8cbf93a016.jpg"},{"id":55252594,"identity":"bd52b745-066c-485d-8b08-23db8fdea57b","added_by":"auto","created_at":"2024-04-24 17:55:33","extension":"jpg","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":868024,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eSelection of main predictors of OA. A\u003c/strong\u003e Selection of the tuning parameter lambda in the LASSO regression via 10-fold cross-validation based on minimum criteria. Misclassification error from the LASSO regression cross-validation procedure was plotted as a function of log lambda. The y-axis indicates the misclassification error. The x-axis indicates the log lambda. \u003cstrong\u003eB\u003c/strong\u003e The LASSO coefficient profles of clinical features. The dotted vertical line was plotted at the value selected using 10-fold cross-validation in A. The resulting variables with non-zero coefficients are indicated in the plot. \u003cstrong\u003eC\u003c/strong\u003e Importance ranking of 7 variables by LASSO regression. \u003cstrong\u003eD\u003c/strong\u003e Importance ranking of 15 variables via 10-fold cross-validation and assessment of feature importance by XGBoost algorithm. \u003cstrong\u003eE\u003c/strong\u003e Slection of variables by RF algorithm. Mean decrease accuracy and mean decrease gini from the RF algorithm cross-validation procedure was plotted. \u003cstrong\u003eF\u003c/strong\u003e Importance ranking of 15 variables via assessment of feature importance by RF algorithm. \u003cstrong\u003eG\u003c/strong\u003e Intersection of variables obtained from the three algorithms, and come out with 5 main predictors of OA.\u003c/p\u003e","description":"","filename":"Figure2.jpg","url":"https://assets-eu.researchsquare.com/files/rs-4268728/v1/07bd3534a936b25c6ffad512.jpg"},{"id":55252156,"identity":"aec4c18e-b0f1-40fc-8b6b-1fd7f9a34d20","added_by":"auto","created_at":"2024-04-24 17:47:33","extension":"jpg","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":389622,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eThe nomogram represents the predicted probability of OA on a scale of 0 to 200.\u003c/strong\u003e For each predictor, draw a vertical line straight up to the point axis and note the corresponding points. Sum the points from each predictor, and the total score corresponding to a predicted probability of OA can be found at the bottom of the nomogram.\u003c/p\u003e","description":"","filename":"Figure3.jpg","url":"https://assets-eu.researchsquare.com/files/rs-4268728/v1/9b7a62da64c7b8cef62b224d.jpg"},{"id":55252159,"identity":"fc5ccb03-f879-4839-8730-5ec320d59e5c","added_by":"auto","created_at":"2024-04-24 17:47:33","extension":"jpg","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":682637,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eThe performance of the new nomogram for predicting OA. A\u003c/strong\u003e ROC curve in the training cohort. The x-axis is 1-Specifcity; the y-axis is the Sensitivity. \u003cstrong\u003eB\u003c/strong\u003e Calibration curve in the training cohort. The x-axis is the nomogram predicted probability of OA; the y-axis is actual probability. \u003cstrong\u003eC\u003c/strong\u003e ROC curve in the validation cohort. \u003cstrong\u003eD\u003c/strong\u003eCalibration curve in the validation cohort.\u003c/p\u003e","description":"","filename":"Figure4.jpg","url":"https://assets-eu.researchsquare.com/files/rs-4268728/v1/dbff589f84208b70230e57c3.jpg"},{"id":55252595,"identity":"613d86ca-2fb1-431f-bc8f-85bd91911d39","added_by":"auto","created_at":"2024-04-24 17:55:33","extension":"jpg","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":356202,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eThe clinical utility of the nomogram was evaluated by DCA. A \u003c/strong\u003eDecision curve in the training cohort. \u003cstrong\u003eB\u003c/strong\u003eDecision curve in the validation cohort. The x-axis represents the threshold probability. The y-axis represents net benefts.\u003c/p\u003e","description":"","filename":"Figure5.jpg","url":"https://assets-eu.researchsquare.com/files/rs-4268728/v1/20e602fc2b4b1ef4a6156289.jpg"},{"id":56112902,"identity":"1f591e80-f5ec-43ac-983b-343dd3243590","added_by":"auto","created_at":"2024-05-08 17:00:09","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":1062543,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-4268728/v1/ff77e6ad-b9da-48b1-94fb-544648c0a89a.pdf"},{"id":55252157,"identity":"98250bb9-e3af-41de-92f8-fc35bf1cd706","added_by":"auto","created_at":"2024-04-24 17:47:33","extension":"docx","order_by":8,"title":"","display":"","copyAsset":false,"role":"supplement","size":23136,"visible":true,"origin":"","legend":"","description":"","filename":"TableS1.docx","url":"https://assets-eu.researchsquare.com/files/rs-4268728/v1/e5a37f08e156527ce39f2ecf.docx"},{"id":55252161,"identity":"47e2eaa5-7e61-450d-b7b2-19d2c7eebcf9","added_by":"auto","created_at":"2024-04-24 17:47:33","extension":"docx","order_by":9,"title":"","display":"","copyAsset":false,"role":"supplement","size":12184,"visible":true,"origin":"","legend":"","description":"","filename":"TableS2.docx","url":"https://assets-eu.researchsquare.com/files/rs-4268728/v1/1a5f73ebaba44060a43a27a9.docx"},{"id":55252154,"identity":"8e7b6ef8-3467-41e2-8c05-0d61a6fb3f25","added_by":"auto","created_at":"2024-04-24 17:47:33","extension":"docx","order_by":10,"title":"","display":"","copyAsset":false,"role":"supplement","size":12563,"visible":true,"origin":"","legend":"","description":"","filename":"TableS3Abbreviations.docx","url":"https://assets-eu.researchsquare.com/files/rs-4268728/v1/cc0d35d134a5718c94536ae3.docx"}],"financialInterests":"No competing interests reported.","formattedTitle":"Development and validation of a new nomogram for OA based on machine learning","fulltext":[{"header":"Introduction","content":"\u003cp\u003eOsteoarthritis (OA) is a chronic joint disease characterized by cartilage damage, subchondral bone destruction, and synovial inflammation[\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e]. Currently, there are over 300\u0026nbsp;million OA patients worldwide, and future prevalence is expected to rise due to rising obesity rates and aging, posing a significant economic burden on patients and society[\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e, \u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e]. The pathogenesis of OA is still not fully understood and is considered to involve factors such as immunity, metabolism, hormones, and genetics[\u003cspan additionalcitationids=\"CR5\" citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e–\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e]. At present, in order to improve quality of life, the clinical management of OA primarily focuses on symptom relief, including pain reduction, alleviation of stiffness, and maintenance of joint functio[\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e]. While joint replacement surgery is a viable treatment option for advanced-stage OA patients, the high cost and associated surgical risks make it unattainable for the majority of OA patients[\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e]. In summary, there is currently no cure for OA. However, Chu et al. demonstrated that the early stages of OA may be reversible, and further recommended early diagnosis as a potential strategy to delay disease progression[\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e]. Therefore, developing a predictive model to identify high-risk populations for OA is essential for early diagnosis and intervention to improve patient outcomes.\u003c/p\u003e \u003cp\u003eIn recent years, the utilization of machine learning techniques has opened up new avenues for the development of clinical prediction models. Machine learning algorithms can analyze multiple variables based on data and identify relevant risk factors, providing early disease diagnosis predictions, which has been widely applied in assisting clinical diagnosis for various diseases[\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e, \u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e]. With the increasing incidence of OA, there is an urgent need for a disease risk prediction model to help early diagnosis of OA. However, previous prediction models have primarily focused on genetic aspects or solely relied on imaging and serological factors[\u003cspan additionalcitationids=\"CR12 CR13\" citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e–\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e]. Other disease diagnostic models have utilized Logistic regression for variable selection and model construction. Unfortunately, these models have encountered issues such as small sample sizes, lack of comprehensive model evaluation methods, and the absence of internal or external validation. As of now, there is no disease prediction model for OA based on clinical data. The nomogram is a visual disease-specific prediction model that incorporates various clinical variables. It aids in early disease detection and is easily accessible for doctors to utilize[\u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e15\u003c/span\u003e, \u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eIn this study, we aim to create a novel clinical prediction model through machine learning methods, incorporating patient demographics, laboratory examinations, anthropometric measurements, and health survey information. This model is designed to effectively predict the risk of OA occurrence and identify high-risk populations of OA.\u003c/p\u003e \n\n "},{"header":"Methods","content":"\u003cp\u003e \u003cb\u003eData source\u003c/b\u003e \u003c/p\u003e\u003cp\u003eThe data in this study was obtained from the National Health and Nutrition Examination Survey (NHANES) in the United States. This survey employs a sophisticated, multi-stage, and stratified probability sampling method to provide a comprehensive understanding of the health and nutritional status of the non-institutionalized population in the United States. The data from NHANES is nationally representative, making it an invaluable resource for conducting large-scale epidemiological studies and developing clinical prediction models. All NHANES survey protocols have received approval from the Research Ethics Review Committee of the National Center for Health Statistics, and participants have signed informed consent forms before participating in the survey. All the data from NHANES used in this study are publicly available at \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://www.cdc.gov/nchs/nhanes\u003c/span\u003e\u003cspan address=\"https://www.cdc.gov/nchs/nhanes\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/p\u003e\u003ch3\u003eParticipant selection\u003c/h3\u003e\u003cp\u003eIn our study, we utilized data from three cycles of the NHANES survey, conducted in the years 2011–2012, 2013–2014, and 2015–2016. During these cycles, a total of 29,902 participants completed extensive demographic surveys, laboratory examinations, and health status questionnaires. To ensure the accuracy and reliability of our research, we conducted rigorous data screening and exclusions. Firstly, participants under 20 years of age were excluded (n = 12,854), as our study primarily focused on osteoarthritis in adults (n = 17,048). Subsequently, individuals with missing osteoarthritis-related data were excluded (n = 1,256). Furthermore,to guarantee data integrity, we excluded individuals lacking essential demographic information (n = 1,456), those with missing questionnaire and dietary data (n = 2,462), and those with missing laboratory data (n = 508). Ultimately, a total of 11,366 participants were included as the subjects of analysis in our study, as shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e.\u003c/p\u003e\u003cp\u003e \u003c/p\u003e\u003ch2\u003eDefinition of osteoarthritis\u003c/h2\u003e\u003cp\u003eThe case definition in epidemiological studies often relies on self-reported osteoarthritis (OA)[\u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e17\u003c/span\u003e]. March et al. demonstrated an 81% consistency rate between self-reported OA and clinically well-defined OA[\u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e18\u003c/span\u003e], suggesting that OA can be reliably self-reported. All participants were asked if they had ever been diagnosed with arthritis: \"Has a doctor or other health professional ever told you that you have arthritis?\" If participants answered \"yes,\" they were then asked, \"What type of arthritis do you have?\" Based on the answers to these questions, participants were categorized as having OA, other types of arthritis or no arthritis[\u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e19\u003c/span\u003e].\u003c/p\u003e\u003ch2\u003eDemographics, laboratory Factors, anthropometrics, and lifestyles\u003c/h2\u003e\u003cp\u003eIn accordance with previous research, we identified factors including age, gender, race, educational level, poverty income ratio (PIR), marital status, body mass index (BMI), alcohol consumption, smoking status, recreational physical activity, self-reported health status, dietary intake factors, renal function, and the systemic immune-inflammatory index (SII) as influencing factors[\u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e20\u003c/span\u003e, \u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e21\u003c/span\u003e].\u003c/p\u003e\u003cp\u003eAge (years) and PIR were used as continuous variables. Gender was classified as male or female. Race was classified as Mexican American people, Other Hispanic people, Non-Hispanic White people, Non-Hispanic Black people, and Other or multiracial people. Education was divided into five categories: Less Than 9th Grade, 9-11th Grade, High School Grad/GED, Some College or AA degree, College Graduate or above. Alcohol consumption was defined by the response to the question: “Have you had at least 12 drinks of any type of alcoholic beverage in any one year?” and was divided into two groups (yes or no). Smoking status was classified as current smoking, former smoking, and never smoking according to the response to the questions: “Have you smoked at least 100 cigarettes in your entire life?” and “Do you currently smoke cigarettes?” Marital status is classified into five categories: married, widowed, divorced, separated, never married, and cohabiting with a partner. Based on BMI, individuals are divided into categories of underweight (\u0026lt; 18 kg/m\u003csup\u003e2\u003c/sup\u003e), normal weight (18–25 kg/m\u003csup\u003e2\u003c/sup\u003e), overweight (25–30 kg/m\u003csup\u003e2\u003c/sup\u003e), and obesity (≥ 30 kg/m\u003csup\u003e2\u003c/sup\u003e). Leisure physical activity levels were categorized into two groups: active and inactive. Individuals reporting moderate or vigorous leisure physical activity in a typical week were classified as active. Those who report no moderate or vigorous leisure physical activities were classified as inactive. In our study, hypertension was defined as a self-reported diagnosis by a doctor, the use of anti-hypertensive medications, or blood pressure ≥ 140/90 mmHg. Diabetes mellitus (DM) status was classified as “diabetes” (self-reported diagnosis by a doctor, HbA1c level ≥ 6.5%, fasting plasma glucose [FPG] level ≥ 7.0 mmol/L random blood glucose level ≥ 11.1 mmol/L, two-hour glucose tolerance test blood glucose level ≥ 11.1 mmol/L, use of diabetes medications, or insulin). Dietary supplement information was obtained from questionnaires designed to collect detailed data on dietary supplement usage. During each NHANES cycle, participants provided detailed dietary intake information for two 24-hour periods, which was used to estimate intake of total energy, caffeine, and fiber. The first dietary recall was collected in person during the NHANES visit, while the second was collected via telephone 3 to 10 days later. The intake was estimated as the average of the two recall periods (or the available data from the first day if only one day's data was available)[\u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e22\u003c/span\u003e]. Data on urinary creatinine and albumin were obtained from laboratory examination within the NHANES project. Blood biomarkers include levels of vitamin D, neutrophil count (NC), lymphocyte count (LC), and platelet count (PC). As previously described, the systemic immune-inflammation index (SII) was calculated as PC × (NC / LC). Considering the right skewed distribution of SII, we performed a log2 transformation on SII[\u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e23\u003c/span\u003e, \u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e24\u003c/span\u003e].\u003c/p\u003e\u003ch2\u003eStatistical analysis\u003c/h2\u003e\u003cp\u003eNHANES is a multiple and complex survey. To represent sample weighted data, it is necessary to calculate weighted data based on sample design[\u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e25\u003c/span\u003e]. However, in this study, we used raw unweighted data from the NHANES database to construct models for machine learning. The reason we did not use weighted data is that weighted data is usually used to estimate the incidence/prevalence rate nationwide. We don’t estimate the prevalence nationwide, we just need to know the relationship between OA and individual characteristics to train the model[\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e].\u003c/p\u003e\u003cp\u003eData were statistically analyzed using R software (version 4.3.0). Continuous variables are represented as mean ± standard deviation (SD), and t-tests are used to compare differences between groups. Meanwhile, categorical variables are expressed in terms of frequency and percentage, and compared using chi square tests. All statistical tests are bilateral, and a \u003cem\u003eP\u003c/em\u003e-value \u0026lt; 0.05 is statistically significant.\u003c/p\u003e\u003cp\u003eTo facilitate model development, we randomly divided all 11,366 participants into two groups in a 7:3 ratio (7,958 individuals for training and 3,408 for validation). The training cohort was used for model development, while the validation cohort was served for internal validation. LASSO regression, XGBoost algorithm, and random forest (RF) algorithm were applied for 10-fold cross-validation and feature importance assessment. Subsequently, we developed a clinical risk prediction nomogram by integrating results from the three algorithms, considering the importance of feature variables.\u003c/p\u003e\u003cp\u003eFor model evaluation, we plotted receiver operating characteristic (ROC) curves and calculated the AUC value. To evaluate the clinical utility of the model, we further conducted decision curve analysis (DCA).\u003c/p\u003e"},{"header":"Results","content":"\u003cdiv id=\"Sec7\" class=\"Section2\"\u003e \u003ch2\u003eBaseline characteristics of participants\u003c/h2\u003e \u003cp\u003eThis study included 11,366 participants for analysis, with an average age of 47.9 years. Among these participants, according to the diagnostic criteria mentioned above, 1,434 individuals (12.6%) were diagnosed with OA. Among these OA patients, 504 individuals (35.4%) were male, and 912 individuals (64.6%) were female. Baseline characteristics of the two groups of participants are shown in Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e. These participants were randomly divided into training and validation groups in a 7:3 ratio, with 7,958 individuals in the training group and 3,408 in the validation group. In the training cohort, participants had an average age of 48.0 years, with 1,007 individuals (12.7%) diagnosed as OA. In the validation cohort, the observed average age of participants was 47.7 years, with 427 individuals (12.5%) diagnosed as OA. No significant differences were observed in baseline characteristics between the two cohorts (as shown in Table \u003cspan refid=\"MOESM1\" class=\"InternalRef\"\u003eS1\u003c/span\u003e)\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab1\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 1\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eBaseline Characteristics of Study Participants from NHANES 2011\u0026ndash;2016.\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"5\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eVariable\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eTotal\u003c/p\u003e \u003cp\u003e(N\u0026thinsp;=\u0026thinsp;11366)\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eNormal\u003c/p\u003e \u003cp\u003e(N\u0026thinsp;=\u0026thinsp;9932)\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eOA\u003c/p\u003e \u003cp\u003e(N\u0026thinsp;=\u0026thinsp;1434)\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003e\u003cem\u003ep\u003c/em\u003e-value\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eAge (years)\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e47.9\u0026thinsp;\u0026plusmn;\u0026thinsp;17.4\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e45.8\u0026thinsp;\u0026plusmn;\u0026thinsp;16.9\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e62.6\u0026thinsp;\u0026plusmn;\u0026thinsp;13.2\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e\u0026lt;\u0026thinsp;0.001\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eRatio of family income to poverty\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e2.5\u0026thinsp;\u0026plusmn;\u0026thinsp;1.6\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e2.5\u0026thinsp;\u0026plusmn;\u0026thinsp;1.6\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e2.7\u0026thinsp;\u0026plusmn;\u0026thinsp;1.6\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e\u0026lt;\u0026thinsp;0.001\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eDiabetes, n(%)\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e1944 (17.1%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e1547 (15.6%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e397 (27.7%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e\u0026lt;\u0026thinsp;0.001\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eHypertension, n(%)\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e4587 (40.4%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e3631 (36.6%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e956 (66.7%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e\u0026lt;\u0026thinsp;0.001\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eCalories intake (kcal/d)\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e2071.4\u0026thinsp;\u0026plusmn;\u0026thinsp;848.2\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e2093.6\u0026thinsp;\u0026plusmn;\u0026thinsp;863.0\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e1918.0\u0026thinsp;\u0026plusmn;\u0026thinsp;719.1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e\u0026lt;\u0026thinsp;0.001\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eCoffee intake (mg/d)\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e137.9\u0026thinsp;\u0026plusmn;\u0026thinsp;158.0\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e133.5\u0026thinsp;\u0026plusmn;\u0026thinsp;153.5\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e168.0\u0026thinsp;\u0026plusmn;\u0026thinsp;183.8\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e\u0026lt;\u0026thinsp;0.001\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eFiber intake (g/d)\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e17.3\u0026thinsp;\u0026plusmn;\u0026thinsp;9.6\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e17.4\u0026thinsp;\u0026plusmn;\u0026thinsp;9.8\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e16.6\u0026thinsp;\u0026plusmn;\u0026thinsp;8.5\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.002\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eUrinary protein (\u0026micro;g/ml)\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e44.8\u0026thinsp;\u0026plusmn;\u0026thinsp;293.0\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e43.5\u0026thinsp;\u0026plusmn;\u0026thinsp;278.4\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e53.8\u0026thinsp;\u0026plusmn;\u0026thinsp;379.2\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.32\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eUrine creatinine (mg/dl)\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e123.4\u0026thinsp;\u0026plusmn;\u0026thinsp;81.4\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e125.4\u0026thinsp;\u0026plusmn;\u0026thinsp;82.3\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e109.6\u0026thinsp;\u0026plusmn;\u0026thinsp;73.8\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e\u0026lt;\u0026thinsp;0.001\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eLog2 (SII)\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e8.8\u0026thinsp;\u0026plusmn;\u0026thinsp;0.8\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e8.8\u0026thinsp;\u0026plusmn;\u0026thinsp;0.8\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e8.9\u0026thinsp;\u0026plusmn;\u0026thinsp;0.8\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e\u0026lt;\u0026thinsp;0.001\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eGender, n(%)\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e\u0026lt;\u0026thinsp;0.001\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eMale\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e5658 (49.8%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e5150 (51.9%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e508 (35.4%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eFemale\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e5708 (50.2%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e4782 (48.1%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e926 (64.6%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eRace, n(%)\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e\u0026lt;\u0026thinsp;0.001\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eMexican American people\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e1525 (13.4%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e1409 (14.2%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e116 (8.1%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eOther Hispanic people\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e1187 (10.4%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e1076 (10.8%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e111 (7.7%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eNon-Hispanic White people\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e4652 (40.9%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e3781 (38.1%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e871 (60.7%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eNon-Hispanic Black people\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e2385 (21%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e2153 (21.7%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e232 (16.2%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eOther/multiracial people\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e1617 (14.2%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e1513 (15.2%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e104 (7.3%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eEducation level, n(%)\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.11\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eLess Than 9th Grade\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e873 (7.7%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e773 (7.8%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e100 (7%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e9-11th Grade\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e1368 (12%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e1203 (12.1%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e165 (11.5%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eHigh School Grad/GED\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e2460 (21.6%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e2165 (21.8%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e295 (20.6%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eSome College or AA degree\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e3582 (31.5%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e3086 (31.1%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e496 (34.6%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eCollege Graduate or above\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e3083 (27.1%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e2705 (27.2%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e378 (26.4%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eMarital status, n(%)\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e\u0026lt;\u0026thinsp;0.001\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eMarried\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e5807 (51.1%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e5031 (50.7%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e776 (54.1%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eWidowed\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e721 (6.3%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e499 (5%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e222 (15.5%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eDivorced\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e1218 (10.7%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e995 (10%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e223 (15.6%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eSeparated\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e345 (3%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e310 (3.1%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e35 (2.4%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eNever married\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e2307 (20.3%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e2185 (22%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e122 (8.5%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eLiving with partner\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e968 (8.5%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e912 (9.2%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e56 (3.9%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eSmoking, n(%)\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e\u0026lt;\u0026thinsp;0.001\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eNever smoker\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e6524 (57.4%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e5840 (58.8%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e684 (47.7%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eFormer smoker\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e2611 (23%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e2111 (21.3%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e500 (34.9%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eCurrent smoker\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e2231 (19.6%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e1981 (19.9%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e250 (17.4%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eDrink alcohol, n(%)\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e8278 (72.8%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e7235 (72.8%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e1043 (72.7%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e\u0026gt;\u0026thinsp;0.9\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003ePhysically active, n(%)\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e5890 (51.8%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e5302 (53.4%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e588 (41%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e\u0026lt;\u0026thinsp;0.001\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eBMI, n(%)\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eUnderweight\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e193 (1.7%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e175 (1.8%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e18 (1.3%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eNormal weight\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e3219 (28.3%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e2945 (29.7%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e274 (19.1%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e\u0026lt;\u0026thinsp;0.001\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eOverweight\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e3673 (32.3%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e3264 (32.9%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e409 (28.5%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eObesity\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e4281 (37.7%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e3548 (35.7%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e733 (51.1%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003ctfoot\u003e \u003ctr\u003e\u003ctd colspan=\"5\"\u003eContinuous data are expressed as weighted mean\u0026thinsp;\u0026plusmn;\u0026thinsp;standard deviation (SD), categorical data are expressed as number percentages.\u003c/td\u003e\u003c/tr\u003e \u003ctr\u003e\u003ctd colspan=\"5\"\u003eAbbreviations: SII: systemic immune inflammation index, BMI: body mass index.\u003c/td\u003e\u003c/tr\u003e \u003c/tfoot\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cdiv id=\"Sec8\" class=\"Section3\"\u003e \u003ch2\u003eSelection of main predictors of OA\u003c/h2\u003e \u003cp\u003eTo identify the key predictor variables of OA, we performed LASSO regression, XGBoost algorithm, and RF algorithm for 10-fold cross-validation and assessment of feature importance. Using LASSO regression, we selected 7 significant predictors from the 19 feature variables in the training cohort: gender, hypertension, BMI, age, drink alcohol, education level and coffee intake (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003eA-C). Figure\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003eD presents the importance ranking of 15 feature variables by XGBoost algorithm, including age, gender, race, hypertension, coffee intake, BMI, calories intake, Log2 (SII), fiber intake, PIR, marital status, urine creatinine, diabetes, urinary protein and smoking. Figure\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003eE-F displays the importance ranking of 15 feature variables by RF algorithm: age, gender, race, urine creatinine, diabetes, calories intake, marital status, urinary protein, hypertension, education level, coffee intake, smoking, BMI, fiber intake and drink alcohol. We took intersection of these feature variables obtained from the three algorithms, and it came out with 5 key feature variables: age, gender, hypertension, BMI and coffee intake (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003eG).\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e \u003c/div\u003e\n\u003ch3\u003eConstruction of a new predictive model of OA\u003c/h3\u003e\n\u003cp\u003eThrough the machine learning methods mentioned above, 5 of the original 19 variables were selected, considered optimal variables. Next, in order to create a new predictive model, we utilized multivariate logistic regression. The logistic regression analysis for these 5 variables was performed (Table \u003cspan refid=\"MOESM2\" class=\"InternalRef\"\u003eS2\u003c/span\u003e). These predictors, which are mutually independent, were combined to create a nomogram illustrating the quantified OA risk (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003e).\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cdiv id=\"Sec10\" class=\"Section2\"\u003e \u003ch2\u003ePerformance of the new nomogram of OA in AUC, and calibration curve\u003c/h2\u003e \u003cp\u003eROC curve and calibration curve were used to evaluate the of performance of the new OA nomogram. Figure\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003eA display the distribution of ROC curve and calibration curve of the new OA nomogram for the training cohort; the ROC curve shows an AUC of 0.804, and the calibration curve exhibits good consistency. In the validation cohort, the ROC curve of the new OA nomogram has an AUC of 0.814(Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003eC), and the calibration curve in Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003eD also demonstrates good consistency and stability. The results above indicate that the nomogram exhibits excellent predictive value and discriminative capability.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec11\" class=\"Section2\"\u003e \u003ch2\u003eEvaluation of clinical utility of the new nomogram of OA\u003c/h2\u003e \u003cp\u003eWe further conducted decision curve analysis (DCA) to evaluate the clinical utility of the new nomogram of OA. As shown in Figs.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003eA-B, the nomogram exhibits a high net benefit in both the training and validation cohorts. The results above indicate that this new OA nomogram possesses a certain degree of clinical utility.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e"},{"header":"Discussion","content":"\u003cp\u003eOsteoarthritis is a severe public health issue, and delayed diagnosis hinders early protective treatment of joint health[\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e]. Therefore, there is an urgent need for a simple and user-friendly prediction model of OA to identify high risk populations. The nomogram, a visual prediction model, which is helpful for disease screening and early diagnosis[\u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e15\u003c/span\u003e]. In this study, we successfully applied machine learning methods to select feature variables using data from the NHANES database. By intersecting these feature variables, we identified 5 optimal predictive factors, to construct a effective risk prediction nomogram of OA. This model exhibits a high predictive performance of OA risk (AUC of training set: 0.804, AUC of validation set: 0.814).\u003c/p\u003e \u003cp\u003eCurrently, there is a growing annual increase in the use of machine learning for developing clinical prediction models[\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e, \u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e26\u003c/span\u003e, \u003cspan citationid=\"CR27\" class=\"CitationRef\"\u003e27\u003c/span\u003e]. To the best of our knowledge, we are the first to utilize machine learning methods based on the NHANES database to construct a risk prediction nomogram for OA using clinical data. This model has a large population base and diversity, ensuring more stable results. We selected simple clinical and laboratory datas that are easily obtainable from various factors, including demographic data, anthropometric measurements, laboratory examintation, and questionnaires, making it more conducive to the widespread application of the model. In conclusion, our prediction model has two distinct advantages: it possesses excellent predictive capacity (high AUC) and it is user-friendly, with wide adaptability (only 5 easily obtainable feature variables and cross-ethnic applicability).\u003c/p\u003e \u003cp\u003eIn our model, we ultimately included five factors: age, gender, hypertension, BMI, coffee intake. It is acknowledged that advanced age and being female are the risk factors of OA[\u003cspan additionalcitationids=\"CR29\" citationid=\"CR28\" class=\"CitationRef\"\u003e28\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR30\" class=\"CitationRef\"\u003e30\u003c/span\u003e]. High BMI is also a risk factor, as being overweight or obese has a direct effect on OA[\u003cspan additionalcitationids=\"CR32 CR33 CR34\" citationid=\"CR31\" class=\"CitationRef\"\u003e31\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR35\" class=\"CitationRef\"\u003e35\u003c/span\u003e], particularly in the knee, demonstrating a dose-response relationship between BMI and OA[\u003cspan citationid=\"CR34\" class=\"CitationRef\"\u003e34\u003c/span\u003e, \u003cspan citationid=\"CR36\" class=\"CitationRef\"\u003e36\u003c/span\u003e]. The activation of metabolic factors leading to the joint injury and the mechanical overload of the weight-bearing joint are considered possible mechanisms to explain how BMI increases the risk of OA[\u003cspan citationid=\"CR37\" class=\"CitationRef\"\u003e37\u003c/span\u003e]. A meta-analysis revealed an association between hypertension and OA, including 2 cohort studies and 6 cross-sectional studies[\u003cspan citationid=\"CR38\" class=\"CitationRef\"\u003e38\u003c/span\u003e]. Hart et al. found that there was a relationship between metabolic factors (such as hypertension and hypercholesterolemia) and OA[\u003cspan citationid=\"CR39\" class=\"CitationRef\"\u003e39\u003c/span\u003e]. Metabolic syndrome (MetS) and OA share similar mechanisms of inflammation, and obesity altered adipokines secretion, leading to the chronic low grade inflammatory status in joint tissues[\u003cspan citationid=\"CR40\" class=\"CitationRef\"\u003e40\u003c/span\u003e]. As for coffee intake, studies have demonstrated that coffee intake is associated with an increased risk of OA[\u003cspan additionalcitationids=\"CR42\" citationid=\"CR41\" class=\"CitationRef\"\u003e41\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR43\" class=\"CitationRef\"\u003e43\u003c/span\u003e]. Zhang et al. utilized two-sample and two-step Mendelian randomization (MR) analyses by genome-wide association studies (GWAS) summary statistics to estimate the relationship between coffee intake and OA, and they found that coffee consumption increased the risk of OA, especially knee osteoarthritis (KOA), with BMI mediating this relationship[\u003cspan citationid=\"CR43\" class=\"CitationRef\"\u003e43\u003c/span\u003e]. Tan et al. observed that maternal mice exposed to low doses of caffeine exhibited impaired fetal joint integrity[\u003cspan citationid=\"CR44\" class=\"CitationRef\"\u003e44\u003c/span\u003e]. Caffeine antagonizes adenosine receptors and increases osteoclastogenesis, which may explain this phenomenon[\u003cspan citationid=\"CR45\" class=\"CitationRef\"\u003e45\u003c/span\u003e, \u003cspan citationid=\"CR46\" class=\"CitationRef\"\u003e46\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eUsing this nomogram, clinical staffs can rapidly and accurately identify individuals who may be at risk of OA. For those identified as having a higher risk during the screening, it is recommended to conduct further examinations for early diagnosis and intervention to improve prognosis. Furthermore, closely monitoring BMI and blood pressure, implementing measures to control diet such coffee intake, can effectively reduce the risk of OA.\u003c/p\u003e \u003cp\u003eIt should be noted that although internal validation was performed in the validation cohort, this study still requires external validation to further prove its generalizability.\u003c/p\u003e"},{"header":"Conclusions","content":"\u003cp\u003eCompared to other existing models, this study has developed an effective clinical nomogram for identifying OA in a large population. Based on the risk assessment, clinicians can create individualized diagnostic and treatment plans for the subjects. For individuals at high risk of OA, further examinations are recommended for early diagnosis and lifestyle or medical intervention to prevent the disease from progressing further. Thus, the risk prediction nomogram of OA developed in this study has significant clinical value.\u003c/p\u003e "},{"header":"Abbreviations","content":"\u003ctable border=\"0\" cellspacing=\"0\" cellpadding=\"0\"\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd width=\"28.543307086614174%\" valign=\"top\"\u003e\n \u003cp\u003eOA\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"71.45669291338582%\" valign=\"top\"\u003e\n \u003cp\u003eOsteoarthritis\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd width=\"28.543307086614174%\" valign=\"top\"\u003e\n \u003cp\u003eNHANES\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"71.45669291338582%\" valign=\"top\"\u003e\n \u003cp\u003eNational Health and Nutrition Examination Survey\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd width=\"28.543307086614174%\" valign=\"top\"\u003e\n \u003cp\u003ePIR\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"71.45669291338582%\" valign=\"top\"\u003e\n \u003cp\u003ePoverty income ratio\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd width=\"28.543307086614174%\" valign=\"top\"\u003e\n \u003cp\u003eBMI\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"71.45669291338582%\" valign=\"top\"\u003e\n \u003cp\u003eBody mass index\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd width=\"28.543307086614174%\" valign=\"top\"\u003e\n \u003cp\u003eSII\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"71.45669291338582%\" valign=\"top\"\u003e\n \u003cp\u003eSystemic immune-inflammatory index\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd width=\"28.543307086614174%\" valign=\"top\"\u003e\n \u003cp\u003eSD\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"71.45669291338582%\" valign=\"top\"\u003e\n \u003cp\u003eStandard deviatio\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd width=\"28.543307086614174%\" valign=\"top\"\u003e\n \u003cp\u003eDM\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"71.45669291338582%\" valign=\"top\"\u003e\n \u003cp\u003eDiabetes mellitus\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd width=\"28.543307086614174%\" valign=\"top\"\u003e\n \u003cp\u003eFPG\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"71.45669291338582%\" valign=\"top\"\u003e\n \u003cp\u003eFasting plasma glucose\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd width=\"28.543307086614174%\" valign=\"top\"\u003e\n \u003cp\u003eNC\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"71.45669291338582%\" valign=\"top\"\u003e\n \u003cp\u003eNeutrophil count\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd width=\"28.543307086614174%\" valign=\"top\"\u003e\n \u003cp\u003eLC\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"71.45669291338582%\" valign=\"top\"\u003e\n \u003cp\u003eLymphocyte count\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd width=\"28.543307086614174%\" valign=\"top\"\u003e\n \u003cp\u003ePC\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"71.45669291338582%\" valign=\"top\"\u003e\n \u003cp\u003ePlatelet count\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd width=\"28.543307086614174%\" valign=\"top\"\u003e\n \u003cp\u003eRF\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"71.45669291338582%\" valign=\"top\"\u003e\n \u003cp\u003eRandom forest\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd width=\"28.543307086614174%\" valign=\"top\"\u003e\n \u003cp\u003eROC\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"71.45669291338582%\" valign=\"top\"\u003e\n \u003cp\u003eReceiver operating characteristic\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd width=\"28.543307086614174%\" valign=\"top\"\u003e\n \u003cp\u003eAUC\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"71.45669291338582%\" valign=\"top\"\u003e\n \u003cp\u003eArea under receiver operating characteristic curve\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd width=\"28.543307086614174%\" valign=\"top\"\u003e\n \u003cp\u003eDCA\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"71.45669291338582%\" valign=\"top\"\u003e\n \u003cp\u003eDecision curve analysis\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd width=\"28.543307086614174%\" valign=\"top\"\u003e\n \u003cp\u003eSD\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"71.45669291338582%\" valign=\"top\"\u003e\n \u003cp\u003eStandard deviation\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd width=\"28.543307086614174%\" valign=\"top\"\u003e\n \u003cp\u003eLASSO\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"71.45669291338582%\" valign=\"top\"\u003e\n \u003cp\u003eLeast absolute shrinkage and selection operator\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd width=\"28.543307086614174%\" valign=\"top\"\u003e\n \u003cp\u003eMR\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"71.45669291338582%\" valign=\"top\"\u003e\n \u003cp\u003eMendelian Randomization\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd width=\"28.543307086614174%\" valign=\"top\"\u003e\n \u003cp\u003eGWAS\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"71.45669291338582%\" valign=\"top\"\u003e\n \u003cp\u003eGenome-wide association studies\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd width=\"28.543307086614174%\" valign=\"top\"\u003e\n \u003cp\u003eKOA\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"71.45669291338582%\" valign=\"top\"\u003e\n \u003cp\u003eKnee osteoarthritis\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd width=\"28.543307086614174%\" valign=\"top\"\u003e\n \u003cp\u003eMetS\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"71.45669291338582%\" valign=\"top\"\u003e\n \u003cp\u003eMetabolic\u0026nbsp;syndrome\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n\u003c/table\u003e"},{"header":"Declarations","content":"\u003cp\u003e\u003cstrong\u003eAcknowledgments\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe authors thank NCHS for its research design and data sharing, as well as all the investigators and participants.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAuthors\u0026rsquo; contributions\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eConceptualization: QZ, JC and YL; methodology: QZ; formal analysis: QZ and JC; writing\u0026mdash;original draft: QZ, JC and YL; writing\u0026mdash;review and editing: JC, QZ and ML; supervision: LL. All authors have read and agreed to the published version of the manuscript.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eFunding\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThis work was supported by grants from the Basic and Applied Basic Research Foundation of Guangdong Province (No. 2021A1515010137), and the Special Fund Project for Science and Technology Innovation Strategy of Guangdong Province (No. 210715106900976).\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAvailability of data and materials\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eDataset is publicly available at\u0026nbsp;\u003ca href=\"https://www.cdc.gov/nchs/nhanes\"\u003ehttps://www.cdc.gov/nchs/nhanes\u003c/a\u003e (accessed on 12 December 2023).\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eDeclarations\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eEthics approval and consent to participate\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eNHANES is conducted by the Centers for Disease Control and Prevention(CDC) and the National Center for Health Statistics (NCHS). The NCHS ResearchEthics Review Committee reviewed and approved the NHANES study proto-col.All participants siqned written informed consent.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eConsent for publication\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eNot applicable.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eCompeting interests\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe authors declare that they have no competing interests.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\u003cli\u003e\u003cspan\u003eYue L, Berman J. What Is Osteoarthritis? JAMA. 2022;327(13):1300.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eDisease GBD, Injury I, Prevalence C. Global, regional, and national incidence, prevalence, and years lived with disability for 354 diseases and injuries for 195 countries and territories, 1990\u0026ndash;2017: a systematic analysis for the Global Burden of Disease Study 2017. Lancet. 2018;392(10159):1789\u0026ndash;858.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSowers MR, Karvonen-Gutierrez CA. The evolving role of obesity in knee osteoarthritis. Curr Opin Rheumatol. 2010;22(5):533\u0026ndash;7.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLiao Z, Han X, Wang Y, Shi J, Zhang Y, Zhao H, Zhang L, Jiang M, Liu M. Differential Metabolites in Osteoarthritis: A Systematic Review and Meta-Analysis. Nutrients 2023, 15(19).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMobasheri A, Rayman MP, Gualillo O, Sellam J, van der Kraan P, Fearon U. The role of metabolism in the pathogenesis of osteoarthritis. Nat Rev Rheumatol. 2017;13(5):302\u0026ndash;11.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eYoung DA, Barter MJ, Soul J. Osteoarthritis year in review: genetics, genomics, epigenetics. Osteoarthritis Cartilage. 2022;30(2):216\u0026ndash;25.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eFelson DT. Clinical practice. Osteoarthritis of the knee. N Engl J Med. 2006;354(8):841\u0026ndash;8.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eChu CR, Williams AA, Coyle CH, Bowers ME. Early diagnosis to enable early treatment of pre-osteoarthritis. Arthritis Res Ther. 2012;14(3):212.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAbdel Hady DA, Abd El-Hafeez T. Predicting female pelvic tilt and lumbar angle using machine learning in case of urinary incontinence and sexual dysfunction. Sci Rep. 2023;13(1):17940.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eFerreira-Santos D, Amorim P, Silva Martins T, Monteiro-Soares M, Pereira Rodrigues P. Enabling Early Obstructive Sleep Apnea Diagnosis With Machine Learning: Systematic Review. J Med Internet Res. 2022;24(9):e39452.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eTsai SF, Yang CT, Liu WJ, Lee CL. Development and validation of an insulin resistance model for a population without diabetes mellitus and its clinical implication: a prospective cohort study. EClinicalMedicine. 2023;58:101934.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLi W, Feng J, Zhu D, Xiao Z, Liu J, Fang Y, Yao L, Qian B, Li S. Nomogram model based on radiomics signatures and age to assist in the diagnosis of knee osteoarthritis. Exp Gerontol. 2023;171:112031.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLi S, Ma L, Cui R. Identification of Novel Diagnostic Biomarkers and Classification Patterns for Osteoarthritis by Analyzing a Specific Set of Genes Related to Inflammation. Inflammation; 2023.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eChen X, Xu J, Zhang H, Yu L. A nomogram for predicting osteoarthritis based on serum biomarkers of bone turnover in middle age: A cross-sectional study of PTH and beta-CTx. Med (Baltim). 2023;102(20):e33833.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBonnett LJ, Snell KIE, Collins GS, Riley RD. Guide to presenting clinical prediction models for use in clinical settings. BMJ. 2019;365:l737.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWang Y, Li J, Xia Y, Gong R, Wang K, Yan Z, Wan X, Liu G, Wu D, Shi L, et al. Prognostic nomogram for intrahepatic cholangiocarcinoma after partial hepatectomy. J Clin Oncol. 2013;31(9):1188\u0026ndash;95.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eXu Y, Wu Q. Trends and disparities in osteoarthritis prevalence among US adults, 2005\u0026ndash;2018. Sci Rep. 2021;11(1):21845.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMarch LM, Schwarz JM, Carfrae BH, Bagge E. Clinical validation of self-reported osteoarthritis. Osteoarthritis Cartilage. 1998;6(2):87\u0026ndash;93.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMendy A, Park J, Vieira ER. Osteoarthritis and risk of mortality in the USA: a population-based cohort study. Int J Epidemiol. 2018;47(6):1821\u0026ndash;9.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWang X, Xie L, Yang S. Association between weight-adjusted-waist index and the prevalence of rheumatoid arthritis and osteoarthritis: a population-based study. BMC Musculoskelet Disord. 2023;24(1):595.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAlhassan E, Nguyen K, Hochberg MC, Mitchell BD. Causal Factors for Osteoarthritis: A Scoping Review of Mendelian Randomization Studies. Arthritis Care Res (Hoboken) 2023.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eChristensen K, Gleason CE, Mares JA. Dietary carotenoids and cognitive function among US adults, NHANES 2011\u0026ndash;2014. Nutr Neurosci. 2020;23(7):554\u0026ndash;62.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLiu B, Wang J, Li YY, Li KP, Zhang Q. The association between systemic immune-inflammation index and rheumatoid arthritis: evidence from NHANES 1999\u0026ndash;2018. Arthritis Res Ther. 2023;25(1):34.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eQin Z, Li H, Wang L, Geng J, Yang Q, Su B, Liao R. Systemic Immune-Inflammation Index Is Associated With Increased Urinary Albumin Excretion: A Population-Based Study. Front Immunol. 2022;13:863640.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eJohnson CL, Dohrmann SM, Burt VL, Mohadjer LK. National health and nutrition examination survey: sample design, 2011\u0026ndash;2014. Vital Health Stat 2 2014(162):1\u0026ndash;33.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMerianos AL, Mahabee-Gittens EM, Stone TM, Jandarov RA, Wang L, Bhandari D, Blount BC, Matt GE. Distinguishing Exposure to Secondhand and Thirdhand Tobacco Smoke among U.S. Children Using Machine Learning: NHANES 2013\u0026ndash;2016. Environ Sci Technol. 2023;57(5):2042\u0026ndash;53.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLi W, Huang G, Tang N, Lu P, Jiang L, Lv J, Qin Y, Lin Y, Xu F, Lei D. Effects of heavy metal exposure on hypertension: A machine learning modeling approach. Chemosphere. 2023;337:139435.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eJohnson VL, Hunter DJ. The epidemiology of osteoarthritis. Best Pract Res Clin Rheumatol. 2014;28(1):5\u0026ndash;15.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003ePrieto-Alhambra D, Judge A, Javaid MK, Cooper C, Diez-Perez A, Arden NK. Incidence and risk factors for clinically diagnosed knee, hip and hand osteoarthritis: influences of age, gender and osteoarthritis affecting other joints. Ann Rheum Dis. 2014;73(9):1659\u0026ndash;64.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWang L, Lu H, Chen H, Jin S, Wang M, Shang S. Development of a model for predicting the 4-year risk of symptomatic knee osteoarthritis in China: a longitudinal cohort study. Arthritis Res Ther. 2021;23(1):65.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eReijman M, Pols HA, Bergink AP, Hazes JM, Belo JN, Lievense AM, Bierma-Zeinstra SM. Body mass index associated with onset and progression of osteoarthritis of the knee but not of the hip: the Rotterdam Study. Ann Rheum Dis. 2007;66(2):158\u0026ndash;62.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eJiang L, Tian W, Wang Y, Rong J, Bao C, Liu Y, Zhao Y, Wang C. Body mass index and susceptibility to knee osteoarthritis: a systematic review and meta-analysis. Joint Bone Spine. 2012;79(3):291\u0026ndash;7.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eGrotle M, Hagen KB, Natvig B, Dahl FA, Kvien TK. Obesity and osteoarthritis in knee, hip and/or hand: an epidemiological study in the general population with 10 years follow-up. BMC Musculoskelet Disord. 2008;9:132.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eReyes C, Leyland KM, Peat G, Cooper C, Arden NK, Prieto-Alhambra D. Association Between Overweight and Obesity and Risk of Clinically Diagnosed Knee, Hip, and Hand Osteoarthritis: A Population-Based Cohort Study. Arthritis Rheumatol. 2016;68(8):1869\u0026ndash;75.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHo J, Mak CCH, Sharma V, To K, Khan W. Mendelian Randomization Studies of Lifestyle-Related Risk Factors for Osteoarthritis: A PRISMA Review and Meta-Analysis. Int J Mol Sci 2022, 23(19).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eRaud B, Gay C, Guiguet-Auclair C, Bonnin A, Gerbaud L, Pereira B, Duclos M, Boirie Y, Coudeyre E. Level of obesity is directly associated with the clinical and functional consequences of knee osteoarthritis. Sci Rep. 2020;10(1):3601.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKing LK, March L, Anandacoomarasamy A. Obesity \u0026amp; osteoarthritis. Indian J Med Res. 2013;138(2):185\u0026ndash;93.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZhang YM, Wang J, Liu XG. Association between hypertension and risk of knee osteoarthritis: A meta-analysis of observational studies. Med (Baltim). 2017;96(32):e7584.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHart DJ, Doyle DV, Spector TD. Association between metabolic factors and knee osteoarthritis in women: the Chingford Study. J Rheumatol. 1995;22(6):1118\u0026ndash;23.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBatushansky A, Zhu S, Komaravolu RK, South S, Mehta-D'souza P, Griffin TM. Fundamentals of OA. An initiative of Osteoarthritis and Cartilage. Obesity and metabolic factors in OA. Osteoarthritis Cartilage. 2022;30(4):501\u0026ndash;15.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLee YH. Investigating the possible causal association of coffee consumption with osteoarthritis risk using a Mendelian randomization analysis. Clin Rheumatol. 2018;37(11):3133\u0026ndash;9.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZhang Y, Fan J, Chen L, Xiong Y, Wu T, Shen S, Wang X, Meng X, Lu Y, Lei X. Causal Association of Coffee Consumption and Total, Knee, Hip and Self-Reported Osteoarthritis: A Mendelian Randomization Study. Front Endocrinol (Lausanne). 2021;12:768529.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZhang W, Lei X, Tu Y, Ma T, Wen T, Yang T, Xue L, Ji J, Xue H. Coffee and the risk of osteoarthritis: a two-sample, two-step multivariable Mendelian randomization study. Front Genet. 2024;15:1340044.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eTan Y, Lu K, Li J, Ni Q, Zhao Z, Magdalou J, Chen L, Wang H. Prenatal caffeine exprosure increases adult female offspring rat's susceptibility to osteoarthritis via low-functional programming of cartilage IGF-1 with histone acetylation. Toxicol Lett. 2018;295:229\u0026ndash;36.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eYi J, Yan B, Li M, Wang Y, Zheng W, Li Y, Zhao Z. Caffeine may enhance orthodontic tooth movement through increasing osteoclastogenesis induced by periodontal ligament cells under compression. Arch Oral Biol. 2016;64:51\u0026ndash;60.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eNieber K. The Impact of Coffee on Health. Planta Med. 2017;83(16):1256\u0026ndash;63.\u003c/span\u003e\u003c/li\u003e\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":true,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"Osteoarthritis, Nomogram, NHANES, Machine learning, LASSO, XGBoost, Random forest","lastPublishedDoi":"10.21203/rs.3.rs-4268728/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-4268728/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003e\u003cstrong\u003eIntroduction: \u003c/strong\u003eOsteoarthritis (OA) is a chronic joint disease with the global number of OA patients exceeds 300 million currently, posing a significant economic burden on patients and society. Currently, there is no cure for OA, making early identification and appropriate management of individuals at risk crucial. Thus, the development of a novel OA prediction model to screen for high-risk individuals, enabling early diagnosis and intervention, holds great importance in improving patient prognosis.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eMethods: \u003c/strong\u003eBased on the National Health and Nutrition Examination Survey (NHANES) for the periods of 2011-2012, 2013-2014, and 2015-2016, the study was a retrospective cross-sectional study involving 11,366 participants. Least absolute shrinkage and selection operator (LASSO) regression, XGBoost algorithm, and random forest (RF) algorithm were used to identify significant indicators associated with OA, and a OA prediction nomogram was developed. The nomogram was evaluated by measuring the the area under receiver operating characteristic curve (AUC), calibration curve, and decision curve analysis (DCA) curve of training and validation sets.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eResults:\u003c/strong\u003e In this study, we identified 5 predictors from 19 variables, including age, gender, hypertension, BMI and coffee intake, and developed an OA nomogram. In both the training and validation cohorts, the OA nomogram exhibited good predictive performance (with AUCs of 0.804 and 0.814, respectively), good consistency and stability in calibration curve and high net benefit in DCA.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eConclusion:\u003c/strong\u003e This nomogram based on 5 variables predicted the risk of OA with a high degree of accuracy, suggesting that it is a convenient tool for clinicians to identify high-risk populations of OA.\u003c/p\u003e","manuscriptTitle":"Development and validation of a new nomogram for OA based on machine learning","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2024-04-24 17:47:28","doi":"10.21203/rs.3.rs-4268728/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"a1e1bf00-2228-4de7-9feb-9051fdaf6828","owner":[],"postedDate":"April 24th, 2024","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[],"tags":[],"updatedAt":"2024-05-08T16:21:30+00:00","versionOfRecord":[],"versionCreatedAt":"2024-04-24 17:47:28","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-4268728","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-4268728","identity":"rs-4268728","version":["v1"]},"buildId":"qtupq5eGEP_6zYnWcrvyt","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

⚙ Ask this paper AI returns verbatim quotes from the full text · source: preprint-html ⓘ

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2024) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc: last seen: 2026-05-20T01:45:00.602351+00:00
unpaywall: last seen: 2026-05-23T02:00:01.238055+00:00

License: CC-BY-4.0