Explainable machine learning-based identification of Chinese college students at risk of problematic social media use: a multicenter cross-sectional online survey

doi:10.21203/rs.3.rs-9136366/v1

Explainable machine learning-based identification of Chinese college students at risk of problematic social media use: a multicenter cross-sectional online survey

2026 · doi:10.21203/rs.3.rs-9136366/v1

preprint OA: closed

Full text JSON View at publisher

Full text 125,128 characters · extracted from preprint-html · click to expand

Explainable machine learning-based identification of Chinese college students at risk of problematic social media use: a multicenter cross-sectional online survey | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article Explainable machine learning-based identification of Chinese college students at risk of problematic social media use: a multicenter cross-sectional online survey Zhiwen Huang, Qiufeng Liu This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-9136366/v1 This work is licensed under a CC BY 4.0 License Status: Under Review Version 1 posted 7 You are reading this latest preprint version Abstract Background Problematic social media use (PSMU) is increasingly recognized as a health-relevant digital behavior among college students and has been linked to psychological distress and poorer well-being. However, few studies have developed interpretable machine learning models to identify Chinese college students at elevated risk of PSMU using multidimensional psychosocial and digital-behavior data. Methods We conducted a multicenter cross-sectional online survey in China between September and October 2025 using Wenjuanxing. After logic checks and deletion of invalid questionnaires, 2,413 valid responses remained. The outcome was screening-positive potential PSMU, defined as a Bergen Social Media Addiction Scale total score ≥ 19. Eighteen prespecified predictors covering demographics, socioeconomic position, pre-college environment, health behaviors, social-media-use patterns, online health behaviors, and mental distress were modeled. Data were randomly split into training (1,689; 70%) and test (724; 30%) sets using stratified sampling by outcome. LASSO was used for feature selection, six supervised algorithms were compared, and SHAP was used to interpret the best-performing model. Results Overall, 1,192 participants (49.4%) screened positive for potential PSMU. Of 18 predictors entered into LASSO, 16 retained non-zero coefficients. XGBoost achieved the best test-set discrimination (AUC 0.851), followed closely by random forest (0.847) and LightGBM (0.843). XGBoost also showed the highest F1-score (0.778). SHAP analyses indicated that social media use frequency, DASS_any, platform count, university city tier, drinking, dating/stranger-social app use, and monthly expenditure contributed most to model output. Conclusions Explainable machine learning—especially XGBoost—can effectively identify Chinese college students at higher cross-sectional risk of potential PSMU. Social media exposure patterns and mental distress contributed most strongly to prediction. These findings may support targeted screening and digital well-being interventions in university settings, although external validation is still required. problematic social media use Bergen Social Media Addiction Scale machine learning XGBoost SHAP Chinese college students Figures Figure 1 Figure 2 Figure 3 Background Social media is deeply embedded in the everyday lives of college students. For most users, social networking platforms serve adaptive functions such as maintaining relationships, entertainment, and information seeking. However, a subgroup of users displays excessive, compulsive, and functionally impairing patterns of engagement that are commonly described as problematic social media use (PSMU) or addictive social media use [ 1 , 2 ]. Research over the past decade has linked PSMU to depressive and anxious symptoms, stress, loneliness, disturbed sleep, and reduced well-being. Recent reviews suggest that these relationships may be bidirectional and are often stronger for problematic or emotionally dysregulated use than for simple time spent online [ 3 , 4 ]. Among Chinese college students, PSMU has been associated with depressive outcomes, stress-related latent risk profiles, reciprocal dynamics with loneliness, and broader psychological distress [ 5 – 7 , 15 ]. Despite this expanding literature, most studies remain explanatory rather than predictive. Conventional regression models are valuable for estimating adjusted associations, but they may not adequately capture non-linear effects, threshold relationships, and interaction structures among social-media exposure, living environment, socioeconomic position, and mental distress. For screening-oriented tasks, machine-learning methods may therefore offer practical advantages, provided that they are transparently reported and carefully interpreted [ 11 – 14 ]. This study used a multicenter online sample of Chinese college students to build and compare six supervised machine-learning models for identifying students at elevated cross-sectional risk of potential PSMU. We further used SHapley Additive exPlanations (SHAP) to interpret the best-performing model. Our aims were to: (1) develop and internally validate an interpretable machine-learning model for PSMU screening; (2) compare the performance of linear, probabilistic, instance-based, and tree-based methods; and (3) identify the most influential predictors from harmonized demographic, behavioral, and mental-health indicators. Methods Study design, participants, and data quality control This study used a multicenter cross-sectional online survey design. The survey was conducted between September and October 2025 through the Wenjuanxing platform. Reporting of the web-based survey was informed, where applicable, by the CHERRIES and STROBE recommendations, and model reporting was structured in line with core TRIPOD principles [ 11 – 13 ]. The parent questionnaire was assembled specifically for this multicenter survey and had not previously been published as a standalone instrument. It combined previously published scales with study-developed or adapted items. Specifically, the Chinese Bergen Social Media Addiction Scale (BSMAS) and the Chinese Depression Anxiety Stress Scale-21 (DASS-21) were administered in their established published forms [ 1 , 2 , 8 ], whereas demographic, socioeconomic, pre-college environment, social-media-use pattern, and online health behavior items were drafted or adapted by the research team for the broader student-health survey after review of related questionnaires [ 9 , 10 ]. An English-language version of the questionnaire sections relevant to the present analysis is provided as Supplementary File 1. The survey link was disseminated by student recruiters through university-related WeChat and QQ groups. Eligible participants were current college or university students in China who were able to complete the questionnaire online and provide electronic informed consent. The study protocol was approved by the Ethics Committee of Jiangsu College of Nursing (approval No. JSCN-ME-2025070719). Questionnaires were screened for internal consistency before analysis. Responses with logic conflicts, obvious contradictions, or implausible completion patterns were deleted during data cleaning. Missing data had already been resolved in the cleaned export used for modeling, and no missing values remained in the final analytic dataset. The final dataset contained 2,413 valid participants, including 1,192 screening-positive outcome events. Institution-name responses indicated broad institutional coverage spanning universities and vocational colleges, with recruitment concentrated in Zhejiang, Jiangsu, and Shanghai; accordingly, the survey should be interpreted as multicenter but regionally clustered rather than nationally representative. For model development, the cleaned dataset was randomly divided into a training set (70%, n = 1,689) and an independent test set (30%, n = 724) using stratified random sampling by outcome. Because the analysis was prediction-oriented and the class distribution was nearly balanced across the split, the training and test sets are presented descriptively rather than as substantive comparison groups. Outcome and predictor measurement The full parent questionnaire included modules on demographics, socioeconomic position, social media use, health-related online behavior, mental health, sexual behavior, condom attitudes, STI awareness, eHealth literacy, sexual health literacy, and HPV-related knowledge. The present prediction analysis did not use the entire parent survey. Instead, it focused on a prespecified subset of harmonized variables available in the cleaned analytic dataset and directly relevant to the PSMU modeling task. Additional modules from the broader survey were not analyzed in this manuscript. The outcome was screening-positive potential PSMU, operationalized as a Bergen Social Media Addiction Scale (BSMAS) total score of 19 or higher. The BSMAS is a six-item instrument that captures salience, mood modification, tolerance, withdrawal, conflict, and relapse, with a total score ranging from 6 to 30 [ 1 , 2 ]. In this study, the threshold of 19 was treated as a sensitivity-oriented screening cut-off for potential PSMU rather than a clinical diagnosis. Eighteen candidate predictors were taken forward into the modeling pipeline: sex, grade level, university city tier, only-child status, highest parental education, monthly expenditure, family structure, residence before college, sexual orientation, smoking, drinking, social media use frequency, platform count, short-video platform use, dating app use, sharing health information online, seeking health information online, and a binary DASS-21-derived distress indicator. Mental distress was derived from the Depression Anxiety Stress Scale-21 (DASS-21), a widely used instrument with validated Chinese psychometric properties [ 8 ]. DASS was coded as 1 when at least one DASS subscale reached moderate or greater severity and 0 otherwise. University city tier was derived from the Chinese city-tier classification based on the location of the participant's institution. The original underlying classification distinguishes tier 1, new tier 1, tier 2, tier 3, tier 4, and tier 5 cities; for the cleaned analytic dataset, these were collapsed into an ordinal three-level variable (1 = tier 3/4/5 city, 2 = tier 2 city, 3 = tier 1/new tier 1 city). For descriptive display in Table 1 , the categories are labeled as tier 3 or below, tier 2, and tier 1/new tier 1. Because these variables were used for risk classification rather than causal inference, they are interpreted as candidate predictive features rather than causal exposures. Statistical analysis Continuous or count variables are presented as mean (standard deviation), and categorical variables as number (percentage). No hypothesis tests are reported for training-versus-test comparisons because the split was algorithmic and stratified by outcome rather than substantively meaningful. Least absolute shrinkage and selection operator (LASSO) logistic regression with internal cross-validation was used to reduce dimensionality and identify features with non-zero penalized coefficients. Six supervised algorithms were then developed and compared: logistic regression, random forest, XGBoost, LightGBM, Gaussian naive Bayes, and k-nearest neighbors. Hyperparameters were tuned within the training set, and final performance was evaluated only on the independent test set. Model performance was assessed using the area under the receiver operating characteristic curve (AUC), accuracy, precision, recall, and F1-score. The best-performing model was interpreted using SHapley Additive exPlanations (SHAP), which quantify the contribution of each feature to model output at both global and individual-instance levels [ 14 ]. SHAP values were interpreted as model-explanation quantities, not as causal effect estimates. All analyses were performed in Python using scikit-learn, XGBoost, LightGBM, and SHAP. During manuscript preparation, a generative AI tool (ChatGPT, OpenAI) was used to support English-language polishing and manuscript organization. The authors retained full responsibility for the study design, analytic decisions, data interpretation, and final approval of the manuscript. Results Sample characteristics A total of 2,413 valid questionnaires were included in analysis, with 1,689 observations in the training set and 724 in the test set. The outcome distribution was essentially identical across splits: 49.4% in the full sample, 49.4% in the training set, and 49.4% in the test set. The descriptive distribution of the analytic variables is shown in Table 1 . The sample was predominantly male (56.9%), undergraduate (freshman/sophomore 43.7%; junior/senior/fifth-year 43.8%), and enrolled at institutions located in tier 1/new tier 1 cities (68.0%). Most participants reported a complete family structure (86.0%), urban residence before college (57.8%), short-video platform use (90.0%), and online health information seeking in the prior 12 months (69.9%). More than half met the DASS criterion (57.9%). Table 1 Descriptive characteristics of the analytic sample. Variable Category Overall (N = 2413) Training set (n = 1689) Test set (n = 724) Sex Female 1040 (43.1%) 736 (43.6%) 304 (42.0%) Male 1373 (56.9%) 953 (56.4%) 420 (58.0%) Grade level Freshman/Sophomore 1054 (43.7%) 724 (42.9%) 330 (45.6%) Junior/Senior/Fifth-year 1058 (43.8%) 748 (44.3%) 310 (42.8%) Master/Doctoral 301 (12.5%) 217 (12.8%) 84 (11.6%) University city tier Tier 3 or below 250 (10.4%) 182 (10.8%) 68 (9.4%) Tier 2 523 (21.7%) 383 (22.7%) 140 (19.3%) Tier 1/New Tier 1 1640 (68.0%) 1124 (66.5%) 516 (71.3%) Only-child status No 1249 (51.8%) 882 (52.2%) 367 (50.7%) Yes 1164 (48.2%) 807 (47.8%) 357 (49.3%) Highest parental education Junior high or below 525 (21.8%) 383 (22.7%) 142 (19.6%) High school/technical secondary 1137 (47.1%) 786 (46.5%) 351 (48.5%) College or above 751 (31.1%) 520 (30.8%) 231 (31.9%) Monthly expenditure CNY 4000 73 (3.0%) 56 (3.3%) 17 (2.3%) Family structure Non-intact family 337 (14.0%) 229 (13.6%) 108 (14.9%) Complete family 2076 (86.0%) 1460 (86.4%) 616 (85.1%) Residence before college Rural/suburban county 1019 (42.2%) 732 (43.3%) 287 (39.6%) Urban 1394 (57.8%) 957 (56.7%) 437 (60.4%) Sexual orientation Other 194 (8.0%) 137 (8.1%) 57 (7.9%) Heterosexual 2219 (92.0%) 1552 (91.9%) 667 (92.1%) Smoking No 1460 (60.5%) 1021 (60.4%) 439 (60.6%) Yes 953 (39.5%) 668 (39.6%) 285 (39.4%) Drinking No 1979 (82.0%) 1379 (81.6%) 600 (82.9%) Yes 434 (18.0%) 310 (18.4%) 124 (17.1%) Social media use frequency Rarely 108 (4.5%) 79 (4.7%) 29 (4.0%) Sometimes 339 (14.0%) 242 (14.3%) 97 (13.4%) Often 889 (36.8%) 615 (36.4%) 274 (37.8%) Always 1077 (44.6%) 753 (44.6%) 324 (44.8%) Platform count Mean (SD) 4.54 (1.62) 4.54 (1.61) 4.53 (1.65) Short-video platform use No 241 (10.0%) 175 (10.4%) 66 (9.1%) Yes 2172 (90.0%) 1514 (89.6%) 658 (90.9%) Dating app use No 1829 (75.8%) 1280 (75.8%) 549 (75.8%) Yes 584 (24.2%) 409 (24.2%) 175 (24.2%) Shared health information online No 1587 (65.8%) 1102 (65.2%) 485 (67.0%) Yes 826 (34.2%) 587 (34.8%) 239 (33.0%) Sought health information online No 726 (30.1%) 481 (28.5%) 245 (33.8%) Yes 1687 (69.9%) 1208 (71.5%) 479 (66.2%) DASS_any No 1017 (42.1%) 713 (42.2%) 304 (42.0%) Yes 1396 (57.9%) 976 (57.8%) 420 (58.0%) Potential PSMU (BSMAS ≥ 19) No 1221 (50.6%) 855 (50.6%) 366 (50.6%) Yes 1192 (49.4%) 834 (49.4%) 358 (49.4%) Note : No p values are shown for training-versus-test comparisons because the split was algorithmic and stratified by outcome. DASS_any indicates that at least one DASS-21 subscale was in the moderate-or-higher range. University city tier was based on the Chinese city-tier classification. The underlying classification distinguishes tier 1, new tier 1, tier 2, tier 3, tier 4, and tier 5 cities; for the cleaned analytic dataset, this was collapsed to an ordinal three-level variable (1 = tier 3/4/5 city, 2 = tier 2 city, 3 = tier 1/new tier 1 city). For descriptive display in this table, categories are labeled as tier 3 or below, tier 2, and tier 1/new tier 1. PSMU = problematic social media use. LASSO feature selection The LASSO stage included 18 candidate predictors. Sixteen retained non-zero penalized coefficients, whereas sexual orientation and short-video platform use were shrunk to zero in the final penalized solution. The variables with the largest absolute positive coefficients were social media use frequency, DASS, drinking, dating app use, and smoking. Residence before college and sharing health information online showed negative penalized coefficients. Figure 1 presents the cross-validation curve and coefficient path, and Table 2 lists the entered features and coefficient directions. Table 2 Predictors entered into the LASSO stage and final penalized coefficients. Feature Coefficient Direction Selected Social media use frequency 0.815 Positive Yes Platform count 0.087 Positive Yes Monthly expenditure 0.274 Positive Yes University city tier 0.216 Positive Yes Grade level 0.073 Positive Yes Highest parental education 0.028 Positive Yes Dating app use 0.558 Positive Yes DASS 0.763 Positive Yes Drinking 0.682 Positive Yes Smoking 0.406 Positive Yes Only-child status 0.350 Positive Yes Residence before college -0.513 Negative Yes Family structure 0.374 Positive Yes Sexual orientation 0.000 Zero No Short-video platform use 0.000 Zero No Sex -0.131 Negative Yes Shared health information online -0.427 Negative Yes Sought health information online 0.001 Positive Yes Note : Coefficients are penalized model coefficients and should be interpreted as contributions within a predictive model, not as causal effect sizes. Model performance Test-set performance for all candidate algorithms is shown in Table 3 and Fig. 2 . XGBoost achieved the best overall performance, with an AUC of 0.851, accuracy of 0.769, precision of 0.742, recall of 0.818, and F1-score of 0.778. Random forest and LightGBM performed similarly and also outperformed logistic regression, Gaussian naive Bayes, and k-nearest neighbors on discrimination and F1-score. Taken together, the results suggest that tree-based ensemble methods better captured the structure of these data than either the linear baseline model or simpler probabilistic and instance-based methods. At the same time, performance differences among the top three models were modest, which argues against overinterpreting a single ‘winning’ algorithm and instead supports the broader conclusion that non-linear ensemble learners were preferable in this prediction task. Table 3 Predictive performance of the candidate models on the independent test set. Model AUC Accuracy Precision Recall F1-score Logistic regression 0.791 0.698 0.690 0.704 0.697 Random forest 0.847 0.762 0.733 0.818 0.773 XGBoost 0.851 0.769 0.742 0.818 0.778 LightGBM 0.843 0.764 0.736 0.816 0.774 Gaussian naive Bayes 0.802 0.728 0.751 0.673 0.710 k-nearest neighbors 0.785 0.700 0.670 0.777 0.719 Note : AUC = area under the receiver operating characteristic curve. Explainable machine-learning results SHAP analyses of the XGBoost model identified social media use frequency as the dominant driver of model output, followed by DASS, platform count, university city tier, drinking, dating/stranger-social app use, and monthly expenditure (Fig. 3 ; Supplementary Figures S1 -S3). The beeswarm pattern indicates that higher social media use frequency and positive DASS status generally shifted predictions toward the PSMU-positive class. Several additional patterns are noteworthy. Greater platform count, drinking, smoking, and dating-app use tended to increase the model output for many participants, whereas urban residence before college and sharing health information online tended to shift predictions downward in substantial subsets of students. The comparatively small spread of SHAP values for sex and sought health information online suggests that these variables contributed limited incremental information after the dominant social-media and psychosocial features were accounted for. Discussion Principal findings This study developed and internally validated an explainable machine-learning approach to identify Chinese college students at elevated cross-sectional risk of potential PSMU. Three findings are particularly notable. First, tree-based ensemble learners (XGBoost, random forest, and LightGBM) consistently outperformed logistic regression, Gaussian naive Bayes, and k-nearest neighbors. Second, the most influential predictors were not purely demographic; rather, they combined behavioral exposure to social media (frequency and platform count), mental distress, and selected contextual or lifestyle variables. Third, several predictors that are often treated as background covariates—such as university city tier, monthly expenditure, and dating-app use—contributed non-trivially to the final model. Interpretation in relation to previous work Our findings are consistent with prior studies linking PSMU to depression, anxiety, stress, loneliness, and broader psychological distress [ 3 – 7 , 15 ]. In Chinese college samples, stronger PSMU has been associated with depressive outcomes [ 5 ], latent profiles characterized by depression and stress [ 6 ], and reciprocal reinforcement with loneliness over time [ 7 ]. The prominence of DASS_any in the present model reinforces the view that affective distress is closely intertwined with problematic social-media engagement. At the same time, the strong role of social media use frequency and platform count indicates that behavioral exposure still matters when considered alongside psychosocial variables. The advantage of XGBoost and related ensemble models is also plausible from a substantive perspective. PSMU is unlikely to arise from a single linear pathway. Instead, it is probably shaped by threshold effects (for example, moving from ‘often’ to ‘always’ using social media), behavioral clustering (such as concurrent dating-app use, smoking, and drinking), and contextual interactions (such as the interplay between city tier and spending power). Tree-based ensembles are better suited than standard logistic regression to capture such non-linearity and interaction structure without requiring investigators to pre-specify numerous cross-terms. The SHAP results additionally offer behaviorally meaningful insights. Social media use frequency dominated the global ranking, which is unsurprising given that compulsive frequency is conceptually adjacent to PSMU. DASS_any emerged as the next most important contributor, underscoring that students with substantial emotional distress are more likely to occupy the high-risk region of the model. Platform count may reflect breadth of digital immersion, while dating app use may capture a more reward-driven and potentially more compulsive pattern of online engagement. The positive contributions of drinking and smoking may indicate broader behavioral risk clustering rather than any direct effect of substance use itself. Conversely, the negative SHAP tendency for sharing health information online suggests that some forms of digital engagement may be more purposeful or informational and therefore less tightly linked to compulsive social-media involvement. Strengths and limitations This study has several strengths. It used a relatively large analytic sample with balanced outcome classes, minimizing severe class-imbalance problems. It compared multiple algorithm families rather than reporting a single model. It also combined LASSO feature selection with SHAP-based interpretation, improving both parsimony and transparency. Finally, participants were recruited from a broad range of institutions, providing wider contextual diversity than a single-campus study. The study also has important limitations. First, recruitment was convenience-based and strongly concentrated in the Yangtze River Delta, so the sample is not nationally representative. Second, the design was cross-sectional, which means the model identifies correlates of concurrent screening-positive potential PSMU rather than predictors of future onset. Third, the outcome threshold of BSMAS ≥ 19 is best interpreted as a screening threshold for potential PSMU. A stricter clinical cut-off of 24 has been proposed in interview-based research with Chinese adolescents [ 16 ], so the observed prevalence should not be interpreted as the prevalence of a clinically confirmed disorder. Fourth, all measures were self-reported and the study did not include platform-verified behavioral logs. Fifth, this manuscript reports internal validation only. Calibration and transportability should be assessed in future external validation work before any real-world deployment. Finally, because the analysis relied on a cleaned modeling export, some questionnaire modules and web-survey paradata were not included in the predictive modeling stage, and some CHERRIES items could not be reconstructed retrospectively. Implications and future research From an applied perspective, the present model is most defensible as a screening-support tool rather than a diagnostic instrument. University mental-health or student-affairs services could use a short questionnaire containing the retained predictors to flag students for more detailed assessment, digital well-being counseling, or psychoeducation. The explainable structure of the model is important here because it allows practitioners to identify why a student is assigned a higher model-based risk, rather than relying on a purely opaque score. In practice, however, any implementation should be preceded by prospective evaluation, calibration assessment, and external validation in geographically broader samples. Conclusions Among Chinese college students recruited through a multicenter online survey, XGBoost showed the best internal test-set performance for identifying students with screening-positive potential problematic social media use. The most informative features combined digital-exposure measures with mental distress and selected contextual factors. These findings support the potential value of explainable machine learning for university screening and prevention, while also underscoring the need for external validation and cautious interpretation of screening-defined outcomes. Abbreviations AUC Area under the receiver operating characteristic curve BSMAS Bergen Social Media Addiction Scale CHERRIES Checklist for Reporting Results of Internet E-Surveys DASS-21 Depression Anxiety Stress Scale-21 LightGBM Light Gradient Boosting Machine LASSO Least absolute shrinkage and selection operator PSMU Problematic social media use SHAP SHapley Additive exPlanations TRIPOD Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis Declarations Ethics approval and consent to participate The study protocol was approved by the Ethics Committee of Jiangsu College of Nursing (approval No. JSCN-ME-2025070719). All procedures involving human participants were conducted in accordance with the ethical principles of the Declaration of Helsinki (World Medical Association; revised in 2024). Electronic informed consent was obtained from all participants before survey initiation. Consent for publication Not applicable. Competing interests The authors declare that they have no competing interests. Authors' information HZW is with the Department of Social Medicine and Preventive Medicine, Faculty of Medicine, University of Malaya, Kuala Lumpur, Malaysia. QFL is with Jiangsu College of Nursing, Huai'an, Jiangsu, China. Funding Not applicable. Author Contribution HZW wrote the manuscript and performed the formal data analysis and machine-learning modeling. QFL was responsible for the remaining study work, including study oversight, recruitment organization, ethics, interpretation of findings, critical revision of the manuscript, and correspondence. Both authors read and approved the final manuscript. Acknowledgement The authors thank the participating students and the student recruiters who assisted with questionnaire dissemination through university WeChat and QQ groups. The authors are also grateful to Taizhou University, Zhejiang University, Ningbo University, Wuxi University, Shanghai University of Electric Power, East China University of Science and Technology, Nanjing University, Shanghai Ocean University, and Wenzhou University for recruitment support and assistance with study dissemination. Data Availability The deidentified dataset and analytic code are available from the corresponding author on reasonable request, subject to institutional and ethical restrictions. References Andreassen CS, Pallesen S, Griffiths MD. The relationship between addictive use of social media, narcissism, and self-esteem: findings from a large national survey. Addict Behav. 2017;64:287–93. 10.1016/j.addbeh.2016.03.006 . Bányai F, Zsila Á, Király O, Maraz A, Elekes Z, Griffiths MD, et al. Problematic social media use: results from a large-scale nationally representative adolescent sample. PLoS ONE. 2017;12(1):e0169839. 10.1371/journal.pone.0169839 . Lopes LS, Valentini JP, Monteiro TH, Costacurta MCF, Soares LON, Telfar-Barnard L, et al. Problematic social media use and its relationship with depression or anxiety: a systematic review. Cyberpsychol Behav Soc Netw. 2022;25(11):691–702. 10.1089/cyber.2021.0300 . Shannon H, Bush K, Villeneuve PJ, Hellemans KGC, Guimond S. Problematic social media use in adolescents and young adults: systematic review and meta-analysis. JMIR Ment Health. 2022;9(4):e33450. 10.2196/33450 . Chen Y, Liu X, Chiu DT, Li Y, Mi B, Zhang Y, et al. Problematic social media use and depressive outcomes among college students in China: observational and experimental findings. Int J Environ Res Public Health. 2022;19(9):4937. 10.3390/ijerph19094937 . Cui J, Wang Y, Liu D, Yang H. Depression and stress are associated with latent profiles of problematic social media use among college students. Front Psychiatry. 2023;14:1306152. 10.3389/fpsyt.2023.1306152 . Wu P, Feng R, Zhang J. The relationship between loneliness and problematic social media usage in Chinese university students: a longitudinal study. BMC Psychol. 2024;12:13. 10.1186/s40359-023-01498-4 . Wang K, Shi HS, Geng FL, Zou LQ, Tan SP, Wang Y, et al. Cross-cultural validation of the Depression Anxiety Stress Scale-21 in China. Psychol Assess. 2016;28(5):e88–100. 10.1037/pas0000207 . Li S, Cui G, Kaminga AC, Cheng S, Xu H. Associations between health literacy, eHealth literacy, and COVID-19-related health behaviors among Chinese college students: cross-sectional online study. J Med Internet Res. 2021;23(5):e25600. 10.2196/25600 . Jo S, Pituch KA, Howe N. The relationships between social media and human papillomavirus awareness and knowledge: cross-sectional study. JMIR Public Health Surveill. 2022;8(9):e37274. 10.2196/37274 . Eysenbach G. Improving the quality of web surveys: the Checklist for Reporting Results of Internet E-Surveys (CHERRIES). J Med Internet Res. 2004;6(3):e34. 10.2196/jmir.6.3.e34 . von Elm E, Altman DG, Egger M, Pocock SJ, Gøtzsche PC, Vandenbroucke JP, STROBE Initiative. The Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) statement: guidelines for reporting observational studies. BMJ. 2007;335(7624):806–8. 10.1136/bmj.39335.541782.AD . Collins GS, Reitsma JB, Altman DG, Moons KGM. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD Statement. J Clin Epidemiol. 2015;68(2):134–43. 10.1016/j.jclinepi.2014.11.010 . Lundberg SM, Lee SI. A unified approach to interpreting model predictions. Advances in Neural Information Processing Systems 30. Red Hook (NY): Curran Associates; 2017. pp. 4765–74. Hu N, Xiao X, Soh KL. Association between problematic social media use and psychological distress among college students: a cross-sectional study in China exploring the mediating role of eating disorders. BMJ Open. 2025;15(5):e092863. 10.1136/bmjopen-2024-092863 . Luo T, Qin L, Cheng L, Wang S, Zhu Z, Xu J, et al. Determination the cut-off point for the Bergen social media addiction scale (BSMAS): diagnostic contribution of the six criteria of the components model of addiction for social media disorder. J Behav Addict. 2021;10(2):281–90. 10.1556/2006.2021.00025 . Additional Declarations No competing interests reported. Supplementary Files BMCPublicHealthSupplementaryMaterialsRevised.docx SupplementaryFile1EnglishQuestionnairePSMU.docx Cite Share Download PDF Status: Under Review Version 1 posted Reviews received at journal 23 Apr, 2026 Reviewers agreed at journal 23 Apr, 2026 Reviewers invited by journal 22 Apr, 2026 Editor assigned by journal 20 Apr, 2026 Editor invited by journal 01 Apr, 2026 Submission checks completed at journal 31 Mar, 2026 First submitted to journal 31 Mar, 2026 You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-9136366","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":632047339,"identity":"af9b83f7-81d3-4549-a348-0660c7642392","order_by":0,"name":"Zhiwen Huang","email":"","orcid":"","institution":"University of Malaya","correspondingAuthor":false,"prefix":"","firstName":"Zhiwen","middleName":"","lastName":"Huang","suffix":""},{"id":632047341,"identity":"120c060d-e737-4c81-bc9f-8d9a7709df3f","order_by":1,"name":"Qiufeng Liu","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAArUlEQVRIiWNgGAWjYBACPgbGBwwP2Gx4+PkbiNTCxsBswJDAliYjOeMAaVoO2xg0JBCrhT2Z+UNC2XkeA4YDjB8+5hCjhecxm0TCuds85swNzJIztxGjRSL/GENi220ey4YDbMy8xGkBOiyx7RyPwYEE4rUwSCS2HSBFC8QvyTySMw42E+cXflCIfSizs+fnbz744SMxWhgYEmAMxgai1CNrGQWjYBSMglGAAwAA3cwxUBsFg7sAAAAASUVORK5CYII=","orcid":"","institution":"Jiangsu College of Nursing","correspondingAuthor":true,"prefix":"","firstName":"Qiufeng","middleName":"","lastName":"Liu","suffix":""}],"badges":[],"createdAt":"2026-03-16 10:12:00","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-9136366/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-9136366/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":108409381,"identity":"da8aedc7-dd1e-47ee-a8fd-24b82505e841","added_by":"auto","created_at":"2026-05-04 10:00:18","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":124169,"visible":true,"origin":"","legend":"\u003cp\u003eLASSO cross-validation curve and coefficient path. Panel A shows the cross-validated loss across log10(λ) values. The dashed red vertical line indicates the selected penalty parameter. Panel B shows coefficient trajectories for the predictors as penalization increases; the top axis indicates the number of retained features.\u003c/p\u003e","description":"","filename":"Figure1LASSO.png","url":"https://assets-eu.researchsquare.com/files/rs-9136366/v1/e6f1a8aecd425e8adea5280b.png"},{"id":108409379,"identity":"b723576e-cf43-41b2-b696-edf0f33c6b18","added_by":"auto","created_at":"2026-05-04 10:00:17","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":336475,"visible":true,"origin":"","legend":"\u003cp\u003eReceiver operating characteristic curves for all candidate models evaluated on the independent test set. Curves are shown for logistic regression, random forest, XGBoost, LightGBM, Gaussian naive Bayes, and k-nearest neighbors.\u003c/p\u003e","description":"","filename":"Figure2ROCallmodels.png","url":"https://assets-eu.researchsquare.com/files/rs-9136366/v1/0b2011f8208484fada693241.png"},{"id":108493292,"identity":"40f607e4-f5a1-4b99-9dfc-1bfc664ee63b","added_by":"auto","created_at":"2026-05-05 09:59:51","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":484178,"visible":true,"origin":"","legend":"\u003cp\u003eSHAP beeswarm plot for the best-performing XGBoost model. Each point corresponds to one participant. The x-axis represents the SHAP value (impact on model output), and color represents the relative feature value from low to high.\u003c/p\u003e","description":"","filename":"Figure3SHAPbeeswarm.png","url":"https://assets-eu.researchsquare.com/files/rs-9136366/v1/b0b314b2977f2961c9a22279.png"},{"id":108804390,"identity":"208a19d5-9ddc-4455-8276-6cc52bfa0c04","added_by":"auto","created_at":"2026-05-08 15:20:05","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":1051704,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-9136366/v1/2d9ff2ec-5653-47de-b8cc-e1398f2f4100.pdf"},{"id":108409378,"identity":"1534ecf7-f800-42ed-ac53-9bf36a4e945a","added_by":"auto","created_at":"2026-05-04 10:00:17","extension":"docx","order_by":0,"title":"","display":"","copyAsset":false,"role":"supplement","size":1411590,"visible":true,"origin":"","legend":"","description":"","filename":"BMCPublicHealthSupplementaryMaterialsRevised.docx","url":"https://assets-eu.researchsquare.com/files/rs-9136366/v1/80b9199bf8e28830d3bda855.docx"},{"id":108409382,"identity":"a7d70cdc-d3a7-4ce0-b3f3-d63f204e71d7","added_by":"auto","created_at":"2026-05-04 10:00:18","extension":"docx","order_by":4,"title":"","display":"","copyAsset":false,"role":"supplement","size":41801,"visible":true,"origin":"","legend":"","description":"","filename":"SupplementaryFile1EnglishQuestionnairePSMU.docx","url":"https://assets-eu.researchsquare.com/files/rs-9136366/v1/dc63152ecce319106757876c.docx"}],"financialInterests":"No competing interests reported.","formattedTitle":"Explainable machine learning-based identification of Chinese college students at risk of problematic social media use: a multicenter cross-sectional online survey","fulltext":[{"header":"Background","content":"\u003cp\u003eSocial media is deeply embedded in the everyday lives of college students. For most users, social networking platforms serve adaptive functions such as maintaining relationships, entertainment, and information seeking. However, a subgroup of users displays excessive, compulsive, and functionally impairing patterns of engagement that are commonly described as problematic social media use (PSMU) or addictive social media use [\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e, \u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eResearch over the past decade has linked PSMU to depressive and anxious symptoms, stress, loneliness, disturbed sleep, and reduced well-being. Recent reviews suggest that these relationships may be bidirectional and are often stronger for problematic or emotionally dysregulated use than for simple time spent online [\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e, \u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e]. Among Chinese college students, PSMU has been associated with depressive outcomes, stress-related latent risk profiles, reciprocal dynamics with loneliness, and broader psychological distress [\u003cspan additionalcitationids=\"CR6\" citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e, \u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e15\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eDespite this expanding literature, most studies remain explanatory rather than predictive. Conventional regression models are valuable for estimating adjusted associations, but they may not adequately capture non-linear effects, threshold relationships, and interaction structures among social-media exposure, living environment, socioeconomic position, and mental distress. For screening-oriented tasks, machine-learning methods may therefore offer practical advantages, provided that they are transparently reported and carefully interpreted [\u003cspan additionalcitationids=\"CR12 CR13\" citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eThis study used a multicenter online sample of Chinese college students to build and compare six supervised machine-learning models for identifying students at elevated cross-sectional risk of potential PSMU. We further used SHapley Additive exPlanations (SHAP) to interpret the best-performing model. Our aims were to: (1) develop and internally validate an interpretable machine-learning model for PSMU screening; (2) compare the performance of linear, probabilistic, instance-based, and tree-based methods; and (3) identify the most influential predictors from harmonized demographic, behavioral, and mental-health indicators.\u003c/p\u003e"},{"header":"Methods","content":"\u003cdiv id=\"Sec3\" class=\"Section2\"\u003e \u003ch2\u003eStudy design, participants, and data quality control\u003c/h2\u003e \u003cp\u003eThis study used a multicenter cross-sectional online survey design. The survey was conducted between September and October 2025 through the Wenjuanxing platform. Reporting of the web-based survey was informed, where applicable, by the CHERRIES and STROBE recommendations, and model reporting was structured in line with core TRIPOD principles [\u003cspan additionalcitationids=\"CR12\" citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eThe parent questionnaire was assembled specifically for this multicenter survey and had not previously been published as a standalone instrument. It combined previously published scales with study-developed or adapted items. Specifically, the Chinese Bergen Social Media Addiction Scale (BSMAS) and the Chinese Depression Anxiety Stress Scale-21 (DASS-21) were administered in their established published forms [\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e, \u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e, \u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e], whereas demographic, socioeconomic, pre-college environment, social-media-use pattern, and online health behavior items were drafted or adapted by the research team for the broader student-health survey after review of related questionnaires [\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e, \u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e]. An English-language version of the questionnaire sections relevant to the present analysis is provided as Supplementary File 1.\u003c/p\u003e \u003cp\u003eThe survey link was disseminated by student recruiters through university-related WeChat and QQ groups. Eligible participants were current college or university students in China who were able to complete the questionnaire online and provide electronic informed consent. The study protocol was approved by the Ethics Committee of Jiangsu College of Nursing (approval No. JSCN-ME-2025070719).\u003c/p\u003e \u003cp\u003eQuestionnaires were screened for internal consistency before analysis. Responses with logic conflicts, obvious contradictions, or implausible completion patterns were deleted during data cleaning. Missing data had already been resolved in the cleaned export used for modeling, and no missing values remained in the final analytic dataset. The final dataset contained 2,413 valid participants, including 1,192 screening-positive outcome events. Institution-name responses indicated broad institutional coverage spanning universities and vocational colleges, with recruitment concentrated in Zhejiang, Jiangsu, and Shanghai; accordingly, the survey should be interpreted as multicenter but regionally clustered rather than nationally representative.\u003c/p\u003e \u003cp\u003eFor model development, the cleaned dataset was randomly divided into a training set (70%, n\u0026thinsp;=\u0026thinsp;1,689) and an independent test set (30%, n\u0026thinsp;=\u0026thinsp;724) using stratified random sampling by outcome. Because the analysis was prediction-oriented and the class distribution was nearly balanced across the split, the training and test sets are presented descriptively rather than as substantive comparison groups.\u003c/p\u003e \u003c/div\u003e\n\u003ch3\u003eOutcome and predictor measurement\u003c/h3\u003e\n\u003cp\u003eThe full parent questionnaire included modules on demographics, socioeconomic position, social media use, health-related online behavior, mental health, sexual behavior, condom attitudes, STI awareness, eHealth literacy, sexual health literacy, and HPV-related knowledge. The present prediction analysis did not use the entire parent survey. Instead, it focused on a prespecified subset of harmonized variables available in the cleaned analytic dataset and directly relevant to the PSMU modeling task. Additional modules from the broader survey were not analyzed in this manuscript.\u003c/p\u003e \u003cp\u003eThe outcome was screening-positive potential PSMU, operationalized as a Bergen Social Media Addiction Scale (BSMAS) total score of 19 or higher. The BSMAS is a six-item instrument that captures salience, mood modification, tolerance, withdrawal, conflict, and relapse, with a total score ranging from 6 to 30 [\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e, \u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e]. In this study, the threshold of 19 was treated as a sensitivity-oriented screening cut-off for potential PSMU rather than a clinical diagnosis.\u003c/p\u003e \u003cp\u003eEighteen candidate predictors were taken forward into the modeling pipeline: sex, grade level, university city tier, only-child status, highest parental education, monthly expenditure, family structure, residence before college, sexual orientation, smoking, drinking, social media use frequency, platform count, short-video platform use, dating app use, sharing health information online, seeking health information online, and a binary DASS-21-derived distress indicator.\u003c/p\u003e \u003cp\u003eMental distress was derived from the Depression Anxiety Stress Scale-21 (DASS-21), a widely used instrument with validated Chinese psychometric properties [\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e]. DASS was coded as 1 when at least one DASS subscale reached moderate or greater severity and 0 otherwise. University city tier was derived from the Chinese city-tier classification based on the location of the participant's institution. The original underlying classification distinguishes tier 1, new tier 1, tier 2, tier 3, tier 4, and tier 5 cities; for the cleaned analytic dataset, these were collapsed into an ordinal three-level variable (1\u0026thinsp;=\u0026thinsp;tier 3/4/5 city, 2\u0026thinsp;=\u0026thinsp;tier 2 city, 3\u0026thinsp;=\u0026thinsp;tier 1/new tier 1 city). For descriptive display in Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e, the categories are labeled as tier 3 or below, tier 2, and tier 1/new tier 1. Because these variables were used for risk classification rather than causal inference, they are interpreted as candidate predictive features rather than causal exposures.\u003c/p\u003e \u003cdiv id=\"Sec5\" class=\"Section2\"\u003e \u003ch2\u003eStatistical analysis\u003c/h2\u003e \u003cp\u003eContinuous or count variables are presented as mean (standard deviation), and categorical variables as number (percentage). No hypothesis tests are reported for training-versus-test comparisons because the split was algorithmic and stratified by outcome rather than substantively meaningful. Least absolute shrinkage and selection operator (LASSO) logistic regression with internal cross-validation was used to reduce dimensionality and identify features with non-zero penalized coefficients.\u003c/p\u003e \u003cp\u003eSix supervised algorithms were then developed and compared: logistic regression, random forest, XGBoost, LightGBM, Gaussian naive Bayes, and k-nearest neighbors. Hyperparameters were tuned within the training set, and final performance was evaluated only on the independent test set. Model performance was assessed using the area under the receiver operating characteristic curve (AUC), accuracy, precision, recall, and F1-score. The best-performing model was interpreted using SHapley Additive exPlanations (SHAP), which quantify the contribution of each feature to model output at both global and individual-instance levels [\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e]. SHAP values were interpreted as model-explanation quantities, not as causal effect estimates.\u003c/p\u003e \u003cp\u003eAll analyses were performed in Python using scikit-learn, XGBoost, LightGBM, and SHAP. During manuscript preparation, a generative AI tool (ChatGPT, OpenAI) was used to support English-language polishing and manuscript organization. The authors retained full responsibility for the study design, analytic decisions, data interpretation, and final approval of the manuscript.\u003c/p\u003e \u003c/div\u003e"},{"header":"Results","content":"\u003cdiv id=\"Sec7\" class=\"Section2\"\u003e \u003ch2\u003eSample characteristics\u003c/h2\u003e \u003cp\u003eA total of 2,413 valid questionnaires were included in analysis, with 1,689 observations in the training set and 724 in the test set. The outcome distribution was essentially identical across splits: 49.4% in the full sample, 49.4% in the training set, and 49.4% in the test set.\u003c/p\u003e \u003cp\u003eThe descriptive distribution of the analytic variables is shown in Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e. The sample was predominantly male (56.9%), undergraduate (freshman/sophomore 43.7%; junior/senior/fifth-year 43.8%), and enrolled at institutions located in tier 1/new tier 1 cities (68.0%). Most participants reported a complete family structure (86.0%), urban residence before college (57.8%), short-video platform use (90.0%), and online health information seeking in the prior 12 months (69.9%). More than half met the DASS criterion (57.9%).\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab1\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 1\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eDescriptive characteristics of the analytic sample.\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"5\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eVariable\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eCategory\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eOverall (N\u0026thinsp;=\u0026thinsp;2413)\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eTraining set (n\u0026thinsp;=\u0026thinsp;1689)\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003eTest set (n\u0026thinsp;=\u0026thinsp;724)\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eSex\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eFemale\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e1040 (43.1%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e736 (43.6%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e304 (42.0%)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eMale\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e1373 (56.9%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e953 (56.4%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e420 (58.0%)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eGrade level\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eFreshman/Sophomore\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e1054 (43.7%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e724 (42.9%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e330 (45.6%)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eJunior/Senior/Fifth-year\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e1058 (43.8%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e748 (44.3%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e310 (42.8%)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eMaster/Doctoral\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e301 (12.5%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e217 (12.8%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e84 (11.6%)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eUniversity city tier\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eTier 3 or below\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e250 (10.4%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e182 (10.8%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e68 (9.4%)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eTier 2\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e523 (21.7%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e383 (22.7%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e140 (19.3%)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eTier 1/New Tier 1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e1640 (68.0%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e1124 (66.5%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e516 (71.3%)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eOnly-child status\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eNo\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e1249 (51.8%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e882 (52.2%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e367 (50.7%)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eYes\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e1164 (48.2%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e807 (47.8%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e357 (49.3%)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eHighest parental education\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eJunior high or below\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e525 (21.8%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e383 (22.7%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e142 (19.6%)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eHigh school/technical secondary\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e1137 (47.1%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e786 (46.5%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e351 (48.5%)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eCollege or above\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e751 (31.1%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e520 (30.8%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e231 (31.9%)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eMonthly expenditure\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e\u0026lt;CNY 1500\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e182 (7.5%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e134 (7.9%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e48 (6.6%)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eCNY 1500\u0026ndash;2000\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e958 (39.7%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e661 (39.1%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e297 (41.0%)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eCNY 2000\u0026ndash;3000\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e939 (38.9%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e653 (38.7%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e286 (39.5%)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eCNY 3000\u0026ndash;4000\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e261 (10.8%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e185 (11.0%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e76 (10.5%)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e\u0026gt;CNY 4000\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e73 (3.0%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e56 (3.3%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e17 (2.3%)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eFamily structure\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eNon-intact family\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e337 (14.0%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e229 (13.6%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e108 (14.9%)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eComplete family\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e2076 (86.0%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e1460 (86.4%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e616 (85.1%)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eResidence before college\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eRural/suburban county\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e1019 (42.2%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e732 (43.3%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e287 (39.6%)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eUrban\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e1394 (57.8%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e957 (56.7%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e437 (60.4%)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eSexual orientation\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eOther\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e194 (8.0%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e137 (8.1%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e57 (7.9%)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eHeterosexual\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e2219 (92.0%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e1552 (91.9%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e667 (92.1%)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eSmoking\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eNo\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e1460 (60.5%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e1021 (60.4%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e439 (60.6%)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eYes\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e953 (39.5%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e668 (39.6%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e285 (39.4%)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eDrinking\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eNo\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e1979 (82.0%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e1379 (81.6%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e600 (82.9%)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eYes\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e434 (18.0%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e310 (18.4%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e124 (17.1%)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eSocial media use frequency\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eRarely\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e108 (4.5%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e79 (4.7%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e29 (4.0%)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eSometimes\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e339 (14.0%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e242 (14.3%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e97 (13.4%)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eOften\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e889 (36.8%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e615 (36.4%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e274 (37.8%)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eAlways\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e1077 (44.6%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e753 (44.6%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e324 (44.8%)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003ePlatform count\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eMean (SD)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e4.54 (1.62)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e4.54 (1.61)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e4.53 (1.65)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eShort-video platform use\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eNo\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e241 (10.0%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e175 (10.4%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e66 (9.1%)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eYes\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e2172 (90.0%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e1514 (89.6%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e658 (90.9%)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eDating app use\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eNo\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e1829 (75.8%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e1280 (75.8%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e549 (75.8%)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eYes\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e584 (24.2%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e409 (24.2%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e175 (24.2%)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eShared health information online\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eNo\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e1587 (65.8%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e1102 (65.2%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e485 (67.0%)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eYes\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e826 (34.2%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e587 (34.8%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e239 (33.0%)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eSought health information online\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eNo\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e726 (30.1%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e481 (28.5%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e245 (33.8%)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eYes\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e1687 (69.9%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e1208 (71.5%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e479 (66.2%)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eDASS_any\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eNo\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e1017 (42.1%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e713 (42.2%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e304 (42.0%)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eYes\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e1396 (57.9%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e976 (57.8%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e420 (58.0%)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003ePotential PSMU (BSMAS\u0026thinsp;\u0026ge;\u0026thinsp;19)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eNo\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e1221 (50.6%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e855 (50.6%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e366 (50.6%)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eYes\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e1192 (49.4%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e834 (49.4%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e358 (49.4%)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003ctfoot\u003e \u003ctr\u003e\u003ctd colspan=\"5\"\u003e\u003cb\u003eNote\u003c/b\u003e: No p values are shown for training-versus-test comparisons because the split was algorithmic and stratified by outcome. DASS_any indicates that at least one DASS-21 subscale was in the moderate-or-higher range. University city tier was based on the Chinese city-tier classification. The underlying classification distinguishes tier 1, new tier 1, tier 2, tier 3, tier 4, and tier 5 cities; for the cleaned analytic dataset, this was collapsed to an ordinal three-level variable (1\u0026thinsp;=\u0026thinsp;tier 3/4/5 city, 2\u0026thinsp;=\u0026thinsp;tier 2 city, 3\u0026thinsp;=\u0026thinsp;tier 1/new tier 1 city). For descriptive display in this table, categories are labeled as tier 3 or below, tier 2, and tier 1/new tier 1. PSMU\u0026thinsp;=\u0026thinsp;problematic social media use.\u003c/td\u003e\u003c/tr\u003e \u003c/tfoot\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec8\" class=\"Section2\"\u003e \u003ch2\u003eLASSO feature selection\u003c/h2\u003e \u003cp\u003eThe LASSO stage included 18 candidate predictors. Sixteen retained non-zero penalized coefficients, whereas sexual orientation and short-video platform use were shrunk to zero in the final penalized solution. The variables with the largest absolute positive coefficients were social media use frequency, DASS, drinking, dating app use, and smoking. Residence before college and sharing health information online showed negative penalized coefficients. Figure\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e presents the cross-validation curve and coefficient path, and Table\u0026nbsp;\u003cspan refid=\"Tab2\" class=\"InternalRef\"\u003e2\u003c/span\u003e lists the entered features and coefficient directions.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab2\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 2\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003ePredictors entered into the LASSO stage and final penalized coefficients.\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"4\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eFeature\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eCoefficient\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eDirection\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eSelected\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eSocial media use frequency\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.815\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003ePositive\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eYes\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003ePlatform count\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.087\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003ePositive\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eYes\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eMonthly expenditure\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.274\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003ePositive\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eYes\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eUniversity city tier\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.216\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003ePositive\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eYes\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eGrade level\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.073\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003ePositive\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eYes\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eHighest parental education\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.028\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003ePositive\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eYes\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eDating app use\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.558\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003ePositive\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eYes\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eDASS\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.763\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003ePositive\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eYes\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eDrinking\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.682\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003ePositive\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eYes\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eSmoking\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.406\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003ePositive\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eYes\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eOnly-child status\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.350\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003ePositive\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eYes\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eResidence before college\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e-0.513\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eNegative\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eYes\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eFamily structure\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.374\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003ePositive\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eYes\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eSexual orientation\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.000\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eZero\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eNo\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eShort-video platform use\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.000\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eZero\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eNo\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eSex\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e-0.131\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eNegative\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eYes\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eShared health information online\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e-0.427\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eNegative\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eYes\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eSought health information online\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.001\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003ePositive\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eYes\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003ctfoot\u003e \u003ctr\u003e\u003ctd colspan=\"4\"\u003e\u003cb\u003eNote\u003c/b\u003e: Coefficients are penalized model coefficients and should be interpreted as contributions within a predictive model, not as causal effect sizes.\u003c/td\u003e\u003c/tr\u003e \u003c/tfoot\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003c/div\u003e\n\u003ch3\u003eModel performance\u003c/h3\u003e\n\u003cp\u003eTest-set performance for all candidate algorithms is shown in Table\u0026nbsp;\u003cspan refid=\"Tab3\" class=\"InternalRef\"\u003e3\u003c/span\u003e and Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003e. XGBoost achieved the best overall performance, with an AUC of 0.851, accuracy of 0.769, precision of 0.742, recall of 0.818, and F1-score of 0.778. Random forest and LightGBM performed similarly and also outperformed logistic regression, Gaussian naive Bayes, and k-nearest neighbors on discrimination and F1-score.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eTaken together, the results suggest that tree-based ensemble methods better captured the structure of these data than either the linear baseline model or simpler probabilistic and instance-based methods. At the same time, performance differences among the top three models were modest, which argues against overinterpreting a single \u0026lsquo;winning\u0026rsquo; algorithm and instead supports the broader conclusion that non-linear ensemble learners were preferable in this prediction task.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab3\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 3\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003ePredictive performance of the candidate models on the independent test set.\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"6\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eModel\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eAUC\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eAccuracy\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003ePrecision\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003eRecall\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c6\"\u003e \u003cp\u003eF1-score\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eLogistic regression\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.791\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.698\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.690\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.704\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.697\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eRandom forest\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.847\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.762\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.733\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.818\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.773\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eXGBoost\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.851\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.769\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.742\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.818\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.778\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eLightGBM\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.843\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.764\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.736\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.816\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.774\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eGaussian naive Bayes\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.802\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.728\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.751\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.673\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.710\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003ek-nearest neighbors\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.785\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.700\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.670\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.777\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.719\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003ctfoot\u003e \u003ctr\u003e\u003ctd colspan=\"6\"\u003e\u003cb\u003eNote\u003c/b\u003e: AUC\u0026thinsp;=\u0026thinsp;area under the receiver operating characteristic curve.\u003c/td\u003e\u003c/tr\u003e \u003c/tfoot\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e\n\u003ch3\u003eExplainable machine-learning results\u003c/h3\u003e\n\u003cp\u003eSHAP analyses of the XGBoost model identified social media use frequency as the dominant driver of model output, followed by DASS, platform count, university city tier, drinking, dating/stranger-social app use, and monthly expenditure (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003e; Supplementary Figures \u003cspan refid=\"MOESM1\" class=\"InternalRef\"\u003eS1\u003c/span\u003e-S3). The beeswarm pattern indicates that higher social media use frequency and positive DASS status generally shifted predictions toward the PSMU-positive class.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eSeveral additional patterns are noteworthy. Greater platform count, drinking, smoking, and dating-app use tended to increase the model output for many participants, whereas urban residence before college and sharing health information online tended to shift predictions downward in substantial subsets of students. The comparatively small spread of SHAP values for sex and sought health information online suggests that these variables contributed limited incremental information after the dominant social-media and psychosocial features were accounted for.\u003c/p\u003e"},{"header":"Discussion","content":"\u003cdiv id=\"Sec12\" class=\"Section2\"\u003e \u003ch2\u003ePrincipal findings\u003c/h2\u003e \u003cp\u003eThis study developed and internally validated an explainable machine-learning approach to identify Chinese college students at elevated cross-sectional risk of potential PSMU. Three findings are particularly notable. First, tree-based ensemble learners (XGBoost, random forest, and LightGBM) consistently outperformed logistic regression, Gaussian naive Bayes, and k-nearest neighbors. Second, the most influential predictors were not purely demographic; rather, they combined behavioral exposure to social media (frequency and platform count), mental distress, and selected contextual or lifestyle variables. Third, several predictors that are often treated as background covariates\u0026mdash;such as university city tier, monthly expenditure, and dating-app use\u0026mdash;contributed non-trivially to the final model.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec13\" class=\"Section2\"\u003e \u003ch2\u003eInterpretation in relation to previous work\u003c/h2\u003e \u003cp\u003eOur findings are consistent with prior studies linking PSMU to depression, anxiety, stress, loneliness, and broader psychological distress [\u003cspan additionalcitationids=\"CR4 CR5 CR6\" citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e, \u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e15\u003c/span\u003e]. In Chinese college samples, stronger PSMU has been associated with depressive outcomes [\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e], latent profiles characterized by depression and stress [\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e], and reciprocal reinforcement with loneliness over time [\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e]. The prominence of DASS_any in the present model reinforces the view that affective distress is closely intertwined with problematic social-media engagement. At the same time, the strong role of social media use frequency and platform count indicates that behavioral exposure still matters when considered alongside psychosocial variables.\u003c/p\u003e \u003cp\u003eThe advantage of XGBoost and related ensemble models is also plausible from a substantive perspective. PSMU is unlikely to arise from a single linear pathway. Instead, it is probably shaped by threshold effects (for example, moving from \u0026lsquo;often\u0026rsquo; to \u0026lsquo;always\u0026rsquo; using social media), behavioral clustering (such as concurrent dating-app use, smoking, and drinking), and contextual interactions (such as the interplay between city tier and spending power). Tree-based ensembles are better suited than standard logistic regression to capture such non-linearity and interaction structure without requiring investigators to pre-specify numerous cross-terms.\u003c/p\u003e \u003cp\u003eThe SHAP results additionally offer behaviorally meaningful insights. Social media use frequency dominated the global ranking, which is unsurprising given that compulsive frequency is conceptually adjacent to PSMU. DASS_any emerged as the next most important contributor, underscoring that students with substantial emotional distress are more likely to occupy the high-risk region of the model. Platform count may reflect breadth of digital immersion, while dating app use may capture a more reward-driven and potentially more compulsive pattern of online engagement. The positive contributions of drinking and smoking may indicate broader behavioral risk clustering rather than any direct effect of substance use itself. Conversely, the negative SHAP tendency for sharing health information online suggests that some forms of digital engagement may be more purposeful or informational and therefore less tightly linked to compulsive social-media involvement.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec14\" class=\"Section2\"\u003e \u003ch2\u003eStrengths and limitations\u003c/h2\u003e \u003cp\u003eThis study has several strengths. It used a relatively large analytic sample with balanced outcome classes, minimizing severe class-imbalance problems. It compared multiple algorithm families rather than reporting a single model. It also combined LASSO feature selection with SHAP-based interpretation, improving both parsimony and transparency. Finally, participants were recruited from a broad range of institutions, providing wider contextual diversity than a single-campus study.\u003c/p\u003e \u003cp\u003eThe study also has important limitations. First, recruitment was convenience-based and strongly concentrated in the Yangtze River Delta, so the sample is not nationally representative. Second, the design was cross-sectional, which means the model identifies correlates of concurrent screening-positive potential PSMU rather than predictors of future onset. Third, the outcome threshold of BSMAS\u0026thinsp;\u0026ge;\u0026thinsp;19 is best interpreted as a screening threshold for potential PSMU. A stricter clinical cut-off of 24 has been proposed in interview-based research with Chinese adolescents [\u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e], so the observed prevalence should not be interpreted as the prevalence of a clinically confirmed disorder. Fourth, all measures were self-reported and the study did not include platform-verified behavioral logs. Fifth, this manuscript reports internal validation only. Calibration and transportability should be assessed in future external validation work before any real-world deployment. Finally, because the analysis relied on a cleaned modeling export, some questionnaire modules and web-survey paradata were not included in the predictive modeling stage, and some CHERRIES items could not be reconstructed retrospectively.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec15\" class=\"Section2\"\u003e \u003ch2\u003eImplications and future research\u003c/h2\u003e \u003cp\u003eFrom an applied perspective, the present model is most defensible as a screening-support tool rather than a diagnostic instrument. University mental-health or student-affairs services could use a short questionnaire containing the retained predictors to flag students for more detailed assessment, digital well-being counseling, or psychoeducation. The explainable structure of the model is important here because it allows practitioners to identify why a student is assigned a higher model-based risk, rather than relying on a purely opaque score. In practice, however, any implementation should be preceded by prospective evaluation, calibration assessment, and external validation in geographically broader samples.\u003c/p\u003e \u003c/div\u003e"},{"header":"Conclusions","content":"\u003cp\u003eAmong Chinese college students recruited through a multicenter online survey, XGBoost showed the best internal test-set performance for identifying students with screening-positive potential problematic social media use. The most informative features combined digital-exposure measures with mental distress and selected contextual factors. These findings support the potential value of explainable machine learning for university screening and prevention, while also underscoring the need for external validation and cautious interpretation of screening-defined outcomes.\u003c/p\u003e"},{"header":"Abbreviations","content":" \u003cdiv class=\"DefinitionListEntry\"\u003e \u003cdiv class=\"Term\"\u003eAUC\u003c/div\u003e \u003cdiv class=\"Description\"\u003e \u003cp\u003eArea under the receiver operating characteristic curve\u003c/p\u003e \u003c/div\u003e \u003c/div\u003e \u003cdiv class=\"DefinitionListEntry\"\u003e \u003cdiv class=\"Term\"\u003eBSMAS\u003c/div\u003e \u003cdiv class=\"Description\"\u003e \u003cp\u003eBergen Social Media Addiction Scale\u003c/p\u003e \u003c/div\u003e \u003c/div\u003e \u003cdiv class=\"DefinitionListEntry\"\u003e \u003cdiv class=\"Term\"\u003eCHERRIES\u003c/div\u003e \u003cdiv class=\"Description\"\u003e \u003cp\u003eChecklist for Reporting Results of Internet E-Surveys\u003c/p\u003e \u003c/div\u003e \u003c/div\u003e \u003cdiv class=\"DefinitionListEntry\"\u003e \u003cdiv class=\"Term\"\u003eDASS-21\u003c/div\u003e \u003cdiv class=\"Description\"\u003e \u003cp\u003eDepression Anxiety Stress Scale-21\u003c/p\u003e \u003c/div\u003e \u003c/div\u003e \u003cdiv class=\"DefinitionListEntry\"\u003e \u003cdiv class=\"Term\"\u003eLightGBM\u003c/div\u003e \u003cdiv class=\"Description\"\u003e \u003cp\u003eLight Gradient Boosting Machine\u003c/p\u003e \u003c/div\u003e \u003c/div\u003e \u003cdiv class=\"DefinitionListEntry\"\u003e \u003cdiv class=\"Term\"\u003eLASSO\u003c/div\u003e \u003cdiv class=\"Description\"\u003e \u003cp\u003eLeast absolute shrinkage and selection operator\u003c/p\u003e \u003c/div\u003e \u003c/div\u003e \u003cdiv class=\"DefinitionListEntry\"\u003e \u003cdiv class=\"Term\"\u003ePSMU\u003c/div\u003e \u003cdiv class=\"Description\"\u003e \u003cp\u003eProblematic social media use\u003c/p\u003e \u003c/div\u003e \u003c/div\u003e \u003cdiv class=\"DefinitionListEntry\"\u003e \u003cdiv class=\"Term\"\u003eSHAP\u003c/div\u003e \u003cdiv class=\"Description\"\u003e \u003cp\u003eSHapley Additive exPlanations\u003c/p\u003e \u003c/div\u003e \u003c/div\u003e \u003cdiv class=\"DefinitionListEntry\"\u003e \u003cdiv class=\"Term\"\u003eTRIPOD\u003c/div\u003e \u003cdiv class=\"Description\"\u003e \u003cp\u003eTransparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis\u003c/p\u003e \u003c/div\u003e \u003c/div\u003e \u003c/div\u003e"},{"header":"Declarations","content":" \u003cp\u003e \u003cstrong\u003eEthics approval and consent to participate\u003c/strong\u003e \u003cp\u003eThe study protocol was approved by the Ethics Committee of Jiangsu College of Nursing (approval No. JSCN-ME-2025070719). All procedures involving human participants were conducted in accordance with the ethical principles of the Declaration of Helsinki (World Medical Association; revised in 2024). Electronic informed consent was obtained from all participants before survey initiation.\u003c/p\u003e \u003c/p\u003e \u003cp\u003e \u003cstrong\u003eConsent for publication\u003c/strong\u003e \u003cp\u003eNot applicable.\u003c/p\u003e \u003c/p\u003e\u003cp\u003e \u003ch2\u003eCompeting interests\u003c/h2\u003e \u003cp\u003eThe authors declare that they have no competing interests.\u003c/p\u003e \u003c/p\u003e\u003cp\u003e \u003ch2\u003eAuthors' information\u003c/h2\u003e \u003cp\u003eHZW is with the Department of Social Medicine and Preventive Medicine, Faculty of Medicine, University of Malaya, Kuala Lumpur, Malaysia. QFL is with Jiangsu College of Nursing, Huai'an, Jiangsu, China.\u003c/p\u003e \u003c/p\u003e\u003ch2\u003eFunding\u003c/h2\u003e \u003cp\u003eNot applicable.\u003c/p\u003e\u003ch2\u003eAuthor Contribution\u003c/h2\u003e\u003cp\u003eHZW wrote the manuscript and performed the formal data analysis and machine-learning modeling. QFL was responsible for the remaining study work, including study oversight, recruitment organization, ethics, interpretation of findings, critical revision of the manuscript, and correspondence. Both authors read and approved the final manuscript.\u003c/p\u003e\u003ch2\u003eAcknowledgement\u003c/h2\u003e\u003cp\u003eThe authors thank the participating students and the student recruiters who assisted with questionnaire dissemination through university WeChat and QQ groups. The authors are also grateful to Taizhou University, Zhejiang University, Ningbo University, Wuxi University, Shanghai University of Electric Power, East China University of Science and Technology, Nanjing University, Shanghai Ocean University, and Wenzhou University for recruitment support and assistance with study dissemination.\u003c/p\u003e\u003ch2\u003eData Availability\u003c/h2\u003e\u003cp\u003eThe deidentified dataset and analytic code are available from the corresponding author on reasonable request, subject to institutional and ethical restrictions.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\u003cli\u003e\u003cspan\u003eAndreassen CS, Pallesen S, Griffiths MD. The relationship between addictive use of social media, narcissism, and self-esteem: findings from a large national survey. Addict Behav. 2017;64:287\u0026ndash;93. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1016/j.addbeh.2016.03.006\u003c/span\u003e\u003cspan address=\"10.1016/j.addbeh.2016.03.006\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eB\u0026aacute;nyai F, Zsila \u0026Aacute;, Kir\u0026aacute;ly O, Maraz A, Elekes Z, Griffiths MD, et al. Problematic social media use: results from a large-scale nationally representative adolescent sample. PLoS ONE. 2017;12(1):e0169839. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1371/journal.pone.0169839\u003c/span\u003e\u003cspan address=\"10.1371/journal.pone.0169839\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLopes LS, Valentini JP, Monteiro TH, Costacurta MCF, Soares LON, Telfar-Barnard L, et al. Problematic social media use and its relationship with depression or anxiety: a systematic review. Cyberpsychol Behav Soc Netw. 2022;25(11):691\u0026ndash;702. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1089/cyber.2021.0300\u003c/span\u003e\u003cspan address=\"10.1089/cyber.2021.0300\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eShannon H, Bush K, Villeneuve PJ, Hellemans KGC, Guimond S. Problematic social media use in adolescents and young adults: systematic review and meta-analysis. JMIR Ment Health. 2022;9(4):e33450. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.2196/33450\u003c/span\u003e\u003cspan address=\"10.2196/33450\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eChen Y, Liu X, Chiu DT, Li Y, Mi B, Zhang Y, et al. Problematic social media use and depressive outcomes among college students in China: observational and experimental findings. Int J Environ Res Public Health. 2022;19(9):4937. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.3390/ijerph19094937\u003c/span\u003e\u003cspan address=\"10.3390/ijerph19094937\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eCui J, Wang Y, Liu D, Yang H. Depression and stress are associated with latent profiles of problematic social media use among college students. Front Psychiatry. 2023;14:1306152. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.3389/fpsyt.2023.1306152\u003c/span\u003e\u003cspan address=\"10.3389/fpsyt.2023.1306152\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWu P, Feng R, Zhang J. The relationship between loneliness and problematic social media usage in Chinese university students: a longitudinal study. BMC Psychol. 2024;12:13. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1186/s40359-023-01498-4\u003c/span\u003e\u003cspan address=\"10.1186/s40359-023-01498-4\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWang K, Shi HS, Geng FL, Zou LQ, Tan SP, Wang Y, et al. Cross-cultural validation of the Depression Anxiety Stress Scale-21 in China. Psychol Assess. 2016;28(5):e88\u0026ndash;100. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1037/pas0000207\u003c/span\u003e\u003cspan address=\"10.1037/pas0000207\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLi S, Cui G, Kaminga AC, Cheng S, Xu H. Associations between health literacy, eHealth literacy, and COVID-19-related health behaviors among Chinese college students: cross-sectional online study. J Med Internet Res. 2021;23(5):e25600. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.2196/25600\u003c/span\u003e\u003cspan address=\"10.2196/25600\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eJo S, Pituch KA, Howe N. The relationships between social media and human papillomavirus awareness and knowledge: cross-sectional study. JMIR Public Health Surveill. 2022;8(9):e37274. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.2196/37274\u003c/span\u003e\u003cspan address=\"10.2196/37274\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eEysenbach G. Improving the quality of web surveys: the Checklist for Reporting Results of Internet E-Surveys (CHERRIES). J Med Internet Res. 2004;6(3):e34. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.2196/jmir.6.3.e34\u003c/span\u003e\u003cspan address=\"10.2196/jmir.6.3.e34\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003evon Elm E, Altman DG, Egger M, Pocock SJ, G\u0026oslash;tzsche PC, Vandenbroucke JP, STROBE Initiative. The Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) statement: guidelines for reporting observational studies. BMJ. 2007;335(7624):806\u0026ndash;8. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1136/bmj.39335.541782.AD\u003c/span\u003e\u003cspan address=\"10.1136/bmj.39335.541782.AD\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eCollins GS, Reitsma JB, Altman DG, Moons KGM. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD Statement. J Clin Epidemiol. 2015;68(2):134\u0026ndash;43. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1016/j.jclinepi.2014.11.010\u003c/span\u003e\u003cspan address=\"10.1016/j.jclinepi.2014.11.010\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLundberg SM, Lee SI. A unified approach to interpreting model predictions. Advances in Neural Information Processing Systems 30. Red Hook (NY): Curran Associates; 2017. pp. 4765\u0026ndash;74.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHu N, Xiao X, Soh KL. Association between problematic social media use and psychological distress among college students: a cross-sectional study in China exploring the mediating role of eating disorders. BMJ Open. 2025;15(5):e092863. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1136/bmjopen-2024-092863\u003c/span\u003e\u003cspan address=\"10.1136/bmjopen-2024-092863\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLuo T, Qin L, Cheng L, Wang S, Zhu Z, Xu J, et al. Determination the cut-off point for the Bergen social media addiction scale (BSMAS): diagnostic contribution of the six criteria of the components model of addiction for social media disorder. J Behav Addict. 2021;10(2):281\u0026ndash;90. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1556/2006.2021.00025\u003c/span\u003e\u003cspan address=\"10.1556/2006.2021.00025\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"bmc-public-health","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"pubh","sideBox":"Learn more about [BMC Public Health](http://bmcpublichealth.biomedcentral.com/)","snPcode":"","submissionUrl":"https://www.editorialmanager.com/pubh/default.aspx","title":"BMC Public Health","twitterHandle":"@BMC_series","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"em","reportingPortfolio":"BMC Series","inReviewEnabled":true,"inReviewRevisionsEnabled":true},"keywords":"problematic social media use, Bergen Social Media Addiction Scale, machine learning, XGBoost, SHAP, Chinese college students","lastPublishedDoi":"10.21203/rs.3.rs-9136366/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-9136366/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003ch2\u003eBackground\u003c/h2\u003e \u003cp\u003eProblematic social media use (PSMU) is increasingly recognized as a health-relevant digital behavior among college students and has been linked to psychological distress and poorer well-being. However, few studies have developed interpretable machine learning models to identify Chinese college students at elevated risk of PSMU using multidimensional psychosocial and digital-behavior data.\u003c/p\u003e\u003ch2\u003eMethods\u003c/h2\u003e \u003cp\u003eWe conducted a multicenter cross-sectional online survey in China between September and October 2025 using Wenjuanxing. After logic checks and deletion of invalid questionnaires, 2,413 valid responses remained. The outcome was screening-positive potential PSMU, defined as a Bergen Social Media Addiction Scale total score\u0026thinsp;\u0026ge;\u0026thinsp;19. Eighteen prespecified predictors covering demographics, socioeconomic position, pre-college environment, health behaviors, social-media-use patterns, online health behaviors, and mental distress were modeled. Data were randomly split into training (1,689; 70%) and test (724; 30%) sets using stratified sampling by outcome. LASSO was used for feature selection, six supervised algorithms were compared, and SHAP was used to interpret the best-performing model.\u003c/p\u003e\u003ch2\u003eResults\u003c/h2\u003e \u003cp\u003eOverall, 1,192 participants (49.4%) screened positive for potential PSMU. Of 18 predictors entered into LASSO, 16 retained non-zero coefficients. XGBoost achieved the best test-set discrimination (AUC 0.851), followed closely by random forest (0.847) and LightGBM (0.843). XGBoost also showed the highest F1-score (0.778). SHAP analyses indicated that social media use frequency, DASS_any, platform count, university city tier, drinking, dating/stranger-social app use, and monthly expenditure contributed most to model output.\u003c/p\u003e\u003ch2\u003eConclusions\u003c/h2\u003e \u003cp\u003eExplainable machine learning\u0026mdash;especially XGBoost\u0026mdash;can effectively identify Chinese college students at higher cross-sectional risk of potential PSMU. Social media exposure patterns and mental distress contributed most strongly to prediction. These findings may support targeted screening and digital well-being interventions in university settings, although external validation is still required.\u003c/p\u003e","manuscriptTitle":"Explainable machine learning-based identification of Chinese college students at risk of problematic social media use: a multicenter cross-sectional online survey","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2026-05-04 10:00:13","doi":"10.21203/rs.3.rs-9136366/v1","editorialEvents":[{"type":"communityComments","content":0},{"type":"editorInvitedReview","content":"","date":"2026-04-23T08:07:42+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"134307451520536325246941115787685523581","date":"2026-04-23T04:10:09+00:00","index":"hide","fulltext":""},{"type":"reviewersInvited","content":"","date":"2026-04-22T21:27:48+00:00","index":"","fulltext":""},{"type":"editorAssigned","content":"","date":"2026-04-20T13:00:40+00:00","index":"","fulltext":""},{"type":"editorInvited","content":"","date":"2026-04-01T08:42:24+00:00","index":"","fulltext":""},{"type":"checksComplete","content":"","date":"2026-04-01T02:05:14+00:00","index":"","fulltext":""},{"type":"submitted","content":"BMC Public Health","date":"2026-04-01T02:00:20+00:00","index":"","fulltext":""}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"bmc-public-health","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"pubh","sideBox":"Learn more about [BMC Public Health](http://bmcpublichealth.biomedcentral.com/)","snPcode":"","submissionUrl":"https://www.editorialmanager.com/pubh/default.aspx","title":"BMC Public Health","twitterHandle":"@BMC_series","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"em","reportingPortfolio":"BMC Series","inReviewEnabled":true,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"a778d092-6bd2-43e5-844d-1bb6af8ca506","owner":[],"postedDate":"May 4th, 2026","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"under-review","subjectAreas":[],"tags":[],"updatedAt":"2026-05-04T10:00:13+00:00","versionOfRecord":[],"versionCreatedAt":"2026-05-04 10:00:13","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-9136366","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-9136366","identity":"rs-9136366","version":["v1"]},"buildId":"XKTyCvWXoU3ODBz1xrDgd","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

⚙ Ask this paper AI returns verbatim quotes from the full text · source: preprint-html ⓘ

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2026) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc: last seen: 2026-05-20T01:45:00.602351+00:00