Prediction of urinary tract infection using machine learning methods-A study for finding the most-informative variables

preprint OA: closed
Full text JSON View at publisher
Full text 157,527 characters · extracted from preprint-html · click to expand
Prediction of urinary tract infection using machine learning methods-A study for finding the most-informative variables | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article Prediction of urinary tract infection using machine learning methods-A study for finding the most-informative variables Sajjad Farashi, Hossein Emad Momtaz This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-5107375/v1 This work is licensed under a CC BY 4.0 License Status: Published Journal Publication published 09 Jan, 2025 Read the published version in BMC Medical Informatics and Decision Making → Version 1 posted 4 You are reading this latest preprint version Abstract Background- Urinary tract infection (UTI) is a frequent health-threatening condition. Early reliable diagnosis of UTI helps to prevent misuse or overuse of antibiotics and hence prevent antibiotic resistance. The gold standard for UTI diagnosis is urinalysis which is a time-consuming and also an error prone method. In this regard, complementary methods are demanded. In the recent decade, machine learning strategies that employ mathematical models on a dataset to extract the most-informative hidden information are the center of interest for prediction and diagnosis purposes. Method -In this study, machine learning approaches were used for finding the important variables for a reliable prediction of UTI. Several types of intelligent machines including classical and deep learning tools were used for this purpose. Results - Eighteen selected features from urine test, blood test and demographic data were selected as the most-informative. Factors extracted from urine such as WBC, nitrite, leukocyte, clarity, color, blood, bilirubin, urobilinogen, and factors extracted from blood test like mean platelet volume, lymphocyte, glucose, red blood cell distribution width, and potassium, demographic data such as age, gender and previous use of antibiotics are the determinative factors for UTI prediction. An ensemble combination of XGBoost, decision tree, and light gradient boosting machines with a voting scheme obtained the highest accuracy for UTI prediction (AUC: 88.53 (0.25), accuracy: 85.64 (0.20)%), according to the selected feature. Furthermore, the results showed the importance of gender and age for UTI prediction. Conclusion- This study highlighted the potential of machine learning for UTI prediction. Urinary tract infection prediction machine learning feature extraction Figures Figure 1 Figure 2 Figure 3 Figure 4 Highlights The most-informative variables for UTI prediction were suggested. An ensemble machine learning approach obtained UTI prediction accuracy of 85.64%. Gender was an important variable for UTI prediction. Introduction Urinary tract infection (UTI) is a common health-threatening issue that affects millions of people each year ( 1 ). Furthermore, the majority of microbiological screenings are associated with cases with possible UTIs( 2 ). UTI is diagnosed based on mainly urine analysis and clinical symptoms. Urine analysis is very sensitive to the collection method and needs 24–72 hours to be accessible. In addition, clinical symptoms alone are not reliable for an accurate prediction( 3 ). In this regard, alternative fast methods are required for UTI prediction. Currently, the lack of fast, accurate and reliable tools for UTI prediction causes the overuse of antibiotics which contributes to the development of bacterial resistance to antibiotic agents. One possible solution for predicting UTI is developing artificial intelligence (AI) systems where artificial machines are trained using extracted information from available data. AI systems are effective solutions for reducing the cost and errors caused by human incorrect decisions, and are fast due to the capability of computers compared with human resources( 3 ). However, the performance of an AI system for predicting UTI depends heavily on identifying the most informative attributes of UTI. In other words, it is important to use the variables with the most discrimination between UTI and non-UTI cases. So far several studies were focused on developing intelligent systems for UTI prediction. In a binary classification study, Taylor et al. used a large dataset (more than 80,000 samples) containing urinalysis data, blood sample characteristics, past medical history data, vital signs and physical examination data to predict UTI. The dataset contains more than 200 features. Their proposed machine learning strategy obtained a classification accuracy of 87.5% with XGBoost classifier. Taylor et al. also used a reduced feature set (10 features) according to the literature search to make the prediction model applicable. The reduced feature set of that study included age, gender, urine leukocyte, urine nitrites, urine WBC, urine bacteria, urine blood, urine epithelial cells, history of UTI and dysuria which obtained the best UTI prediction accuracy of 85.9% ( 4 ). Burton et al., using a large dataset (212,554 samples) and selected features by recursive feature elimination strategy obtained a prediction accuracy of 85% using a neural network classifier. The proposed feature set for that study was WBC count, bacterial count, age, epithelial cell count, RBC count, number of positive cultures to date, pyuria, pregnancy, inpatient, gender, persistent/recurrent infection, number of positive cultures month prior, positive for nitrates, renal inpatient/outpatient, pre-operative patient, acute kidney disease, immunocompromised, and number of positive cultures week prior, multiple sclerosis, offensive smell, and hematuria ( 2 ). The necessity for using different prediction strategies for children (age < 11) and pregnant women was the important result of this study. Heckerling et al. using a small dataset (212 samples) and a combination of artificial neural network and genetic algorithm and a reduced feature set containing urinary frequency, foul urine odor, leukocyte, bacteria and epithelial cells on urinalysis proposed a predictive model for UTI prediction. The accuracy of that model was 76.4%. In Choi et al. ( 5 ), a minimal variables for accurate UTI prediction were urine bacterial count, monocyte count, WBC count, lymphocyte count, urinary WBC count, and specific gravity, diastolic blood pressure, systolic blood pressure, age, mean blood pressure, and C-reactive protein. The best discriminative power with this feature set was obtained by an XGBoost classifier (85.7%). Gadalla et al. proposed a machine learning tool for predicting UTI in women samples using clinical and immunological factors. According to a recursive feature elimination strategy and SVM and RF classifiers, MMP9, NGAL, IL-8/CXCL8 and IL-1βin urine were proposed as the most informative predictors ( 6 ). According to the literature, different studies suggested various minimal feature sets for accurate UTI prediction. However, the prediction accuracy of proposed models for UTI prediction still needs improvement. The main purpose of the current study was to search the most informative factors according to an available big dataset for UTI prediction. Another aim of this study was to propose an artificial intelligence system for providing a probability of UTI for blind cases. Methods Models were developed using machine learning algorithms in Python programming language (Version 3.8.1) while several packages including numpy (1.26.4), sklearn (1.4.2), Pandas (2.1.4), Scipy (1.11.4), Xgboost(2.0.3), lightgbm(4.3), and deeptables (0.2.6) were incorporated. All other analyses such as statistical analyses, and figure generation were also performed in Python. Dataset description In this study, the available dataset prepared by Taylor et al. from a single-center, multi-site study was used ( 4 ). This dataset contains more than 200 variables for each susceptible case who visited four adult emergency departments from March 2013 until May 2016. For each case, information regarding vital signs (body temperature, blood pressure, O2 amount, saturation and dependency), laboratory results, and urinalysis results, history of medication, the complaints (including abdominal pain, back pain, altered mental status, dizziness, fatigue, genital problems, fever, flank pain, hematuria, and weakness), demographics, and physical exam findings were collected. This tabular dataset contained both categorical and numerical data. The vital sign variables transformed to a clustered space in which each value was assigned to one cluster according to its level (1: critically low, 2: low, 3: normal, 4: high, 5: critically high). The labeling strategy (UTI vs. non-UTI) was described in Taylor et al. ( 4 ). This dataset contains more than 80000 samples. More information about this dataset can be found in ( 4 ). Feature selection The used dataset for the current study includes more than 200 features. Since the major purpose of the current study was to propose a clinically applicable AI system, such a high-dimensional feature space should to be reduced to a lower dimensional space. This reduces the computational and time complexity of training step of AI system and also make it feasible for clinical staff to feed the most informative data to the AI system for the prediction of a possible UTI. To reduce the dimensionality of dataset, feature selection was performed in three successive steps. At first, in manual feature elimination, some features such as race, ethnicity, or language, which are specific to a particular area, were excluded. Some features with vague definition such as weakness, or psychiatric confusion were also excluded. Furthermore, features which contain many missing values (‘not_reported’ values in the dataset) were excluded. A 70% threshold for missing values relative to the total sample size was used for ignoring such features. Furthermore, since this study tries to propose an AI strategy for predicting UTI at admission, features such as O2 saturation, blood pressure and body temperature were considered only the admission and such variables during hospitalization were ignored. For the second round of feature space reduction, five different types of feature selection strategies were used, including filter methods (i.e. information gain, chi-squared, Fisher’s score, mean absolute difference), and an embedded method (Lasso regularization feature selection). The information gain method calculates the mutual information (MI) or dependency of each feature and the label vector (UTI and non-UTI labels) according to the entropy estimation( 7 ). Higher value of MI indicates higher dependency (i.e. feature is a well representative of the label vector). Chi-squared method computes statistical dependence between each feature and class labels. Fisher score calculates according to the ratio of between-class and within-class variance and its higher value indicates the more discrimination power of a feature. Mean absolute difference calculates statistical dispersion and returns the average absolute difference between a selected feature and associated class labels ( 8 ). Each method identifies the most informative features from a unique perspective. For practical feasibility, each feature selection method was forced to find the 20 most-informative features. After accumulating selected features by different methods, features that were labeled as informative by at least two feature selection methods were chosen. Finally, for reducing the feature space to a smaller size, the third round of feature selection using wrapper methods was applied. The wrapper feature selection methods including forward, backward, and also recursive feature elimination methods were used to select the most informative subset of features. The final feature set with the acceptable prediction performance was searched according to the classification results. It is worth noting that wrapper methods are very time-consuming when dealing with high dimensional data. In this regard these methods were used after reducing the dimension of data using more time efficient methods like filter feature selection strategies. To calculate the importance of selected features, the component neighborhood analysis (NCA) was performed ( 9 ). This algorithm is a machine learning strategy which is based on nearest neighbor-based feature weighting according to the maximizing classification accuracy. Classification In this study, different types of classifiers were used. Classic classifiers like logistic regression, Naïve Gaussian Bayes classifier, decision tree and random forest classifier, eXtreme Gradient Boosting (XGboost), and light gradient-boosting machines (LightGBM) ( 10 ) and also several deep learning-based machines including deep factorization machines (DeepFM) ( 11 ), eXtreme deep factorization machine (xdeepFM) ( 12 ), wide and deep network (WDN) ( 13 ), deep and cross network (DCN) ( 14 ), product-based neural networks (PNN) ( 15 ), automatic feature interaction learning via self-attentive neural networks (AutoInt)( 16 ), attentional factorization machines (AFM)( 17 ), and feature generation by convolutional neural network (FGCNN)( 18 ) were also used. Data was split randomly into two chunks including train and test where the test size was 25% of total sample size. A parameter optimization step, using grid search or randomized search algorithms for tuning each classifier, was performed before the training step. For parameter optimization 20% of training data was chosen randomly to avoid overfitting. For training each classifier, a 10-fold cross-validation was used and the holdout samples were used for model validation. Test data was used to evaluate the performance of the optimized classifier according to the accuracy, precision, and recall metrics. To check the generalizability of the classifier and prevent issues like imbalanced classes and overfitting, repeated K-fold cross validation (K = 5) was used. In this strategy the training and testing steps were repeated by including different test and train samples and the mean and standard deviation of results for these repeats were reported. Classifiers were compared according to the area under ROC curve (AUC) in which the AUC value was scaled up to 0-100 range (instead of 0–1) to show the small differences between classifier’s performance. Results The first feature reduction step (i.e. manual feature elimination step) reduced the size of feature space from 211 to 150. In table S1 (see supporting material), selected features by different filter methods and Lasso regularization method as an embedded feature selection strategy was shown. By applying filter and embedded feature selection strategies and accumulating the selected features, thirty-two features (feature set No.1) were selected (Table 1 ). The criterion for selecting feature set No.2 was that the feature should be proposed by at least two feature selection methods. Table 1 The most-informative feature set (feature set No.1) for UTI prediction according to the filter and embedded feature selection strategies. Feature name Description Number of selection by five feature selection methods ua_wbc Number of white blood cells in the urine (large, moderate, small) 5 Blood basophils Amount of a types of white blood cells called granulocytes 4 chief_complaint Complaint such as fever, pain, dizziness, fatigue, dysuria, genitourinary problems, weakness, and Hematuria 4 ua_clarity Clarity of urine (clear or not clear) 4 ua_nitrite The presence of nitrite in urine (negative or positive) 4 abx Antibiotic consumption (yes or no) 3 Blood ANC Absolute Neutrophil Count in blood 3 Blood eosinophils White blood cells for boosting inflammation 3 fever Body temperature higher than 37º C (yes or no) 3 Blood lymphocytes A type of immune cell in the blood 3 Blood monocytes A type of white blood cell in the blood 3 Blood MPV Mean platelet volume 3 ua_rbc Number of red blood cells in the urine (large, moderate, small) 3 abd_pain Abdominal pain (yes or no) 2 abd_soft abdomen softening (yes or no) 2 abxUTI Antibiotic consumption for UTI (yes or no) 2 age Age of participant 2 arrival The way that subject was delivered to the emergency unit (Car, EMS, walk-in, or Wheelchair) 2 gender Male or female 2 Glucose Blood glucose level 2 Blood MCH Mean corpuscular hemoglobin 2 Blood potassium Blood potassium level 2 Blood RDW Red blood cell distribution width 2 ua_bili Bilirubin in urine (large, moderate, negative, small) 2 ua_blood The presence of blood in urine (large, moderate, negative, small) 2 ua_color The color of urine (amber, colorless,, red, yellow, other) 2 ua_glucose Urine glucose level (large, moderate, small ,negative) 2 ua_ketones Urine ketones level (large, moderate, small, negative) 2 ua_leuk Leukocytes in Urine (large, moderate, small, negative) 2 ua_protein Protein in urine (large, moderate, small, negative) 2 ua_spec_grav Urine specific gravity (numeric value) 2 ua_urobili Urobilinogen in the urine (positive or negative) 2 The results of classification using feature set No.1 (32 selected features, as shown in Table 1 ) and different machine learning strategies were reported in Table 2 . Using a K-fold cross-validation (K = 10), one fold of training samples was used for validation in a repeated manner. The testing samples were used for testing phase. Table 2 Result of classification with a K-fold cross-validation (K = 10) for training step and repeated K-fold cross-validation (K = 5) for testing phase. Reported values are mean (standard deviation). The feature set was as Table 1 . Validation phase Testing phase Classifier Accuracy Precision Recall AUC Accuracy Specificity Precision Recall AUC Classic classifiers Logistic regression 83.09 ( 0.29 ) 70.36 ( 1.14 ) 44.26 ( 1.28 ) 83.70 ( 0.56 ) 83.24 ( 0.23 ) 94.67 ( 0.17 ) 71.18 ( 0.78 ) 44.54 ( 0.96 ) 83.83 ( 0.25 ) Naïve Gaussian Bayes 80.78 ( 0.40 ) 59.24 ( 1.10 ) 49.22 ( 1.15 ) 81.40 ( 0.73 ) 80.75 ( 0.10 ) 90.03 ( 0.08 ) 59.54 ( 0.30 ) 49.46 ( 0.12 ) 81.42 ( 0.10) Decision tree 84.99 ( 0.34 ) 75.98 ( 1.79 ) 49.76 ( 2.16 ) 86.98 ( 0.52 ) 84.91 ( 0.28 ) 95.38 ( 0.52 ) 76.10 ( 1.42 ) 49.48 ( 1.50 ) 86.96 ( 0.23 ) Random forest 84.08 ( 0.36 ) 75.30 ( 2.65 ) 45.18 ( 3.89 ) 85.18 ( 0.51 ) 84.16 ( 0.19 ) 95.97 ( 0.11 ) 76.49 ( 0.55 ) 44.28 ( 0.27 ) 85.16 ( 0.17 ) XGboost 85.92 ( 0.37 ) 75.63 ( 1.20 ) 56.28 ( 1.20 ) 88.92 ( 0.49 ) 86.05 ( 0.23 ) 94.76 ( 0.18 ) 75.97 ( 0.55 ) 56.37 ( 0.67 ) 88.87 ( 0.14 ) LightGBM 85.91 ( 0.37 ) 75.24 ( 1.07 ) 56.62 ( 1.32 ) 88.78 ( 0.53 ) 85.78 ( 0.24 ) 94.48 ( 0.20 ) 75.09 ( 0.66 ) 56.32 ( 0.32 ) 88.81 ( 0.14 ) Deep learning networks DeepFM 83.43 ( 1.19 ) 78.83 ( 6.52 ) 39.40 ( 12.24 ) 86.75 ( 0.69 ) 85.18 ( 0.47 ) 94.49 ( 1.32 ) 74.26 ( 2.95 ) 53.33 ( 4.27 ) 87.69 ( 0.55 ) WDN 83.02 ( 0.67 ) 76.85 ( 3.61 ) 37.05 ( 6.86 ) 85.08 ( 0.52 ) 83.40 ( 0.39 ) 96.30 ( 1.11 ) 76.87 ( 2.78 ) 40.18 ( 4.98 ) 85.33 ( 0.16 ) xdeepFM 83.18 ( 0.58 ) 78.47 ( 2.88 ) 36.02 ( 5.33 ) 85.16 ( 0.32 ) 83.13 ( 0.73 ) 96.17 ( 1.29 ) 76.32 ( 4.32 ) 39.31 ( 7.43 ) 85.44 ( 0.21 ) DCN 79.52 ( 1.15 ) 89.97 ( 3.54 ) 11.19 ( 6.70 ) 85.05 ( 0.39 ) 82.44 ( 0.61 ) 82.44 ( 0.61 ) 82.55 ( 2.84 ) 29.24 ( 4.82 ) 85.29 ( 0.33 ) PNN 83.34 ( 0.33 ) 78.07 ( 2.77 ) 37.68 ( 3.76 ) 83.73 ( 0.15 ) 83.44 ( 0.45 ) 96.19 ( 0.56 ) 76.13 ( 1.70 ) 41.22 ( 2.39 ) 85.38 ( 0.26 ) AutoInt 60.52 ( 25.61 ) 34.65 ( 31.74 ) 43.82 ( 44.58 ) 77.99 ( 8.17 ) 67.19 ( 22.29 ) 74.76 ( 38.21 ) 42.60 ( 27.61 ) 41.68 ( 38.38 ) 77.77 ( 7.99 ) AFM 77.36 ( 0.31 ) 12.00 ( 32.50 ) 0.01 ( 0.04 ) 80.76 ( 2.11) 78.88 ( 1.91 ) 99.11 ( 1.15 ) 31.42 ( 38.51 ) 10.57 ( 13.09 ) 82.39 ( 1.42 ) FGCNN 83.68 ( 0.36 ) 72.56 ( 1.79 ) 45.79 ( 2.99 ) 85.49 ( 0.38 ) 83.99 ( 0.13 ) 94.47 ( 0.75 ) 72.12 ( 1.68 ) 48.34 ( 2.87 ) 85.66 ( 0.24 ) According to Table 2 , the performance of XGboost classifier in terms of classification accuracy, precision, specification, recall and area under ROC curve was better than other classifiers. LightGBM, DeepFM and Decision tree classifiers were other top classifier options for further analysis. For the final step of feature selection, three different wrapper methods were used. After third feature selection strategies, i.e. wrapper feature selection methods, the initial 211-dimension feature space was reduced to an 18-dimension space. The selected features by each method were shown in Table S2 (see supporting materials). The final feature set was selected by accumulating the features proposed by wrapper methods. In this regard, eighteen features were proposed as the most discriminative features (feature set No.2) as ua_nitrite, ua_wbc, ua_bili, ua_leuk, ua_urobili, abx, abxUTI, age, chief_complaint, gender, Glucose, Lymphocytes, MPV, Potassium, RDW, ua_blood, ua_clarity, and ua_color. The classification performance for feature set No.2 and qualified classifiers was reported in Table 3 . Table 3 Classification results for the selected classifiers according to feature set No.2 Validation phase (10-fold cross validation) Testing phase (repeated 5-fold cross validation) Classifier Accuracy Precision Recall AUC Accuracy Specificity Precision Recall AUC Decision tree 84.98 ( 0.45 ) 75.52 ( 2.03 ) 50.49 ( 1.94 ) 86.97 ( 0.64 ) 85.03 ( 0.22 ) 95.57 ( 0.49 ) 76.45 ( 1.48 ) 49.04 ( 1.99 ) 86.88 ( 0.21 ) XGboost 85.66 ( 0.34 ) 75.30 ( 1.07 ) 55.01 ( 1.20 ) 88.35 ( 0.46 ) 85.58 ( 0.13 ) 94.70 ( 0.19 ) 75.24 ( 0.40 ) 54.63 ( 0.96 ) 88.37 ( 0.11 ) LightGBM 85.76 ( 0.38 ) 74.95 ( 1.25 ) 55.91 ( 1.11 ) 88.55 ( 0.50 ) 85.80 ( 0.15 ) 94.66 ( 0.22 ) 75.73 ( 0.94 ) 56.04 ( 0.54 ) 88.56 ( 0.19 ) DeepFM 84.20 ( 0.61 ) 77.64 ( 4.33 ) 43.71 ( 7.11 ) 87.17 ( 0.49 ) 85.00 ( 0.07 ) 95.40 ( 0.73 ) 76.01 ( 1.70 ) 49.43 ( 2.99 ) 87.49 ( 0.15 ) Combined classifier (equal weight) 85.70 ( 0.37 ) 76.01 ( 1.27 ) 54.22 ( 1.53 ) 88.52 ( 0.50 ) 85.64 ( 0.20 ) 94.89 ( 0.20 ) 75.86 ( 0.75 ) 54.37 ( 0.24 ) 89.53 ( 0.25 ) According to the results in Table 3 , when a combination of three classic classifiers including decision tree, XGBoost, and lightGBM with equal weight as an ensemble classifier was used, the maximum value for classification was obtained. Some notes about selected features In Fig. 1 , the percentage of cases with positive nitrite and uroblinogen report, the percentage of cases with large WBC, leucocytes, blood and bilirubin in urine sample, and critically large blood glucose, lymphocytes, potassium, MPV, and RDW reports for different classes (UTI infected vs. non-UTI) were shown. According to Fig. 1 , among most-informative features with categorical values (positive vs. negative, or large vs. other levels), urine nitrite, and urine WBC, urine leucocyte, urine blood, urine bilirubin, blood glucose, and blood potassium were features in which the percentage in the infected group was larger than the non-infected group. Furthermore, according to Fig. 1 , more percentage of non-UTI cases had critically large blood lymphocytes, RDW, and MPV and urine urobilinogen. This implies that such features reduced following UTI. Correlation between selected features To check for any correlation among selected features, correlation matrix was shown in Fig. 2 . The categorical variables were coded to a binary space as follows: for ua_nitrite: positive ( 1 ) vs. negative (0); for ua_urobili since according to Fig. 1 the number of positive cases was smaller in UTI as compared with non-UTI group: positive (0) vs. negative ( 1 ); for ua_wbc, ua_leuk, ua_blood, ua_bili: large ( 1 ) vs. other levels (0); for Glucose, and potassium: critally large ( 1 ) vs. other levels (0); for lymphocyte, MPV, and RDW: critally large (0) vs. other levels ( 1 ), since according to Fig. 1 the number of critically large cases was smaller in UTI as compared with non-UTI group; for ABX and ABX_UTI: yes ( 1 ) vs. no (0), and for ua_clarity: not_clear ( 1 ) vs. clear (0). Features such as chief compliant and ua_color were not considered for this analysis, since the exact relationship between them and UTI was unknown. In the used dataset, for both UTI positive and negative cases the yellow color urine was dominant (85.89% and 88.77% of UTI positive and negative cases, respectively, while the frequency of other urine colors was negligible). In addition, the most frequent chief compliant in both UTI and non-UTI group was abdominal pain (23.73% and 31.58%, respectively) and other symptoms were not dominant. Furthermore, there was no standard threshold for age variable regarding UTI incidence. Therefore, it was not possible to change these features into a binary representation. For this analysis only cases were considered in which all 14 above-mentioned variables were reported (48761 samples). The Pearson’s correlation was used for calculating correlation coefficient (R) and associated p-value. In Table 4 the weights of selected features according to the neighborhood component analysis were reported. Table 4 Weights of features according to the neighborhood component analysis. Features weight Importance score (%) ua_wbc 0.961 22.41 ua_nitrite 0.926 21.61 ua_leuk 0.842 19.63 ua_clarity 0.545 12.72 MPV 0.156 3.63 lymphocytes 0.151 3.53 ua_blood 0.146 3.40 ua_color 0.131 3.05 ua_bili 0.098 2.28 ua_urobili 0.076 1.78 abx 0.061 1.42 age 0.057 1.34 gender 0.047 1.09 abxUTI 0.048 1.04 glucose 0.028 0.66 RDW 0.012 0.28 potassium 0.005 0.13 chief_complaint 0 0 Sensitivity of classifier performance to age and gender Feature selection stage showed that age and gender were important factors for UTI prediction. To check how the performance of the predictor system was affected by age, or gender, the prediction capability for different age spans (18–40, 40–60, and > 60 years of old) and different genders (male or female) was investigated. In Figs. 3 and 4 , ROC curve for UTI perdition for different age spans and different genders was shown. For this analysis, an ensemble learning technique by combining decision tree, XGBoost and lightGBM estimators with equal weight was used. The combination used a voting scheme to make the final prediction. Comparison with other methods One of the main challenges in UTI prediction is determining the minimal feature set required for a fast and accurate prediction. In this regard, different studies tried to find the most informative features according to their dataset. In Table 5 , the prediction capability of several studies according to different feature sets was compared with the results obtained by our analyses. Since various strategies can be applied for implementing machine learning algorithms (such as different data splitting strategies, cross-validation folds, classifier implementations, optimization methods, etc.), the prediction capability of these feature sets was estimated by the same classifier developed in our study and the same dataset that was released by Taylor et al. ( 4 ). In Taylor et al. ( 4 ), a minimal feature set including age, gender, urine analysis leukocytes, urine nitrite, urine WBC, urine bacteria, urine blood, urine epithelial cells, history of UTI and dysuria was proposed as the most informative features. In Choi et al. ( 5 ), top ten variables for UTI prediction were reported to be urine bacterial count, monocyte count, WBC count, lymphocyte count, urinary WBC count, specific gravity, diastolic blood pressure, systolic blood pressure, age, mean blood pressure, and C-reactive protein. All these features were included in the dataset of Taylor et al. ( 4 ). The result of UTI prediction based on the above-mentioned features, our develop machine learning strategy and Taylor et al. dataset ( 4 ) was reported in Table 5 . In Burton et al. ( 2 ), which used an independent dataset compared to this study and Taylor et al. ( 4 ), 21 features including wbc count, bacterial count, age, epithelial cell count, rbc count, number of positive cultures to date, pyuria (no rbcs), pregnancy, inpatient, gender, persistent/recurrent, infection, number of positive, cultures month prior, positive for nitrates, renal inpatient/outpatient, pre-operative patient, acute kidney disease, immunocompromised, number of positive cultures week prior, multiple sclerosis, offensive smell, haematuria (no wbcs) were selected as the most informative for UTI prediction. The best performance for UTI prediction for this feature set was obtained by XGboost classifier (AUC = 0.91). However, since the dataset of Burton et al. was not accessible, it was not possible to compare the potential of the proposed feature set for UTI prediction with other methods. Table 5 Comparison between UTI prediction capability of different feature sets. The same classification strategy and the same dataset were used for all feature sets. Accuracy Specificity Precision Recall AUC* This work (XGboost) 85.58 ( 0.13 ) 94.70 ( 0.19 ) 75.24 ( 0.40 ) 54.63 ( 0.96 ) 88.37 ( 0.11 ) This work (Combined classifier) 85.64 ( 0.20 ) 94.89 ( 0.20 ) 75.86 ( 0.75 ) 54.37 ( 0.24 ) 88.53 ( 0.25 ) Taylor et al. (XGboost) 84.39 ( 0.20 ) 94.29 ( 0.21 ) 72.41 ( 0.81 ) 50.84 ( 0.84 ) 86.26 ( 0.22 ) Taylor et al. (combined classifier) 84.44 ( 0.23 ) 95.26 ( 0.28 ) 74.85 ( 0.50 ) 47.77 ( 1.47 ) 86.34 ( 0.56 ) Choi et al. (XGboost) 83.95 ( 0.24 ) 94.05 ( 0.13 ) 71.14 ( 0.36 ) 49.70 ( 0.66 ) 85.43 ( 0.30 ) Choi et al. (Combined classifier) 84.27 (0.26) 94.44 (0.18) 72.43 (0.48) 49.66 (0.63) 85.82 (0.36) *To show minor differences AUC was scaled up to 0-100 range. Discussion The current study suggested that urine WBC, urine nitrite, urine leucocyte and urine clarity as most-informative biomarkers for UTI prediction (see Table 4 ). Positive nitrite in urine due to the contribution of bacteria to change nitrates to nitrites in the urine might be considered as a possible sign of infection in urinary tract. However, a previous study showed that analysis of solely urine nitrite, WBCs, and leucocytes as fast strategy for UTI suspected cases ( 19 ) was subject to the low sensitivity and specificity ( 20 ). This might be due to the fact that even though higher levels of WBCs and positive nitrite are potential signs of UTI, they are not specific characteristics of UTI. For example other conditions such as kidney stone or pelvic problems may trigger the increase of WBC count or in patients with gastroenteritis, the urinary nitrite significantly increases ( 21 ). The lack of specificity of these factors for UTI was evident in results depicted in Fig. 1 in which a portion of non-infected (non-UTI) samples showed positive nitrite or large level of WBCs and leucocytes. Another useful biomarker for bacterial inflammation detection is neutrophil to lymphocyte ratio, however it is not specific for UTI and several diseases such as pneumonia, cancers or heart diseases affect such a biomarker( 22 ). Positive urine bilirubin which was supposed to be a potent biomarker(Tables 1 and 4 ), is highly correlated with unusual liver functions ( 23 ), therefore such indicator is not the specific biomarker for UTI. The small weight for such feature (Table 4 ) and the relatively large non-UTI samples with large bilirubin level (Fig. 1 ) showed that bilirubin could not be considered as a potential biomarker for UTI. Literature suggested a positive correlation between diabetes (higher level of blood glucose) and UTI. Diabetes may damage the nerves and weaken the immunity system and in this way increases the risk of bacterial infections. High blood levels may also provide nutrients for bacteria ( 24 ). This might justify why blood glucose was proposed in the current study as a biomarker for UTI prediction. In line with the results of the current study, MPV as an indicator of function of platelets, was suggested by another study as a potential biomarkers for UTI( 25 ). MPV during mild inflammation shows an increase rate, while for severe inflammations it decreases due to the consumption of platelets ( 26 ). The direction of MPV change is a controversial issue between studies. Some studies introduced MPV as positive acute phase reactant and reported its increased level ( 27 ), while some other studies considered MPV as a negative acute phase reactant and reported its decrement after inflammation( 26 ). According the dataset used in this study, the percentage of UTI labeled subjects with large level of MPV was lower compared with non-UTI samples (Fig. 1 ). The urine clarity which was proposed as a potential biomarker in the current study (Table 4 ), is another indicator that usually is used for UTI prediction. However, previous results implied that visual inspection of urine cannot be a sufficient indicator for UTI prediction ( 28 ). Furthermore, previous studies suggested higher sensitivity and specificity of WBC count than nitrite when they compared with urine culture ( 29 ). This is in accordance with higher importance score of urine WBC as compared with nitrite, however, other factors like blood glucose level or pregnancy might influence such outcome( 29 ). The results of this study proposed RDW as one of the potential biomarkers of UTI prediction. Previous studies reported correlation between RDW and systemic inflammation ( 30 ). The results (Fig. 1 ) indicated lower percentage of UTI cases had large level of RDW as compared with non-UTI samples. Another potent factor that was highlighted in the current study for UTI prediction was potassium level (Table 4 ). Several studies reported the ionic abnormalities among inpatient with UTI ( 31 , 32 ). The ionic abnormality may be due to the poor feeding, increased sweating, or vomiting in UTI conditions ( 33 ). Low potassium level (hypokalemia) was commonly observed in UTI cases ( 34 ), while for pyelonephritis cases (a type of urinary tract infection with infected kidneys), the increased level of circulating potassium was observed compared with control group ( 35 ). The increased potassium level is in accordance with the analysis of the used dataset (see Fig. 1 ) in which the prevalence of critically large blood potassium was observed for UTI group. According to Fig. 2 , there was no strong correlation (R > 0.5) between features except for ABX (antibiotic usage) and ABX_UTI (antibiotic usage for UTI), which are highly correlated (R = 1). This may be due to the fact that people who used antibiotics in large extent are more susceptible to UTI due to antibiotic resistance. It should be noted that the prediction performance of the classifier by removing one of ABX or ABX_UTI variables degraded slightly (not reported in the manuscript), therefore, it is suggested to use both of them. Furthermore, a moderate correlation (R = 0.42, p < 0.01) was observed between leukocyte esterase (ua_leuk) and white blood cell counts in urine (ua_wbc). This correlation is reasonable since leukocyte esterase is an enzyme found in white blood cells. The non-strong correlations indicate selected features do not share common information regarding UTI. The overall classification accuracy of this work was 85.64% (the result for larger AUC) which obtained by an ensemble classifier using the combination of decision tree, XGBoost and lightGBM classifiers with equal weights (Table 3 ). However, this performance degraded when UTI prediction was done for female samples or younger cases (Figs. 3 and 4 ). These results showed the importance of gender and age for developing machine learning strategies for UTI prediction. In addition, when the proposed feature set of this work was compared with other studies (Table 5 ), it was clear that this combination obtained better accuracy and an enhanced trade-off between type1 and type 2 error for UTI prediction. Conclusion UTI is a frequent problem in different societies. The reliable prediction of UTI in a quick time prevents unnecessary antibiotic use for non UTI cases and also facilitates microbial degradation for UTI cases. Machine learning strategies according to the related UTI data is an interesting tool for developing prediction systems for UTI. The current study used an available dataset which contained several features including urinalysis, blood test, and vital sign, demographic and observational data for UTI suspected cases. Finding the most-informative features as well as developing an intelligent system for predicting UTI were the main purposes of this study. This study showed the potential of machine learning strategies for prediction of UTI according to laboratory and urinalysis results. Furthermore, eighteen features with maximum discrimination capability were proposed. Furthermore, the results showed that age and gender were two factors which affected UTI prediction. Declarations Statements of ethical approval -Not applicable Consent to participate -Not applicable Consent for publication -Not applicable Conflicts of interest/Competing interests -There is nothing to declare. Funding- This work was Funded by Hamadan University of Medical Sciences, Deputy of research and technology. Author Contribution - S.F and H.EM performed systematic searches. S.F performed analyses and wrote the manuscript. S.F, and H.EM discussed the obtained results and finalized the draft. Acknowledgement Authors would like to thank vice-chancellor for research and technology for all support for the current work. Availability of data and materials- The data used in this study is fully accessible from https://doi.org/10.1371/journal.pone.0194085 . References Stamm WE, Norrby SR. Urinary tract infections: disease panorama and challenges. J Infect Dis. 2001;183(Suppl 1):S1–4. Burton RJ, Albur M, Eberl M, Cuff SM. Using artificial intelligence to reduce diagnostic workload without compromising detection of urinary tract infections. BMC Med Inf Decis Mak. 2019;19(1):171. Goździkiewicz N, Zwolińska D, Polak-Jonkisz D. The Use of Artificial Intelligence Algorithms in the Diagnosis of Urinary Tract Infections-A Literature Review. J Clin Med. 2022;11(10). Taylor RA, Moore CL, Cheung K-H, Brandt C. Predicting urinary tract infections in the emergency department with machine learning. PLoS ONE. 2018;13(3):e0194085. Choi MH, Kim D, Park Y, Jeong SH. Development and validation of artificial intelligence models to predict urinary tract infections and secondary bloodstream infections in adult patients. J Infect Public Health. 2024;17(1):10–7. Gadalla AAH, Friberg IM, Kift-Morgan A, Zhang J, Eberl M, Topley N, et al. Identification of clinical and urine biomarkers for uncomplicated urinary tract infection using machine learning algorithms. Sci Rep. 2019;9(1):19694. Kozachenko LF, Leonenko NN. Sample Estimate of the Entropy of a Random Vector. Probl Peredachi Inf. 1987;23(2):9–16. Dodge Y. The concise encyclopedia of statistics. New York: Springer; 2010. Yang W, Wang K, Zuo W. Neighborhood Component Feature Selection for High-Dimensional Data. J Comput. 2012;7:161–8. Ke G, Meng Q, Finley T, Wang T, Chen W, Ma W, et al. editors. LightGBM: A Highly Efficient Gradient Boosting Decision Tree. Neural Information Processing Systems; 2017. Guo H, Tang R, Ye Y, Li Z, He X. DeepFM: a factorization-machine based neural network for CTR prediction. arXiv preprint arXiv:170304247. 2017. Lian J, Zhou X, Zhang F, Chen Z, Xie X, Sun G, editors. xdeepfm: Combining explicit and implicit feature interactions for recommender systems. Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining; 2018. Cheng H-T, Koc L, Harmsen J, Shaked T, Chandra T, Aradhye H et al. Wide Deep Learn Recommender Syst2016. 7–10 p. Wang R, Fu B, Fu G, Wang M. Deep & Cross Network for Ad Click Predictions. Proceedings of the ADKDD'17. 2017. Qu Y, Cai H, Ren K, Zhang W, Yu Y, Wen Y, Wang J, editors. Product-Based Neural Networks for User Response Prediction. 2016 IEEE 16th International Conference on Data Mining (ICDM); 2016 12–15 Dec. 2016. Song W, Shi C, Xiao Z, Duan Z, Xu Y, Zhang M, Tang J, editors. Autoint: Automatic feature interaction learning via self-attentive neural networks. Proceedings of the 28th ACM international conference on information and knowledge management; 2019. Xiao J, Ye H, He X, Zhang H, Wu F, Chua T-S. Attentional factorization machines: Learning the weight of feature interactions via attention networks. arXiv preprint arXiv:170804617. 2017. Liu B, Tang R, Chen Y, Yu J, Guo H, Zhang Y, editors. Feature generation by convolutional neural network for click-through rate prediction. The World Wide Web Conference; 2019. Appenheimer AB, Ford B. Urine Dipstick: Urinary Nitrites and Leukocyte Esterase – Dipping into Murky Waters. In: Sharp VJA, Antes LM, Sanders ML, Lockwood GM, editors. Urine Tests: A Case-Based Guide to Clinical Evaluation and Application. Cham: Springer International Publishing; 2020. pp. 97–115. Williams GJ, Macaskill P, Chan SF, Turner RM, Hodson E, Craig JC. Absolute and relative accuracy of rapid urine tests for urinary tract infection in children: a meta-analysis. Lancet Infect Dis. 2010;10(4):240–50. Wanchu A, Khullar M, Sud A, Deodhar SD, Bambery P. Elevated urinary nitrite and citrulline levels in patients with rheumatoid arthritis. Inflammopharmacology. 1999;7(2):155–61. Han SY, Lee IR, Park SJ, Kim JH, Shin JI. Usefulness of neutrophil-lymphocyte ratio in young children with febrile urinary tract infection. Korean J Pediatr. 2016;59(3):139–44. Foley KF, Wasserman J. Are Unexpected Positive Dipstick Urine Bilirubin Results Clinically Significant? A Retrospective Review. Lab Med. 2014;45(1):59–61. Ahmed AE, Abdelkarim S, Zenida M, Baiti MAH, Alhazmi AAY, Alfaifi BAH et al. Prevalence and Associated Risk Factors of Urinary Tract Infection among Diabetic Patients: A Cross-Sectional Study. Healthc (Basel). 2023;11(6). Akya A, Rostami-Far Z, Chegene Lorestani R, Khazaei S, Elahi A, Rostamian M, et al. Platelet Indices as Useful Indicators of Urinary Tract Infection. Iran J Ped Hematol Oncol. 2019;9(3):159–65. Tanju C, Ekrem G, Emel AB, Nur A. Mean platelet volume as a negative marker of inflammation in children with rotavirus gastroenteritis. Iran J Pediatr. 2014;24(5):617. Albayrak Y, Albayrak A, Albayrak F, Yildirim R, Aylu B, Uyanik A, et al. Mean platelet volume: a new predictor in confirming acute appendicitis diagnosis. Clin Appl Thromb Hemost. 2011;17(4):362–6. Bulloch B, Bausher JC, Pomerantz WJ, Connors JM, Mahabee-Gittens M, Dowd MD. Can Urine Clarity Exclude the Diagnosis of Urinary Tract Infection? Pediatrics. 2000;106(5):e60–e. Mohanna AT, Alshamrani KM, SaemAldahar MA, Kidwai AO, Kaneetah AH, Khan MA, Mazraani N. The Sensitivity and Specificity of White Blood Cells and Nitrite in Dipstick Urinalysis in Association With Urine Culture in Detecting Infection in Adults From October 2016 to October 2019 at King Abdulaziz Medical City. Cureus. 2021;13(6):e15436. Ma W, Mao S, Bao M, Wu Y, Guo Y, Liu J, et al. Prognostic significance of red cell distribution width in bladder cancer. Translational Androl Urol. 2020;9(2):295–302. Park SJ, Oh YS, Choi MJ, Shin JI, Kim KH. Hyponatremia may reflect severe inflammation in children with febrile urinary tract infection. Pediatr Nephrol. 2012;27:2261–7. Winberg J. Renal function studies in infants and children with acute, nonobstructive urinary tract infections. Acta Paediatr. 1959;48:577–89. Bertini A, Milani GP, Simonetti GD, Fossali EF, Faré PB, Bianchetti MG, Lava SAG, Na+, K+. Cl–, acid–base or H2O homeostasis in children with urinary tract infections: a narrative review. Pediatr Nephrol. 2016;31(9):1403–9. Shen A-L, Lin H-L, Lin H-C, Tseng Y-F, Hsu C-Y, Chou C-Y. Urinary tract infection is associated with hypokalemia: a case control study. BMC Urol. 2020;20(1):108. Watanabe T. Hyponatremia and hyperkalemia in infants with acute pyelonephritis. Pediatr Nephrol. 2004;19:361–2. Additional Declarations No competing interests reported. Supplementary Files Supportingpaper2.docx Cite Share Download PDF Status: Published Journal Publication published 09 Jan, 2025 Read the published version in BMC Medical Informatics and Decision Making → Version 1 posted Editorial decision: Revision requested 20 Sep, 2024 Editor assigned by journal 20 Sep, 2024 Submission checks completed at journal 20 Sep, 2024 First submitted to journal 18 Sep, 2024 You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-5107375","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":356944057,"identity":"31431d0a-0d97-4cc0-a762-1adc1a93e446","order_by":0,"name":"Sajjad Farashi","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAABC0lEQVRIiWNgGAWjYHCCBAMGNgZ+EIuZwcCGgUGCSC2SDRAtaURpAQK4FobDhLXItx94UPCjjEGCf/bhx58LCs4n9s9uPviAocYmGpcWgzMJCYY95xgkJM6lmUnPMLidOOPOsWQDhmNpuQ24tDAkJBjwtjHUMZxhMGPmAWppuJFjJsHYcBinFvn+BwmGf9sYJOTPsH/+zGNwLnE+IS0MNxISjIG2SBic4TGQ5jE4kLiBkBaDGw8SjGWAfjE8w1MG1JJsvPFGWrJBAh6/yPfnpBm+AYaY3Bn2zZ95/tjJzruRfPDBhxob3A5j4EkzYGD4D+c6glUm4FQOAuyHHyBz7fEqHgWjYBSMghEJAMuRWgwreJIFAAAAAElFTkSuQmCC","orcid":"","institution":"Hamadan University of Medical Sciences","correspondingAuthor":true,"prefix":"","firstName":"Sajjad","middleName":"","lastName":"Farashi","suffix":""},{"id":356944058,"identity":"42639079-be74-485d-ab8f-fd551bdf88bb","order_by":1,"name":"Hossein Emad Momtaz","email":"","orcid":"","institution":"Hamadan University of Medical Sciences","correspondingAuthor":false,"prefix":"","firstName":"Hossein","middleName":"Emad","lastName":"Momtaz","suffix":""}],"badges":[],"createdAt":"2024-09-18 06:19:38","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-5107375/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-5107375/v1","draftVersion":[],"editorialEvents":[{"content":"https://doi.org/10.1186/s12911-024-02819-2","type":"published","date":"2025-01-09T15:58:01+00:00"}],"editorialNote":"","failedWorkflow":false,"files":[{"id":70460833,"identity":"61184b80-b940-4f5f-adc0-2ace1e1ec971","added_by":"auto","created_at":"2024-12-03 11:31:47","extension":"jpeg","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":739220,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eComparison between UTI (infected) and non-UTI (non-infected) cases, considering the most-informative features.\u003c/strong\u003e\u003c/p\u003e","description":"","filename":"floatimage1.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-5107375/v1/c69dec0e455cf845d5ab433e.jpeg"},{"id":70460834,"identity":"1fc2b9c4-6311-4c1e-802d-8b1c64f363da","added_by":"auto","created_at":"2024-12-03 11:31:47","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":175882,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eCorrelation matrix of selected variables with categorical values. The color indicates the correlation value and each number indicates the p-value (*: p\u0026lt;0.05, **: p\u0026lt;0.01).\u003c/strong\u003e\u003c/p\u003e","description":"","filename":"floatimage2.png","url":"https://assets-eu.researchsquare.com/files/rs-5107375/v1/4916b61e45c8a27f2ae7128c.png"},{"id":70460830,"identity":"510f4c3d-0ff0-4a1f-afe2-9d296283668c","added_by":"auto","created_at":"2024-12-03 11:31:47","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":40257,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eROC curve for comparison of predictor performance for different age spans. Area under curve was 86.42 (0.17) (Age: 18-40), 87.71 (0.22) (Age: 40-60), and 89.42 (0.25) (Age\u0026gt;60), respectively.\u003c/strong\u003e\u003c/p\u003e","description":"","filename":"floatimage3.png","url":"https://assets-eu.researchsquare.com/files/rs-5107375/v1/4a0467ebefb75d199e3990b8.png"},{"id":70461778,"identity":"472083ea-41a9-4418-a1b5-2d2c677ee62f","added_by":"auto","created_at":"2024-12-03 11:39:47","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":33114,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eROC curve for comparison of predictor performance according to the gender. Area under curve was 91.42 (0.32) and 86.16 (0.23) for males and females, respectively.\u003c/strong\u003e\u003c/p\u003e","description":"","filename":"floatimage4.png","url":"https://assets-eu.researchsquare.com/files/rs-5107375/v1/fae4dbe04bd9c3fb3f52a01e.png"},{"id":73694752,"identity":"9a0e32c5-2a6d-4737-821b-61a2292ffebc","added_by":"auto","created_at":"2025-01-13 16:13:52","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":2460191,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-5107375/v1/fddcf4aa-8819-4e9f-85fb-9cd9424fb895.pdf"},{"id":70461777,"identity":"f0e09086-3c9c-43ff-a09a-5ff67e82a991","added_by":"auto","created_at":"2024-12-03 11:39:47","extension":"docx","order_by":2,"title":"","display":"","copyAsset":false,"role":"supplement","size":14474,"visible":true,"origin":"","legend":"","description":"","filename":"Supportingpaper2.docx","url":"https://assets-eu.researchsquare.com/files/rs-5107375/v1/131efc816d070db8ab03c407.docx"}],"financialInterests":"No competing interests reported.","formattedTitle":"Prediction of urinary tract infection using machine learning methods-A study for finding the most-informative variables","fulltext":[{"header":"Highlights","content":"\u003cul\u003e\n \u003cli\u003eThe most-informative variables for UTI prediction were suggested.\u003c/li\u003e\n \u003cli\u003eAn ensemble machine learning approach obtained UTI prediction accuracy of 85.64%.\u003c/li\u003e\n \u003cli\u003eGender was an important variable for UTI prediction.\u0026nbsp;\u003c/li\u003e\n\u003c/ul\u003e"},{"header":"Introduction","content":"\u003cp\u003eUrinary tract infection (UTI) is a common health-threatening issue that affects millions of people each year (\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e). Furthermore, the majority of microbiological screenings are associated with cases with possible UTIs(\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e). UTI is diagnosed based on mainly urine analysis and clinical symptoms. Urine analysis is very sensitive to the collection method and needs 24\u0026ndash;72 hours to be accessible. In addition, clinical symptoms alone are not reliable for an accurate prediction(\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e). In this regard, alternative fast methods are required for UTI prediction. Currently, the lack of fast, accurate and reliable tools for UTI prediction causes the overuse of antibiotics which contributes to the development of bacterial resistance to antibiotic agents.\u003c/p\u003e \u003cp\u003eOne possible solution for predicting UTI is developing artificial intelligence (AI) systems where artificial machines are trained using extracted information from available data. AI systems are effective solutions for reducing the cost and errors caused by human incorrect decisions, and are fast due to the capability of computers compared with human resources(\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e). However, the performance of an AI system for predicting UTI depends heavily on identifying the most informative attributes of UTI. In other words, it is important to use the variables with the most discrimination between UTI and non-UTI cases.\u003c/p\u003e \u003cp\u003eSo far several studies were focused on developing intelligent systems for UTI prediction. In a binary classification study, Taylor et al. used a large dataset (more than 80,000 samples) containing urinalysis data, blood sample characteristics, past medical history data, vital signs and physical examination data to predict UTI. The dataset contains more than 200 features. Their proposed machine learning strategy obtained a classification accuracy of 87.5% with XGBoost classifier. Taylor et al. also used a reduced feature set (10 features) according to the literature search to make the prediction model applicable. The reduced feature set of that study included age, gender, urine leukocyte, urine nitrites, urine WBC, urine bacteria, urine blood, urine epithelial cells, history of UTI and dysuria which obtained the best UTI prediction accuracy of 85.9% (\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e). Burton et al., using a large dataset (212,554 samples) and selected features by recursive feature elimination strategy obtained a prediction accuracy of 85% using a neural network classifier. The proposed feature set for that study was WBC count, bacterial count, age, epithelial cell count, RBC count, number of positive cultures to date, pyuria, pregnancy, inpatient, gender, persistent/recurrent infection, number of positive cultures month prior, positive for nitrates, renal inpatient/outpatient, pre-operative patient, acute kidney disease, immunocompromised, and number of positive cultures week prior, multiple sclerosis, offensive smell, and hematuria (\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e). The necessity for using different prediction strategies for children (age\u0026thinsp;\u0026lt;\u0026thinsp;11) and pregnant women was the important result of this study. Heckerling et al. using a small dataset (212 samples) and a combination of artificial neural network and genetic algorithm and a reduced feature set containing urinary frequency, foul urine odor, leukocyte, bacteria and epithelial cells on urinalysis proposed a predictive model for UTI prediction. The accuracy of that model was 76.4%. In Choi et al. (\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e), a minimal variables for accurate UTI prediction were urine bacterial count, monocyte count, WBC count, lymphocyte count, urinary WBC count, and specific gravity, diastolic blood pressure, systolic blood pressure, age, mean blood pressure, and C-reactive protein. The best discriminative power with this feature set was obtained by an XGBoost classifier (85.7%). Gadalla et al. proposed a machine learning tool for predicting UTI in women samples using clinical and immunological factors. According to a recursive feature elimination strategy and SVM and RF classifiers, MMP9, NGAL, IL-8/CXCL8 and IL-1βin urine were proposed as the most informative predictors (\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e).\u003c/p\u003e \u003cp\u003eAccording to the literature, different studies suggested various minimal feature sets for accurate UTI prediction. However, the prediction accuracy of proposed models for UTI prediction still needs improvement. The main purpose of the current study was to search the most informative factors according to an available big dataset for UTI prediction. Another aim of this study was to propose an artificial intelligence system for providing a probability of UTI for blind cases.\u003c/p\u003e"},{"header":"Methods","content":"\u003cp\u003eModels were developed using machine learning algorithms in Python programming language (Version 3.8.1) while several packages including numpy (1.26.4), sklearn (1.4.2), Pandas (2.1.4), Scipy (1.11.4), Xgboost(2.0.3), lightgbm(4.3), and deeptables (0.2.6) were incorporated. All other analyses such as statistical analyses, and figure generation were also performed in Python.\u003c/p\u003e \u003cdiv id=\"Sec3\" class=\"Section2\"\u003e \u003ch2\u003eDataset description\u003c/h2\u003e \u003cp\u003eIn this study, the available dataset prepared by Taylor et al. from a single-center, multi-site study was used (\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e). This dataset contains more than 200 variables for each susceptible case who visited four adult emergency departments from March 2013 until May 2016. For each case, information regarding vital signs (body temperature, blood pressure, O2 amount, saturation and dependency), laboratory results, and urinalysis results, history of medication, the complaints (including abdominal pain, back pain, altered mental status, dizziness, fatigue, genital problems, fever, flank pain, hematuria, and weakness), demographics, and physical exam findings were collected. This tabular dataset contained both categorical and numerical data. The vital sign variables transformed to a clustered space in which each value was assigned to one cluster according to its level (1: critically low, 2: low, 3: normal, 4: high, 5: critically high). The labeling strategy (UTI vs. non-UTI) was described in Taylor et al. (\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e). This dataset contains more than 80000 samples. More information about this dataset can be found in (\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e).\u003c/p\u003e \u003c/div\u003e\n\u003ch3\u003eFeature selection\u003c/h3\u003e\n\u003cp\u003eThe used dataset for the current study includes more than 200 features. Since the major purpose of the current study was to propose a clinically applicable AI system, such a high-dimensional feature space should to be reduced to a lower dimensional space. This reduces the computational and time complexity of training step of AI system and also make it feasible for clinical staff to feed the most informative data to the AI system for the prediction of a possible UTI.\u003c/p\u003e \u003cp\u003eTo reduce the dimensionality of dataset, feature selection was performed in three successive steps. At first, in manual feature elimination, some features such as race, ethnicity, or language, which are specific to a particular area, were excluded. Some features with vague definition such as weakness, or psychiatric confusion were also excluded. Furthermore, features which contain many missing values (\u0026lsquo;not_reported\u0026rsquo; values in the dataset) were excluded. A 70% threshold for missing values relative to the total sample size was used for ignoring such features. Furthermore, since this study tries to propose an AI strategy for predicting UTI at admission, features such as O2 saturation, blood pressure and body temperature were considered only the admission and such variables during hospitalization were ignored. For the second round of feature space reduction, five different types of feature selection strategies were used, including filter methods (i.e. information gain, chi-squared, Fisher\u0026rsquo;s score, mean absolute difference), and an embedded method (Lasso regularization feature selection). The information gain method calculates the mutual information (MI) or dependency of each feature and the label vector (UTI and non-UTI labels) according to the entropy estimation(\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e). Higher value of MI indicates higher dependency (i.e. feature is a well representative of the label vector). Chi-squared method computes statistical dependence between each feature and class labels. Fisher score calculates according to the ratio of between-class and within-class variance and its higher value indicates the more discrimination power of a feature. Mean absolute difference calculates statistical dispersion and returns the average absolute difference between a selected feature and associated class labels (\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e).\u003c/p\u003e \u003cp\u003eEach method identifies the most informative features from a unique perspective. For practical feasibility, each feature selection method was forced to find the 20 most-informative features. After accumulating selected features by different methods, features that were labeled as informative by at least two feature selection methods were chosen. Finally, for reducing the feature space to a smaller size, the third round of feature selection using wrapper methods was applied. The wrapper feature selection methods including forward, backward, and also recursive feature elimination methods were used to select the most informative subset of features. The final feature set with the acceptable prediction performance was searched according to the classification results. It is worth noting that wrapper methods are very time-consuming when dealing with high dimensional data. In this regard these methods were used after reducing the dimension of data using more time efficient methods like filter feature selection strategies. To calculate the importance of selected features, the component neighborhood analysis (NCA) was performed (\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e). This algorithm is a machine learning strategy which is based on nearest neighbor-based feature weighting according to the maximizing classification accuracy.\u003c/p\u003e\n\u003ch3\u003eClassification\u003c/h3\u003e\n\u003cp\u003eIn this study, different types of classifiers were used. Classic classifiers like logistic regression, Na\u0026iuml;ve Gaussian Bayes classifier, decision tree and random forest classifier, eXtreme Gradient Boosting (XGboost), and light gradient-boosting machines (LightGBM) (\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e) and also several deep learning-based machines including deep factorization machines (DeepFM) (\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e), eXtreme deep factorization machine (xdeepFM) (\u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e), wide and deep network (WDN) (\u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e), deep and cross network (DCN) (\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e), product-based neural networks (PNN) (\u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e15\u003c/span\u003e), automatic feature interaction learning via self-attentive neural networks (AutoInt)(\u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e), attentional factorization machines (AFM)(\u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e17\u003c/span\u003e), and feature generation by convolutional neural network (FGCNN)(\u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e18\u003c/span\u003e) were also used. Data was split randomly into two chunks including train and test where the test size was 25% of total sample size. A parameter optimization step, using grid search or randomized search algorithms for tuning each classifier, was performed before the training step. For parameter optimization 20% of training data was chosen randomly to avoid overfitting. For training each classifier, a 10-fold cross-validation was used and the holdout samples were used for model validation.\u003c/p\u003e \u003cp\u003eTest data was used to evaluate the performance of the optimized classifier according to the accuracy, precision, and recall metrics. To check the generalizability of the classifier and prevent issues like imbalanced classes and overfitting, repeated K-fold cross validation (K\u0026thinsp;=\u0026thinsp;5) was used. In this strategy the training and testing steps were repeated by including different test and train samples and the mean and standard deviation of results for these repeats were reported. Classifiers were compared according to the area under ROC curve (AUC) in which the AUC value was scaled up to 0-100 range (instead of 0\u0026ndash;1) to show the small differences between classifier\u0026rsquo;s performance.\u003c/p\u003e"},{"header":"Results","content":"\u003cp\u003eThe first feature reduction step (i.e. manual feature elimination step) reduced the size of feature space from 211 to 150. In table \u003cspan refid=\"MOESM1\" class=\"InternalRef\"\u003eS1\u003c/span\u003e (see supporting material), selected features by different filter methods and Lasso regularization method as an embedded feature selection strategy was shown.\u003c/p\u003e \u003cp\u003eBy applying filter and embedded feature selection strategies and accumulating the selected features, thirty-two features (feature set No.1) were selected (Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e). The criterion for selecting feature set No.2 was that the feature should be proposed by at least two feature selection methods.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab1\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 1\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eThe most-informative feature set (feature set No.1) for UTI prediction according to the filter and embedded feature selection strategies.\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"3\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eFeature name\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eDescription\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eNumber of selection by five feature selection methods\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eua_wbc\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eNumber of white blood cells in the urine (large, moderate, small)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e5\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eBlood basophils\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eAmount of a types of white blood cells called granulocytes\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e4\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003echief_complaint\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eComplaint such as fever, pain, dizziness, fatigue, dysuria, genitourinary problems, weakness, and Hematuria\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e4\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eua_clarity\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eClarity of urine (clear or not clear)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e4\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eua_nitrite\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eThe presence of nitrite in urine (negative or positive)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e4\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eabx\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eAntibiotic consumption (yes or no)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e3\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eBlood ANC\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eAbsolute Neutrophil Count in blood\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e3\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eBlood eosinophils\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eWhite blood cells for boosting inflammation\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e3\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003efever\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eBody temperature higher than 37\u0026ordm; C (yes or no)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e3\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eBlood lymphocytes\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eA type of immune cell in the blood\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e3\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eBlood monocytes\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eA type of white blood cell in the blood\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e3\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eBlood MPV\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eMean platelet volume\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e3\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eua_rbc\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eNumber of red blood cells in the urine (large, moderate, small)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e3\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eabd_pain\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eAbdominal pain (yes or no)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e2\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eabd_soft\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eabdomen softening (yes or no)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e2\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eabxUTI\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eAntibiotic consumption for UTI (yes or no)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e2\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eage\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eAge of participant\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e2\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003earrival\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eThe way that subject was delivered to the emergency unit (Car, EMS, walk-in, or Wheelchair)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e2\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003egender\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eMale or female\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e2\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eGlucose\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eBlood glucose level\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e2\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eBlood MCH\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eMean corpuscular hemoglobin\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e2\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eBlood potassium\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eBlood potassium level\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e2\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eBlood RDW\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eRed blood cell distribution width\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e2\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eua_bili\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eBilirubin in urine (large, moderate, negative, small)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e2\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eua_blood\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eThe presence of blood in urine (large, moderate, negative, small)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e2\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eua_color\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eThe color of urine (amber, colorless,, red, yellow, other)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e2\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eua_glucose\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eUrine glucose level (large, moderate, small ,negative)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e2\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eua_ketones\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eUrine ketones level (large, moderate, small, negative)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e2\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eua_leuk\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eLeukocytes in Urine (large, moderate, small, negative)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e2\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eua_protein\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eProtein in urine (large, moderate, small, negative)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e2\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eua_spec_grav\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eUrine specific gravity (numeric value)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e2\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eua_urobili\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eUrobilinogen in the urine (positive or negative)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e2\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003eThe results of classification using feature set No.1 (32 selected features, as shown in Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e) and different machine learning strategies were reported in Table\u0026nbsp;\u003cspan refid=\"Tab2\" class=\"InternalRef\"\u003e2\u003c/span\u003e. Using a K-fold cross-validation (K\u0026thinsp;=\u0026thinsp;10), one fold of training samples was used for validation in a repeated manner. The testing samples were used for testing phase.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab2\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 2\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eResult of classification with a K-fold cross-validation (K\u0026thinsp;=\u0026thinsp;10) for training step and repeated K-fold cross-validation (K\u0026thinsp;=\u0026thinsp;5) for testing phase. Reported values are mean (standard deviation). The feature set was as Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e.\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"10\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c7\" colnum=\"7\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c8\" colnum=\"8\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c9\" colnum=\"9\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c10\" colnum=\"10\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colspan=\"10\" nameend=\"c10\" namest=\"c1\"\u003e \u003cp\u003eValidation phase Testing phase\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eClassifier\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eAccuracy\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003ePrecision\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eRecall\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003eAUC\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c6\"\u003e \u003cp\u003eAccuracy\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c7\"\u003e \u003cp\u003eSpecificity\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c8\"\u003e \u003cp\u003ePrecision\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c9\"\u003e \u003cp\u003eRecall\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c10\"\u003e \u003cp\u003eAUC\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003ctr\u003e \u003cth align=\"left\" colspan=\"10\" nameend=\"c10\" namest=\"c1\"\u003e \u003cp\u003eClassic classifiers\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eLogistic regression\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e83.09 ( 0.29 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e70.36 ( 1.14 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e44.26 ( 1.28 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e83.70 ( 0.56 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e83.24 ( 0.23 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e94.67 ( 0.17 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e71.18 ( 0.78 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003e44.54 ( 0.96 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c10\"\u003e \u003cp\u003e83.83 ( 0.25 )\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eNa\u0026iuml;ve Gaussian Bayes\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e80.78 ( 0.40 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e59.24 ( 1.10 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e49.22 ( 1.15 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e81.40 ( 0.73 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e80.75 ( 0.10 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e90.03 ( 0.08 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e59.54 ( 0.30 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003e49.46 ( 0.12 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c10\"\u003e \u003cp\u003e81.42 ( 0.10)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eDecision tree\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e84.99 ( 0.34 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e75.98 ( 1.79 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e49.76 ( 2.16 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e86.98 ( 0.52 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e84.91 ( 0.28 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e95.38 ( 0.52 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e76.10 ( 1.42 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003e49.48 ( 1.50 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c10\"\u003e \u003cp\u003e86.96 ( 0.23 )\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eRandom forest\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e84.08 ( 0.36 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e75.30 ( 2.65 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e45.18 ( 3.89 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e85.18 ( 0.51 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e84.16 ( 0.19 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e95.97 ( 0.11 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e76.49 ( 0.55 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003e44.28 ( 0.27 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c10\"\u003e \u003cp\u003e85.16 ( 0.17 )\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eXGboost\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e85.92 ( 0.37 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e75.63 ( 1.20 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e56.28 ( 1.20 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e88.92 ( 0.49 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e86.05 ( 0.23 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e94.76 ( 0.18 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e75.97 ( 0.55 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003e56.37 ( 0.67 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c10\"\u003e \u003cp\u003e88.87 ( 0.14 )\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eLightGBM\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e85.91 ( 0.37 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e75.24 ( 1.07 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e56.62 ( 1.32 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e88.78 ( 0.53 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e85.78 ( 0.24 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e94.48 ( 0.20 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e75.09 ( 0.66 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003e56.32 ( 0.32 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c10\"\u003e \u003cp\u003e88.81 ( 0.14 )\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colspan=\"10\" nameend=\"c10\" namest=\"c1\"\u003e \u003cp\u003e\u003cb\u003eDeep learning networks\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eDeepFM\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e83.43 ( 1.19 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e78.83 ( 6.52 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e39.40 ( 12.24 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e86.75 ( 0.69 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e85.18 ( 0.47 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e94.49 ( 1.32 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e74.26 ( 2.95 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003e53.33 ( 4.27 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c10\"\u003e \u003cp\u003e87.69 ( 0.55 )\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eWDN\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e83.02 ( 0.67 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e76.85 ( 3.61 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e37.05 ( 6.86 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e85.08 ( 0.52 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e83.40 ( 0.39 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e96.30 ( 1.11 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e76.87 ( 2.78 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003e40.18 ( 4.98 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c10\"\u003e \u003cp\u003e85.33 ( 0.16 )\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003exdeepFM\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e83.18 ( 0.58 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e78.47 ( 2.88 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e36.02 ( 5.33 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e85.16 ( 0.32 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e83.13 ( 0.73 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e96.17 ( 1.29 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e76.32 ( 4.32 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003e39.31 ( 7.43 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c10\"\u003e \u003cp\u003e85.44 ( 0.21 )\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eDCN\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e79.52 ( 1.15 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e89.97 ( 3.54 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e11.19 ( 6.70 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e85.05 ( 0.39 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e82.44 ( 0.61 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e82.44 ( 0.61 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e82.55 ( 2.84 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003e29.24 ( 4.82 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c10\"\u003e \u003cp\u003e85.29 ( 0.33 )\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003ePNN\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e83.34 ( 0.33 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e78.07 ( 2.77 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e37.68 ( 3.76 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e83.73 ( 0.15 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e83.44 ( 0.45 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e96.19 ( 0.56 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e76.13 ( 1.70 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003e41.22 ( 2.39 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c10\"\u003e \u003cp\u003e85.38 ( 0.26 )\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eAutoInt\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e60.52 ( 25.61 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e34.65 ( 31.74 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e43.82 ( 44.58 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e77.99 ( 8.17 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e67.19 ( 22.29 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e74.76 ( 38.21 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e42.60 ( 27.61 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003e41.68 ( 38.38 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c10\"\u003e \u003cp\u003e77.77 ( 7.99 )\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eAFM\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e77.36 ( 0.31 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e12.00 ( 32.50 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.01 ( 0.04 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e80.76 ( 2.11)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e78.88 ( 1.91 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e99.11 ( 1.15 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e31.42 ( 38.51 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003e10.57 ( 13.09 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c10\"\u003e \u003cp\u003e82.39 ( 1.42 )\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eFGCNN\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e83.68 ( 0.36 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e72.56 ( 1.79 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e45.79 ( 2.99 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e85.49 ( 0.38 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e83.99 ( 0.13 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e94.47 ( 0.75 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e72.12 ( 1.68 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003e48.34 ( 2.87 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c10\"\u003e \u003cp\u003e85.66 ( 0.24 )\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003eAccording to Table\u0026nbsp;\u003cspan refid=\"Tab2\" class=\"InternalRef\"\u003e2\u003c/span\u003e, the performance of XGboost classifier in terms of classification accuracy, precision, specification, recall and area under ROC curve was better than other classifiers. LightGBM, DeepFM and Decision tree classifiers were other top classifier options for further analysis. For the final step of feature selection, three different wrapper methods were used. After third feature selection strategies, i.e. wrapper feature selection methods, the initial 211-dimension feature space was reduced to an 18-dimension space. The selected features by each method were shown in Table S2 (see supporting materials). The final feature set was selected by accumulating the features proposed by wrapper methods. In this regard, eighteen features were proposed as the most discriminative features (feature set No.2) as ua_nitrite, ua_wbc, ua_bili, ua_leuk, ua_urobili, abx, abxUTI, age, chief_complaint, gender, Glucose, Lymphocytes, MPV, Potassium, RDW, ua_blood, ua_clarity, and ua_color. The classification performance for feature set No.2 and qualified classifiers was reported in Table\u0026nbsp;\u003cspan refid=\"Tab3\" class=\"InternalRef\"\u003e3\u003c/span\u003e.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab3\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 3\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eClassification results for the selected classifiers according to feature set No.2\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"10\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c7\" colnum=\"7\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c8\" colnum=\"8\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c9\" colnum=\"9\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c10\" colnum=\"10\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colspan=\"5\" nameend=\"c5\" namest=\"c1\"\u003e \u003cp\u003eValidation phase (10-fold cross validation)\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colspan=\"5\" nameend=\"c10\" namest=\"c6\"\u003e \u003cp\u003eTesting phase (repeated 5-fold cross validation)\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eClassifier\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eAccuracy\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003ePrecision\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eRecall\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003eAUC\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c6\"\u003e \u003cp\u003eAccuracy\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c7\"\u003e \u003cp\u003eSpecificity\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c8\"\u003e \u003cp\u003ePrecision\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c9\"\u003e \u003cp\u003eRecall\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c10\"\u003e \u003cp\u003eAUC\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eDecision tree\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e84.98 ( 0.45 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e75.52 ( 2.03 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e50.49 ( 1.94 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e86.97 ( 0.64 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e85.03 ( 0.22 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e95.57 ( 0.49 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e \u003cp\u003e76.45 ( 1.48 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e \u003cp\u003e49.04 ( 1.99 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c10\"\u003e \u003cp\u003e86.88 ( 0.21 )\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eXGboost\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e85.66 ( 0.34 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e75.30 ( 1.07 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e55.01 ( 1.20 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e88.35 ( 0.46 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e85.58 ( 0.13 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e94.70 ( 0.19 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e \u003cp\u003e75.24 ( 0.40 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e \u003cp\u003e54.63 ( 0.96 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c10\"\u003e \u003cp\u003e88.37 ( 0.11 )\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eLightGBM\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e85.76 ( 0.38 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e74.95 ( 1.25 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e55.91 ( 1.11 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e88.55 ( 0.50 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e85.80 ( 0.15 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e94.66 ( 0.22 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e \u003cp\u003e75.73 ( 0.94 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e \u003cp\u003e56.04 ( 0.54 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c10\"\u003e \u003cp\u003e88.56 ( 0.19 )\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eDeepFM\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e84.20 ( 0.61 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e77.64 ( 4.33 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e43.71 ( 7.11 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e87.17 ( 0.49 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e85.00 ( 0.07 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e95.40 ( 0.73 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e \u003cp\u003e76.01 ( 1.70 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e \u003cp\u003e49.43 ( 2.99 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c10\"\u003e \u003cp\u003e87.49 ( 0.15 )\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eCombined classifier (equal weight)\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e85.70 ( 0.37 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e76.01 ( 1.27 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e54.22 ( 1.53 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e88.52 ( 0.50 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e85.64 ( 0.20 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e94.89 ( 0.20 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e \u003cp\u003e75.86 ( 0.75 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e \u003cp\u003e54.37 ( 0.24 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c10\"\u003e \u003cp\u003e89.53 ( 0.25 )\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003eAccording to the results in Table\u0026nbsp;\u003cspan refid=\"Tab3\" class=\"InternalRef\"\u003e3\u003c/span\u003e, when a combination of three classic classifiers including decision tree, XGBoost, and lightGBM with equal weight as an ensemble classifier was used, the maximum value for classification was obtained.\u003c/p\u003e\n\u003ch3\u003eSome notes about selected features\u003c/h3\u003e\n\u003cp\u003eIn Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e, the percentage of cases with positive nitrite and uroblinogen report, the percentage of cases with large WBC, leucocytes, blood and bilirubin in urine sample, and critically large blood glucose, lymphocytes, potassium, MPV, and RDW reports for different classes (UTI infected vs. non-UTI) were shown. According to Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e, among most-informative features with categorical values (positive vs. negative, or large vs. other levels), urine nitrite, and urine WBC, urine leucocyte, urine blood, urine bilirubin, blood glucose, and blood potassium were features in which the percentage in the infected group was larger than the non-infected group. Furthermore, according to Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e, more percentage of non-UTI cases had critically large blood lymphocytes, RDW, and MPV and urine urobilinogen. This implies that such features reduced following UTI.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cdiv id=\"Sec8\" class=\"Section2\"\u003e \u003ch2\u003eCorrelation between selected features\u003c/h2\u003e \u003cp\u003eTo check for any correlation among selected features, correlation matrix was shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003e. The categorical variables were coded to a binary space as follows: for ua_nitrite: positive (\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e) vs. negative (0); for ua_urobili since according to Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e the number of positive cases was smaller in UTI as compared with non-UTI group: positive (0) vs. negative (\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e); for ua_wbc, ua_leuk, ua_blood, ua_bili: large (\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e) vs. other levels (0); for Glucose, and potassium: critally large (\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e) vs. other levels (0); for lymphocyte, MPV, and RDW: critally large (0) vs. other levels (\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e), since according to Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e the number of critically large cases was smaller in UTI as compared with non-UTI group; for ABX and ABX_UTI: yes (\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e) vs. no (0), and for ua_clarity: not_clear (\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e) vs. clear (0). Features such as chief compliant and ua_color were not considered for this analysis, since the exact relationship between them and UTI was unknown. In the used dataset, for both UTI positive and negative cases the yellow color urine was dominant (85.89% and 88.77% of UTI positive and negative cases, respectively, while the frequency of other urine colors was negligible). In addition, the most frequent chief compliant in both UTI and non-UTI group was abdominal pain (23.73% and 31.58%, respectively) and other symptoms were not dominant. Furthermore, there was no standard threshold for age variable regarding UTI incidence. Therefore, it was not possible to change these features into a binary representation. For this analysis only cases were considered in which all 14 above-mentioned variables were reported (48761 samples). The Pearson\u0026rsquo;s correlation was used for calculating correlation coefficient (R) and associated p-value.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eIn Table\u0026nbsp;\u003cspan refid=\"Tab4\" class=\"InternalRef\"\u003e4\u003c/span\u003e the weights of selected features according to the neighborhood component analysis were reported.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab4\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 4\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eWeights of features according to the neighborhood component analysis.\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"3\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eFeatures\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eweight\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eImportance score (%)\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eua_wbc\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.961\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e22.41\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eua_nitrite\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.926\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e21.61\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eua_leuk\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.842\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e19.63\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eua_clarity\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.545\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e12.72\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eMPV\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.156\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e3.63\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003elymphocytes\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.151\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e3.53\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eua_blood\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.146\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e3.40\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eua_color\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.131\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e3.05\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eua_bili\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.098\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e2.28\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eua_urobili\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.076\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e1.78\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eabx\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.061\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e1.42\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eage\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.057\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e1.34\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003egender\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.047\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e1.09\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eabxUTI\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.048\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e1.04\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eglucose\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.028\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.66\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eRDW\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.012\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.28\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003epotassium\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.005\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.13\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003echief_complaint\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003c/div\u003e\n\u003ch3\u003eSensitivity of classifier performance to age and gender\u003c/h3\u003e\n\u003cp\u003eFeature selection stage showed that age and gender were important factors for UTI prediction. To check how the performance of the predictor system was affected by age, or gender, the prediction capability for different age spans (18\u0026ndash;40, 40\u0026ndash;60, and \u0026gt;\u0026thinsp;60 years of old) and different genders (male or female) was investigated. In Figs.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003e and \u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003e, ROC curve for UTI perdition for different age spans and different genders was shown. For this analysis, an ensemble learning technique by combining decision tree, XGBoost and lightGBM estimators with equal weight was used. The combination used a voting scheme to make the final prediction.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003c/p\u003e\n\u003ch3\u003eComparison with other methods\u003c/h3\u003e\n\u003cp\u003eOne of the main challenges in UTI prediction is determining the minimal feature set required for a fast and accurate prediction. In this regard, different studies tried to find the most informative features according to their dataset. In Table\u0026nbsp;\u003cspan refid=\"Tab5\" class=\"InternalRef\"\u003e5\u003c/span\u003e, the prediction capability of several studies according to different feature sets was compared with the results obtained by our analyses. Since various strategies can be applied for implementing machine learning algorithms (such as different data splitting strategies, cross-validation folds, classifier implementations, optimization methods, etc.), the prediction capability of these feature sets was estimated by the same classifier developed in our study and the same dataset that was released by Taylor et al. (\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e).\u003c/p\u003e \u003cp\u003eIn Taylor et al. (\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e), a minimal feature set including age, gender, urine analysis leukocytes, urine nitrite, urine WBC, urine bacteria, urine blood, urine epithelial cells, history of UTI and dysuria was proposed as the most informative features. In Choi et al. (\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e), top ten variables for UTI prediction were reported to be urine bacterial count, monocyte count, WBC count, lymphocyte count, urinary WBC count, specific gravity, diastolic blood pressure, systolic blood pressure, age, mean blood pressure, and C-reactive protein. All these features were included in the dataset of Taylor et al. (\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e). The result of UTI prediction based on the above-mentioned features, our develop machine learning strategy and Taylor et al. dataset (\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e) was reported in Table\u0026nbsp;\u003cspan refid=\"Tab5\" class=\"InternalRef\"\u003e5\u003c/span\u003e.\u003c/p\u003e \u003cp\u003eIn Burton et al. (\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e), which used an independent dataset compared to this study and Taylor et al. (\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e), 21 features including wbc count, bacterial count, age, epithelial cell count, rbc count, number of positive cultures to date, pyuria (no rbcs), pregnancy, inpatient, gender, persistent/recurrent, infection, number of positive, cultures month prior, positive for nitrates, renal inpatient/outpatient, pre-operative patient, acute kidney disease, immunocompromised, number of positive cultures week prior, multiple sclerosis, offensive smell, haematuria (no wbcs) were selected as the most informative for UTI prediction. The best performance for UTI prediction for this feature set was obtained by XGboost classifier (AUC\u0026thinsp;=\u0026thinsp;0.91). However, since the dataset of Burton et al. was not accessible, it was not possible to compare the potential of the proposed feature set for UTI prediction with other methods.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab5\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 5\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eComparison between UTI prediction capability of different feature sets. The same classification strategy and the same dataset were used for all feature sets.\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"6\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eAccuracy\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eSpecificity\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003ePrecision\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003eRecall\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c6\"\u003e \u003cp\u003eAUC*\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eThis work (XGboost)\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e85.58\u003c/p\u003e \u003cp\u003e( 0.13 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e94.70\u003c/p\u003e \u003cp\u003e( 0.19 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e75.24\u003c/p\u003e \u003cp\u003e( 0.40 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e54.63\u003c/p\u003e \u003cp\u003e( 0.96 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e88.37\u003c/p\u003e \u003cp\u003e( 0.11 )\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eThis work (Combined classifier)\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e85.64\u003c/p\u003e \u003cp\u003e( 0.20 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e94.89\u003c/p\u003e \u003cp\u003e( 0.20 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e75.86\u003c/p\u003e \u003cp\u003e( 0.75 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e54.37\u003c/p\u003e \u003cp\u003e( 0.24 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e88.53\u003c/p\u003e \u003cp\u003e( 0.25 )\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eTaylor et al.\u003c/b\u003e\u003c/p\u003e \u003cp\u003e\u003cb\u003e(XGboost)\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e84.39\u003c/p\u003e \u003cp\u003e( 0.20 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e94.29\u003c/p\u003e \u003cp\u003e( 0.21 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e72.41\u003c/p\u003e \u003cp\u003e( 0.81 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e50.84\u003c/p\u003e \u003cp\u003e( 0.84 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e86.26\u003c/p\u003e \u003cp\u003e( 0.22 )\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eTaylor et al.\u003c/b\u003e\u003c/p\u003e \u003cp\u003e\u003cb\u003e(combined classifier)\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e84.44\u003c/p\u003e \u003cp\u003e( 0.23 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e95.26\u003c/p\u003e \u003cp\u003e( 0.28 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e74.85\u003c/p\u003e \u003cp\u003e( 0.50 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e47.77\u003c/p\u003e \u003cp\u003e( 1.47 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e86.34\u003c/p\u003e \u003cp\u003e( 0.56 )\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eChoi et al. (XGboost)\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e83.95\u003c/p\u003e \u003cp\u003e( 0.24 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e94.05\u003c/p\u003e \u003cp\u003e( 0.13 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e71.14\u003c/p\u003e \u003cp\u003e( 0.36 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e49.70\u003c/p\u003e \u003cp\u003e( 0.66 )\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e85.43\u003c/p\u003e \u003cp\u003e( 0.30 )\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eChoi et al. (Combined classifier)\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e84.27\u003c/p\u003e \u003cp\u003e(0.26)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e94.44\u003c/p\u003e \u003cp\u003e(0.18)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e72.43\u003c/p\u003e \u003cp\u003e(0.48)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e49.66\u003c/p\u003e \u003cp\u003e(0.63)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e85.82\u003c/p\u003e \u003cp\u003e(0.36)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003e*To show minor differences AUC was scaled up to 0-100 range.\u003c/p\u003e"},{"header":"Discussion","content":"\u003cp\u003eThe current study suggested that urine WBC, urine nitrite, urine leucocyte and urine clarity as most-informative biomarkers for UTI prediction (see Table\u0026nbsp;\u003cspan refid=\"Tab4\" class=\"InternalRef\"\u003e4\u003c/span\u003e). Positive nitrite in urine due to the contribution of bacteria to change nitrates to nitrites in the urine might be considered as a possible sign of infection in urinary tract. However, a previous study showed that analysis of solely urine nitrite, WBCs, and leucocytes as fast strategy for UTI suspected cases (\u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e19\u003c/span\u003e) was subject to the low sensitivity and specificity (\u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e20\u003c/span\u003e). This might be due to the fact that even though higher levels of WBCs and positive nitrite are potential signs of UTI, they are not specific characteristics of UTI. For example other conditions such as kidney stone or pelvic problems may trigger the increase of WBC count or in patients with gastroenteritis, the urinary nitrite significantly increases (\u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e21\u003c/span\u003e). The lack of specificity of these factors for UTI was evident in results depicted in Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e in which a portion of non-infected (non-UTI) samples showed positive nitrite or large level of WBCs and leucocytes.\u003c/p\u003e \u003cp\u003eAnother useful biomarker for bacterial inflammation detection is neutrophil to lymphocyte ratio, however it is not specific for UTI and several diseases such as pneumonia, cancers or heart diseases affect such a biomarker(\u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e22\u003c/span\u003e). Positive urine bilirubin which was supposed to be a potent biomarker(Tables\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e and \u003cspan refid=\"Tab4\" class=\"InternalRef\"\u003e4\u003c/span\u003e), is highly correlated with unusual liver functions (\u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e23\u003c/span\u003e), therefore such indicator is not the specific biomarker for UTI. The small weight for such feature (Table\u0026nbsp;\u003cspan refid=\"Tab4\" class=\"InternalRef\"\u003e4\u003c/span\u003e) and the relatively large non-UTI samples with large bilirubin level (Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e) showed that bilirubin could not be considered as a potential biomarker for UTI.\u003c/p\u003e \u003cp\u003eLiterature suggested a positive correlation between diabetes (higher level of blood glucose) and UTI. Diabetes may damage the nerves and weaken the immunity system and in this way increases the risk of bacterial infections. High blood levels may also provide nutrients for bacteria (\u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e24\u003c/span\u003e). This might justify why blood glucose was proposed in the current study as a biomarker for UTI prediction. In line with the results of the current study, MPV as an indicator of function of platelets, was suggested by another study as a potential biomarkers for UTI(\u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e25\u003c/span\u003e). MPV during mild inflammation shows an increase rate, while for severe inflammations it decreases due to the consumption of platelets (\u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e26\u003c/span\u003e). The direction of MPV change is a controversial issue between studies. Some studies introduced MPV as positive acute phase reactant and reported its increased level (\u003cspan citationid=\"CR27\" class=\"CitationRef\"\u003e27\u003c/span\u003e), while some other studies considered MPV as a negative acute phase reactant and reported its decrement after inflammation(\u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e26\u003c/span\u003e). According the dataset used in this study, the percentage of UTI labeled subjects with large level of MPV was lower compared with non-UTI samples (Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e).\u003c/p\u003e \u003cp\u003eThe urine clarity which was proposed as a potential biomarker in the current study (Table\u0026nbsp;\u003cspan refid=\"Tab4\" class=\"InternalRef\"\u003e4\u003c/span\u003e), is another indicator that usually is used for UTI prediction. However, previous results implied that visual inspection of urine cannot be a sufficient indicator for UTI prediction (\u003cspan citationid=\"CR28\" class=\"CitationRef\"\u003e28\u003c/span\u003e). Furthermore, previous studies suggested higher sensitivity and specificity of WBC count than nitrite when they compared with urine culture (\u003cspan citationid=\"CR29\" class=\"CitationRef\"\u003e29\u003c/span\u003e). This is in accordance with higher importance score of urine WBC as compared with nitrite, however, other factors like blood glucose level or pregnancy might influence such outcome(\u003cspan citationid=\"CR29\" class=\"CitationRef\"\u003e29\u003c/span\u003e). The results of this study proposed RDW as one of the potential biomarkers of UTI prediction. Previous studies reported correlation between RDW and systemic inflammation (\u003cspan citationid=\"CR30\" class=\"CitationRef\"\u003e30\u003c/span\u003e). The results (Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e) indicated lower percentage of UTI cases had large level of RDW as compared with non-UTI samples. Another potent factor that was highlighted in the current study for UTI prediction was potassium level (Table\u0026nbsp;\u003cspan refid=\"Tab4\" class=\"InternalRef\"\u003e4\u003c/span\u003e). Several studies reported the ionic abnormalities among inpatient with UTI (\u003cspan citationid=\"CR31\" class=\"CitationRef\"\u003e31\u003c/span\u003e, \u003cspan citationid=\"CR32\" class=\"CitationRef\"\u003e32\u003c/span\u003e). The ionic abnormality may be due to the poor feeding, increased sweating, or vomiting in UTI conditions (\u003cspan citationid=\"CR33\" class=\"CitationRef\"\u003e33\u003c/span\u003e). Low potassium level (hypokalemia) was commonly observed in UTI cases (\u003cspan citationid=\"CR34\" class=\"CitationRef\"\u003e34\u003c/span\u003e), while for pyelonephritis cases (a type of urinary tract infection with infected kidneys), the increased level of circulating potassium was observed compared with control group (\u003cspan citationid=\"CR35\" class=\"CitationRef\"\u003e35\u003c/span\u003e). The increased potassium level is in accordance with the analysis of the used dataset (see Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e) in which the prevalence of critically large blood potassium was observed for UTI group.\u003c/p\u003e \u003cp\u003eAccording to Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003e, there was no strong correlation (R\u0026thinsp;\u0026gt;\u0026thinsp;0.5) between features except for ABX (antibiotic usage) and ABX_UTI (antibiotic usage for UTI), which are highly correlated (R\u0026thinsp;=\u0026thinsp;1). This may be due to the fact that people who used antibiotics in large extent are more susceptible to UTI due to antibiotic resistance. It should be noted that the prediction performance of the classifier by removing one of ABX or ABX_UTI variables degraded slightly (not reported in the manuscript), therefore, it is suggested to use both of them. Furthermore, a moderate correlation (R\u0026thinsp;=\u0026thinsp;0.42, p\u0026thinsp;\u0026lt;\u0026thinsp;0.01) was observed between leukocyte esterase (ua_leuk) and white blood cell counts in urine (ua_wbc). This correlation is reasonable since leukocyte esterase is an enzyme found in white blood cells. The non-strong correlations indicate selected features do not share common information regarding UTI.\u003c/p\u003e \u003cp\u003eThe overall classification accuracy of this work was 85.64% (the result for larger AUC) which obtained by an ensemble classifier using the combination of decision tree, XGBoost and lightGBM classifiers with equal weights (Table\u0026nbsp;\u003cspan refid=\"Tab3\" class=\"InternalRef\"\u003e3\u003c/span\u003e). However, this performance degraded when UTI prediction was done for female samples or younger cases (Figs.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003e and \u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003e). These results showed the importance of gender and age for developing machine learning strategies for UTI prediction. In addition, when the proposed feature set of this work was compared with other studies (Table\u0026nbsp;\u003cspan refid=\"Tab5\" class=\"InternalRef\"\u003e5\u003c/span\u003e), it was clear that this combination obtained better accuracy and an enhanced trade-off between type1 and type 2 error for UTI prediction.\u003c/p\u003e"},{"header":"Conclusion","content":"\u003cp\u003eUTI is a frequent problem in different societies. The reliable prediction of UTI in a quick time prevents unnecessary antibiotic use for non UTI cases and also facilitates microbial degradation for UTI cases. Machine learning strategies according to the related UTI data is an interesting tool for developing prediction systems for UTI. The current study used an available dataset which contained several features including urinalysis, blood test, and vital sign, demographic and observational data for UTI suspected cases. Finding the most-informative features as well as developing an intelligent system for predicting UTI were the main purposes of this study. This study showed the potential of machine learning strategies for prediction of UTI according to laboratory and urinalysis results. Furthermore, eighteen features with maximum discrimination capability were proposed. Furthermore, the results showed that age and gender were two factors which affected UTI prediction.\u003c/p\u003e"},{"header":"Declarations","content":" \u003cp\u003e \u003cb\u003eStatements of ethical approval\u003c/b\u003e-Not applicable\u003c/p\u003e \u003cp\u003e \u003cstrong\u003eConsent to participate\u003c/strong\u003e \u003cp\u003e-Not applicable\u003c/p\u003e \u003c/p\u003e \u003cp\u003e \u003cstrong\u003eConsent for publication\u003c/strong\u003e \u003cp\u003e-Not applicable\u003c/p\u003e \u003c/p\u003e \u003cp\u003e \u003cstrong\u003eConflicts of interest/Competing interests\u003c/strong\u003e \u003cp\u003e-There is nothing to declare.\u003c/p\u003e\u003ch2\u003eFunding-\u003c/h2\u003e \u003cp\u003eThis work was Funded by Hamadan University of Medical Sciences, Deputy of research and technology.\u003c/p\u003e\u003ch2\u003eAuthor Contribution\u003c/h2\u003e\u003cp\u003e- S.F and H.EM performed systematic searches. S.F performed analyses and wrote the manuscript. S.F, and H.EM discussed the obtained results and finalized the draft.\u003c/p\u003e\u003ch2\u003eAcknowledgement\u003c/h2\u003e\u003cp\u003eAuthors would like to thank vice-chancellor for research and technology for all support for the current work.\u003c/p\u003e\u003ch2\u003eAvailability of data and materials-\u003c/h2\u003e \u003cp\u003eThe data used in this study is fully accessible from \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1371/journal.pone.0194085\u003c/span\u003e\u003cspan address=\"10.1371/journal.pone.0194085\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\u003cli\u003e\u003cspan\u003eStamm WE, Norrby SR. Urinary tract infections: disease panorama and challenges. J Infect Dis. 2001;183(Suppl 1):S1\u0026ndash;4.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBurton RJ, Albur M, Eberl M, Cuff SM. Using artificial intelligence to reduce diagnostic workload without compromising detection of urinary tract infections. BMC Med Inf Decis Mak. 2019;19(1):171.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eGoździkiewicz N, Zwolińska D, Polak-Jonkisz D. The Use of Artificial Intelligence Algorithms in the Diagnosis of Urinary Tract Infections-A Literature Review. J Clin Med. 2022;11(10).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eTaylor RA, Moore CL, Cheung K-H, Brandt C. Predicting urinary tract infections in the emergency department with machine learning. PLoS ONE. 2018;13(3):e0194085.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eChoi MH, Kim D, Park Y, Jeong SH. Development and validation of artificial intelligence models to predict urinary tract infections and secondary bloodstream infections in adult patients. J Infect Public Health. 2024;17(1):10\u0026ndash;7.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eGadalla AAH, Friberg IM, Kift-Morgan A, Zhang J, Eberl M, Topley N, et al. Identification of clinical and urine biomarkers for uncomplicated urinary tract infection using machine learning algorithms. Sci Rep. 2019;9(1):19694.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKozachenko LF, Leonenko NN. Sample Estimate of the Entropy of a Random Vector. Probl Peredachi Inf. 1987;23(2):9\u0026ndash;16.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eDodge Y. The concise encyclopedia of statistics. New York: Springer; 2010.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eYang W, Wang K, Zuo W. Neighborhood Component Feature Selection for High-Dimensional Data. J Comput. 2012;7:161\u0026ndash;8.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKe G, Meng Q, Finley T, Wang T, Chen W, Ma W, et al. editors. LightGBM: A Highly Efficient Gradient Boosting Decision Tree. Neural Information Processing Systems; 2017.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eGuo H, Tang R, Ye Y, Li Z, He X. DeepFM: a factorization-machine based neural network for CTR prediction. arXiv preprint arXiv:170304247. 2017.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLian J, Zhou X, Zhang F, Chen Z, Xie X, Sun G, editors. xdeepfm: Combining explicit and implicit feature interactions for recommender systems. Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery \u0026amp; data mining; 2018.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eCheng H-T, Koc L, Harmsen J, Shaked T, Chandra T, Aradhye H et al. Wide Deep Learn Recommender Syst2016. 7\u0026ndash;10 p.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWang R, Fu B, Fu G, Wang M. Deep \u0026amp; Cross Network for Ad Click Predictions. Proceedings of the ADKDD'17. 2017.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eQu Y, Cai H, Ren K, Zhang W, Yu Y, Wen Y, Wang J, editors. Product-Based Neural Networks for User Response Prediction. 2016 IEEE 16th International Conference on Data Mining (ICDM); 2016 12\u0026ndash;15 Dec. 2016.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSong W, Shi C, Xiao Z, Duan Z, Xu Y, Zhang M, Tang J, editors. Autoint: Automatic feature interaction learning via self-attentive neural networks. Proceedings of the 28th ACM international conference on information and knowledge management; 2019.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eXiao J, Ye H, He X, Zhang H, Wu F, Chua T-S. Attentional factorization machines: Learning the weight of feature interactions via attention networks. arXiv preprint arXiv:170804617. 2017.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLiu B, Tang R, Chen Y, Yu J, Guo H, Zhang Y, editors. Feature generation by convolutional neural network for click-through rate prediction. The World Wide Web Conference; 2019.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAppenheimer AB, Ford B. Urine Dipstick: Urinary Nitrites and Leukocyte Esterase \u0026ndash; Dipping into Murky Waters. In: Sharp VJA, Antes LM, Sanders ML, Lockwood GM, editors. Urine Tests: A Case-Based Guide to Clinical Evaluation and Application. Cham: Springer International Publishing; 2020. pp. 97\u0026ndash;115.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWilliams GJ, Macaskill P, Chan SF, Turner RM, Hodson E, Craig JC. Absolute and relative accuracy of rapid urine tests for urinary tract infection in children: a meta-analysis. Lancet Infect Dis. 2010;10(4):240\u0026ndash;50.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWanchu A, Khullar M, Sud A, Deodhar SD, Bambery P. Elevated urinary nitrite and citrulline levels in patients with rheumatoid arthritis. Inflammopharmacology. 1999;7(2):155\u0026ndash;61.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHan SY, Lee IR, Park SJ, Kim JH, Shin JI. Usefulness of neutrophil-lymphocyte ratio in young children with febrile urinary tract infection. Korean J Pediatr. 2016;59(3):139\u0026ndash;44.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eFoley KF, Wasserman J. Are Unexpected Positive Dipstick Urine Bilirubin Results Clinically Significant? A Retrospective Review. Lab Med. 2014;45(1):59\u0026ndash;61.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAhmed AE, Abdelkarim S, Zenida M, Baiti MAH, Alhazmi AAY, Alfaifi BAH et al. Prevalence and Associated Risk Factors of Urinary Tract Infection among Diabetic Patients: A Cross-Sectional Study. Healthc (Basel). 2023;11(6).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAkya A, Rostami-Far Z, Chegene Lorestani R, Khazaei S, Elahi A, Rostamian M, et al. Platelet Indices as Useful Indicators of Urinary Tract Infection. Iran J Ped Hematol Oncol. 2019;9(3):159\u0026ndash;65.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eTanju C, Ekrem G, Emel AB, Nur A. Mean platelet volume as a negative marker of inflammation in children with rotavirus gastroenteritis. Iran J Pediatr. 2014;24(5):617.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAlbayrak Y, Albayrak A, Albayrak F, Yildirim R, Aylu B, Uyanik A, et al. Mean platelet volume: a new predictor in confirming acute appendicitis diagnosis. Clin Appl Thromb Hemost. 2011;17(4):362\u0026ndash;6.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBulloch B, Bausher JC, Pomerantz WJ, Connors JM, Mahabee-Gittens M, Dowd MD. Can Urine Clarity Exclude the Diagnosis of Urinary Tract Infection? Pediatrics. 2000;106(5):e60\u0026ndash;e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMohanna AT, Alshamrani KM, SaemAldahar MA, Kidwai AO, Kaneetah AH, Khan MA, Mazraani N. The Sensitivity and Specificity of White Blood Cells and Nitrite in Dipstick Urinalysis in Association With Urine Culture in Detecting Infection in Adults From October 2016 to October 2019 at King Abdulaziz Medical City. Cureus. 2021;13(6):e15436.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMa W, Mao S, Bao M, Wu Y, Guo Y, Liu J, et al. Prognostic significance of red cell distribution width in bladder cancer. Translational Androl Urol. 2020;9(2):295\u0026ndash;302.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003ePark SJ, Oh YS, Choi MJ, Shin JI, Kim KH. Hyponatremia may reflect severe inflammation in children with febrile urinary tract infection. Pediatr Nephrol. 2012;27:2261\u0026ndash;7.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWinberg J. Renal function studies in infants and children with acute, nonobstructive urinary tract infections. Acta Paediatr. 1959;48:577\u0026ndash;89.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBertini A, Milani GP, Simonetti GD, Fossali EF, Far\u0026eacute; PB, Bianchetti MG, Lava SAG, Na+, K+. Cl\u0026ndash;, acid\u0026ndash;base or H2O homeostasis in children with urinary tract infections: a narrative review. Pediatr Nephrol. 2016;31(9):1403\u0026ndash;9.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eShen A-L, Lin H-L, Lin H-C, Tseng Y-F, Hsu C-Y, Chou C-Y. Urinary tract infection is associated with hypokalemia: a case control study. BMC Urol. 2020;20(1):108.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWatanabe T. Hyponatremia and hyperkalemia in infants with acute pyelonephritis. Pediatr Nephrol. 2004;19:361\u0026ndash;2.\u003c/span\u003e\u003c/li\u003e\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":true,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"bmc-medical-informatics-and-decision-making","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"midm","sideBox":"Learn more about [BMC Medical Informatics and Decision Making](http://bmcmedinformdecismak.biomedcentral.com/)","snPcode":"","submissionUrl":"https://www.editorialmanager.com/midm/default.aspx","title":"BMC Medical Informatics and Decision Making","twitterHandle":"BMC_series","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"em","reportingPortfolio":"BMC Series","inReviewEnabled":true,"inReviewRevisionsEnabled":true},"keywords":"Urinary tract infection, prediction, machine learning, feature extraction","lastPublishedDoi":"10.21203/rs.3.rs-5107375/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-5107375/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003e\u003cstrong\u003eBackground-\u003c/strong\u003eUrinary tract infection (UTI) is a frequent health-threatening condition. Early reliable diagnosis of UTI helps to prevent misuse or overuse of antibiotics and hence prevent antibiotic resistance. The gold standard for UTI diagnosis is urinalysis which is a time-consuming and also an error prone method. In this regard, complementary methods are demanded. In the recent decade, machine learning strategies that employ mathematical models on a dataset to extract the most-informative hidden information are the center of interest for prediction and diagnosis purposes.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eMethod\u003c/strong\u003e-In this study, machine learning approaches were used for finding the important variables for a reliable prediction of UTI. Several types of intelligent machines including classical and deep learning tools were used for this purpose.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eResults\u003c/strong\u003e- Eighteen selected features from urine test, blood test and demographic data were selected as the most-informative. Factors extracted from urine such as WBC, nitrite, leukocyte, clarity, color, blood, bilirubin, urobilinogen, and factors extracted from blood test like mean platelet volume, lymphocyte, glucose, red blood cell distribution width, and potassium, demographic data such as age, gender and previous use of antibiotics are the determinative factors for UTI prediction. An ensemble combination of XGBoost, decision tree, and light gradient boosting machines with a voting scheme obtained the highest accuracy for UTI prediction (AUC: 88.53 (0.25), accuracy: 85.64 (0.20)%), according to the selected feature. Furthermore, the results showed the importance of gender and age for UTI prediction.\u003cstrong\u003e\u0026nbsp;\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eConclusion-\u003c/strong\u003eThis study highlighted the potential of machine learning for UTI prediction.\u003c/p\u003e","manuscriptTitle":"Prediction of urinary tract infection using machine learning methods-A study for finding the most-informative variables","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2024-12-03 11:31:42","doi":"10.21203/rs.3.rs-5107375/v1","editorialEvents":[{"type":"communityComments","content":0},{"type":"decision","content":"Revision requested","date":"2024-09-20T15:59:57+00:00","index":"","fulltext":""},{"type":"editorAssigned","content":"","date":"2024-09-20T06:31:43+00:00","index":"","fulltext":""},{"type":"checksComplete","content":"","date":"2024-09-20T06:31:41+00:00","index":"","fulltext":""},{"type":"submitted","content":"BMC Medical Informatics and Decision Making","date":"2024-09-18T06:17:03+00:00","index":"","fulltext":""}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"bmc-medical-informatics-and-decision-making","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"midm","sideBox":"Learn more about [BMC Medical Informatics and Decision Making](http://bmcmedinformdecismak.biomedcentral.com/)","snPcode":"","submissionUrl":"https://www.editorialmanager.com/midm/default.aspx","title":"BMC Medical Informatics and Decision Making","twitterHandle":"BMC_series","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"em","reportingPortfolio":"BMC Series","inReviewEnabled":true,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"e03a48f2-c6dc-47a4-ada7-20dc0045d413","owner":[],"postedDate":"December 3rd, 2024","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"published-in-journal","subjectAreas":[],"tags":[],"updatedAt":"2025-01-13T16:11:11+00:00","versionOfRecord":{"articleIdentity":"rs-5107375","link":"https://doi.org/10.1186/s12911-024-02819-2","journal":{"identity":"bmc-medical-informatics-and-decision-making","isVorOnly":false,"title":"BMC Medical Informatics and Decision Making"},"publishedOn":"2025-01-09 15:58:01","publishedOnDateReadable":"January 9th, 2025"},"versionCreatedAt":"2024-12-03 11:31:42","video":"","vorDoi":"10.1186/s12911-024-02819-2","vorDoiUrl":"https://doi.org/10.1186/s12911-024-02819-2","workflowStages":[]},"version":"v1","identity":"rs-5107375","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-5107375","identity":"rs-5107375","version":["v1"]},"buildId":"qtupq5eGEP_6zYnWcrvyt","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

Ask this paper AI returns verbatim quotes from the full text · source: preprint-html

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2024) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc
last seen: 2026-05-20T01:45:00.602351+00:00