Machine Learning-Driven Strategies for Enhanced Pediatric Wheezing Detection

doi:10.21203/rs.3.rs-4419150/v1

Machine Learning-Driven Strategies for Enhanced Pediatric Wheezing Detection

2024 · doi:10.21203/rs.3.rs-4419150/v1

preprint OA: closed

Full text JSON View at publisher

Full text 82,400 characters · extracted from preprint-html · click to expand

Machine Learning-Driven Strategies for Enhanced Pediatric Wheezing Detection | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article Machine Learning-Driven Strategies for Enhanced Pediatric Wheezing Detection Hye Jeong Moon, Hyunmin Ji, Baek Seung Kim, Beom Joon Kim, Kyunghoon Kim This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-4419150/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract Background Auscultation is a critical diagnostic feature of lung diseases, but it is subjective and challenging to measure accurately. To overcome these limitations, artificial intelligence models have been developed. Methods In this prospective study, we aimed to compare respiratory sound feature extraction methods to develop an optimal machine learning model for detecting wheezing in children. Pediatric pulmonologists recorded and verified 103 instances of wheezing and 184 other respiratory sounds in 76 children. Various methods were used for sound feature extraction, and dimensions were reduced using t-distributed Stochastic Neighbor Embedding (t-SNE). The performance of models in wheezing detection was evaluated using a kernel support vector machine (SVM). Results The duration of recordings in the wheezing and non-wheezing groups were 89.36 ± 39.51 ms and 63.09 ± 27.79 ms, respectively. The Mel-spectrogram, Mel-frequency Cepstral Coefficient (MFCC), and spectral contrast achieved the best expression of respiratory sounds and showed good performance in cluster classification. The SVM model using spectral contrast exhibited the best performance, with an accuracy, precision, recall, and F-1 score of 0.897, 0.800, 0.952, and 0.869, respectively. Conclusion Mel-spectrograms, MFCC, and spectral contrast are effective for characterizing respiratory sounds in children. A machine learning model using spectral contrast demonstrated high detection performance, indicating its potential utility in ensuring accurate diagnosis of pediatric respiratory diseases. Figures Figure 1 Figure 2 Introduction Wheezing is defined as the rapid movement of air through narrowed airways caused by bronchial asthma, allergic reactions, or respiratory infections. ( 1 ) Wheezing is characterized by sinusoidal oscillations of 100–1000 Hz and can occur during both inhalation and exhalation. Wheezing is an important symptom in the diagnosis of various diseases. For example, in asthma and chronic obstructive pulmonary disease, wheezing can be heard in any part of the chest due to airway narrowing in the anterior lung fields. However, local bronchial obstruction due to foreign bodies, mucus, or narrowing due to tumors may cause wheezing predominantly in specific areas. ( 2 ) In general practice, lung diseases are diagnosed by clinicians following an examination of the patient's chief complaint, medical history, physical examination, and auscultatory findings. Distinguishing between wheezing and non-wheezing during auscultatory examination, which is key in the diagnosis of many lung diseases, requires years of training and experience, and is open to individual subjectivity. This makes objective assessment difficult. Furthermore, in high risk patients requiring isolation, direct physical examination is limited, which limits auscultatory assessment. ( 3 – 5 ) To overcome these limitations, researchers have used artificial intelligence (AI) to distinguish between normal and abnormal auscultatory sounds, with some studies indicating better performance than human doctors. ( 6 , 7 ) In particular, the International Conference on Biomedical and Health Informatics open dataset has been extensively studied for auscultatory sound classification. ( 8 – 10 ) In these studies, preprocessing methods were used to extract audio features for AI training. These features included the Mel-spectrogram, log-Mel-spectrogram, and Mel-frequency cepstral coefficient (MFCC), which are known to represent audio data. ( 11 , 12 ) This study was conducted to determine the most effective tool for the extraction of features from wheezing sounds. Feature extraction was performed using various sound classification tools. To observe the differences in performance, we used a kernel support vector machine (SVM), a type of machine learning model, to classify wheezing. ( 13 ) In addition, we compared the dimensionality of different features extracted from audio data by t-stochastic neighbor embedding (t-SNE) by reducing and visualizing them, and analyzed which features could be used to learn and represent breathing sound data well. ( 14 ) The overall aim of this project was to determine the existing techniques that are effective in distinguishing between breathing sounds. Methods Study design and data collection We conducted a prospective study of pediatric patients who visited the pediatric department of a university hospital in Korea between August 2019 and January 2020. All records were obtained from patients who voluntarily agreed to have their breath sounds recorded. All breath sounds were recorded in children visiting the outpatient department by a pediatric respiratory specialist using an electronic stethoscope (Jabes, GST Technology, Seoul, Korea). The recorded auscultatory sounds were categorized as wheezing or non-wheezing according to the pediatric physician’s diagnosis. Two auscultation cycles were recorded for each patient: one in the anterior lung and one in the posterior lung. Four breathing sounds were recorded for each participant. To validate the classification, two pediatric respiratory specialists performed blinded validation for recorded breathing sounds ; if there was more than one classification identical to the original classification, it was flagged and stored in the database. Data on sex, age, and auscultation site were also collected. Feature Extraction In this study, the following feature extraction methods were used to extract 48 kH breath sound data: 1) Mel-spectrogram: This is a popular feature extraction method which is used to analyze data with frequency characteristics that change over time. A Mel-spectrogram is output after the audio data have been subjected to a Fast Fourier Transform and passed through a Mel filter bank. 2) Log Mel-spectrogram: This method takes the logarithm of the Mel-spectrogram and converts it to a frequency similar to that heard by humans. 3) MFCC: This feature extraction method performs a Discrete Cosine Transform (DCT) operation on a Mel-spectrogram. It is primarily used for human speech data, and requires less computation than the Mel-spectrogram. ( 15 ) 4) MFCC-delta: This is a method of stacking the MFCC and deltas (first differences) and delta-deltas (second differences) for the MFCC around the frequency axis, representing noisy data. ( 16 ) 5) Chroma Short-Time Fourier Transform (STFT): This is a feature for the representation of a 12 tone scale, often used in the analysis of music data. ( 17 ) 6) Chroma Constant-Q Transform (CQT): This application uses CQT instead of SFTF in the chroma. It considers the geometric split between different frequency bands and contains additional high-frequency information. ( 18 ) 7) Spectral contrast: This method is based on differences in spectral contrast, where higher frequencies are contrasted with lower frequencies to create a more pronounced difference. ( 19 ) 8) Tonnetz: This method incorporates the tonnet theory discovered by Euler and is effective in uncovering hidden relationships and patterns. ( 20 ) Evaluation of the AI algorithm The kernel SVM was used to classify wheezing and non-wheezing sounds for each feature obtained from the breath sound data (Fig. 1 ). For visualization purposes, the dimensionality was reduced to a two-dimensional coordinate plane using t-SNE. An SVM is a supervised learning algorithm that aims to classify two categories by finding the optimal decision boundary between them. Kernel SVMs apply a kernel trick to SVMs that allows them to classify multidimensional data linearly. ( 14 ) In the present study, we performed 5-fold cross-validation and grid search on the training data (80% of the total data) to explore the optimal hyperparameters of the kernel SVM for training and compared the final results with the test data (20% of the total data) (Table 1 , Table 2 ). Table 1 Selected hyperparameters with grid search. Feature Type Kernel Gamma Mel-spectrogram Linear 0.1 1000 Log Mel-spectrogram Linear 0.1 1000 MFCC RBF 0.1 100 MFCC-Delta RBF 0.1 1000 Chroma STFT RBF 1 0.1 Chroma CQT RBF 1000 0.01 Spectral contrast Linear 0.1 1000 Tonnetz Linear 0.1 1000 MFCC, Mel-Frequency Cepstral Coefficient, STFT; Short-Time Fourier Transform, CQT; Constant-Q Transform, RBF; Radial Basis Function Table 2 Performance of the different models in discriminating other respiratory sounds from wheezing use kernel support vector machine Feature Accuracy AUC Precision Recall F1-score Mel-spectrogram 0.862 0.871 0.760 0.905 0.826 Log Mel-spectrogram 0.845 0.868 0.714 0.952 0.816 MFCC 0.863 0.882 0.741 0.952 0.833 MFCC-Delta 0.810 0.799 0.727 0.761 0.744 Chroma STFT 0.724 0.722 0.600 0.714 0.652 Chroma CQT 0.689 0.654 0.579 0.524 0.550 Spectral contrast 0.897 0.909 0.800 0.952 0.869 Tonnetz 0.672 0.651 0.545 0.571 0.558 MFCC, Mel-Frequency Cepstral Coefficient, STFT; Short-Time Fourier Transform, CQT; Constant-Q Transform t-SNE is a machine learning algorithm that reduces the dimensionality of high-dimensional data for vector visualization. The probability values for each dimension were calculated based on SNE to reduce the dimensions while maintaining the distance values between each vector. Let ${p}_{j\mid i}$ represents the probability of data ${x}_{i}$ and ${x}_{j}$ being chosen as neighbors in the pre-reduction dimension, as shown in Eq. 1 below: $${p}_{j\mid i}=\frac{\text{exp}\left(-{∥{x}_{i}-{x}_{j}∥}^{2}/2{\sigma }_{i}^{2}\right)}{\sum _{k\ne i} \text{e}\text{x}\text{p}\left(-{∥{x}_{i}-{x}_{k}∥}^{2}/2{\sigma }_{i}^{2}\right)}$$ 1 Similarly, ${q}_{j\mid i}$ is defined as the probability that data ${y}_{i}$ and ${y}_{j}$ are selected as neighbors in the dimension-reduced post-analysis (Eq. 2 ): $${q}_{j\mid i}=\frac{{e}^{-{∥{y}_{i}-{y}_{j}∥}^{2}}}{\sum _{k\ne i} {e}^{-{∥{y}_{i}-{y}_{k}∥}^{2}}}$$ 2 The t-SNE defines ${p}_{ij}$ in order to impose symmetry, where ${p}_{j\mid i}={p}_{i\mid j}$ . This was achieved using the formula shown in Eq. 3 : $${p}_{ij}=\frac{{p}_{j\mid i}+{p}_{i\mid j}}{2N}$$ 3 We defined Kullback-Leibler divergence (KL) divergence as a cost function which measures the similarity of corresponding distributions, represented by $\sum _{i} KL\left({P}_{i}\parallel {Q}_{i}\right)$ . ( 21 ) It is trained through gradient descent in the direction that minimizes the cost function, as shown in Eq. 4 : $$\sum _{i} KL\left({P}_{i}\parallel {Q}_{i}\right)=\sum _{i} \sum _{j} {p}_{j\mid i}\text{l}\text{o}\text{g}\frac{{p}_{j\mid i}}{{q}_{j\mid i}}$$ 4 In short, t-SNE learns the equivalent Euclidean distance for both pre- and post-decreasing dimensionality, albeit with fewer dimensions. The dimensionality was reduced to a two-dimensional coordinate plane using t-SNE for the values of each feature on the x- and y-axes, and wheezing and non-wheezing participants were visualized in two separate classes (Fig. 2 ). This study was conducted using Python software version 3.6.5 (Python Software Foundation, 9450 SW Gemini Dr., ECM# 90772, Beaverton, OR 97008, USA) and the Librosa package was used for each feature extraction. The scikit-learn package was used to model t-SNE and SVM. Statistical analysis Statistical analysis was conducted using the data extracted for each feature, utilizing accuracy, area under the curve (AUC), precision, recall, and F1-scores. Ethics statement This study was approved by the Institutional Review Board (IRB) of the Catholic University of Korea (IRB approval no. PC19OESI0045). Written informed consent was obtained from at least one legal guardian for all participants. For children 7 years of age and older, assent of child was also obtained. All methods were performed in accordance with relevant guidelines and regulations. Results A total of 76 patients were included in the study, and 103 wheeze sounds and 184 non-wheeze sounds were collected. Based on these data, the characteristics of the auscultatory sounds were summarized according to sex, age, and duration of breath sounds (Table 3 ). The median age of the patients with wheezing was 4 years (2–8 years), while that in those without wheezing was 3 years (1–5 years). We found that the duration times of the wheezing participants were 89.36 ± 39.51 ms and those of the non-wheezing participants were 63.09 ± 27.79 ms. Table 3 Characteristics of respiratory sounds collected in the study Wheezing (n = 103) Others (n = 184) P value Demographic data of the included patients Male sex, n (%) 67 (65.0) 113 (61.4) 0.541 Age (years) 4 ( 2 – 8 ) 3 ( 1 – 5 ) < 0.001 Duration of sound (ms) 89.36 ± 39.51 63.09 ± 27.79 < 0.001 Continuous variables are expressed as mean ± standard deviation or median (interquartile range) The kernel SVM was used to classify wheezing and non-wheezing sounds for each feature obtained from the breath sound data (Fig. 2 ). The audio data were extracted using the Mel-spectrogram, log-Mel-spectrogram, MFCC, MFCC-delta, chroma STFT, chroma CQT, spectral contrast, and tonnet feature extraction methods. To determine the best-performing SVM, the size of each region was determined based on the kernel and gamma values using a grid search. This type distinguishes this area into two types of lines: a linear type for linear separation and a Radial Basis Function as a curve that follows a normal distribution shape to divide each area. These are summarized based on the hyperparameter tuning results for each model. (Table 1 ). In the statistical analysis, the AUC and F1-scores are metrics that can be indicative of data imbalance and were found to be effective in this study. The Mel-spectrogram, MFCC, and spectral contrast proved to be the most suitable for classifying breath sounds, demonstrating the clearest clustering in distinguishing between wheezing and non-wheezing sounds. In particular, spectral contrast achieved an AUC of 0.909 and an F1-score of 0.869, indicating the highest classification performance (Table 2 ). Discussion In this study, we investigated whether any of the existing machine learning techniques can effectively distinguish between lung disease patients with and without wheezing by recognizing specific diagnostic patterns from breathing data. A total of 76 patients were included, and 103 wheeze sounds and 184 non-wheeze sounds were analyzed. Based on the breathing voice data, the Mel-spectrogram, MFCC and spectral contrast were found to be the most suitable for classifying breathing sounds and distinguishing between wheezing and non-wheezing sounds with clear clustering. Among the various techniques analyzed, spectral contrast demonstrated the most effective classification performance in distinguishing wheezing in children. This suggests that machine learning models using spectral contrast may be used to accurately diagnose respiratory diseases in children. Differentiation between wheezing and non-wheezing requires extensive medical expertise, impeding objective evaluations and restricting the use of auscultatory assessments in isolated, high risk populations. To address these issues, researchers have turned to deep learning methods to differentiate between normal and abnormal auscultatory sounds. Recently, several techniques have been proposed to improve the identification of lung sounds using deep learning. Efforts have been made to categorize breath noise by applying traditional deep learning neural networks (CNNs). ( 22 , 23 ) In a recent study, a CNN model was used to distinguish wheezing sounds. ( 5 ) Further, machine learning has been used to classify abnormal respiratory sounds into subclasses. Different architectures were shown to effectively differentiate between wheeze, rhonchi, and crackles. However, the authors opted for CNNs over SVMs for the connection between the feature extractor and classifier as CNNs were found to yield superior results in both image classification and traditional classification tasks. The researchers opted for a CNN as the classifier and utilized InceptionV3, DenseNet201, ResNet50, ResNet101, VGG16, and VGG19 as feature extractors. The study's findings revealed that VGG16 yielded the most favorable results by achieving an AUC of 0.93 and accuracy of 86.5%, validating its competence in identifying anomalous lung sounds and classifying crackles, wheezes, and rhonchi. ( 24 ) Previous studies have analyzed feature extractors and classifiers for integrating breath sounds into machine learning with a focus on determining the most effective models. However, which method is superior for distinguishing wheezing sounds remains unknown. Numerous techniques are currently available to extract features from breath sounds; however, none have been identified as being particularly effective in distinguishing wheezing. The raw audio data functionally represents the pitch as sound pressure over time. In recent deep learning and machine learning methods, features are extracted from raw data rather than from raw audio data. Feature extraction techniques enable frequency representation by decomposing the time data into frequency components using a Fast Fourier Transform. This conversion process reveals which frequencies are strong or weak in the audio signal and how deep learning or machine learning can better learn from audio data. Furthermore, there are several techniques for extracting features from audio, and the manner in which these features are extracted is critical for accurately representing audio. These features have a horizontal (time) axis, vertical (frequency) axis, and one channel, creating an image-like data structure. We found that the spectral contrast performed best as a feature extractor for wheezing. Furthermore, we did not limit ourselves to the extraction and classification of low-frequency wheezing sound data. Instead, we employed t-SNE to reduce dimensionality and trained machine learning models using this approach. As previously noted, this study differs from others in that the data were classified using t-SNE based on multiple features and then reduced to a two-dimensional coordinate plane, allowing for the visualization of wheezing and non-wheezing. However, this study had some limitations. It was conducted on a single-center basis, and the limited sample size during the collection of low data made assessment of accuracy challenging. For these reasons, we were restricted to evaluating the effectiveness of the hyperparameters solely using the AUC and F1-score metrics. Moreover, because we trained the machine learning model to differentiate only between wheezing and non-wheezing respiratory sounds, it remains unclear whether spectral contrast provides a similarly outstanding performance when distinguishing other respiratory sound types. Validating large-scale prospective studies is essential for future research, although utilizing spectral contrast may improve the performance of AI in distinguishing respiratory sound characteristics. Conclusion This study confirmed that the Mel-spectrogram, MFCC, and spectral contrast exhibited the best performance in characterizing respiratory sounds. Overall, we found that machine learning trained on spectral contrast demonstrated superior performance in detecting wheezing sounds compared to other feature extraction methods in pediatric cases. It is anticipated that training such a high-performance model in machine learning will enable a more accurate analysis of respiratory sounds in pediatric patients and enhance the precision of diagnosing abnormal respiratory sounds. Declarations Authors' contributions: HJ Moon, HM Ji, BS Kim, BJ Kim, and KH Kim conceptualized and designed the study, collected and analyzed the data, and drafted, reviewed, and revised the manuscript. All authors have approved the final manuscript as submitted and agreed to be accountable for all aspects of this study. Consent for publication: Not applicable. Ethics statement: This study was approved by the Institutional Review Board (IRB) of the Catholic University of Korea (IRB approval no. PC19OESI0045). Written informed consent was obtained from at least one legal guardian for all participants. For children 7 years of age and older, assent of child was also obtained. All methods were performed in accordance with relevant guidelines and regulations. Competing interests: The authors declare that they have no competing interests. Data Availability Statement: The datasets generated and/or analyzed during the current study are available from the corresponding author upon reasonable request. Funding: This research was supported by the New Faculty Startup Fund from Seoul National University and the 2019 SamA Pharmaceutical grant. References Weiss, L. N. The diagnosis of wheezing in children. Am. Fam. Physician. 77 , 1109-1114 (2008). Bohadana, A., Izbicki, G. & Kraman, S. S. Fundamentals of lung auscultation. N. Engl. J. Med. 370 , 744-751 (2014). Ha-Neul, P. M., Jang, W.-N. M, & Hyo-Kyoung N. M. Validity of Cough-Holter Monitoring for the Objective Assessment of Cough and Wheezing in Children with Respiratory Symptoms. Pediatr. Allergy Respir. Dis (Korea) 22 , 344-353 (2012). Schultz, A. & Brand, P. L. P. Episodic Viral Wheeze and Multiple Trigger Wheeze in preschool children: A useful distinction for clinicians? Paediatr. Respir. Rev. 12 , 160-164 (2011). Kim, B. J., Kim, B. S., Mun, J. H., Lim, C. & Kim, K. H. An accurate deep learning model for wheezing in children using real world data. Sci. Rep. 12 , 22465 (2022). Zhang, J. et al. Real-world verification of artificial intelligence algorithm-assisted auscultation of breath sounds in children. Front. Pediatr. 9 , 627337 (2021). Bardou, D., Zhang, K. & Ahmad, S. M. Lung sounds classification using convolutional neural networks. Artif. Intell. Med. 88 , 58-69 (2018). Pramono, R. X. A., Bowyer, S. & Rodriguez-Villegas, E. Automatic adventitious respiratory sound analysis: A systematic review. PLoS One 12 , e0177926 (2017). Kim, Y. et al. The coming era of a new auscultation system for analyzing respiratory sounds. BMC Pulm. Med. 22 , 119 (2022). Grzywalski, T. et al. Practical implementation of artificial intelligence algorithms in pulmonary auscultation examination. Eur. J. Pediatr. 178 , 883-890 (2019). McDonnell, M. D. & Gao, W. (ed.). Acoustic scene classification using deep residual networks with late fusion of separated high and low frequency paths. ICASSP IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP); 2020 (IEEE, 20202020). O’Hanlon, K. & Sandler, M. B. (ed.). Comparing cqt and reassignment based chroma features for template-based automatic chord recognition. ICASSP IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP); 2019 (IEEE, 20192019). Murty, M. N. & Raghava, R. Support Vector Machines and Perceptrons: Learning, Optimization, Classification, and Application to Social Networks , (2016). Van der Maaten, L. & Hinton, G. Visualizing data using t-SNE. J. Mach. Learn. Res. 9 (2008). Chakraborty, K., Talele, A. & Upadhya, S. Voice recognition using MFCC algorithm. Int. J. Innov. Res. Adv. Eng. (IJIRAE) 1 , 2349-2163 (2014). Kumar, K., Kim, C. & Stern, R. M. (ed.). Delta-spectral Cepstral Coefficients for Robust Speech Recognition IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP); 2011 (IEEE, 2011). Müller, M. Short-Time Fourier Transform and Chroma Features. Lab Course, Friedrich-Alexander-Universität Erlangen-Nürnberg , (2015). Schörkhuber, C. & Klapuri, A. (ed.). Constant-Q transform toolbox for music processing. 7th sound and music computing conference, Barcelona, Spain; 2010. Jiang, D.-N., Lu, L., Zhang, H.-J., Tao, J.-H. & Cai, L.-H. (ed.). Music Type Classification by Spectral Contrast Feature in Proceedings IEEE International Conference on Multimedia and Expo (IEEE, 2002). Humphrey, E. J., Cho, T. & Bello, J. P. (ed.). Learning a Robust Tonnetz-Space Transform for Automatic Chord Recognition IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP); 2012 (IEEE, 2012). Moreno, P., Ho, P. & Vasconcelos, N. A Kullback-Leibler divergence based kernel for SVM classification in multimedia applications. Adv. Neural Inf. Process. Syst. 16 (2003). Tariq, Z., Shah, S. K. & Lee, Y. Feature-based fusion using CNN for lung and heart sound classification. Sensors (Basel) 22 , 1521 (2022). Zulfiqar, R. et al. Abnormal respiratory sounds classification using deep CNN through artificial noise addition. Front. Med. (Lausanne) 8 , 714811 (2021). Kim, Y. et al. Respiratory sound classification for crackles, wheezes, and rhonchi in the clinical field using deep learning. Sci. Rep. 11 , 17186 (2021). Additional Declarations No competing interests reported. Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-4419150","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":312120825,"identity":"fcf4338b-6d6a-415f-ab00-18ddc4ed561a","order_by":0,"name":"Hye Jeong Moon","email":"","orcid":"","institution":"Seoul National University College of Medicine","correspondingAuthor":false,"prefix":"","firstName":"Hye","middleName":"Jeong","lastName":"Moon","suffix":""},{"id":312120826,"identity":"ed59dda0-551f-481f-91d7-e9192d05723f","order_by":1,"name":"Hyunmin Ji","email":"","orcid":"","institution":"Seoul National University","correspondingAuthor":false,"prefix":"","firstName":"Hyunmin","middleName":"","lastName":"Ji","suffix":""},{"id":312120827,"identity":"8239f920-db35-4f48-b33f-e5c9017fee8b","order_by":2,"name":"Baek Seung Kim","email":"","orcid":"","institution":"Seoul National University Bundang Hospital","correspondingAuthor":false,"prefix":"","firstName":"Baek","middleName":"Seung","lastName":"Kim","suffix":""},{"id":312120828,"identity":"5a305878-c51e-4cd4-a9e4-eddfaf52def4","order_by":3,"name":"Beom Joon Kim","email":"","orcid":"","institution":"The Catholic University of Korea","correspondingAuthor":false,"prefix":"","firstName":"Beom","middleName":"Joon","lastName":"Kim","suffix":""},{"id":312120829,"identity":"c33d72d3-d980-4195-b8ec-1e98c920008d","order_by":4,"name":"Kyunghoon Kim","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAA5UlEQVRIie3QMQrCMBSA4RcCcYm4Fgr2Cg0FFxWvogi6dFAEcRCNFNql4hpx8AqC4FwIdCpeohcQsovFqmPsKJh/eSHkg0cATKYfDPFyNh0nT153pBrxGB/1q5F3A574bjWCo7ilJguJNjxTahqC0+BkdNMuFmfeQWQS19D2Yu9DYCIhUmiJ8D1cDyVBQf1SHACdoBZoFyvJXVJIaa4K0qtIuLQgo2AXZHACIvUkTmeYpmOXCdKy6dUaCkmGWsKi4Izpsr0+WjhXdN7p7qLQ0xP+HJ/treLftQDAKcfqyzOTyWT66x4j+UC56jtXYAAAAABJRU5ErkJggg==","orcid":"","institution":"Seoul National University College of Medicine","correspondingAuthor":true,"prefix":"","firstName":"Kyunghoon","middleName":"","lastName":"Kim","suffix":""}],"badges":[],"createdAt":"2024-05-14 12:21:53","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-4419150/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-4419150/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":58224890,"identity":"798c41a4-d0d6-42f4-b59d-0d8579af09d4","added_by":"auto","created_at":"2024-06-12 17:42:07","extension":"jpg","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":44142,"visible":true,"origin":"","legend":"\u003cp\u003eFlowchart showing the use of t-SNE to visualize lung sounds.\u003c/p\u003e\n\u003cp\u003eMFCC, Mel-Frequency Cepstral Coefficient, STFT; Short-Time Fourier Transform, CQT; Constant-Q Transform; t-SNE, t-stochastic neighbor embedding\u003c/p\u003e","description":"","filename":"1.jpg","url":"https://assets-eu.researchsquare.com/files/rs-4419150/v1/005c6ee09d3ad6261dc675dd.jpg"},{"id":58224889,"identity":"4706ae8f-c089-445f-a372-b2f1d7dbddd4","added_by":"auto","created_at":"2024-06-12 17:42:07","extension":"jpg","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":68332,"visible":true,"origin":"","legend":"\u003cp\u003eThe structure of a kernel support vector machine model.\u003c/p\u003e\n\u003cp\u003eMFCC, Mel-Frequency Cepstral Coefficient, STFT; Short-Time Fourier Transform, CQT; Constant-Q Transform\u003c/p\u003e","description":"","filename":"2.jpg","url":"https://assets-eu.researchsquare.com/files/rs-4419150/v1/88849a68439b5950411170c3.jpg"},{"id":71514706,"identity":"c2d87e8c-b922-46e4-bee0-46c39fa8bfa5","added_by":"auto","created_at":"2024-12-16 10:54:23","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":531621,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-4419150/v1/2dcf7988-873e-437c-b58b-4d7e9daa1e7d.pdf"}],"financialInterests":"No competing interests reported.","formattedTitle":"Machine Learning-Driven Strategies for Enhanced Pediatric Wheezing Detection","fulltext":[{"header":"Introduction","content":"\u003cp\u003eWheezing is defined as the rapid movement of air through narrowed airways caused by bronchial asthma, allergic reactions, or respiratory infections. (\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e) Wheezing is characterized by sinusoidal oscillations of 100\u0026ndash;1000 Hz and can occur during both inhalation and exhalation. Wheezing is an important symptom in the diagnosis of various diseases. For example, in asthma and chronic obstructive pulmonary disease, wheezing can be heard in any part of the chest due to airway narrowing in the anterior lung fields. However, local bronchial obstruction due to foreign bodies, mucus, or narrowing due to tumors may cause wheezing predominantly in specific areas. (\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e)\u003c/p\u003e \u003cp\u003eIn general practice, lung diseases are diagnosed by clinicians following an examination of the patient's chief complaint, medical history, physical examination, and auscultatory findings. Distinguishing between wheezing and non-wheezing during auscultatory examination, which is key in the diagnosis of many lung diseases, requires years of training and experience, and is open to individual subjectivity. This makes objective assessment difficult. Furthermore, in high risk patients requiring isolation, direct physical examination is limited, which limits auscultatory assessment. (\u003cspan additionalcitationids=\"CR4\" citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e)\u003c/p\u003e \u003cp\u003eTo overcome these limitations, researchers have used artificial intelligence (AI) to distinguish between normal and abnormal auscultatory sounds, with some studies indicating better performance than human doctors. (\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e, \u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e) In particular, the International Conference on Biomedical and Health Informatics open dataset has been extensively studied for auscultatory sound classification. (\u003cspan additionalcitationids=\"CR9\" citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e) In these studies, preprocessing methods were used to extract audio features for AI training. These features included the Mel-spectrogram, log-Mel-spectrogram, and Mel-frequency cepstral coefficient (MFCC), which are known to represent audio data. (\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e, \u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e)\u003c/p\u003e \u003cp\u003eThis study was conducted to determine the most effective tool for the extraction of features from wheezing sounds. Feature extraction was performed using various sound classification tools. To observe the differences in performance, we used a kernel support vector machine (SVM), a type of machine learning model, to classify wheezing. (\u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e) In addition, we compared the dimensionality of different features extracted from audio data by t-stochastic neighbor embedding (t-SNE) by reducing and visualizing them, and analyzed which features could be used to learn and represent breathing sound data well. (\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e) The overall aim of this project was to determine the existing techniques that are effective in distinguishing between breathing sounds.\u003c/p\u003e"},{"header":"Methods","content":"\u003cdiv id=\"Sec3\" class=\"Section2\"\u003e \u003ch2\u003eStudy design and data collection\u003c/h2\u003e \u003cp\u003eWe conducted a prospective study of pediatric patients who visited the pediatric department of a university hospital in Korea between August 2019 and January 2020. All records were obtained from patients who voluntarily agreed to have their breath sounds recorded. All breath sounds were recorded in children visiting the outpatient department by a pediatric respiratory specialist using an electronic stethoscope (Jabes, GST Technology, Seoul, Korea).\u003c/p\u003e \u003cp\u003eThe recorded auscultatory sounds were categorized as wheezing or non-wheezing according to the pediatric physician\u0026rsquo;s diagnosis. Two auscultation cycles were recorded for each patient: one in the anterior lung and one in the posterior lung. Four breathing sounds were recorded for each participant. To validate the classification, two pediatric respiratory specialists performed blinded validation for recorded breathing sounds ; if there was more than one classification identical to the original classification, it was flagged and stored in the database. Data on sex, age, and auscultation site were also collected.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec4\" class=\"Section2\"\u003e \u003ch2\u003eFeature Extraction\u003c/h2\u003e \u003cp\u003eIn this study, the following feature extraction methods were used to extract 48 kH breath sound data:\u003c/p\u003e \u003cp\u003e1) Mel-spectrogram: This is a popular feature extraction method which is used to analyze data with frequency characteristics that change over time. A Mel-spectrogram is output after the audio data have been subjected to a Fast Fourier Transform and passed through a Mel filter bank.\u003c/p\u003e \u003cp\u003e2) Log Mel-spectrogram: This method takes the logarithm of the Mel-spectrogram and converts it to a frequency similar to that heard by humans.\u003c/p\u003e \u003cp\u003e3) MFCC: This feature extraction method performs a Discrete Cosine Transform (DCT) operation on a Mel-spectrogram. It is primarily used for human speech data, and requires less computation than the Mel-spectrogram. (\u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e15\u003c/span\u003e)\u003c/p\u003e \u003cp\u003e4) MFCC-delta: This is a method of stacking the MFCC and deltas (first differences) and delta-deltas (second differences) for the MFCC around the frequency axis, representing noisy data. (\u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e)\u003c/p\u003e \u003cp\u003e5) Chroma Short-Time Fourier Transform (STFT): This is a feature for the representation of a 12 tone scale, often used in the analysis of music data. (\u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e17\u003c/span\u003e)\u003c/p\u003e \u003cp\u003e6) Chroma Constant-Q Transform (CQT): This application uses CQT instead of SFTF in the chroma. It considers the geometric split between different frequency bands and contains additional high-frequency information. (\u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e18\u003c/span\u003e)\u003c/p\u003e \u003cp\u003e7) Spectral contrast: This method is based on differences in spectral contrast, where higher frequencies are contrasted with lower frequencies to create a more pronounced difference. (\u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e19\u003c/span\u003e)\u003c/p\u003e \u003cp\u003e8) Tonnetz: This method incorporates the tonnet theory discovered by Euler and is effective in uncovering hidden relationships and patterns. (\u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e20\u003c/span\u003e)\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec5\" class=\"Section2\"\u003e \u003ch2\u003eEvaluation of the AI algorithm\u003c/h2\u003e \u003cp\u003e The kernel SVM was used to classify wheezing and non-wheezing sounds for each feature obtained from the breath sound data (Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e). For visualization purposes, the dimensionality was reduced to a two-dimensional coordinate plane using t-SNE.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eAn SVM is a supervised learning algorithm that aims to classify two categories by finding the optimal decision boundary between them. Kernel SVMs apply a kernel trick to SVMs that allows them to classify multidimensional data linearly. (\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e) In the present study, we performed 5-fold cross-validation and grid search on the training data (80% of the total data) to explore the optimal hyperparameters of the kernel SVM for training and compared the final results with the test data (20% of the total data) (Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e, Table\u0026nbsp;\u003cspan refid=\"Tab2\" class=\"InternalRef\"\u003e2\u003c/span\u003e).\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab1\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 1\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eSelected hyperparameters with grid search.\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"4\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eFeature\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eType\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eKernel\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eGamma\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eMel-spectrogram\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eLinear\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e1000\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eLog Mel-spectrogram\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eLinear\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e1000\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eMFCC\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eRBF\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e100\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eMFCC-Delta\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eRBF\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e1000\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eChroma STFT\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eRBF\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.1\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eChroma CQT\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eRBF\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e1000\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.01\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eSpectral contrast\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eLinear\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e1000\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eTonnetz\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eLinear\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e1000\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003ctfoot\u003e \u003ctr\u003e\u003ctd colspan=\"4\"\u003eMFCC, Mel-Frequency Cepstral Coefficient, STFT; Short-Time Fourier Transform, CQT; Constant-Q Transform, RBF; Radial Basis Function\u003c/td\u003e\u003c/tr\u003e \u003c/tfoot\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab2\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 2\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003ePerformance of the different models in discriminating other respiratory sounds from wheezing use kernel support vector machine\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"6\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eFeature\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eAccuracy\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eAUC\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003ePrecision\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003eRecall\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c6\"\u003e \u003cp\u003eF1-score\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eMel-spectrogram\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.862\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.871\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.760\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.905\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.826\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eLog Mel-spectrogram\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.845\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.868\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.714\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.952\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.816\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eMFCC\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.863\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.882\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.741\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.952\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.833\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eMFCC-Delta\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.810\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.799\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.727\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.761\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.744\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eChroma STFT\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.724\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.722\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.600\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.714\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.652\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eChroma CQT\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.689\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.654\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.579\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.524\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.550\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eSpectral contrast\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.897\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.909\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.800\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.952\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.869\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eTonnetz\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.672\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.651\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.545\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.571\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.558\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003ctfoot\u003e \u003ctr\u003e\u003ctd colspan=\"6\"\u003eMFCC, Mel-Frequency Cepstral Coefficient, STFT; Short-Time Fourier Transform, CQT; Constant-Q Transform\u003c/td\u003e\u003c/tr\u003e \u003c/tfoot\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003et-SNE is a machine learning algorithm that reduces the dimensionality of high-dimensional data for vector visualization. The probability values for each dimension were calculated based on SNE to reduce the dimensions while maintaining the distance values between each vector. Let \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\${p}_{j\\mid i}\$\u003c/span\u003e\u003c/span\u003e represents the probability of data \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\${x}_{i}\$\u003c/span\u003e\u003c/span\u003e and \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\${x}_{j}\$\u003c/span\u003e\u003c/span\u003e being chosen as neighbors in the pre-reduction dimension, as shown in Eq.\u0026nbsp;\u003cspan refid=\"Equ1\" class=\"InternalRef\"\u003e1\u003c/span\u003e below:\u003cdiv id=\"Equ1\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ1\" name=\"EquationSource\"\u003e\n$${p}_{j\\mid i}=\\frac{\\text{exp}\\left(-{∥{x}_{i}-{x}_{j}∥}^{2}/2{\\sigma }_{i}^{2}\\right)}{\\sum _{k\\ne i} \\text{e}\\text{x}\\text{p}\\left(-{∥{x}_{i}-{x}_{k}∥}^{2}/2{\\sigma }_{i}^{2}\\right)}$$\u003c/div\u003e\u003cdiv class=\"EquationNumber\"\u003e1\u003c/div\u003e\u003c/div\u003e\u003c/p\u003e \u003cp\u003eSimilarly, \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\${q}_{j\\mid i}\$\u003c/span\u003e\u003c/span\u003e is defined as the probability that data \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\${y}_{i}\$\u003c/span\u003e\u003c/span\u003e and \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\${y}_{j}\$\u003c/span\u003e\u003c/span\u003e are selected as neighbors in the dimension-reduced post-analysis (Eq.\u0026nbsp;\u003cspan refid=\"Equ2\" class=\"InternalRef\"\u003e2\u003c/span\u003e):\u003cdiv id=\"Equ2\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ2\" name=\"EquationSource\"\u003e\n$${q}_{j\\mid i}=\\frac{{e}^{-{∥{y}_{i}-{y}_{j}∥}^{2}}}{\\sum _{k\\ne i} {e}^{-{∥{y}_{i}-{y}_{k}∥}^{2}}}$$\u003c/div\u003e\u003cdiv class=\"EquationNumber\"\u003e2\u003c/div\u003e\u003c/div\u003e\u003c/p\u003e \u003cp\u003eThe t-SNE defines \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\${p}_{ij}\$\u003c/span\u003e\u003c/span\u003e in order to impose symmetry, where \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\${p}_{j\\mid i}={p}_{i\\mid j}\$\u003c/span\u003e\u003c/span\u003e. This was achieved using the formula shown in Eq.\u0026nbsp;\u003cspan refid=\"Equ3\" class=\"InternalRef\"\u003e3\u003c/span\u003e:\u003cdiv id=\"Equ3\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ3\" name=\"EquationSource\"\u003e\n$${p}_{ij}=\\frac{{p}_{j\\mid i}+{p}_{i\\mid j}}{2N}$$\u003c/div\u003e\u003cdiv class=\"EquationNumber\"\u003e3\u003c/div\u003e\u003c/div\u003e\u003c/p\u003e \u003cp\u003eWe defined Kullback-Leibler divergence (KL) divergence as a cost function which measures the similarity of corresponding distributions, represented by \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\sum _{i} KL\\left({P}_{i}\\parallel {Q}_{i}\\right)\$\u003c/span\u003e\u003c/span\u003e. (\u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e21\u003c/span\u003e) It is trained through gradient descent in the direction that minimizes the cost function, as shown in Eq.\u0026nbsp;\u003cspan refid=\"Equ4\" class=\"InternalRef\"\u003e4\u003c/span\u003e:\u003cdiv id=\"Equ4\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ4\" name=\"EquationSource\"\u003e\n$$\\sum _{i} KL\\left({P}_{i}\\parallel {Q}_{i}\\right)=\\sum _{i} \\sum _{j} {p}_{j\\mid i}\\text{l}\\text{o}\\text{g}\\frac{{p}_{j\\mid i}}{{q}_{j\\mid i}}$$\u003c/div\u003e\u003cdiv class=\"EquationNumber\"\u003e4\u003c/div\u003e\u003c/div\u003e\u003c/p\u003e \u003cp\u003eIn short, t-SNE learns the equivalent Euclidean distance for both pre- and post-decreasing dimensionality, albeit with fewer dimensions.\u003c/p\u003e \u003cp\u003eThe dimensionality was reduced to a two-dimensional coordinate plane using t-SNE for the values of each feature on the x- and y-axes, and wheezing and non-wheezing participants were visualized in two separate classes (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003e).\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eThis study was conducted using Python software version 3.6.5 (Python Software Foundation, 9450 SW Gemini Dr., ECM# 90772, Beaverton, OR 97008, USA) and the Librosa package was used for each feature extraction. The scikit-learn package was used to model t-SNE and SVM.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec6\" class=\"Section2\"\u003e \u003ch2\u003eStatistical analysis\u003c/h2\u003e \u003cp\u003eStatistical analysis was conducted using the data extracted for each feature, utilizing accuracy, area under the curve (AUC), precision, recall, and F1-scores.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec7\" class=\"Section2\"\u003e \u003ch2\u003eEthics statement\u003c/h2\u003e \u003cp\u003e This study was approved by the Institutional Review Board (IRB) of the Catholic University of Korea (IRB approval no. PC19OESI0045). Written informed consent was obtained from at least one legal guardian for all participants. For children 7 years of age and older, assent of child was also obtained. All methods were performed in accordance with relevant guidelines and regulations.\u003c/p\u003e \u003c/div\u003e"},{"header":"Results","content":"\u003cp\u003eA total of 76 patients were included in the study, and 103 wheeze sounds and 184 non-wheeze sounds were collected. Based on these data, the characteristics of the auscultatory sounds were summarized according to sex, age, and duration of breath sounds (Table\u0026nbsp;\u003cspan refid=\"Tab3\" class=\"InternalRef\"\u003e3\u003c/span\u003e). The median age of the patients with wheezing was 4 years (2\u0026ndash;8 years), while that in those without wheezing was 3 years (1\u0026ndash;5 years). We found that the duration times of the wheezing participants were 89.36\u0026thinsp;\u0026plusmn;\u0026thinsp;39.51 ms and those of the non-wheezing participants were 63.09\u0026thinsp;\u0026plusmn;\u0026thinsp;27.79 ms.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab3\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 3\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eCharacteristics of respiratory sounds collected in the study\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"4\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eWheezing\u003c/p\u003e \u003cp\u003e(n\u0026thinsp;=\u0026thinsp;103)\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eOthers\u003c/p\u003e \u003cp\u003e(n\u0026thinsp;=\u0026thinsp;184)\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003e\u003cem\u003eP\u003c/em\u003e value\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eDemographic data of the included patients\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eMale sex, \u003cem\u003en\u003c/em\u003e (%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e67 (65.0)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e113 (61.4)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.541\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eAge (years)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e4 (\u003cspan additionalcitationids=\"CR3 CR4 CR5 CR6 CR7\" citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e3 (\u003cspan additionalcitationids=\"CR2 CR3 CR4\" citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e\u0026lt;\u0026thinsp;0.001\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eDuration of sound (ms)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e89.36\u0026thinsp;\u0026plusmn;\u0026thinsp;39.51\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e63.09\u0026thinsp;\u0026plusmn;\u0026thinsp;27.79\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e\u0026lt;\u0026thinsp;0.001\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003ctfoot\u003e \u003ctr\u003e\u003ctd colspan=\"4\"\u003eContinuous variables are expressed as mean\u0026thinsp;\u0026plusmn;\u0026thinsp;standard deviation or median (interquartile range)\u003c/td\u003e\u003c/tr\u003e \u003c/tfoot\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003eThe kernel SVM was used to classify wheezing and non-wheezing sounds for each feature obtained from the breath sound data (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003e). The audio data were extracted using the Mel-spectrogram, log-Mel-spectrogram, MFCC, MFCC-delta, chroma STFT, chroma CQT, spectral contrast, and tonnet feature extraction methods. To determine the best-performing SVM, the size of each region was determined based on the kernel and gamma values using a grid search. This type distinguishes this area into two types of lines: a linear type for linear separation and a Radial Basis Function as a curve that follows a normal distribution shape to divide each area. These are summarized based on the hyperparameter tuning results for each model. (Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e).\u003c/p\u003e \u003cp\u003eIn the statistical analysis, the AUC and F1-scores are metrics that can be indicative of data imbalance and were found to be effective in this study. The Mel-spectrogram, MFCC, and spectral contrast proved to be the most suitable for classifying breath sounds, demonstrating the clearest clustering in distinguishing between wheezing and non-wheezing sounds.\u003c/p\u003e \u003cp\u003eIn particular, spectral contrast achieved an AUC of 0.909 and an F1-score of 0.869, indicating the highest classification performance (Table\u0026nbsp;\u003cspan refid=\"Tab2\" class=\"InternalRef\"\u003e2\u003c/span\u003e).\u003c/p\u003e"},{"header":"Discussion","content":"\u003cp\u003eIn this study, we investigated whether any of the existing machine learning techniques can effectively distinguish between lung disease patients with and without wheezing by recognizing specific diagnostic patterns from breathing data. A total of 76 patients were included, and 103 wheeze sounds and 184 non-wheeze sounds were analyzed. Based on the breathing voice data, the Mel-spectrogram, MFCC and spectral contrast were found to be the most suitable for classifying breathing sounds and distinguishing between wheezing and non-wheezing sounds with clear clustering. Among the various techniques analyzed, spectral contrast demonstrated the most effective classification performance in distinguishing wheezing in children. This suggests that machine learning models using spectral contrast may be used to accurately diagnose respiratory diseases in children.\u003c/p\u003e \u003cp\u003eDifferentiation between wheezing and non-wheezing requires extensive medical expertise, impeding objective evaluations and restricting the use of auscultatory assessments in isolated, high risk populations. To address these issues, researchers have turned to deep learning methods to differentiate between normal and abnormal auscultatory sounds. Recently, several techniques have been proposed to improve the identification of lung sounds using deep learning. Efforts have been made to categorize breath noise by applying traditional deep learning neural networks (CNNs). (\u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e22\u003c/span\u003e, \u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e23\u003c/span\u003e) In a recent study, a CNN model was used to distinguish wheezing sounds. (\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e) Further, machine learning has been used to classify abnormal respiratory sounds into subclasses. Different architectures were shown to effectively differentiate between wheeze, rhonchi, and crackles. However, the authors opted for CNNs over SVMs for the connection between the feature extractor and classifier as CNNs were found to yield superior results in both image classification and traditional classification tasks. The researchers opted for a CNN as the classifier and utilized InceptionV3, DenseNet201, ResNet50, ResNet101, VGG16, and VGG19 as feature extractors. The study's findings revealed that VGG16 yielded the most favorable results by achieving an AUC of 0.93 and accuracy of 86.5%, validating its competence in identifying anomalous lung sounds and classifying crackles, wheezes, and rhonchi. (\u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e24\u003c/span\u003e)\u003c/p\u003e \u003cp\u003ePrevious studies have analyzed feature extractors and classifiers for integrating breath sounds into machine learning with a focus on determining the most effective models. However, which method is superior for distinguishing wheezing sounds remains unknown. Numerous techniques are currently available to extract features from breath sounds; however, none have been identified as being particularly effective in distinguishing wheezing. The raw audio data functionally represents the pitch as sound pressure over time. In recent deep learning and machine learning methods, features are extracted from raw data rather than from raw audio data. Feature extraction techniques enable frequency representation by decomposing the time data into frequency components using a Fast Fourier Transform. This conversion process reveals which frequencies are strong or weak in the audio signal and how deep learning or machine learning can better learn from audio data. Furthermore, there are several techniques for extracting features from audio, and the manner in which these features are extracted is critical for accurately representing audio. These features have a horizontal (time) axis, vertical (frequency) axis, and one channel, creating an image-like data structure.\u003c/p\u003e \u003cp\u003eWe found that the spectral contrast performed best as a feature extractor for wheezing. Furthermore, we did not limit ourselves to the extraction and classification of low-frequency wheezing sound data. Instead, we employed t-SNE to reduce dimensionality and trained machine learning models using this approach. As previously noted, this study differs from others in that the data were classified using t-SNE based on multiple features and then reduced to a two-dimensional coordinate plane, allowing for the visualization of wheezing and non-wheezing. However, this study had some limitations. It was conducted on a single-center basis, and the limited sample size during the collection of low data made assessment of accuracy challenging. For these reasons, we were restricted to evaluating the effectiveness of the hyperparameters solely using the AUC and F1-score metrics. Moreover, because we trained the machine learning model to differentiate only between wheezing and non-wheezing respiratory sounds, it remains unclear whether spectral contrast provides a similarly outstanding performance when distinguishing other respiratory sound types. Validating large-scale prospective studies is essential for future research, although utilizing spectral contrast may improve the performance of AI in distinguishing respiratory sound characteristics.\u003c/p\u003e"},{"header":"Conclusion","content":"\u003cp\u003eThis study confirmed that the Mel-spectrogram, MFCC, and spectral contrast exhibited the best performance in characterizing respiratory sounds. Overall, we found that machine learning trained on spectral contrast demonstrated superior performance in detecting wheezing sounds compared to other feature extraction methods in pediatric cases. It is anticipated that training such a high-performance model in machine learning will enable a more accurate analysis of respiratory sounds in pediatric patients and enhance the precision of diagnosing abnormal respiratory sounds.\u003c/p\u003e"},{"header":"Declarations","content":"\u003cp\u003eAuthors\u0026apos; contributions: HJ Moon, HM Ji, BS Kim, BJ Kim, and KH Kim conceptualized and designed the study, collected and analyzed the data, and drafted, reviewed, and revised the manuscript. All authors have approved the final manuscript as submitted and agreed to be accountable for all aspects of this study.\u003c/p\u003e\n\u003cp\u003eConsent for publication:\u0026nbsp;Not applicable.\u003c/p\u003e\n\u003cp\u003eEthics statement:\u0026nbsp;This study was approved by the Institutional Review Board (IRB) of the Catholic University of Korea (IRB approval no. PC19OESI0045). Written informed consent was obtained from at least one legal guardian for all participants. For children 7 years of age and older, assent of child was also obtained. All methods were performed in accordance with relevant guidelines and regulations.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eCompeting interests: The authors declare that they have no competing interests.\u003c/p\u003e\n\u003cp\u003eData Availability Statement: The datasets generated and/or analyzed during the current study are available from the corresponding author upon reasonable request.\u003c/p\u003e\n\u003cp\u003eFunding: This research was supported by the New Faculty Startup Fund from Seoul National University and the 2019 SamA Pharmaceutical grant.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\n\u003cli\u003eWeiss, L. N. The diagnosis of wheezing in children. \u003cem\u003eAm. Fam. Physician.\u003c/em\u003e \u003cstrong\u003e77\u003c/strong\u003e, 1109-1114 (2008).\u003c/li\u003e\n\u003cli\u003eBohadana, A., Izbicki, G. \u0026amp; Kraman, S. S. Fundamentals of lung auscultation. \u003cem\u003eN. Engl. J. Med.\u003c/em\u003e \u003cstrong\u003e370\u003c/strong\u003e, 744-751 (2014).\u003c/li\u003e\n\u003cli\u003eHa-Neul, P. M., Jang, W.-N. M, \u0026amp; Hyo-Kyoung N. M. Validity of Cough-Holter Monitoring for the Objective Assessment of Cough and Wheezing in Children with Respiratory Symptoms. \u003cem\u003ePediatr. Allergy Respir. Dis (Korea)\u003c/em\u003e \u003cstrong\u003e22\u003c/strong\u003e, 344-353 (2012).\u003c/li\u003e\n\u003cli\u003eSchultz, A. \u0026amp; Brand, P. L. P. Episodic Viral Wheeze and Multiple Trigger Wheeze in preschool children: A useful distinction for clinicians? \u003cem\u003ePaediatr. Respir. Rev.\u003c/em\u003e \u003cstrong\u003e12\u003c/strong\u003e, 160-164 (2011).\u003c/li\u003e\n\u003cli\u003eKim, B. J., Kim, B. S., Mun, J. H., Lim, C. \u0026amp; Kim, K. H. An accurate deep learning model for wheezing in children using real world data. \u003cem\u003eSci. Rep.\u003c/em\u003e \u003cstrong\u003e12\u003c/strong\u003e, 22465 (2022).\u003c/li\u003e\n\u003cli\u003eZhang, J. \u003cem\u003eet al.\u003c/em\u003e Real-world verification of artificial intelligence algorithm-assisted auscultation of breath sounds in children. \u003cem\u003eFront. Pediatr.\u003c/em\u003e \u003cstrong\u003e9\u003c/strong\u003e, 627337 (2021).\u003c/li\u003e\n\u003cli\u003eBardou, D., Zhang, K. \u0026amp; Ahmad, S. M. Lung sounds classification using convolutional neural networks. \u003cem\u003eArtif. Intell. Med.\u003c/em\u003e \u003cstrong\u003e88\u003c/strong\u003e, 58-69 (2018).\u003c/li\u003e\n\u003cli\u003ePramono, R. X. A., Bowyer, S. \u0026amp; Rodriguez-Villegas, E. Automatic adventitious respiratory sound analysis: A systematic review. \u003cem\u003ePLoS One\u003c/em\u003e \u003cstrong\u003e12\u003c/strong\u003e, e0177926 (2017).\u003c/li\u003e\n\u003cli\u003eKim, Y. \u003cem\u003eet al.\u003c/em\u003e The coming era of a new auscultation system for analyzing respiratory sounds. \u003cem\u003eBMC Pulm. Med.\u003c/em\u003e \u003cstrong\u003e22\u003c/strong\u003e, 119 (2022).\u003c/li\u003e\n\u003cli\u003eGrzywalski, T. \u003cem\u003eet al.\u003c/em\u003e Practical implementation of artificial intelligence algorithms in pulmonary auscultation examination. \u003cem\u003eEur. J. Pediatr.\u003c/em\u003e \u003cstrong\u003e178\u003c/strong\u003e, 883-890 (2019).\u003c/li\u003e\n\u003cli\u003eMcDonnell, M. D. \u0026amp; Gao, W. (ed.). Acoustic scene classification using deep residual networks with late fusion of separated high and low frequency paths. \u003cem\u003eICASSP\u003c/em\u003e IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP); 2020 (IEEE, 20202020).\u003c/li\u003e\n\u003cli\u003eO\u0026rsquo;Hanlon, K. \u0026amp; Sandler, M. B. (ed.). Comparing cqt and reassignment based chroma features for template-based automatic chord recognition. \u003cem\u003eICASSP\u003c/em\u003e IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP); 2019 (IEEE, 20192019).\u003c/li\u003e\n\u003cli\u003eMurty, M. N. \u0026amp; Raghava, R. \u003cem\u003eSupport Vector Machines and Perceptrons: Learning, Optimization, Classification, and Application to Social Networks\u003c/em\u003e, (2016).\u003c/li\u003e\n\u003cli\u003eVan der Maaten, L. \u0026amp; Hinton, G. Visualizing data using t-SNE. \u003cem\u003eJ. Mach. Learn. Res.\u003c/em\u003e \u003cstrong\u003e9\u003c/strong\u003e (2008).\u003c/li\u003e\n\u003cli\u003eChakraborty, K., Talele, A. \u0026amp; Upadhya, S. Voice recognition using MFCC algorithm. \u003cem\u003eInt. J. Innov. Res. Adv. Eng. (IJIRAE)\u003c/em\u003e \u003cstrong\u003e1\u003c/strong\u003e, 2349-2163 (2014).\u003c/li\u003e\n\u003cli\u003eKumar, K., Kim, C. \u0026amp; Stern, R. M. (ed.). Delta-spectral Cepstral Coefficients for Robust Speech Recognition IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP); 2011 (IEEE, 2011).\u003c/li\u003e\n\u003cli\u003eM\u0026uuml;ller, M. \u003cem\u003eShort-Time Fourier Transform and Chroma Features. Lab Course, Friedrich-Alexander-Universit\u0026auml;t Erlangen-N\u0026uuml;rnberg\u003c/em\u003e, (2015).\u003c/li\u003e\n\u003cli\u003eSch\u0026ouml;rkhuber, C. \u0026amp; Klapuri, A. (ed.). Constant-Q transform toolbox for music processing. 7th sound and music computing conference, Barcelona, Spain; 2010.\u003c/li\u003e\n\u003cli\u003eJiang, D.-N., Lu, L., Zhang, H.-J., Tao, J.-H. \u0026amp; Cai, L.-H. (ed.). Music Type Classification by Spectral Contrast Feature in Proceedings IEEE International Conference on Multimedia and Expo (IEEE, 2002).\u003c/li\u003e\n\u003cli\u003eHumphrey, E. J., Cho, T. \u0026amp; Bello, J. P. (ed.). Learning a Robust Tonnetz-Space Transform for Automatic Chord Recognition IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP); 2012 (IEEE, 2012).\u003c/li\u003e\n\u003cli\u003eMoreno, P., Ho, P. \u0026amp; Vasconcelos, N. A Kullback-Leibler divergence based kernel for SVM classification in multimedia applications. \u003cem\u003eAdv. Neural Inf. Process. Syst.\u003c/em\u003e \u003cstrong\u003e16\u003c/strong\u003e (2003).\u003c/li\u003e\n\u003cli\u003eTariq, Z., Shah, S. K. \u0026amp; Lee, Y. Feature-based fusion using CNN for lung and heart sound classification. \u003cem\u003eSensors (Basel)\u003c/em\u003e \u003cstrong\u003e22\u003c/strong\u003e, 1521 (2022).\u003c/li\u003e\n\u003cli\u003eZulfiqar, R. \u003cem\u003eet al.\u003c/em\u003e Abnormal respiratory sounds classification using deep CNN through artificial noise addition. \u003cem\u003eFront. Med. (Lausanne)\u003c/em\u003e \u003cstrong\u003e8\u003c/strong\u003e, 714811 (2021).\u003c/li\u003e\n\u003cli\u003eKim, Y. \u003cem\u003eet al.\u003c/em\u003e Respiratory sound classification for crackles, wheezes, and rhonchi in the clinical field using deep learning. \u003cem\u003eSci. Rep.\u003c/em\u003e \u003cstrong\u003e11\u003c/strong\u003e, 17186 (2021).\u003c/li\u003e\n\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":true,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"","lastPublishedDoi":"10.21203/rs.3.rs-4419150/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-4419150/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003ch2\u003eBackground\u003c/h2\u003e \u003cp\u003eAuscultation is a critical diagnostic feature of lung diseases, but it is subjective and challenging to measure accurately. To overcome these limitations, artificial intelligence models have been developed.\u003c/p\u003e\u003ch2\u003eMethods\u003c/h2\u003e \u003cp\u003eIn this prospective study, we aimed to compare respiratory sound feature extraction methods to develop an optimal machine learning model for detecting wheezing in children. Pediatric pulmonologists recorded and verified 103 instances of wheezing and 184 other respiratory sounds in 76 children. Various methods were used for sound feature extraction, and dimensions were reduced using t-distributed Stochastic Neighbor Embedding (t-SNE). The performance of models in wheezing detection was evaluated using a kernel support vector machine (SVM).\u003c/p\u003e\u003ch2\u003eResults\u003c/h2\u003e \u003cp\u003eThe duration of recordings in the wheezing and non-wheezing groups were 89.36\u0026thinsp;\u0026plusmn;\u0026thinsp;39.51 ms and 63.09\u0026thinsp;\u0026plusmn;\u0026thinsp;27.79 ms, respectively. The Mel-spectrogram, Mel-frequency Cepstral Coefficient (MFCC), and spectral contrast achieved the best expression of respiratory sounds and showed good performance in cluster classification. The SVM model using spectral contrast exhibited the best performance, with an accuracy, precision, recall, and F-1 score of 0.897, 0.800, 0.952, and 0.869, respectively.\u003c/p\u003e\u003ch2\u003eConclusion\u003c/h2\u003e \u003cp\u003eMel-spectrograms, MFCC, and spectral contrast are effective for characterizing respiratory sounds in children. A machine learning model using spectral contrast demonstrated high detection performance, indicating its potential utility in ensuring accurate diagnosis of pediatric respiratory diseases.\u003c/p\u003e","manuscriptTitle":"Machine Learning-Driven Strategies for Enhanced Pediatric Wheezing Detection","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2024-06-12 17:42:03","doi":"10.21203/rs.3.rs-4419150/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"c196ffda-1be9-490d-9978-1c8b7889dd13","owner":[],"postedDate":"June 12th, 2024","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[],"tags":[],"updatedAt":"2024-12-16T10:54:10+00:00","versionOfRecord":[],"versionCreatedAt":"2024-06-12 17:42:03","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-4419150","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-4419150","identity":"rs-4419150","version":["v1"]},"buildId":"8U1c8b4HqxoKbykW_rLl7","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

⚙ Ask this paper AI returns verbatim quotes from the full text · source: preprint-html ⓘ

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2024) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc: last seen: 2026-05-20T01:45:00.602351+00:00