Artificial Intelligence Enhanced Electrocardiogram Analysis for Age and Sex Classification in Youth | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article Artificial Intelligence Enhanced Electrocardiogram Analysis for Age and Sex Classification in Youth Honggen Zhang, Mohammad Zaeri-Amirani, Mojtaba Abolfazli, Narayana P. Santhanam, and 5 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-7512909/v1 This work is licensed under a CC BY 4.0 License Status: Published Journal Publication published 18 Feb, 2026 Read the published version in Pediatric Cardiology → Version 1 posted 7 You are reading this latest preprint version Abstract Introduction Electrocardiogram (ECG) values vary significantly across age and sex, particularly during childhood and adolescence. While age- and sex-specific ECG standards exist, they often fail to capture complex multi-dimensional relationships and have not been applied in machine learning (ML) enhanced ECG analysis. Accuracy of automated ECG analysis in clinical practice improved significantly by applying ML models, however there is a paucity of such studies in the pediatric population. Our aim was to create age- and sex-specific standards for children by ML modeling. Methods We analyzed 29,408 curated resting 12-lead ECGs from healthy subjects aged 0-21 years using 177 digitized ECG variables combined with various ML models including regression and classification analyses and semi-supervised neural networks. Primary outcome variables were age and sex. Model performance was evaluated using F1-score, AUROC, and confusion matrices across repeated train-test splits. Results Support vector machine (SVM) achieved the highest accuracy in modeling both age and sex. Key predictive features included heart rate, PR interval, QRS duration, and T-wave amplitude. Age-group classification achieved an average true positive rate of 60% with SVM, improving to 94% when allowing one-group misclassification. Sex classification reached F1-scores of 0.91 and AUROC of 0.95 in adolescents and young adults, and moderate accuracy in younger children. Discussion Traditional supervised ML models can accurately model physiologic ECG changes related to age and sex, outperforming semi-supervised models, particularly in smaller subgroups. These findings support the development of age- and sex-specific ML-enhanced ECG standards to aid future research and clinical applications in pediatric cardiology. artificial intelligence machine learning electrocardiogram standards pediatric screening Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Introduction Since its invention by Einthoven in 1898, the electrocardiogram (ECG) has been one of the most important screening and diagnostic tests for heart problems [ 1 ] However, ECG analysis with the ability to indicate underlying cardiac disease states has historically relied upon the inherent biases and errors of physician interpretation based on standards derived from very few healthy subjects. In pediatric and adolescent populations, ECG interpretation is further hindered by the distinct challenges due to the rapid and profound physiological changes that occur from birth through young adulthood. These developmental changes affect ECG waveform morphology and timing, reflecting underlying alterations in heart size, autonomic tone, ion channel function, and conduction pathways. Detailed assessment of the largest historical pediatric cohort of normal ECGs recently allowed us to understand the granularity of these changes in the parameters of ECG variables across childhood [ 2 ]. Yet, these standards are primarily based on summary statistics of isolated ECG variables and do not fully capture the complex, multivariate patterns embedded within the high-dimensional ECG signal. Recognizing these dynamic and complex features is essential for accurate clinical interpretation. Recent advances in machine learning (ML) have ignited the development of a novel field: artificial intelligence (AI) enhanced ECG analysis [ 3 – 5 ]. These ML models offer the potential to leverage the full complexity of data in ECGs, and can integrate numerous ECG features simultaneously, uncovering latent patterns, and improving the granularity of physiological modeling. Importantly, establishing ML models that can accurately infer age and sex based solely on ECG data is a critical step toward developing adaptive, context-aware diagnostic tools. These models can identify subtle electrophysiological signatures correlated with demographic factors, thus providing a normative framework that enhances detection of pathological deviations. AI-enhanced ECG analysis has been used for the detection of ventricular dysfunction and hypertrophic cardiomyopathy [ 6 – 8 ] and was also able to model the age and sex of adults and estimate a biological or cardiac age of a person [ 9 – 10 ]. However, these studies were performed in adults. There is a paucity of data of AI-enhanced ECG analysis in children, complicated by the profound changes in the ECG that occur between 0–21 years of age. Hypertrophic cardiomyopathy (HCM), left ventricular dysfunction and recently, congenitally corrected transposition of the great arteries have been successfully analyzed by ML models with reasonable detection rate [ 11 – 13 ]. However, other less common heart conditions have not been modeled by AI in children. ECG analysis in children and youth has a unique importance in screening for rare heart conditions, congenital heart defects and inherited arrhythmia syndromes [ 14 – 17 ], but this screening cannot be performed in an automated model, if there is no definition of age-and sex-specific normal standards. To date, AI-generated age- and sex-specific ECG standards for children and adolescents have not been developed, limiting the application of automated models for pediatric cardiac screening. This study aims to evaluate multiple supervised and semi-supervised ML architectures to classify age groups and sex from ECG features of pediatric and young adult individuals. By developing age- and sex-classification, we aim to establish a foundation for automated AI-enhanced ECG analysis leveraging age- and sex-specific standards for children and young adults. Methods ECG data source and study population This study used a large, curated cohort of ECGs from subjects with no known heart condition. The cohort, previously validated and published (Bratincsak et al, Circ AE, 2020) included ECGs collected retrospectively from patients aged 1 day to 21 years at Hawaii Pacific Health and Rady Children’s Hospital San Diego between 2012 and 2022 (n = 70,816). ECGs were obtained for a variety of clinical reasons, including evaluation of heart murmur, irregular heartbeat, syncope, dizziness, bradycardia, tachycardia, fever, screening for certain diseases, and for sports pre-participation screening. ECG from patients with congenital or acquired heart conditions, history of heart surgery, arrhythmia syndromes, pacemakers, or duplicate ECGs from the same patient within the same age-group were excluded from the study (n = 31,278). The stringent exclusion criteria created a curated cohort of ECGs from children and adolescents with no evidence of heart defect or cardiac anomaly on a more than 7-year average follow-up. All ECGs were performed in resting supine position using GE MAC 5500 HD ECG systems (General Electrics, Houston, TX) at 500 Hz sampling frequency with standardized voltage (10 mm = 1 mV) and speed (25 mm/s). Digitized ECG values were exported from the GE Muse v9 system. ECGs were excluded from the final analysis if they had technical errors, lead reversal, poor baseline, or missing lead information (n = 10,130), resulting in a final cohort of 29,408 complete ECGs. The study was approved by the Hawaii Pacific Health Research Institute and deemed exempt from further Institutional Review Board approval due to the retrospective nature of the analysis. ECG variable selection and processing ECG variables were pre-selected based on expert physician input and prior established standards. We included 177 ECG variables (features) in the analysis of the ECGs, such as: P, QRS, and T axes; frontal QRS-T and spatial QRS-T angles, RR interval; PR interval; QRS duration; QT interval and corrected QT interval (QTc) calculated using the Bazett and Fridericia methods; peak amplitudes of P, Q, R, S and T waves; QRS integral; and T wave integral in all leads (I, II, III, aVL, aVF, aVR, V1, V2, V3, V4, V5, V6). Machine learning models, training and testing Selected ML models were used on standardized digitized values of 177 ECG variables. We compared supervised and semi-supervised ML models for the determination of age and gender. Assessment with supervised ML models included Support Vector Machines (SVM) with linear and Radial Basis Function (RBF) kernels, Adaptive Boosting (AdaBoost) with decision trees, and Linear Discriminant Analysis (LDA) models. For semi-supervised ML we used Triplets Bidirectional Generative Adversarial Networks (T-BiGAN) and Residual Networks (ResNet), a deep neural network. Traditional ML models Analysis with SVM is usually employed to solve complex classifications with a sensitive detection of outliers by defining boundaries among data points predetermined by certain supervised inputs. We used SVM in both the original data space with a linear kernel and in a new feature space obtained by a non-linear transformation of the data using RBF kernel. with penalty C = 0.5, 1, and 1.5 analyzed and optimal penalty C = 1.5 selected for age classification, and the default C = 1 for sex classification. AdaBoost is a popular ensemble-based method for data classification that can enhance the power of a base/weak classifiers by a weighted linear combination of original data. We used AdaBoost with a decision tree model as the base estimator with hyperparameter depth of D = 1,3,5 tested, and learning rate L = 0.1,1,2 tested. The base estimator was set at 200 with the optimal hyperparameter combination of D = 5 and L = 1 for age classification, while for sex classification the base estimator was 50, D = 1, and L = 1. LDA is a multi-class classification model that can be used for supervised learning by maximizing class separation on a low-dimensional space. We used LDA to separate multiple classes with multiple features based on data dimensionality reduction involving the entire ellipse of data and not only data on the boundary of distinct groups. Age classification is a multi-class classification, and we used the entire dataset to develop the model. Model performance was evaluated by ratio of predicted and true labels for the determination of age-across the 0–21 years of the cohort, Sex classification is a binary determination. Since the difference in ECG variables is more pronounced among various ages than between the two sexes, we performed binary analysis to determine the sex of the subjects within each age-group. The performance of binary classification for SVM, AdaBoost and LDA was assessed using both the F1 score and receiver operating characteristic (ROC) curves as comparison metrics. We calculated ROC with the probability of each sample being assigned to one class. For model classification accuracy we used the F1-score metric, which combines precision and recall scores. Semi-supervised ML models T-BiGAN is a ML model that offers improved feature representation through semi-supervised learning, employing a model based on Bidirectional Generative Adversarial Networks (BiGAN). In this approach, semi-supervised data is seamlessly integrated into the training process via an additional triplet loss term. The BiGAN structure comprises an encoder and a decoder, facilitating data transformation into a latent space, alongside a discriminator tasked with distinguishing genuine data from synthetic data within the GAN framework. In T-BiGAN, auxiliary labels within the dataset are utilized. The choice of triplet loss is deliberated: during training, the model considers a probability where the distance from a query example to a negative example (i.e. those with labels different from the query) should be greater than the distance to positive examples. This strategic use of triplet loss fosters a mapping in the latent space that encourage data with the same label to form distinct clusters, differentiating them from data with other labels. This is the underlying rationale for the application of T-BiGAN on ECG data. In our T-BiGAN model, we incorporated sex and age groups (2 sex-groups x 9 age-groups = 18 categories) as auxiliary labels during the training process. The model uses a latent dimension of 50 z_dim = 50), and is comprised of three main components: an encoder, a generator, and a discriminator. Each of these neural components consists of two hidden layers, with each layer having a size of 1024 units followed by a leaky Rectified Linear Unit (ReLU) activation layer with a negative slope of 0.2 for non-linearity. Additionally, the layers in the generator are followed by Batch Normalization layers. The optimizer is Adam with a very small initial learning rate of 1e-8 and β₁ = 0.5, making the training stable but very slow to start. The model is regularized using L2 weight decay (2.5e-5) and weights are initialized with a small Gaussian noise (stddev = 0.02). Training is run for 501 epochs with a batch size of 256. Residual Networks (ResNet) is a deep neural network architecture designed to facilitate the training of exceptionally deep networks. It achieves this by introducing residual blocks, which utilize skip connections to learn the difference (residual) between input and desired output in each block. ResNet architectures often incorporate batch normalization, global average pooling, and some in various depths. The ResNet model comprises of an initial convolutional layer that applies 32 filters, batch normalization, ReLU activation, and max-pooling. The ResNet-based classifier is built on top of ImageNet-pretrained ResNet-50, using it as a feature extractor (with all layers frozen). The output of the ResNet backbone is passed through a custom fully connected head, consisting of dense layers of sizes 512, 256, 128, and 64, each followed by ReLU activation and 50% dropout to prevent overfitting. The final layer is a Softmax classifier that outputs class probabilities. The model is compiled with the Adam optimizer, using categorical cross-entropy loss and accuracy as a metric. Similarly to the supervised ML models, T-BiGAN and ResNet models were used to characterize the sex of the subjects in a binary classification, while the age-group classification used a multi-class model for the determination of set age-groups. Training and testing For all supervised (SVM, AdaBoost, LDA) and semi-supervised (T-BiGAN, ResNet) ML models 75% of data was used for training, 5% for testing and hyperparameter tuning, and 15% for final classification. To minimize overfitting and report performance variance, we used k-fold cross-validation. Statistical analysis For binary classification, simple descriptive statistical measures were calculated (true positive, false positive, true negative, and false negative rates). From those rates, ROC curves were generated and Area Under the ROC Curve (AUROC) was calculated. For predictive accuracy, precision rates or positive predictive values (true positive divided by the sum of true and false positive), and recall or sensitivity (true positive divided by the sum of true positive and false negative) rates were calculated, next F1 scores were generated as the harmonic mean of the precision and recall rates, being one of the most accurate measures of test predictability. For multiple group classification confusion matrices were generated to assess true and predicted positive rates. Results Study population After exclusion of patients with heart conditions, duplicate and erroneous ECGs, the final study cohort contained 29,408 curated normal ECGs across ages of 1 day to 21 years (12,318 male – 41.9%, 17,090 female – 58.1%). The cohort was divided based on prior ECG age classification following the developmental stages of children and young adults, to the following 9 age-groups: 1) term newborns: 1–6 days old; 2) neonates: 1–4 weeks old; 3) young infants: 1 months to < 6 months old; 4) older infants: 6 months to < 2 years old; 5) toddlers and small children: 2 to < 5 years old; 6) children: 5 to < 9 years old; 7) preteen children: 9 to < 13 years old; 8) teenagers: 13 to < 17 years old; 9), and adolescents to young adults: 17 to < 22 years old. The number of patients in each age group ranged from 304 to 7,366 (Table 1 ). Although we attempted to model age with a continuous regression analysis across the entire age range, but the confidence interval and error margin of ± 1.2 years was deemed to be too large and meaningless, when we had to compare 1–6 days old and 2–4 weeks old infants. Therefore, and following previous physiological classification of infants and children into distinct age groups, we performed our age-classification analysis using 9 distinct age groups. Table 1 Number of study subjects from 1 day to 21 years sorted into 9 age-groups Age groups 1 2 3 4 5 6 7 8 9 Ages 1–6 days 1–4 weeks 5 weeks − 5 months 6–23 months 2–4 years 5–8 years 9–12 years 13–16 years 17–21 years N 304 684 1328 1766 2384 3075 4323 8178 7366 male 162 363 740 948 1227 1574 1896 3032 2376 female 142 321 588 818 1157 1501 2427 5146 4990 Modeling of sex in various ages Sex-related differences in ECG parameters were subtle and often masked by the more pronounced age-related changes. To control for the age-related differences, sex classification models were trained and tested separately within each age group. Predictive accuracy varied by age group and the ML model used. The summated area under the ROC curve in distinguishing male and female ECGs ranged from 61% in younger children to 96% in teenagers and young adults. Consistent with the AUROC scores, F1 scores for determining sex ranged from 0.53 to 0.91, depending on the age group and the ML model used. The best sex classification performance was achieved in teenagers (13–17 years) and young adults (18–21 years) with an AUROC pf 95% and an F1 score of 0.91, compared to AUROC of 65–82% and F1 scores of 0.53–0.7 in younger children (0–12 years) (Fig. 1, Table 2 ). Supervised ML models, such as SVM, consistently outperformed semi-supervised deep learning models, such as ResNet, in predicting sex along all age groups, e.g. an F1 score of 0.91 by SVM vs. 0.60 by ResNet, and an AUROC of 96% by SVM vs. 73% by ResNet in the 16–21 years old group (age group 9) (Fig. 2, Table 2 ). Table 2 Area Under the Receiver Operating Characteristic Curve and F1 accuracy scores in differentiating sex using various machine learning models and 5-fold cross-validation Area Under the Receiver Operating Characteristic Curves Age groups 1 2 3 4 5 6 7 8 9 SVM-RBF 0.74 ± 0.12 0.65 ± 0.12 0.67 ± 0.08 0.71 ± 0.05 0.73 ± 0.05 0.74 ± 0.06 0.82 ± 0.03 0.94 ± 0.02 0.95 ± 0.02 AdaBoost 0.76 ± 0.13 0.66 ± 0.08 0.66 ± 0.05 0.72 ± 0.06 0.73 ± 0.09 0.75 ± 0.05 0.82 ± 0.05 0.94 ± 0.02 0.95 ± 0.01 LDA 0.55 ± 0.04 0.60 ± 0.03 0.64 ± 0.03 0.66 ± 0.01 0.71 ± 0.01 0.72 ± 0.01 0.81 ± 0.01 0.93 ± 0.01 0.95 ± 0.01 T-BiGAN 0.72 ± 0.04 0.68 ± 0.03 0.68 ± 0.02 0.68 ± 0.02 0.70 ± 0.01 0.70 ± 0.01 0.78 ± 0.01 0.93 ± 0.01 0.94 ± 0.01 Resnet 0.57 ± 0.04 0.55 ± 0.03 0.51 ± 0.02 0.55 ± 0.02 0.55 ± 0.01 0.50 ± 0.01 0.58 ± 0.01 0.52 ± 0.01 0.67 ± 0.01 F1 accuracy scores Age groups 1 2 3 4 5 6 7 8 9 SVM-RBF 0.70 ± 0.1 0.66 ± 0.11 0.65 ± 0.09 0.65 ± 0.08 0.68 ± 0.06 0.71 ± 0.05 0.75 ± 0.05 0.87 ± 0.02 0.91 ± 0.02 AdaBoost 0.66 ± 0.12 0.61 ± 0.11 0.60 ± 0.08 0.63 ± 0.07 0.64 ± 0.06 0.67 ± 0.05 0.70 ± 0.04 0.85 ± 0.02 0.89 ± 0.01 LDA 0.53 ± 0.03 0.63 ± 0.03 0.61 ± 0.03 0.60 ± 0.02 0.63 ± 0.01 0.63 ± 0.01 0.71 ± 0.01 0.83 ± 0.01 0.83 ± 0.01 T-BiGAN 0.61 ± 0.04 0.60 ± 0.03 0.57 ± 0.02 0.61 ± 0.02 0.64 ± 0.01 0.64 ± 0.01 0.71 ± 0.01 0.85 ± 0.01 0.86 ± 0.01 Resnet 0.57 ± 0.04 0.55 ± 0.03 0.52 ± 0.02 0.51 ± 0.02 0.53 ± 0.01 0.56 ± 0.01 0.58 ± 0.01 0.61 ± 0.01 0.64 ± 0.01 Age groups: 1: 1–6 days old; 2: 1–4 weeks old; 3: 1 months to < 6 months old; 4: 6 months to < 2 years old; 5: 2 to < 5 years old; 6: 5 to < 9 years old; 7: 9 to < 13 years old; 8: 13 to < 17 years old; 9: 17 to < 22 years old; 10: 22 to < 30 years old; and 11: 30–40 years old. Machine learning models: Support Vector Machines (SVM) with Radial Basis Function (RBF) kernels, Adaptive Boosting (AdaBoost) with decision trees, Linear Discriminant Analysis (LDA), Triplets Bidirectional Generative Adversarial Networks (T-BiGAN) and Residual Networks (ResNet). Modeling of age from infancy to young adulthood Numerous ECG variables showed variations among different age-groups in children and young adults. The following ECG features were identified having a higher importance using the permutation importance method: heart rate, PR interval, QRS duration, QTc interval, and R and T wave amplitudes in V1, V3 and V6 (Fig. 3). These key features were explicitly utilized in the supervised ML models to develop age-group prediction, while the semi-supervised T-BiGAN model processed all features without any discrimination or weight, and transformed to latent space to develop age-group classification. Visual representation of individual data points reflects how the accuracy of age prediction improved when the data was transformed to latent space (Fig. 4). All models successfully discriminated among the predefined 9 age-groups in the multi-class classification model, with a true positive rate (TPR) ranging from 46–72%. The highest accuracy was observed in distinguishing the youngest age groups (1–6 days, 1–4 weeks, 1–5 months old) with TPRs of 64–72%, compared to older age groups (2–21 years), with TPRs of 46–62% (Fig. 5). Traditional supervised models performed better than semi-supervised neural networks. SVM with RBF kernel showed the highest accuracy with an average TPR of 60%, outperforming AdaBoost (49%), and LDA (55%). Younger age-groups with smaller number of subjects had a higher variation of results depending on hyperparameter optimization, reflected by wider confidence intervals, nevertheless, SVM outperformed the other models in multi-class classification despite hyperparameter optimization. Among semi-supervised models, T-BiGAN had the highest average TPR (57%) in multi-class classification of various ages compared to Resnet (39%). A confusion matrix visually represents the true positive rates using SVM with RBF kernel (Fig. 5). The matrix revealed that misclassification (false negative rate: FNR) predominantly occurred in immediate neighboring age-groups, consistent with the expected overlap in physiological changes. Allowing a single age-group deviation (± 1 age-group error margin), increased the average adjusted age-group detection accuracy (TPR) to 94% for SVM (range 91–99%), and 92% with T-BiGAN (range 88–98%), and a FNR of 1–9% with SVM, and 2–12% with T-BiGAN. Comparison of various machine learning models Comparison of traditional supervised ML methods (SVM, AdaBoost, LDA) compared to semi-supervised neural networks(T-BiGAN, ResNet) demonstrated that when including less than 300 ECGs, supervised methods (SVM with linear or RBF kernel) outperformed semi-supervised methods in predicting both age and sex. When the analyzed data included 1000 or more ECGs, supervised and semi-supervised methods had similar accuracy. Overall, the best results were achieved using SVM with RBF kernel in both sex-prediction with a binary classification, and age-prediction using a multi-class classification. Discussion Our findings demonstrate that machine learning models can accurately classify both age and sex from ECG features in children and young adults. Notably, classification of age and sex was achieved with high precision even in groups containing only a few hundred ECGs – previously not demonstrated with any other method. Our results support the concept that age- and sex-related physiological differences are encoded in the ECG waveform and can be decoded through data-driven methods. Modeling demographic-specific normal ECGs using specific ML models may facilitate the development of more precise, automated ECG interpretation frameworks. Age and sex classification In early childhood, physiologic differences between males and females are minimal, but become more apparent around (10–12 years). Our results demonstrate that among adolescents and young adults, when males and females differ in physiologic features, their ECG also changes, reflected by subtle sex-related differences in specific parameters, such as PR interval, QRS duration, QTc interval, and R and S wave voltages in many leads. ML models detected these changes with high accuracy in adolescents and young adults, with AUROC values of 0.94–0.95. Such a high AUROC would serve as a remarkable metric for a screening tool, supporting the idea that the ECG encodes biologically relevant sex-specific signatures. Similar to sex, age can be determined accurately by ECG in children and young adults. Somatic and physiologic changes occur during the development of children, with the most dramatic changes observed during early childhood (0–5 years), and less pronounced differences in adolescents and young adults (15 to 21 years). Following somatic growth, ECG variables change throughout childhood, with the most dramatic changes in early childhood, detected by specific ECG parameters, such as heart rate, PR, QRS and QTc intervals, and R and T wave voltages. Despite sample size limitations and class imbalance among analyzed groups, both supervised and semi-supervised ML models classified various ages with an excellent adjusted true positive rate of 88–99% by the ECG. This is the first ML-based multi-class model to classify nine distinct pediatric age groups using ECG data alone. The proof of the concept of multi-class classification using ML-enhanced ECG analysis is foundational in the process of developing an automated system for ECG analysis and defining normal ECGs for every age and sex. Machine learning model comparison Supervised ML models (SVM and AdaBoost) performed more favorably in prediction modeling of datasets with less than 1000 data points (ECGs) compared to semi-supervised neural networks (T-BiGAN and ResNet). Supervised and weighted preselection of ECG variables improved the prediction accuracy of supervised ML models. When a dataset (specific age group) contained more than 1000 data points (ECGs), the difference in performance between the supervised and semi-supervised models diminished. While previous ECG-based AI models relied on datasets in excess of 5,000 subjects, our study shows that effective classification and prediction is possible with much smaller subject numbers. Developing ML algorithms for ECG analysis that can model analyzed groups with less than a 1000 subjects, and distinguish these groups with remarkable accuracy is particularly relevant for rare cardiac disorders, where large datasets are not available. Clinical significance of sex and age classification in youth Our results highlight that ECG signatures differ by demographic group and that ML can reliably recognize these patterns. The most significantly affected ECG variables and ML modeling features were PR, QRS and QTc intervals and the R, S and T wave amplitudes. These very same ECG parameters are used to diagnose several heart conditions, including conduction defects, long QT syndrome (LQTS), and ventricular hypertrophy or enlargement associated with cardiomyopathies. The subtle changes in ECG variables of these heart conditions are not only affected, but could be easily masked by the changes caused by different age-groups and sexes. Without age- and sex-specific reference values, such abnormalities can be overlooked or misinterpreted. For example, a mildly prolonged QT interval may be normal in a 1-week-old female, but would serve as a suspected diagnosis in a 10-year-old male, and similarly a certain QRS duration and S wave amplitude in lead V2 could be normal in an 8-year-old female, but would be associated with significant ventricular enlargement or hypertrophy in a 2-year-old female. Establishing robust normative ECG standards is crucial for future ML-based models aimed at detecting heart conditions in children. These models can only perform accurately if trained on well-defined demographic baselines with curated normal ECGs. By modeling healthy ECG profiles across pediatric and young adult age groups and sexes, we set the stage for ML tools that can provide demographic-specific diagnosis in clinical practice. It is important to emphasize that the primary aim of our study was not to predict age or sex as diagnostic endpoints, but rather to assess and model the representation of these factors in the ECG data. This distinction is important because accurate identification of age and sex signatures is a prerequisite for developing reliable diagnostic models that avoid confounding by demographic variability. In clinical practice, age and sex are known and recorded variables, but understanding their explicit ECG correlates enhances model transparency and interpretability, will serve as the foundation for future age-and sex-specific ML models for cardiac disease prediction, and provide an important step towards creating AI-enhanced ECG screening. Limitations and strengths Our study had certain limitations. Sex was self-reported, which may not always align with biologic sex – potentially affecting ECG patterns. Our ML analysis was limited to selected models, and excluded certain advanced AI methods, such as deep and convolutional neural networks, however, we believe that we have chosen representative models with appropriate optimization, and the exclusion of certain models was because the input signal was not deemed complex enough to warrant them. Strengths of our study include the use of a curated dataset containing only healthy subjects, eliminating patients with cardiac conditions, and enabling the establishment of normal ECG values for accurate ML modeling. We also compared multiple ML algorithms, rather than relying on a single model, which strengthens confidence in our results and helps mitigate bias and overfitting. Achieving consistent findings across different methods suggests robustness and reproducibility of our results. Conclusion In conclusion, this study provides foundational evidence that ML can uncover age- and sex-specific signatures in pediatric ECG data. By establishing reliable age- and sex-specific ECG standards, this work supports future efforts to build ML models capable of identifying conditions with subtle ECG changes affected by age- and sex-specific variations. Our findings move the field beyond static reference standards toward dynamic, ML-informed models that better capture biological variability and will provide personalized, context-aware ECG interpretation. Such AI-enhanced ECG analytic models incorporating demographic variation are poised to improve the accuracy and reliability of ECG interpretation, particularly for rare cardiac conditions in children, where sample sizes are limited and demographic variability is large. Declarations Ethical Approval This retrospective study was deemed exempt from IRB review, as it was conducted using data previously collected under an Institutional Review Board protocol approved by Hawaii Pacific Health, and in accordance with the ethical standards set forth in the 1964 Declaration of Helsinki. Competing Interests The authors declare that the research was conducted in the absence of any commercial, financial or non-financial** relationships that could be construed as a potential conflict of interest. Author Contribution HZ – Methodology, Formal Analysis, Writing – original draft.MZ-A – Methodology, Formal Analysis, Writing – review & editing. MA – Methodology, Formal Analysis, Writing – review & editing.NS – Conceptualization, Methodology, Writing – review & editing.JZ – Methodology, Writing – review & editing.AH-M – Conceptualization, Methodology, Writing – review & editing. CK – Methodology, Formal Analysis, Writing – review & editing. JP – Conceptualization, Writing – original draft. AB – Conceptualization, Methodology, Formal Analysis, Writing – original draft.All authors have read and approved the final manuscript. Acknowledgments The research for this manuscript was supported by the following grants: 1R21LM0138818 awarded by the National Library of Medicine, National Institutes of Health; and NRT-AI 2244574 awarded by the National Science Foundation. Data Availability The datasets of electrocardiogram variables for children and young adults will be made partially available upon request. Due to patients’ privacy and regulations about HIPAA protected personal identifiable information, the entire dataset is not available as an open source and will not be made available, because patient-specific ECG parameters may identify an individual patient. References Fisch C (2000) Centennial of the string galvanometer and the electrocardiogram. J Am Coll Cardiol 36:1737–1745. https://doi.org/10.1016/s0735-1097(00)00976-1 Bratincsák A, Kimata C, Limm-Chan BN, Vincent KP, Williams MR, Perry JC (2020) Electrocardiogram standards for children and young adults using Z-scores. Circ Arrhythm Electrophysiol 13:e008253. https://doi.org/10.1161/CIRCEP.119.008253 Feeny A, Chung MK, Madabhushi A, Attia ZI, Cikes M, Firouznia M, et al. (2019) Artificial intelligence and machine learning in arrhythmias and electrophysiology. Circ Arrhythm Electrophysiol 12:e007952. https://doi.org/10.1161/CIRCEP.119.007952 Ribeiro AH, Ribeiro MH, Paixão GMM, Oliveira DM, Gomes PR, Canazart JA, et al. (2020) Automatic diagnosis of the 12-lead ECG using a deep neural network. Nat Commun 11:1760. https://doi.org/10.1038/s41467-020-15656-0 Siontis KC, Attia ZI, Friedman PA, Noseworthy PA, Kapa S, Lopez-Jimenez F, et al. (2021) Artificial intelligence-enhanced electrocardiography in cardiovascular disease management. Nat Rev Cardiol 18:349–360. https://doi.org/10.1038/s41569-020-00503-2 Attia ZI, Noseworthy PA, Lopez-Jimenez F, Asirvatham SJ, Deshmukh AJ, Gersh BJ, et al. (2019) An artificial intelligence-enabled ECG algorithm for the identification of patients with atrial fibrillation during sinus rhythm. Lancet 394:861–867. https://doi.org/10.1016/S0140-6736(19)31721-0 Attia ZI, Kapa S, Lopez-Jimenez F, McKie PM, Ladewig DJ, Satam G, et al. (2019) Screening for cardiac contractile dysfunction using an artificial intelligence-enabled electrocardiogram. Nat Med 25:70–74. https://doi.org/10.1038/s41591-018-0240-2 Ko WY, Siontis KC, Attia ZI, Carter RE, Kapa S, Ommen SR, et al. (2020) Detection of hypertrophic cardiomyopathy using a convolutional neural network-enabled electrocardiogram. J Am Coll Cardiol 75:722–733. https://doi.org/10.1016/j.jacc.2019.12.030 Attia ZI, Friedman PA, Noseworthy PA, Lopez-Jimenez F, Ladewig DJ, Satam G, et al. (2019c) Age and sex prediction using an artificial intelligence-enabled electrocardiogram. Circ Arrhythm Electrophysiol 12:e007284. https://doi.org/10.1161/CIRCEP.119.007284 van der Wall EE (2022) International criteria for electrocardiographic interpretation in athletes: Consensus statement. J Electrocardiol 71:1–6. https://doi.org/10.1016/j.jelectrocard.2021.12.001 Siontis KC, Noseworthy PA, Attia ZI, Carter RE, Yao X, Kapa S, et al. (2021) Detection of hypertrophic cardiomyopathy by an artificial intelligence electrocardiogram in children and adolescents. Int J Cardiol 340:42–47. https://doi.org/10.1016/j.ijcard.2021.08.026 Mayourian J, Kaye D, Chowdhury D, Konerman M, Moghaddam AN, Sambidi P, et al. (2024) Pediatric ECG-based deep learning to predict left ventricular dysfunction and remodeling. Circulation 149:917–931. https://doi.org/10.1161/CIRCULATIONAHA.123.067750 Ghelani SJ, Thatte N, La Cava W, Triedman JK, Mayourian J. Artificial Intelligence-Enabled ECG to Detect Congenitally Corrected Transposition of the Great Arteries. Pediatr Cardiol. 2025 Jun 16. doi: 10.1007/s00246-025-03916-3. Epub ahead of print. PMID: 40523997. Gillette PC, Garson A (1992) Sudden cardiac death in the pediatric population. Circulation 85:I64–I69. https://doi.org/10.1161/01.CIR.85.1_suppl.I64 Rodday AM, Tryka KA, King ME, Goodwin J, Graham D, Parsons SK (2012) Electrocardiogram screening for disorders that cause sudden cardiac death in asymptomatic children: a meta-analysis. Pediatrics 129:e999–e1010. https://doi.org/10.1542/peds.2011-0643 Drezner JA, Ackerman MJ, Anderson J, Ashley E, Asplund CA, Baggish AL, et al. (2017) International criteria for electrocardiographic interpretation in athletes: Consensus statement. Br J Sports Med 51:704–731. https://doi.org/10.1136/bjsports-2016-097331 Sarto P, Zorzi A, Merlo L, Cerrone M, Cipriani A, Mattioli AV, et al. (2023) Value of screening for the risk of sudden cardiac death in young competitive athletes. Eur Heart J 44:1084–1092. https://doi.org/10.1093/eurheartj/ehac015 Additional Declarations No competing interests reported. Cite Share Download PDF Status: Published Journal Publication published 18 Feb, 2026 Read the published version in Pediatric Cardiology → Version 1 posted Editorial decision: Revision requested 03 Nov, 2025 Reviews received at journal 20 Oct, 2025 Reviewers agreed at journal 14 Oct, 2025 Reviewers invited by journal 06 Oct, 2025 Editor assigned by journal 02 Sep, 2025 Submission checks completed at journal 02 Sep, 2025 First submitted to journal 01 Sep, 2025 You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-7512909","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":529724669,"identity":"e1c58380-0069-4117-9b74-eba2336e87f3","order_by":0,"name":"Honggen Zhang","email":"","orcid":"","institution":"University of Hawaii","correspondingAuthor":false,"prefix":"","firstName":"Honggen","middleName":"","lastName":"Zhang","suffix":""},{"id":529724670,"identity":"c03abba0-9ff7-456a-bb2f-d48ce0723e68","order_by":1,"name":"Mohammad Zaeri-Amirani","email":"","orcid":"","institution":"University of Hawaii","correspondingAuthor":false,"prefix":"","firstName":"Mohammad","middleName":"","lastName":"Zaeri-Amirani","suffix":""},{"id":529724671,"identity":"5a3f8b8a-ae68-49f0-9128-85f174f60b27","order_by":2,"name":"Mojtaba Abolfazli","email":"","orcid":"","institution":"University of Hawaii","correspondingAuthor":false,"prefix":"","firstName":"Mojtaba","middleName":"","lastName":"Abolfazli","suffix":""},{"id":529724672,"identity":"a242d326-56f9-4732-999e-39b891aed166","order_by":3,"name":"Narayana P. Santhanam","email":"","orcid":"","institution":"University of Hawaii","correspondingAuthor":false,"prefix":"","firstName":"Narayana","middleName":"P.","lastName":"Santhanam","suffix":""},{"id":529724673,"identity":"511a1fbb-f62d-4157-8453-8931864e15bd","order_by":4,"name":"June Zhang","email":"","orcid":"","institution":"University of Hawaii","correspondingAuthor":false,"prefix":"","firstName":"June","middleName":"","lastName":"Zhang","suffix":""},{"id":529724674,"identity":"fab6e505-3fd3-4779-9b81-b98fa582d4f9","order_by":5,"name":"Anders Høst-Madsen","email":"","orcid":"","institution":"University of Hawaii","correspondingAuthor":false,"prefix":"","firstName":"Anders","middleName":"","lastName":"Høst-Madsen","suffix":""},{"id":529724675,"identity":"11a8f667-6613-4d4b-a9c0-5c4c091d65ed","order_by":6,"name":"Chieko Kimata","email":"","orcid":"","institution":"Hawaii Pacific Health","correspondingAuthor":false,"prefix":"","firstName":"Chieko","middleName":"","lastName":"Kimata","suffix":""},{"id":529724676,"identity":"78fceacc-66a2-4450-81d4-9824f69f9668","order_by":7,"name":"James C. Perry","email":"","orcid":"","institution":"University of California San Diego","correspondingAuthor":false,"prefix":"","firstName":"James","middleName":"C.","lastName":"Perry","suffix":""},{"id":529724677,"identity":"d596c56c-243b-4539-baa8-a8e0e4300345","order_by":8,"name":"Andras Bratincsak","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAAzElEQVRIiWNgGAWjYHACAyA+IGfAwEOiFmPStSRuIFoL/+zmbR9+/LmTvl269+jGLww2+fIOBLRI3DlWPLO37Vnuzjnn0m7LMKRZbjxAyJobOcYMvA2HczfcyDG7LcFw2MCwgYAOeaAWxj9/DqcbEK0FqNKYmYftcAJIy80PQC3yhNxleCOtmFm27ZnhhjtnzG4zGKQZGBDSIncjeTPjmz935A1u95jd/FFhYyBPyGEIIMHAwMwDtMLgAClaGH8AaRJsGQWjYBSMghECABQwR6lSILzoAAAAAElFTkSuQmCC","orcid":"","institution":"University of Hawaii","correspondingAuthor":true,"prefix":"","firstName":"Andras","middleName":"","lastName":"Bratincsak","suffix":""}],"badges":[],"createdAt":"2025-09-02 03:38:21","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-7512909/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-7512909/v1","draftVersion":[],"editorialEvents":[{"content":"https://doi.org/10.1007/s00246-025-04118-7","type":"published","date":"2026-02-18T15:57:33+00:00"}],"editorialNote":"","failedWorkflow":false,"files":[{"id":93882615,"identity":"f51a092c-19f8-4cef-b23d-f955455de986","added_by":"auto","created_at":"2025-10-19 16:53:50","extension":"docx","order_by":0,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":54399,"visible":true,"origin":"","legend":"","description":"","filename":"AIenhancedECGinyouthfinalPedsCardio.docx","url":"https://assets-eu.researchsquare.com/files/rs-7512909/v1/93bf7639a75ecf8465b19348.docx"},{"id":93882617,"identity":"cf73d9b6-b300-4e98-9609-d62ee0043120","added_by":"auto","created_at":"2025-10-19 16:53:50","extension":"tiff","order_by":1,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":11092102,"visible":true,"origin":"","legend":"","description":"","filename":"Figure1.tiff","url":"https://assets-eu.researchsquare.com/files/rs-7512909/v1/350a300ef80af8f7587dce84.tiff"},{"id":93883357,"identity":"99fd2791-4717-4357-b7cb-55b6bf1c3a1b","added_by":"auto","created_at":"2025-10-19 17:01:50","extension":"tiff","order_by":2,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":13503246,"visible":true,"origin":"","legend":"","description":"","filename":"Figure2.tiff","url":"https://assets-eu.researchsquare.com/files/rs-7512909/v1/ee6dbeb8e9c2d37b5fa35cbe.tiff"},{"id":93883358,"identity":"a6e0a17f-9c4e-4268-bf41-bcf3459ac1fe","added_by":"auto","created_at":"2025-10-19 17:01:51","extension":"tiff","order_by":3,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":11228214,"visible":true,"origin":"","legend":"","description":"","filename":"Figure3.tiff","url":"https://assets-eu.researchsquare.com/files/rs-7512909/v1/b582264c35ba870764afba44.tiff"},{"id":93883778,"identity":"a5997765-70a6-4b20-b77a-f8153fb930e3","added_by":"auto","created_at":"2025-10-19 17:09:51","extension":"tiff","order_by":4,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":8411382,"visible":true,"origin":"","legend":"","description":"","filename":"Figure4.tiff","url":"https://assets-eu.researchsquare.com/files/rs-7512909/v1/aa16421fa8026177b0c2955b.tiff"},{"id":93882623,"identity":"c90da9f5-c9a4-48f3-8c8f-5ee789fe8de6","added_by":"auto","created_at":"2025-10-19 16:53:51","extension":"tiff","order_by":5,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":12228366,"visible":true,"origin":"","legend":"","description":"","filename":"Figure5.tiff","url":"https://assets-eu.researchsquare.com/files/rs-7512909/v1/303c67a0fc80b8fb65ca89d7.tiff"},{"id":93882628,"identity":"7479436e-bd90-4346-9dbb-5f3f84be201d","added_by":"auto","created_at":"2025-10-19 16:53:51","extension":"json","order_by":6,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":10696,"visible":true,"origin":"","legend":"","description":"","filename":"f479273741874550b2362bb05605b412.json","url":"https://assets-eu.researchsquare.com/files/rs-7512909/v1/10354b985fc467191e334298.json"},{"id":93882620,"identity":"b1e368d2-ef56-47fc-ae75-cd24ec3ad7a6","added_by":"auto","created_at":"2025-10-19 16:53:51","extension":"xml","order_by":7,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":97747,"visible":true,"origin":"","legend":"","description":"","filename":"f479273741874550b2362bb05605b4121enriched.xml","url":"https://assets-eu.researchsquare.com/files/rs-7512909/v1/bfa73d5bce0c9f0efd64328a.xml"},{"id":93883359,"identity":"53e2a4d7-5312-406c-bdc0-8be8620dcf80","added_by":"auto","created_at":"2025-10-19 17:01:51","extension":"tiff","order_by":8,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":11092102,"visible":true,"origin":"","legend":"","description":"","filename":"Figure1.tiff","url":"https://assets-eu.researchsquare.com/files/rs-7512909/v1/d80badfd2fffd451f27fe019.tiff"},{"id":93882632,"identity":"6eb74a69-88e5-409b-859b-eab4a6e60f23","added_by":"auto","created_at":"2025-10-19 16:53:51","extension":"tiff","order_by":9,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":13503246,"visible":true,"origin":"","legend":"","description":"","filename":"Figure2.tiff","url":"https://assets-eu.researchsquare.com/files/rs-7512909/v1/406cada9e1b2f78b018db448.tiff"},{"id":93882637,"identity":"d31569b3-312c-4b31-b021-fa9151202e7b","added_by":"auto","created_at":"2025-10-19 16:53:51","extension":"tiff","order_by":10,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":11228214,"visible":true,"origin":"","legend":"","description":"","filename":"Figure3.tiff","url":"https://assets-eu.researchsquare.com/files/rs-7512909/v1/7a84ab9552b81068a8d6d530.tiff"},{"id":93882627,"identity":"bef0d23e-b9ae-48e5-b63e-bdf5cf49a066","added_by":"auto","created_at":"2025-10-19 16:53:51","extension":"tiff","order_by":11,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":8411382,"visible":true,"origin":"","legend":"","description":"","filename":"Figure4.tiff","url":"https://assets-eu.researchsquare.com/files/rs-7512909/v1/fe110cfbb756542c926821bf.tiff"},{"id":93883364,"identity":"1aaad160-2591-41c0-b01b-04f8a4956a6c","added_by":"auto","created_at":"2025-10-19 17:01:51","extension":"tiff","order_by":12,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":12228366,"visible":true,"origin":"","legend":"","description":"","filename":"Figure5.tiff","url":"https://assets-eu.researchsquare.com/files/rs-7512909/v1/1ea99751edc635aad5994e47.tiff"},{"id":93883361,"identity":"e5a42e23-7fce-446a-8cfa-67b82476b1c1","added_by":"auto","created_at":"2025-10-19 17:01:51","extension":"png","order_by":13,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":108114,"visible":true,"origin":"","legend":"","description":"","filename":"OnlineFigure1.png","url":"https://assets-eu.researchsquare.com/files/rs-7512909/v1/8b33d8f3e7d746d9235faff8.png"},{"id":93882636,"identity":"0a683966-43ee-4a39-8103-1109fb497d9e","added_by":"auto","created_at":"2025-10-19 16:53:51","extension":"png","order_by":14,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":153340,"visible":true,"origin":"","legend":"","description":"","filename":"OnlineFigure2.png","url":"https://assets-eu.researchsquare.com/files/rs-7512909/v1/4236e2af89db53f584366227.png"},{"id":93883779,"identity":"5efdf560-c61a-4143-8b76-3a39b41a9a8c","added_by":"auto","created_at":"2025-10-19 17:09:51","extension":"png","order_by":15,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":189704,"visible":true,"origin":"","legend":"","description":"","filename":"OnlineFigure3.png","url":"https://assets-eu.researchsquare.com/files/rs-7512909/v1/4120f2842f782843cd86d258.png"},{"id":93882625,"identity":"3e2884fb-b027-4055-a24e-99e63e3a75ae","added_by":"auto","created_at":"2025-10-19 16:53:51","extension":"png","order_by":16,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":69947,"visible":true,"origin":"","legend":"","description":"","filename":"OnlineFigure4.png","url":"https://assets-eu.researchsquare.com/files/rs-7512909/v1/fbde6916bdf6cd734bb93146.png"},{"id":93882633,"identity":"60a7757d-5c30-491c-92a0-a8cd1b1267ea","added_by":"auto","created_at":"2025-10-19 16:53:51","extension":"png","order_by":17,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":98844,"visible":true,"origin":"","legend":"","description":"","filename":"OnlineFigure5.png","url":"https://assets-eu.researchsquare.com/files/rs-7512909/v1/c4d34ba68fb6f204832b6ef3.png"},{"id":93883362,"identity":"57554283-cf9c-46f3-8528-3250bc37d134","added_by":"auto","created_at":"2025-10-19 17:01:51","extension":"xml","order_by":18,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":94729,"visible":true,"origin":"","legend":"","description":"","filename":"f479273741874550b2362bb05605b4121structuring.xml","url":"https://assets-eu.researchsquare.com/files/rs-7512909/v1/0dbccee5abc094b9df6c3ebf.xml"},{"id":93882635,"identity":"638aa534-fb44-41e5-902e-6d285394e00b","added_by":"auto","created_at":"2025-10-19 16:53:51","extension":"html","order_by":19,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":104309,"visible":true,"origin":"","legend":"","description":"","filename":"earlyproof.html","url":"https://assets-eu.researchsquare.com/files/rs-7512909/v1/7767797989d12832e71111c9.html"},{"id":93882618,"identity":"98c01b7d-b831-4777-b224-f6e52e8a99e8","added_by":"auto","created_at":"2025-10-19 16:53:50","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":1342360,"visible":true,"origin":"","legend":"\u003cp\u003eSex determination accuracy in youth: Classification of sex in various age groups displayed as Receiver Operating Characteristic curves using Support Vector Machine supervised machine learning model on 12-lead electrocardiograms. Sex could not be distinguished with high accuracy in age groups 1-6: ages 1 day to 8 years, consistent with minimal difference in the somatic appearance of girls and boys. Sex was classified with acceptable accuracy (82% Area Under the Curve) in age group 7: ages 9 years to 12 years, in the pre-pubertal age. Sex was predicted with very high accuracy (94-95% AUROC) in age groups 8-9: in ages 13 years and older, consistent with somatic differences between males and females. Age groups 1: 1-6 days old; 2: 1-4 weeks old; 3: 1 months to \u0026lt;6 months old; 4: 6 months to \u0026lt;2 years old; 5: 2 to \u0026lt;5 years old; 6: 5 to \u0026lt;9 years old; 7: 9 to \u0026lt;13 years old; 8: 13 to \u0026lt;17 years old; 9: 17 to \u0026lt;22 years old\u003c/p\u003e","description":"","filename":"Figure1.png","url":"https://assets-eu.researchsquare.com/files/rs-7512909/v1/89d6532b96d8f2eff562f3c6.png"},{"id":93882613,"identity":"bb74a22e-0ac8-4c13-bafe-4acb66f548b1","added_by":"auto","created_at":"2025-10-19 16:53:50","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":1532462,"visible":true,"origin":"","legend":"\u003cp\u003eSex determination accuracy with various ML models: Comparison of various machine learning models in the classification of sex across various ages using selected ML models, such as Support Vector Machines (SVM) with Radial Basis Function (RBF) kernels, Adaptive Boosting (AdaBoost) with decision trees, Linear Discriminant Analysis (LDA), Triplets Bidirectional Generative Adversarial Networks (T-BiGAN) and Residual Networks (ResNet), demonstrated that supervised and semi-supervised models (SVM, AdaBoost, LDA, T-BiGAN) outperformed neural network in the classification of sex in every age-group. SVM with RBF kernel resulted in the highest prediction accuracy. All ML models showed higher accuracy in older age-groups, ages 13 years and older\u003c/p\u003e","description":"","filename":"Figure2.png","url":"https://assets-eu.researchsquare.com/files/rs-7512909/v1/534f51fdf28d9e0c0df7ac91.png"},{"id":93882616,"identity":"19126dc6-057e-48d0-86fd-b62cbbe361c8","added_by":"auto","created_at":"2025-10-19 16:53:50","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":2334256,"visible":true,"origin":"","legend":"\u003cp\u003eVisual representation of age classification based on ECG features: Two-dimensional display of the age-distribution of 29,408 children and adolescents (ages 1 day to 21 years, age group 1 – red, age-group 9 – blue) compares a simple regression analysis in raw space (insert A) and after translation into latent space (insert B) using Bidirectional Generalized Adversarial Network semi-supervised machine learning model. Distribution in latent space created a higher resolution and better prediction of age compared to simple regression analysis\u003c/p\u003e","description":"","filename":"Figure3.png","url":"https://assets-eu.researchsquare.com/files/rs-7512909/v1/d5f991eab2b10231c2c4effa.png"},{"id":93882622,"identity":"3e211a10-3f48-49c7-b36b-5152ea43c71e","added_by":"auto","created_at":"2025-10-19 16:53:51","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":482809,"visible":true,"origin":"","legend":"\u003cp\u003eImportant ECG features for age classification: Certain ECG variables had a high impact on age classification. These variables, such as PR, RR and QT intervals, QRS duration, R and T wave amplitudes in precordial leads, appeared to be more important for the determination of age by supervised machine learning models demonstrated by the impact of F1 accuracy score changes\u003c/p\u003e","description":"","filename":"Figure4.png","url":"https://assets-eu.researchsquare.com/files/rs-7512909/v1/cdebbaebb448b308754771bf.png"},{"id":93883360,"identity":"ed29da9f-52c0-497f-a6a2-d0b43b16e240","added_by":"auto","created_at":"2025-10-19 17:01:51","extension":"png","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":943202,"visible":true,"origin":"","legend":"\u003cp\u003eConfusion matrix of age classification in youth: Accuracy of age classification displayed as a confusion matrix in various age groups using Support Vector Machine supervised machine learning model. Age could be classified with high accuracy in the youngest age groups (age groups 1-3, ages 1 day to 5 months) with true positive rates (TPRs) of 64-72%, compared to older children (age groups 4-6, ages 6 months to 8 years) with TPRs of 51-61%, and adolescents and young adults (age groups 7-9, ages 13-21 years) with TPRs of 46-62%. Adjusting the analysis allowing ±1 age-group deviation improved the TPR to 91-99% in all ages. Age groups 1: 1-6 days old; 2: 1-4 weeks old; 3: 1 months to \u0026lt;6 months old; 4: 6 months to \u0026lt;2 years old; 5: 2 to \u0026lt;5 years old; 6: 5 to \u0026lt;9 years old; 7: 9 to \u0026lt;13 years old; 8: 13 to \u0026lt;17 years old; 9: 17 to \u0026lt;22 years old\u003c/p\u003e","description":"","filename":"Figure5.png","url":"https://assets-eu.researchsquare.com/files/rs-7512909/v1/5fde5ddb91395175998ed6ec.png"},{"id":103251365,"identity":"f4626513-b766-4286-a908-bbc3344fa19e","added_by":"auto","created_at":"2026-02-23 16:08:23","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":7103667,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-7512909/v1/10cef4b7-9a93-42b3-b998-a63ad10d07c1.pdf"}],"financialInterests":"No competing interests reported.","formattedTitle":"Artificial Intelligence Enhanced Electrocardiogram Analysis for Age and Sex Classification in Youth","fulltext":[{"header":"Introduction","content":"\u003cp\u003eSince its invention by Einthoven in 1898, the electrocardiogram (ECG) has been one of the most important screening and diagnostic tests for heart problems [\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e] However, ECG analysis with the ability to indicate underlying cardiac disease states has historically relied upon the inherent biases and errors of physician interpretation based on standards derived from very few healthy subjects. In pediatric and adolescent populations, ECG interpretation is further hindered by the distinct challenges due to the rapid and profound physiological changes that occur from birth through young adulthood. These developmental changes affect ECG waveform morphology and timing, reflecting underlying alterations in heart size, autonomic tone, ion channel function, and conduction pathways. Detailed assessment of the largest historical pediatric cohort of normal ECGs recently allowed us to understand the granularity of these changes in the parameters of ECG variables across childhood [\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e]. Yet, these standards are primarily based on summary statistics of isolated ECG variables and do not fully capture the complex, multivariate patterns embedded within the high-dimensional ECG signal. Recognizing these dynamic and complex features is essential for accurate clinical interpretation.\u003c/p\u003e\u003cp\u003eRecent advances in machine learning (ML) have ignited the development of a novel field: artificial intelligence (AI) enhanced ECG analysis [\u003cspan additionalcitationids=\"CR4\" citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e]. These ML models offer the potential to leverage the full complexity of data in ECGs, and can integrate numerous ECG features simultaneously, uncovering latent patterns, and improving the granularity of physiological modeling. Importantly, establishing ML models that can accurately infer age and sex based solely on ECG data is a critical step toward developing adaptive, context-aware diagnostic tools. These models can identify subtle electrophysiological signatures correlated with demographic factors, thus providing a normative framework that enhances detection of pathological deviations. AI-enhanced ECG analysis has been used for the detection of ventricular dysfunction and hypertrophic cardiomyopathy [\u003cspan additionalcitationids=\"CR7\" citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e] and was also able to model the age and sex of adults and estimate a biological or cardiac age of a person [\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e]. However, these studies were performed in adults. There is a paucity of data of AI-enhanced ECG analysis in children, complicated by the profound changes in the ECG that occur between 0\u0026ndash;21 years of age. Hypertrophic cardiomyopathy (HCM), left ventricular dysfunction and recently, congenitally corrected transposition of the great arteries have been successfully analyzed by ML models with reasonable detection rate [\u003cspan additionalcitationids=\"CR12\" citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e]. However, other less common heart conditions have not been modeled by AI in children. ECG analysis in children and youth has a unique importance in screening for rare heart conditions, congenital heart defects and inherited arrhythmia syndromes [\u003cspan additionalcitationids=\"CR15 CR16\" citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e17\u003c/span\u003e], but this screening cannot be performed in an automated model, if there is no definition of age-and sex-specific normal standards. To date, AI-generated age- and sex-specific ECG standards for children and adolescents have not been developed, limiting the application of automated models for pediatric cardiac screening.\u003c/p\u003e\u003cp\u003eThis study aims to evaluate multiple supervised and semi-supervised ML architectures to classify age groups and sex from ECG features of pediatric and young adult individuals. By developing age- and sex-classification, we aim to establish a foundation for automated AI-enhanced ECG analysis leveraging age- and sex-specific standards for children and young adults.\u003c/p\u003e"},{"header":"Methods","content":"\u003cdiv id=\"Sec3\" class=\"Section2\"\u003e\u003ch2\u003eECG data source and study population\u003c/h2\u003e\u003cp\u003eThis study used a large, curated cohort of ECGs from subjects with no known heart condition. The cohort, previously validated and published (Bratincsak et al, Circ AE, 2020) included ECGs collected retrospectively from patients aged 1 day to 21 years at Hawaii Pacific Health and Rady Children\u0026rsquo;s Hospital San Diego between 2012 and 2022 (n\u0026thinsp;=\u0026thinsp;70,816). ECGs were obtained for a variety of clinical reasons, including evaluation of heart murmur, irregular heartbeat, syncope, dizziness, bradycardia, tachycardia, fever, screening for certain diseases, and for sports pre-participation screening. ECG from patients with congenital or acquired heart conditions, history of heart surgery, arrhythmia syndromes, pacemakers, or duplicate ECGs from the same patient within the same age-group were excluded from the study (n\u0026thinsp;=\u0026thinsp;31,278). The stringent exclusion criteria created a curated cohort of ECGs from children and adolescents with no evidence of heart defect or cardiac anomaly on a more than 7-year average follow-up. All ECGs were performed in resting supine position using GE MAC 5500 HD ECG systems (General Electrics, Houston, TX) at 500 Hz sampling frequency with standardized voltage (10 mm\u0026thinsp;=\u0026thinsp;1 mV) and speed (25 mm/s). Digitized ECG values were exported from the GE Muse v9 system. ECGs were excluded from the final analysis if they had technical errors, lead reversal, poor baseline, or missing lead information (n\u0026thinsp;=\u0026thinsp;10,130), resulting in a final cohort of 29,408 complete ECGs. The study was approved by the Hawaii Pacific Health Research Institute and deemed exempt from further Institutional Review Board approval due to the retrospective nature of the analysis.\u003c/p\u003e\u003c/div\u003e\n\u003ch3\u003eECG variable selection and processing\u003c/h3\u003e\n\u003cp\u003eECG variables were pre-selected based on expert physician input and prior established standards. We included 177 ECG variables (features) in the analysis of the ECGs, such as: P, QRS, and T axes; frontal QRS-T and spatial QRS-T angles, RR interval; PR interval; QRS duration; QT interval and corrected QT interval (QTc) calculated using the Bazett and Fridericia methods; peak amplitudes of P, Q, R, S and T waves; QRS integral; and T wave integral in all leads (I, II, III, aVL, aVF, aVR, V1, V2, V3, V4, V5, V6).\u003c/p\u003e\n\u003ch3\u003eMachine learning models, training and testing\u003c/h3\u003e\n\u003cp\u003eSelected ML models were used on standardized digitized values of 177 ECG variables. We compared supervised and semi-supervised ML models for the determination of age and gender. Assessment with supervised ML models included Support Vector Machines (SVM) with linear and Radial Basis Function (RBF) kernels, Adaptive Boosting (AdaBoost) with decision trees, and Linear Discriminant Analysis (LDA) models. For semi-supervised ML we used Triplets Bidirectional Generative Adversarial Networks (T-BiGAN) and Residual Networks (ResNet), a deep neural network.\u003c/p\u003e\n\u003ch3\u003eTraditional ML models\u003c/h3\u003e\n\u003cp\u003eAnalysis with SVM is usually employed to solve complex classifications with a sensitive detection of outliers by defining boundaries among data points predetermined by certain supervised inputs. We used SVM in both the original data space with a linear kernel and in a new feature space obtained by a non-linear transformation of the data using RBF kernel. with penalty C\u0026thinsp;=\u0026thinsp;0.5, 1, and 1.5 analyzed and optimal penalty C\u0026thinsp;=\u0026thinsp;1.5 selected for age classification, and the default C\u0026thinsp;=\u0026thinsp;1 for sex classification.\u003c/p\u003e\u003cp\u003eAdaBoost is a popular ensemble-based method for data classification that can enhance the power of a base/weak classifiers by a weighted linear combination of original data. We used AdaBoost with a decision tree model as the base estimator with hyperparameter depth of D\u0026thinsp;=\u0026thinsp;1,3,5 tested, and learning rate L\u0026thinsp;=\u0026thinsp;0.1,1,2 tested. The base estimator was set at 200 with the optimal hyperparameter combination of D\u0026thinsp;=\u0026thinsp;5 and L\u0026thinsp;=\u0026thinsp;1 for age classification, while for sex classification the base estimator was 50, D\u0026thinsp;=\u0026thinsp;1, and L\u0026thinsp;=\u0026thinsp;1.\u003c/p\u003e\u003cp\u003eLDA is a multi-class classification model that can be used for supervised learning by maximizing class separation on a low-dimensional space. We used LDA to separate multiple classes with multiple features based on data dimensionality reduction involving the entire ellipse of data and not only data on the boundary of distinct groups.\u003c/p\u003e\u003cp\u003eAge classification is a multi-class classification, and we used the entire dataset to develop the model. Model performance was evaluated by ratio of predicted and true labels for the determination of age-across the 0\u0026ndash;21 years of the cohort, Sex classification is a binary determination. Since the difference in ECG variables is more pronounced among various ages than between the two sexes, we performed binary analysis to determine the sex of the subjects within each age-group. The performance of binary classification for SVM, AdaBoost and LDA was assessed using both the F1 score and receiver operating characteristic (ROC) curves as comparison metrics. We calculated ROC with the probability of each sample being assigned to one class. For model classification accuracy we used the F1-score metric, which combines precision and recall scores.\u003c/p\u003e\n\u003ch3\u003eSemi-supervised ML models\u003c/h3\u003e\n\u003cp\u003eT-BiGAN is a ML model that offers improved feature representation through semi-supervised learning, employing a model based on Bidirectional Generative Adversarial Networks (BiGAN). In this approach, semi-supervised data is seamlessly integrated into the training process via an additional triplet loss term. The BiGAN structure comprises an encoder and a decoder, facilitating data transformation into a latent space, alongside a discriminator tasked with distinguishing genuine data from synthetic data within the GAN framework. In T-BiGAN, auxiliary labels within the dataset are utilized. The choice of triplet loss is deliberated: during training, the model considers a probability where the distance from a query example to a negative example (i.e. those with labels different from the query) should be greater than the distance to positive examples. This strategic use of triplet loss fosters a mapping in the latent space that encourage data with the same label to form distinct clusters, differentiating them from data with other labels. This is the underlying rationale for the application of T-BiGAN on ECG data.\u003c/p\u003e\u003cp\u003eIn our T-BiGAN model, we incorporated sex and age groups (2 sex-groups x 9 age-groups\u0026thinsp;=\u0026thinsp;18 categories) as auxiliary labels during the training process. The model uses a latent dimension of 50 z_dim\u0026thinsp;=\u0026thinsp;50), and is comprised of three main components: an encoder, a generator, and a discriminator. Each of these neural components consists of two hidden layers, with each layer having a size of 1024 units followed by a leaky Rectified Linear Unit (ReLU) activation layer with a negative slope of 0.2 for non-linearity. Additionally, the layers in the generator are followed by Batch Normalization layers. The optimizer is Adam with a very small initial learning rate of 1e-8 and β₁ = 0.5, making the training stable but very slow to start. The model is regularized using L2 weight decay (2.5e-5) and weights are initialized with a small Gaussian noise (stddev\u0026thinsp;=\u0026thinsp;0.02). Training is run for 501 epochs with a batch size of 256.\u003c/p\u003e\u003cp\u003eResidual Networks (ResNet) is a deep neural network architecture designed to facilitate the training of exceptionally deep networks. It achieves this by introducing residual blocks, which utilize skip connections to learn the difference (residual) between input and desired output in each block. ResNet architectures often incorporate batch normalization, global average pooling, and some in various depths.\u003c/p\u003e\u003cp\u003eThe ResNet model comprises of an initial convolutional layer that applies 32 filters, batch normalization, ReLU activation, and max-pooling. The ResNet-based classifier is built on top of ImageNet-pretrained ResNet-50, using it as a feature extractor (with all layers frozen). The output of the ResNet backbone is passed through a custom fully connected head, consisting of dense layers of sizes 512, 256, 128, and 64, each followed by ReLU activation and 50% dropout to prevent overfitting. The final layer is a Softmax classifier that outputs class probabilities. The model is compiled with the Adam optimizer, using categorical cross-entropy loss and accuracy as a metric.\u003c/p\u003e\u003cp\u003eSimilarly to the supervised ML models, T-BiGAN and ResNet models were used to characterize the sex of the subjects in a binary classification, while the age-group classification used a multi-class model for the determination of set age-groups.\u003c/p\u003e\u003cdiv id=\"Sec8\" class=\"Section2\"\u003e\u003ch2\u003eTraining and testing\u003c/h2\u003e\u003cp\u003eFor all supervised (SVM, AdaBoost, LDA) and semi-supervised (T-BiGAN, ResNet) ML models 75% of data was used for training, 5% for testing and hyperparameter tuning, and 15% for final classification. To minimize overfitting and report performance variance, we used k-fold cross-validation.\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec9\" class=\"Section2\"\u003e\u003ch2\u003eStatistical analysis\u003c/h2\u003e\u003cp\u003eFor binary classification, simple descriptive statistical measures were calculated (true positive, false positive, true negative, and false negative rates). From those rates, ROC curves were generated and Area Under the ROC Curve (AUROC) was calculated. For predictive accuracy, precision rates or positive predictive values (true positive divided by the sum of true and false positive), and recall or sensitivity (true positive divided by the sum of true positive and false negative) rates were calculated, next F1 scores were generated as the harmonic mean of the precision and recall rates, being one of the most accurate measures of test predictability. For multiple group classification confusion matrices were generated to assess true and predicted positive rates.\u003c/p\u003e\u003c/div\u003e"},{"header":"Results","content":"\u003cdiv id=\"Sec11\" class=\"Section2\"\u003e\u003ch2\u003eStudy population\u003c/h2\u003e\u003cp\u003eAfter exclusion of patients with heart conditions, duplicate and erroneous ECGs, the final study cohort contained 29,408 curated normal ECGs across ages of 1 day to 21 years (12,318 male \u0026ndash; 41.9%, 17,090 female \u0026ndash; 58.1%). The cohort was divided based on prior ECG age classification following the developmental stages of children and young adults, to the following 9 age-groups: 1) term newborns: 1\u0026ndash;6 days old; 2) neonates: 1\u0026ndash;4 weeks old; 3) young infants: 1 months to \u0026lt;\u0026thinsp;6 months old; 4) older infants: 6 months to \u0026lt;\u0026thinsp;2 years old; 5) toddlers and small children: 2 to \u0026lt;\u0026thinsp;5 years old; 6) children: 5 to \u0026lt;\u0026thinsp;9 years old; 7) preteen children: 9 to \u0026lt;\u0026thinsp;13 years old; 8) teenagers: 13 to \u0026lt;\u0026thinsp;17 years old; 9), and adolescents to young adults: 17 to \u0026lt;\u0026thinsp;22 years old. The number of patients in each age group ranged from 304 to 7,366 (Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e). Although we attempted to model age with a continuous regression analysis across the entire age range, but the confidence interval and error margin of \u0026plusmn;\u0026thinsp;1.2 years was deemed to be too large and meaningless, when we had to compare 1\u0026ndash;6 days old and 2\u0026ndash;4 weeks old infants. Therefore, and following previous physiological classification of infants and children into distinct age groups, we performed our age-classification analysis using 9 distinct age groups.\u003c/p\u003e\u003cp\u003e\u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab1\" border=\"1\"\u003e\u003ccaption language=\"En\"\u003e\u003cdiv class=\"CaptionNumber\"\u003eTable 1\u003c/div\u003e\u003cdiv class=\"CaptionContent\"\u003e\u003cp\u003eNumber of study subjects from 1 day to 21 years sorted into 9 age-groups\u003c/p\u003e\u003c/div\u003e\u003c/caption\u003e\u003ccolgroup cols=\"10\"\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c7\" colnum=\"7\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c8\" colnum=\"8\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c9\" colnum=\"9\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c10\" colnum=\"10\"\u003e\u003c/div\u003e\u003cthead\u003e\u003ctr\u003e\u003cth align=\"left\" colname=\"c1\"\u003e\u003cp\u003eAge groups\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c2\"\u003e\u003cp\u003e1\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c3\"\u003e\u003cp\u003e2\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c4\"\u003e\u003cp\u003e3\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c5\"\u003e\u003cp\u003e4\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c6\"\u003e\u003cp\u003e5\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c7\"\u003e\u003cp\u003e6\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c8\"\u003e\u003cp\u003e7\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c9\"\u003e\u003cp\u003e8\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c10\"\u003e\u003cp\u003e9\u003c/p\u003e\u003c/th\u003e\u003c/tr\u003e\u003c/thead\u003e\u003ctbody\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eAges\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e1\u0026ndash;6 days\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e1\u0026ndash;4 weeks\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e5 weeks \u0026minus;\u0026thinsp;5 months\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u003cp\u003e6\u0026ndash;23 months\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c6\"\u003e\u003cp\u003e2\u0026ndash;4 years\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c7\"\u003e\u003cp\u003e5\u0026ndash;8 years\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c8\"\u003e\u003cp\u003e9\u0026ndash;12 years\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c9\"\u003e\u003cp\u003e13\u0026ndash;16 years\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c10\"\u003e\u003cp\u003e17\u0026ndash;21 years\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eN\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e304\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e684\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e1328\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u003cp\u003e1766\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c6\"\u003e\u003cp\u003e2384\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c7\"\u003e\u003cp\u003e3075\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c8\"\u003e\u003cp\u003e4323\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c9\"\u003e\u003cp\u003e8178\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c10\"\u003e\u003cp\u003e7366\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003emale\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e162\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e363\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e740\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u003cp\u003e948\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c6\"\u003e\u003cp\u003e1227\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c7\"\u003e\u003cp\u003e1574\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c8\"\u003e\u003cp\u003e1896\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c9\"\u003e\u003cp\u003e3032\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c10\"\u003e\u003cp\u003e2376\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003efemale\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e142\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e321\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e588\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u003cp\u003e818\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c6\"\u003e\u003cp\u003e1157\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c7\"\u003e\u003cp\u003e1501\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c8\"\u003e\u003cp\u003e2427\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c9\"\u003e\u003cp\u003e5146\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c10\"\u003e\u003cp\u003e4990\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003c/tbody\u003e\u003c/colgroup\u003e\u003c/table\u003e\u003c/div\u003e\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec12\" class=\"Section2\"\u003e\u003ch2\u003eModeling of sex in various ages\u003c/h2\u003e\u003cp\u003eSex-related differences in ECG parameters were subtle and often masked by the more pronounced age-related changes. To control for the age-related differences, sex classification models were trained and tested separately within each age group. Predictive accuracy varied by age group and the ML model used. The summated area under the ROC curve in distinguishing male and female ECGs ranged from 61% in younger children to 96% in teenagers and young adults. Consistent with the AUROC scores, F1 scores for determining sex ranged from 0.53 to 0.91, depending on the age group and the ML model used. The best sex classification performance was achieved in teenagers (13\u0026ndash;17 years) and young adults (18\u0026ndash;21 years) with an AUROC pf 95% and an F1 score of 0.91, compared to AUROC of 65\u0026ndash;82% and F1 scores of 0.53\u0026ndash;0.7 in younger children (0\u0026ndash;12 years) (Fig.\u0026nbsp;1, Table\u0026nbsp;\u003cspan refid=\"Tab2\" class=\"InternalRef\"\u003e2\u003c/span\u003e). Supervised ML models, such as SVM, consistently outperformed semi-supervised deep learning models, such as ResNet, in predicting sex along all age groups, e.g. an F1 score of 0.91 by SVM vs. 0.60 by ResNet, and an AUROC of 96% by SVM vs. 73% by ResNet in the 16\u0026ndash;21 years old group (age group 9) (Fig.\u0026nbsp;2, Table\u0026nbsp;\u003cspan refid=\"Tab2\" class=\"InternalRef\"\u003e2\u003c/span\u003e).\u003c/p\u003e\u003cp\u003e\u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab2\" border=\"1\"\u003e\u003ccaption language=\"En\"\u003e\u003cdiv class=\"CaptionNumber\"\u003eTable 2\u003c/div\u003e\u003cdiv class=\"CaptionContent\"\u003e\u003cp\u003eArea Under the Receiver Operating Characteristic Curve and F1 accuracy scores in differentiating sex using various machine learning models and 5-fold cross-validation\u003c/p\u003e\u003c/div\u003e\u003c/caption\u003e\u003ccolgroup cols=\"10\"\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c7\" colnum=\"7\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c8\" colnum=\"8\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c9\" colnum=\"9\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c10\" colnum=\"10\"\u003e\u003c/div\u003e\u003cthead\u003e\u003ctr\u003e\u003cth align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/th\u003e\u003cth align=\"left\" colspan=\"9\" nameend=\"c10\" namest=\"c2\"\u003e\u003cp\u003eArea Under the Receiver Operating Characteristic Curves\u003c/p\u003e\u003c/th\u003e\u003c/tr\u003e\u003c/thead\u003e\u003ctbody\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eAge groups\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e1\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e2\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e3\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u003cp\u003e4\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c6\"\u003e\u003cp\u003e5\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c7\"\u003e\u003cp\u003e6\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c8\"\u003e\u003cp\u003e7\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c9\"\u003e\u003cp\u003e8\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c10\"\u003e\u003cp\u003e9\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eSVM-RBF\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e0.74\u0026thinsp;\u0026plusmn;\u0026thinsp;0.12\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e0.65\u0026thinsp;\u0026plusmn;\u0026thinsp;0.12\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e0.67\u0026thinsp;\u0026plusmn;\u0026thinsp;0.08\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u003cp\u003e0.71\u0026thinsp;\u0026plusmn;\u0026thinsp;0.05\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c6\"\u003e\u003cp\u003e0.73\u0026thinsp;\u0026plusmn;\u0026thinsp;0.05\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c7\"\u003e\u003cp\u003e0.74\u0026thinsp;\u0026plusmn;\u0026thinsp;0.06\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c8\"\u003e\u003cp\u003e0.82\u0026thinsp;\u0026plusmn;\u0026thinsp;0.03\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c9\"\u003e\u003cp\u003e0.94\u0026thinsp;\u0026plusmn;\u0026thinsp;0.02\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c10\"\u003e\u003cp\u003e0.95\u0026thinsp;\u0026plusmn;\u0026thinsp;0.02\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eAdaBoost\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e0.76\u0026thinsp;\u0026plusmn;\u0026thinsp;0.13\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e0.66\u0026thinsp;\u0026plusmn;\u0026thinsp;0.08\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e0.66\u0026thinsp;\u0026plusmn;\u0026thinsp;0.05\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u003cp\u003e0.72\u0026thinsp;\u0026plusmn;\u0026thinsp;0.06\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c6\"\u003e\u003cp\u003e0.73\u0026thinsp;\u0026plusmn;\u0026thinsp;0.09\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c7\"\u003e\u003cp\u003e0.75\u0026thinsp;\u0026plusmn;\u0026thinsp;0.05\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c8\"\u003e\u003cp\u003e0.82\u0026thinsp;\u0026plusmn;\u0026thinsp;0.05\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c9\"\u003e\u003cp\u003e0.94\u0026thinsp;\u0026plusmn;\u0026thinsp;0.02\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c10\"\u003e\u003cp\u003e0.95\u0026thinsp;\u0026plusmn;\u0026thinsp;0.01\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eLDA\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e0.55\u0026thinsp;\u0026plusmn;\u0026thinsp;0.04\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e0.60\u0026thinsp;\u0026plusmn;\u0026thinsp;0.03\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e0.64\u0026thinsp;\u0026plusmn;\u0026thinsp;0.03\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u003cp\u003e0.66\u0026thinsp;\u0026plusmn;\u0026thinsp;0.01\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c6\"\u003e\u003cp\u003e0.71\u0026thinsp;\u0026plusmn;\u0026thinsp;0.01\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c7\"\u003e\u003cp\u003e0.72\u0026thinsp;\u0026plusmn;\u0026thinsp;0.01\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c8\"\u003e\u003cp\u003e0.81\u0026thinsp;\u0026plusmn;\u0026thinsp;0.01\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c9\"\u003e\u003cp\u003e0.93\u0026thinsp;\u0026plusmn;\u0026thinsp;0.01\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c10\"\u003e\u003cp\u003e0.95\u0026thinsp;\u0026plusmn;\u0026thinsp;0.01\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eT-BiGAN\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e0.72\u0026thinsp;\u0026plusmn;\u0026thinsp;0.04\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e0.68\u0026thinsp;\u0026plusmn;\u0026thinsp;0.03\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e0.68\u0026thinsp;\u0026plusmn;\u0026thinsp;0.02\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u003cp\u003e0.68\u0026thinsp;\u0026plusmn;\u0026thinsp;0.02\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c6\"\u003e\u003cp\u003e0.70\u0026thinsp;\u0026plusmn;\u0026thinsp;0.01\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c7\"\u003e\u003cp\u003e0.70\u0026thinsp;\u0026plusmn;\u0026thinsp;0.01\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c8\"\u003e\u003cp\u003e0.78\u0026thinsp;\u0026plusmn;\u0026thinsp;0.01\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c9\"\u003e\u003cp\u003e0.93\u0026thinsp;\u0026plusmn;\u0026thinsp;0.01\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c10\"\u003e\u003cp\u003e0.94\u0026thinsp;\u0026plusmn;\u0026thinsp;0.01\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eResnet\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e0.57\u0026thinsp;\u0026plusmn;\u0026thinsp;0.04\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e0.55\u0026thinsp;\u0026plusmn;\u0026thinsp;0.03\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e0.51\u0026thinsp;\u0026plusmn;\u0026thinsp;0.02\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u003cp\u003e0.55\u0026thinsp;\u0026plusmn;\u0026thinsp;0.02\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c6\"\u003e\u003cp\u003e0.55\u0026thinsp;\u0026plusmn;\u0026thinsp;0.01\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c7\"\u003e\u003cp\u003e0.50\u0026thinsp;\u0026plusmn;\u0026thinsp;0.01\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c8\"\u003e\u003cp\u003e0.58\u0026thinsp;\u0026plusmn;\u0026thinsp;0.01\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c9\"\u003e\u003cp\u003e0.52\u0026thinsp;\u0026plusmn;\u0026thinsp;0.01\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c10\"\u003e\u003cp\u003e0.67\u0026thinsp;\u0026plusmn;\u0026thinsp;0.01\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e\u003ctd align=\"left\" colspan=\"9\" nameend=\"c10\" namest=\"c2\"\u003e\u003cp\u003eF1 accuracy scores\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eAge groups\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e1\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e2\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e3\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u003cp\u003e4\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c6\"\u003e\u003cp\u003e5\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c7\"\u003e\u003cp\u003e6\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c8\"\u003e\u003cp\u003e7\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c9\"\u003e\u003cp\u003e8\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c10\"\u003e\u003cp\u003e9\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eSVM-RBF\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e0.70\u0026thinsp;\u0026plusmn;\u0026thinsp;0.1\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e0.66\u0026thinsp;\u0026plusmn;\u0026thinsp;0.11\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e0.65\u0026thinsp;\u0026plusmn;\u0026thinsp;0.09\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u003cp\u003e0.65\u0026thinsp;\u0026plusmn;\u0026thinsp;0.08\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c6\"\u003e\u003cp\u003e0.68\u0026thinsp;\u0026plusmn;\u0026thinsp;0.06\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c7\"\u003e\u003cp\u003e0.71\u0026thinsp;\u0026plusmn;\u0026thinsp;0.05\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c8\"\u003e\u003cp\u003e0.75\u0026thinsp;\u0026plusmn;\u0026thinsp;0.05\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c9\"\u003e\u003cp\u003e0.87\u0026thinsp;\u0026plusmn;\u0026thinsp;0.02\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c10\"\u003e\u003cp\u003e0.91\u0026thinsp;\u0026plusmn;\u0026thinsp;0.02\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eAdaBoost\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e0.66\u0026thinsp;\u0026plusmn;\u0026thinsp;0.12\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e0.61\u0026thinsp;\u0026plusmn;\u0026thinsp;0.11\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e0.60\u0026thinsp;\u0026plusmn;\u0026thinsp;0.08\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u003cp\u003e0.63\u0026thinsp;\u0026plusmn;\u0026thinsp;0.07\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c6\"\u003e\u003cp\u003e0.64\u0026thinsp;\u0026plusmn;\u0026thinsp;0.06\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c7\"\u003e\u003cp\u003e0.67\u0026thinsp;\u0026plusmn;\u0026thinsp;0.05\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c8\"\u003e\u003cp\u003e0.70\u0026thinsp;\u0026plusmn;\u0026thinsp;0.04\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c9\"\u003e\u003cp\u003e0.85\u0026thinsp;\u0026plusmn;\u0026thinsp;0.02\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c10\"\u003e\u003cp\u003e0.89\u0026thinsp;\u0026plusmn;\u0026thinsp;0.01\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eLDA\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e0.53\u0026thinsp;\u0026plusmn;\u0026thinsp;0.03\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e0.63\u0026thinsp;\u0026plusmn;\u0026thinsp;0.03\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e0.61\u0026thinsp;\u0026plusmn;\u0026thinsp;0.03\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u003cp\u003e0.60\u0026thinsp;\u0026plusmn;\u0026thinsp;0.02\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c6\"\u003e\u003cp\u003e0.63\u0026thinsp;\u0026plusmn;\u0026thinsp;0.01\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c7\"\u003e\u003cp\u003e0.63\u0026thinsp;\u0026plusmn;\u0026thinsp;0.01\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c8\"\u003e\u003cp\u003e0.71\u0026thinsp;\u0026plusmn;\u0026thinsp;0.01\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c9\"\u003e\u003cp\u003e0.83\u0026thinsp;\u0026plusmn;\u0026thinsp;0.01\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c10\"\u003e\u003cp\u003e0.83\u0026thinsp;\u0026plusmn;\u0026thinsp;0.01\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eT-BiGAN\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e0.61\u0026thinsp;\u0026plusmn;\u0026thinsp;0.04\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e0.60\u0026thinsp;\u0026plusmn;\u0026thinsp;0.03\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e0.57\u0026thinsp;\u0026plusmn;\u0026thinsp;0.02\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u003cp\u003e0.61\u0026thinsp;\u0026plusmn;\u0026thinsp;0.02\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c6\"\u003e\u003cp\u003e0.64\u0026thinsp;\u0026plusmn;\u0026thinsp;0.01\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c7\"\u003e\u003cp\u003e0.64\u0026thinsp;\u0026plusmn;\u0026thinsp;0.01\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c8\"\u003e\u003cp\u003e0.71\u0026thinsp;\u0026plusmn;\u0026thinsp;0.01\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c9\"\u003e\u003cp\u003e0.85\u0026thinsp;\u0026plusmn;\u0026thinsp;0.01\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c10\"\u003e\u003cp\u003e0.86\u0026thinsp;\u0026plusmn;\u0026thinsp;0.01\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eResnet\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e0.57\u0026thinsp;\u0026plusmn;\u0026thinsp;0.04\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e0.55\u0026thinsp;\u0026plusmn;\u0026thinsp;0.03\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e0.52\u0026thinsp;\u0026plusmn;\u0026thinsp;0.02\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u003cp\u003e0.51\u0026thinsp;\u0026plusmn;\u0026thinsp;0.02\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c6\"\u003e\u003cp\u003e0.53\u0026thinsp;\u0026plusmn;\u0026thinsp;0.01\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c7\"\u003e\u003cp\u003e0.56\u0026thinsp;\u0026plusmn;\u0026thinsp;0.01\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c8\"\u003e\u003cp\u003e0.58\u0026thinsp;\u0026plusmn;\u0026thinsp;0.01\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c9\"\u003e\u003cp\u003e0.61\u0026thinsp;\u0026plusmn;\u0026thinsp;0.01\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c10\"\u003e\u003cp\u003e0.64\u0026thinsp;\u0026plusmn;\u0026thinsp;0.01\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003c/tbody\u003e\u003c/colgroup\u003e\u003ctfoot\u003e\u003ctr\u003e\u003ctd colspan=\"10\"\u003eAge groups: 1: 1\u0026ndash;6 days old; 2: 1\u0026ndash;4 weeks old; 3: 1 months to \u0026lt;\u0026thinsp;6 months old; 4: 6 months to \u0026lt;\u0026thinsp;2 years old; 5: 2 to \u0026lt;\u0026thinsp;5 years old; 6: 5 to \u0026lt;\u0026thinsp;9 years old; 7: 9 to \u0026lt;\u0026thinsp;13 years old; 8: 13 to \u0026lt;\u0026thinsp;17 years old; 9: 17 to \u0026lt;\u0026thinsp;22 years old; 10: 22 to \u0026lt;\u0026thinsp;30 years old; and 11: 30\u0026ndash;40 years old. Machine learning models: Support Vector Machines (SVM) with Radial Basis Function (RBF) kernels, Adaptive Boosting (AdaBoost) with decision trees, Linear Discriminant Analysis (LDA), Triplets Bidirectional Generative Adversarial Networks (T-BiGAN) and Residual Networks (ResNet).\u003c/td\u003e\u003c/tr\u003e\u003c/tfoot\u003e\u003c/table\u003e\u003c/div\u003e\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec13\" class=\"Section2\"\u003e\u003ch2\u003eModeling of age from infancy to young adulthood\u003c/h2\u003e\u003cp\u003eNumerous ECG variables showed variations among different age-groups in children and young adults. The following ECG features were identified having a higher importance using the permutation importance method: heart rate, PR interval, QRS duration, QTc interval, and R and T wave amplitudes in V1, V3 and V6 (Fig.\u0026nbsp;3). These key features were explicitly utilized in the supervised ML models to develop age-group prediction, while the semi-supervised T-BiGAN model processed all features without any discrimination or weight, and transformed to latent space to develop age-group classification. Visual representation of individual data points reflects how the accuracy of age prediction improved when the data was transformed to latent space (Fig.\u0026nbsp;4).\u003c/p\u003e\u003cp\u003eAll models successfully discriminated among the predefined 9 age-groups in the multi-class classification model, with a true positive rate (TPR) ranging from 46\u0026ndash;72%. The highest accuracy was observed in distinguishing the youngest age groups (1\u0026ndash;6 days, 1\u0026ndash;4 weeks, 1\u0026ndash;5 months old) with TPRs of 64\u0026ndash;72%, compared to older age groups (2\u0026ndash;21 years), with TPRs of 46\u0026ndash;62% (Fig.\u0026nbsp;5). Traditional supervised models performed better than semi-supervised neural networks. SVM with RBF kernel showed the highest accuracy with an average TPR of 60%, outperforming AdaBoost (49%), and LDA (55%). Younger age-groups with smaller number of subjects had a higher variation of results depending on hyperparameter optimization, reflected by wider confidence intervals, nevertheless, SVM outperformed the other models in multi-class classification despite hyperparameter optimization. Among semi-supervised models, T-BiGAN had the highest average TPR (57%) in multi-class classification of various ages compared to Resnet (39%).\u003c/p\u003e\u003cp\u003eA confusion matrix visually represents the true positive rates using SVM with RBF kernel (Fig.\u0026nbsp;5). The matrix revealed that misclassification (false negative rate: FNR) predominantly occurred in immediate neighboring age-groups, consistent with the expected overlap in physiological changes. Allowing a single age-group deviation (\u0026plusmn;\u0026thinsp;1 age-group error margin), increased the average adjusted age-group detection accuracy (TPR) to 94% for SVM (range 91\u0026ndash;99%), and 92% with T-BiGAN (range 88\u0026ndash;98%), and a FNR of 1\u0026ndash;9% with SVM, and 2\u0026ndash;12% with T-BiGAN.\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec14\" class=\"Section2\"\u003e\u003ch2\u003eComparison of various machine learning models\u003c/h2\u003e\u003cp\u003eComparison of traditional supervised ML methods (SVM, AdaBoost, LDA) compared to semi-supervised neural networks(T-BiGAN, ResNet) demonstrated that when including less than 300 ECGs, supervised methods (SVM with linear or RBF kernel) outperformed semi-supervised methods in predicting both age and sex. When the analyzed data included 1000 or more ECGs, supervised and semi-supervised methods had similar accuracy. Overall, the best results were achieved using SVM with RBF kernel in both sex-prediction with a binary classification, and age-prediction using a multi-class classification.\u003c/p\u003e\u003c/div\u003e"},{"header":"Discussion","content":"\u003cp\u003eOur findings demonstrate that machine learning models can accurately classify both age and sex from ECG features in children and young adults. Notably, classification of age and sex was achieved with high precision even in groups containing only a few hundred ECGs \u0026ndash; previously not demonstrated with any other method. Our results support the concept that age- and sex-related physiological differences are encoded in the ECG waveform and can be decoded through data-driven methods. Modeling demographic-specific normal ECGs using specific ML models may facilitate the development of more precise, automated ECG interpretation frameworks.\u003c/p\u003e\u003cdiv id=\"Sec16\" class=\"Section2\"\u003e\u003ch2\u003eAge and sex classification\u003c/h2\u003e\u003cp\u003eIn early childhood, physiologic differences between males and females are minimal, but become more apparent around (10\u0026ndash;12 years). Our results demonstrate that among adolescents and young adults, when males and females differ in physiologic features, their ECG also changes, reflected by subtle sex-related differences in specific parameters, such as PR interval, QRS duration, QTc interval, and R and S wave voltages in many leads. ML models detected these changes with high accuracy in adolescents and young adults, with AUROC values of 0.94\u0026ndash;0.95. Such a high AUROC would serve as a remarkable metric for a screening tool, supporting the idea that the ECG encodes biologically relevant sex-specific signatures.\u003c/p\u003e\u003cp\u003eSimilar to sex, age can be determined accurately by ECG in children and young adults. Somatic and physiologic changes occur during the development of children, with the most dramatic changes observed during early childhood (0\u0026ndash;5 years), and less pronounced differences in adolescents and young adults (15 to 21 years). Following somatic growth, ECG variables change throughout childhood, with the most dramatic changes in early childhood, detected by specific ECG parameters, such as heart rate, PR, QRS and QTc intervals, and R and T wave voltages. Despite sample size limitations and class imbalance among analyzed groups, both supervised and semi-supervised ML models classified various ages with an excellent adjusted true positive rate of 88\u0026ndash;99% by the ECG. This is the first ML-based multi-class model to classify nine distinct pediatric age groups using ECG data alone. The proof of the concept of multi-class classification using ML-enhanced ECG analysis is foundational in the process of developing an automated system for ECG analysis and defining normal ECGs for every age and sex.\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec17\" class=\"Section2\"\u003e\u003ch2\u003eMachine learning model comparison\u003c/h2\u003e\u003cp\u003eSupervised ML models (SVM and AdaBoost) performed more favorably in prediction modeling of datasets with less than 1000 data points (ECGs) compared to semi-supervised neural networks (T-BiGAN and ResNet). Supervised and weighted preselection of ECG variables improved the prediction accuracy of supervised ML models. When a dataset (specific age group) contained more than 1000 data points (ECGs), the difference in performance between the supervised and semi-supervised models diminished. While previous ECG-based AI models relied on datasets in excess of 5,000 subjects, our study shows that effective classification and prediction is possible with much smaller subject numbers. Developing ML algorithms for ECG analysis that can model analyzed groups with less than a 1000 subjects, and distinguish these groups with remarkable accuracy is particularly relevant for rare cardiac disorders, where large datasets are not available.\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec18\" class=\"Section2\"\u003e\u003ch2\u003eClinical significance of sex and age classification in youth\u003c/h2\u003e\u003cp\u003eOur results highlight that ECG signatures differ by demographic group and that ML can reliably recognize these patterns. The most significantly affected ECG variables and ML modeling features were PR, QRS and QTc intervals and the R, S and T wave amplitudes. These very same ECG parameters are used to diagnose several heart conditions, including conduction defects, long QT syndrome (LQTS), and ventricular hypertrophy or enlargement associated with cardiomyopathies. The subtle changes in ECG variables of these heart conditions are not only affected, but could be easily masked by the changes caused by different age-groups and sexes. Without age- and sex-specific reference values, such abnormalities can be overlooked or misinterpreted. For example, a mildly prolonged QT interval may be normal in a 1-week-old female, but would serve as a suspected diagnosis in a 10-year-old male, and similarly a certain QRS duration and S wave amplitude in lead V2 could be normal in an 8-year-old female, but would be associated with significant ventricular enlargement or hypertrophy in a 2-year-old female.\u003c/p\u003e\u003cp\u003eEstablishing robust normative ECG standards is crucial for future ML-based models aimed at detecting heart conditions in children. These models can only perform accurately if trained on well-defined demographic baselines with curated normal ECGs. By modeling healthy ECG profiles across pediatric and young adult age groups and sexes, we set the stage for ML tools that can provide demographic-specific diagnosis in clinical practice. It is important to emphasize that the primary aim of our study was not to predict age or sex as diagnostic endpoints, but rather to assess and model the representation of these factors in the ECG data. This distinction is important because accurate identification of age and sex signatures is a prerequisite for developing reliable diagnostic models that avoid confounding by demographic variability. In clinical practice, age and sex are known and recorded variables, but understanding their explicit ECG correlates enhances model transparency and interpretability, will serve as the foundation for future age-and sex-specific ML models for cardiac disease prediction, and provide an important step towards creating AI-enhanced ECG screening.\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec19\" class=\"Section2\"\u003e\u003ch2\u003eLimitations and strengths\u003c/h2\u003e\u003cp\u003eOur study had certain limitations. Sex was self-reported, which may not always align with biologic sex \u0026ndash; potentially affecting ECG patterns. Our ML analysis was limited to selected models, and excluded certain advanced AI methods, such as deep and convolutional neural networks, however, we believe that we have chosen representative models with appropriate optimization, and the exclusion of certain models was because the input signal was not deemed complex enough to warrant them.\u003c/p\u003e\u003cp\u003eStrengths of our study include the use of a curated dataset containing only healthy subjects, eliminating patients with cardiac conditions, and enabling the establishment of normal ECG values for accurate ML modeling. We also compared multiple ML algorithms, rather than relying on a single model, which strengthens confidence in our results and helps mitigate bias and overfitting. Achieving consistent findings across different methods suggests robustness and reproducibility of our results.\u003c/p\u003e\u003c/div\u003e"},{"header":"Conclusion","content":"\u003cp\u003eIn conclusion, this study provides foundational evidence that ML can uncover age- and sex-specific signatures in pediatric ECG data. By establishing reliable age- and sex-specific ECG standards, this work supports future efforts to build ML models capable of identifying conditions with subtle ECG changes affected by age- and sex-specific variations. Our findings move the field beyond static reference standards toward dynamic, ML-informed models that better capture biological variability and will provide personalized, context-aware ECG interpretation. Such AI-enhanced ECG analytic models incorporating demographic variation are poised to improve the accuracy and reliability of ECG interpretation, particularly for rare cardiac conditions in children, where sample sizes are limited and demographic variability is large.\u003c/p\u003e"},{"header":"Declarations","content":"\u003cp\u003e\u003cstrong\u003eEthical Approval\u003c/strong\u003e\u003cp\u003e This retrospective study was deemed exempt from IRB review, as it was conducted using data previously collected under an Institutional Review Board protocol approved by Hawaii Pacific Health, and in accordance with the ethical standards set forth in the 1964 Declaration of Helsinki.\u003c/p\u003e\u003c/p\u003e\u003cp\u003e\u003ch2\u003eCompeting Interests\u003c/h2\u003e\u003cp\u003eThe authors declare that the research was conducted in the absence of any commercial, financial or non-financial** relationships that could be construed as a potential conflict of interest.\u003c/p\u003e\u003c/p\u003e\u003ch2\u003eAuthor Contribution\u003c/h2\u003e\u003cp\u003eHZ \u0026ndash; Methodology, Formal Analysis, Writing \u0026ndash; original draft.MZ-A \u0026ndash; Methodology, Formal Analysis, Writing \u0026ndash; review \u0026amp; editing. MA \u0026ndash; Methodology, Formal Analysis, Writing \u0026ndash; review \u0026amp; editing.NS \u0026ndash; Conceptualization, Methodology, Writing \u0026ndash; review \u0026amp; editing.JZ \u0026ndash; Methodology, Writing \u0026ndash; review \u0026amp; editing.AH-M \u0026ndash; Conceptualization, Methodology, Writing \u0026ndash; review \u0026amp; editing. CK \u0026ndash; Methodology, Formal Analysis, Writing \u0026ndash; review \u0026amp; editing. JP \u0026ndash; Conceptualization, Writing \u0026ndash; original draft. AB \u0026ndash; Conceptualization, Methodology, Formal Analysis, Writing \u0026ndash; original draft.All authors have read and approved the final manuscript.\u003c/p\u003e\u003ch2\u003eAcknowledgments\u003c/h2\u003e\u003cp\u003eThe research for this manuscript was supported by the following grants: 1R21LM0138818 awarded by the National Library of Medicine, National Institutes of Health; and NRT-AI 2244574 awarded by the National Science Foundation.\u003c/p\u003e\u003ch2\u003eData Availability\u003c/h2\u003e\u003cp\u003eThe datasets of electrocardiogram variables for children and young adults will be made partially available upon request. Due to patients\u0026rsquo; privacy and regulations about HIPAA protected personal identifiable information, the entire dataset is not available as an open source and will not be made available, because patient-specific ECG parameters may identify an individual patient.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\n\u003cli\u003eFisch C (2000) Centennial of the string galvanometer and the electrocardiogram. \u003cem\u003eJ Am Coll Cardiol\u003c/em\u003e 36:1737\u0026ndash;1745. https://doi.org/10.1016/s0735-1097(00)00976-1\u003c/li\u003e\n\u003cli\u003eBratincs\u0026aacute;k A, Kimata C, Limm-Chan BN, Vincent KP, Williams MR, Perry JC (2020) Electrocardiogram standards for children and young adults using Z-scores. \u003cem\u003eCirc Arrhythm Electrophysiol\u003c/em\u003e 13:e008253. https://doi.org/10.1161/CIRCEP.119.008253\u003c/li\u003e\n\u003cli\u003eFeeny A, Chung MK, Madabhushi A, Attia ZI, Cikes M, Firouznia M, et al. (2019) Artificial intelligence and machine learning in arrhythmias and electrophysiology. \u003cem\u003eCirc Arrhythm Electrophysiol\u003c/em\u003e 12:e007952. https://doi.org/10.1161/CIRCEP.119.007952\u003c/li\u003e\n\u003cli\u003eRibeiro AH, Ribeiro MH, Paix\u0026atilde;o GMM, Oliveira DM, Gomes PR, Canazart JA, et al. (2020) Automatic diagnosis of the 12-lead ECG using a deep neural network. \u003cem\u003eNat Commun\u003c/em\u003e 11:1760. https://doi.org/10.1038/s41467-020-15656-0\u003c/li\u003e\n\u003cli\u003eSiontis KC, Attia ZI, Friedman PA, Noseworthy PA, Kapa S, Lopez-Jimenez F, et al. (2021) Artificial intelligence-enhanced electrocardiography in cardiovascular disease management. \u003cem\u003eNat Rev Cardiol\u003c/em\u003e 18:349\u0026ndash;360. https://doi.org/10.1038/s41569-020-00503-2\u003c/li\u003e\n\u003cli\u003eAttia ZI, Noseworthy PA, Lopez-Jimenez F, Asirvatham SJ, Deshmukh AJ, Gersh BJ, et al. (2019) An artificial intelligence-enabled ECG algorithm for the identification of patients with atrial fibrillation during sinus rhythm. \u003cem\u003eLancet\u003c/em\u003e 394:861\u0026ndash;867. https://doi.org/10.1016/S0140-6736(19)31721-0\u003c/li\u003e\n\u003cli\u003eAttia ZI, Kapa S, Lopez-Jimenez F, McKie PM, Ladewig DJ, Satam G, et al. (2019) Screening for cardiac contractile dysfunction using an artificial intelligence-enabled electrocardiogram. \u003cem\u003eNat Med\u003c/em\u003e 25:70\u0026ndash;74. https://doi.org/10.1038/s41591-018-0240-2\u003c/li\u003e\n\u003cli\u003eKo WY, Siontis KC, Attia ZI, Carter RE, Kapa S, Ommen SR, et al. (2020) Detection of hypertrophic cardiomyopathy using a convolutional neural network-enabled electrocardiogram. \u003cem\u003eJ Am Coll Cardiol\u003c/em\u003e 75:722\u0026ndash;733. https://doi.org/10.1016/j.jacc.2019.12.030\u003c/li\u003e\n\u003cli\u003eAttia ZI, Friedman PA, Noseworthy PA, Lopez-Jimenez F, Ladewig DJ, Satam G, et al. (2019c) Age and sex prediction using an artificial intelligence-enabled electrocardiogram. \u003cem\u003eCirc Arrhythm Electrophysiol\u003c/em\u003e 12:e007284. https://doi.org/10.1161/CIRCEP.119.007284\u003c/li\u003e\n\u003cli\u003evan der Wall EE (2022) International criteria for electrocardiographic interpretation in athletes: Consensus statement. \u003cem\u003eJ Electrocardiol\u003c/em\u003e 71:1\u0026ndash;6. https://doi.org/10.1016/j.jelectrocard.2021.12.001\u003c/li\u003e\n\u003cli\u003eSiontis KC, Noseworthy PA, Attia ZI, Carter RE, Yao X, Kapa S, et al. (2021) Detection of hypertrophic cardiomyopathy by an artificial intelligence electrocardiogram in children and adolescents. \u003cem\u003eInt J Cardiol\u003c/em\u003e 340:42\u0026ndash;47. https://doi.org/10.1016/j.ijcard.2021.08.026\u003c/li\u003e\n\u003cli\u003eMayourian J, Kaye D, Chowdhury D, Konerman M, Moghaddam AN, Sambidi P, et al. (2024) Pediatric ECG-based deep learning to predict left ventricular dysfunction and remodeling. \u003cem\u003eCirculation\u003c/em\u003e 149:917\u0026ndash;931. https://doi.org/10.1161/CIRCULATIONAHA.123.067750\u003c/li\u003e\n\u003cli\u003eGhelani SJ, Thatte N, La Cava W, Triedman JK, Mayourian J. Artificial Intelligence-Enabled ECG to Detect Congenitally Corrected Transposition of the Great Arteries. Pediatr Cardiol. 2025 Jun 16. doi: 10.1007/s00246-025-03916-3. Epub ahead of print. PMID: 40523997.\u003c/li\u003e\n\u003cli\u003eGillette PC, Garson A (1992) Sudden cardiac death in the pediatric population. \u003cem\u003eCirculation\u003c/em\u003e 85:I64\u0026ndash;I69. https://doi.org/10.1161/01.CIR.85.1_suppl.I64\u003c/li\u003e\n\u003cli\u003eRodday AM, Tryka KA, King ME, Goodwin J, Graham D, Parsons SK (2012) Electrocardiogram screening for disorders that cause sudden cardiac death in asymptomatic children: a meta-analysis. \u003cem\u003ePediatrics\u003c/em\u003e 129:e999\u0026ndash;e1010. https://doi.org/10.1542/peds.2011-0643\u003c/li\u003e\n\u003cli\u003eDrezner JA, Ackerman MJ, Anderson J, Ashley E, Asplund CA, Baggish AL, et al. (2017) International criteria for electrocardiographic interpretation in athletes: Consensus statement. \u003cem\u003eBr J Sports Med\u003c/em\u003e 51:704\u0026ndash;731. https://doi.org/10.1136/bjsports-2016-097331\u003c/li\u003e\n\u003cli\u003eSarto P, Zorzi A, Merlo L, Cerrone M, Cipriani A, Mattioli AV, et al. (2023) Value of screening for the risk of sudden cardiac death in young competitive athletes. \u003cem\u003eEur Heart J\u003c/em\u003e 44:1084\u0026ndash;1092. https://doi.org/10.1093/eurheartj/ehac015\u003c/li\u003e\n\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":true,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"pediatric-cardiology","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"pedc","sideBox":"Learn more about [Pediatric Cardiology](http://link.springer.com/journal/246)","snPcode":"246","submissionUrl":"https://submission.nature.com/new-submission/246/3","title":"Pediatric Cardiology","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"em","reportingPortfolio":"Springer Hybrid","inReviewEnabled":true,"inReviewRevisionsEnabled":false},"keywords":"artificial intelligence, machine learning, electrocardiogram, standards, pediatric, screening","lastPublishedDoi":"10.21203/rs.3.rs-7512909/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-7512909/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003e\u003cstrong\u003eIntroduction\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eElectrocardiogram (ECG) values vary significantly across age and sex, particularly during childhood and adolescence. While age- and sex-specific ECG standards exist, they often fail to capture complex multi-dimensional relationships and have not been applied in machine learning (ML) enhanced ECG analysis. Accuracy of automated ECG analysis in clinical practice improved significantly by applying ML models, however there is a paucity of such studies in the pediatric population. Our aim was to create age- and sex-specific standards for children by ML modeling.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eMethods\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eWe analyzed 29,408 curated resting 12-lead ECGs from healthy subjects aged 0-21 years using 177 digitized ECG variables combined with various ML models including regression and classification analyses and semi-supervised neural networks. Primary outcome variables were age and sex. Model performance was evaluated using F1-score, AUROC, and confusion matrices across repeated train-test splits.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eResults\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eSupport vector machine (SVM) achieved the highest accuracy in modeling both age and sex. Key predictive features included heart rate, PR interval, QRS duration, and T-wave amplitude. Age-group classification achieved an average true positive rate of 60% with SVM, improving to 94% when allowing one-group misclassification. Sex classification reached F1-scores of 0.91 and AUROC of 0.95 in adolescents and young adults, and moderate accuracy in younger children.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eDiscussion\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eTraditional supervised ML models can accurately model physiologic ECG changes related to age and sex, outperforming semi-supervised models, particularly in smaller subgroups. These findings support the development of age- and sex-specific ML-enhanced ECG standards to aid future research and clinical applications in pediatric cardiology.\u003c/p\u003e","manuscriptTitle":"Artificial Intelligence Enhanced Electrocardiogram Analysis for Age and Sex Classification in Youth","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-10-19 16:53:42","doi":"10.21203/rs.3.rs-7512909/v1","editorialEvents":[{"type":"communityComments","content":0},{"type":"decision","content":"Revision requested","date":"2025-11-03T07:12:58+00:00","index":"","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2025-10-20T13:16:22+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"135431005525199794520539151026339502256","date":"2025-10-15T00:43:36+00:00","index":"hide","fulltext":""},{"type":"reviewersInvited","content":"","date":"2025-10-07T00:59:04+00:00","index":"","fulltext":""},{"type":"editorAssigned","content":"","date":"2025-09-02T12:09:56+00:00","index":"","fulltext":""},{"type":"checksComplete","content":"","date":"2025-09-02T12:07:35+00:00","index":"","fulltext":""},{"type":"submitted","content":"Pediatric Cardiology","date":"2025-09-02T03:36:06+00:00","index":"","fulltext":""}],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"pediatric-cardiology","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"pedc","sideBox":"Learn more about [Pediatric Cardiology](http://link.springer.com/journal/246)","snPcode":"246","submissionUrl":"https://submission.nature.com/new-submission/246/3","title":"Pediatric Cardiology","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"em","reportingPortfolio":"Springer Hybrid","inReviewEnabled":true,"inReviewRevisionsEnabled":false}}],"origin":"","ownerIdentity":"3930120b-e045-4e5d-aa7c-1cce82a24cad","owner":[],"postedDate":"October 19th, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"published-in-journal","subjectAreas":[],"tags":[],"updatedAt":"2026-02-23T16:05:43+00:00","versionOfRecord":{"articleIdentity":"rs-7512909","link":"https://doi.org/10.1007/s00246-025-04118-7","journal":{"identity":"pediatric-cardiology","isVorOnly":false,"title":"Pediatric Cardiology"},"publishedOn":"2026-02-18 15:57:33","publishedOnDateReadable":"February 18th, 2026"},"versionCreatedAt":"2025-10-19 16:53:42","video":"","vorDoi":"10.1007/s00246-025-04118-7","vorDoiUrl":"https://doi.org/10.1007/s00246-025-04118-7","workflowStages":[]},"version":"v1","identity":"rs-7512909","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-7512909","identity":"rs-7512909","version":["v1"]},"buildId":"8U1c8b4HqxoKbykW_rLl7","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.